364 resultados para Dataset


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Extracting and aggregating the relevant event records relating to an identified security incident from the multitude of heterogeneous logs in an enterprise network is a difficult challenge. Presenting the information in a meaningful way is an additional challenge. This paper looks at solutions to this problem by first identifying three main transforms; log collection, correlation, and visual transformation. Having identified that the CEE project will address the first transform, this paper focuses on the second, while the third is left for future work. To aggregate by correlating event records we demonstrate the use of two correlation methods, simple and composite. These make use of a defined mapping schema and confidence values to dynamically query the normalised dataset and to constrain result events to within a time window. Doing so improves the quality of results, required for the iterative re-querying process being undertaken. Final results of the process are output as nodes and edges suitable for presentation as a network graph.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Currently, recommender systems (RS) have been widely applied in many commercial e-commerce sites to help users deal with the information overload problem. Recommender systems provide personalized recommendations to users and thus help them in making good decisions about which product to buy from the vast number of product choices available to them. Many of the current recommender systems are developed for simple and frequently purchased products like books and videos, by using collaborative-filtering and content-based recommender system approaches. These approaches are not suitable for recommending luxurious and infrequently purchased products as they rely on a large amount of ratings data that is not usually available for such products. This research aims to explore novel approaches for recommending infrequently purchased products by exploiting user generated content such as user reviews and product click streams data. From reviews on products given by the previous users, association rules between product attributes are extracted using an association rule mining technique. Furthermore, from product click streams data, user profiles are generated using the proposed user profiling approach. Two recommendation approaches are proposed based on the knowledge extracted from these resources. The first approach is developed by formulating a new query from the initial query given by the target user, by expanding the query with the suitable association rules. In the second approach, a collaborative-filtering recommender system and search-based approaches are integrated within a hybrid system. In this hybrid system, user profiles are used to find the target user’s neighbour and the subsequent products viewed by them are then used to search for other relevant products. Experiments have been conducted on a real world dataset collected from one of the online car sale companies in Australia to evaluate the effectiveness of the proposed recommendation approaches. The experiment results show that user profiles generated from user click stream data and association rules generated from user reviews can improve recommendation accuracy. In addition, the experiment results also prove that the proposed query expansion and the hybrid collaborative filtering and search-based approaches perform better than the baseline approaches. Integrating the collaborative-filtering and search-based approaches has been challenging as this strategy has not been widely explored so far especially for recommending infrequently purchased products. Therefore, this research will provide a theoretical contribution to the recommender system field as a new technique of combining collaborative-filtering and search-based approaches will be developed. This research also contributes to a development of a new query expansion technique for infrequently purchased products recommendation. This research will also provide a practical contribution to the development of a prototype system for recommending cars.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This project sought to investigate parameters of residual soil materials located in South East Queensland (SEQ), as determined from a large number of historical site investigation records. This was undertaken to quantify material parameter variability and to assess the validity of using commonly adopted correlations to estimate "typical" soil parameters for this region. A dataset of in situ and laboratory derived residual soil parameters was constructed and analysed to identify potential correlations that related either to the entire area considered, or to specific residual soils that were derived from a common parent material. The variability of SEQ soil parameters were generally found to be greater than the results of equivalent studies that analysed transported soil dominant datasets. Noteworthy differences in material properties also became evident when residual soils weathered from different parent materials were considered independently. Large variation between the correlations developed for specific soil types was found, which highligted both heterogeneity of the studied materials and the incompatibility of generic correlations to residual soils present in SEQ. Region and parent material specific correlations that estimate shear strength from in situ penetration tests have been proposed for the various residual soil types considered.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

It is often suggested that there is a psychological advantage to be leading in a competition. It is, however, hard to identify such an effect econometrically. Using a Regression Discontinuity Design over a large dataset of tennis matches (N=634,095) the present paper exploits the randomised variation in first set results that occurs when the first set is decided by a close tie break (N=72,294). I find that winning the first set has a significant and strong effect on the result of the second set. A player who wins a close first set tie break will, on average, win one game more in the second set. I discuss the likely economic and psychological explanations of this phenomenon.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we present a new simulation methodology in order to obtain exact or approximate Bayesian inference for models for low-valued count time series data that have computationally demanding likelihood functions. The algorithm fits within the framework of particle Markov chain Monte Carlo (PMCMC) methods. The particle filter requires only model simulations and, in this regard, our approach has connections with approximate Bayesian computation (ABC). However, an advantage of using the PMCMC approach in this setting is that simulated data can be matched with data observed one-at-a-time, rather than attempting to match on the full dataset simultaneously or on a low-dimensional non-sufficient summary statistic, which is common practice in ABC. For low-valued count time series data we find that it is often computationally feasible to match simulated data with observed data exactly. Our particle filter maintains $N$ particles by repeating the simulation until $N+1$ exact matches are obtained. Our algorithm creates an unbiased estimate of the likelihood, resulting in exact posterior inferences when included in an MCMC algorithm. In cases where exact matching is computationally prohibitive, a tolerance is introduced as per ABC. A novel aspect of our approach is that we introduce auxiliary variables into our particle filter so that partially observed and/or non-Markovian models can be accommodated. We demonstrate that Bayesian model choice problems can be easily handled in this framework.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Atmospheric ultrafine particles play an important role in affecting human health, altering climate and degrading visibility. Numerous studies have been conducted to better understand the formation process of these particles, including field measurements, laboratory chamber studies and mathematical modeling approaches. Field studies on new particle formation found that formation processes were significantly affected by atmospheric conditions, such as the availability of particle precursors and meteorological conditions. However, those studies were mainly carried out in rural areas of the northern hemisphere and information on new particle formation in urban areas, especially those in subtropical regions, is limited. In general, subtropical regions display a higher level of solar radiation, along with stronger photochemical reactivity, than those regions investigated in previous studies. However, based on the results of these studies, the mechanisms involved in the new particle formation process remain unclear, particularly in the Southern Hemisphere. Therefore, in order to fill this gap in knowledge, a new particle formation study was conducted in a subtropical urban area in the Southern Hemisphere during 2009, which measured particle size distribution in different locations in Brisbane, Australia. Characterisation of nucleation events was conducted at the campus building of the Queensland University of Technology (QUT), located in an urban area of Brisbane. Overall, the annual average number concentrations of ultrafine, Aitken and nucleation mode particles were found to be 9.3 x 103, 3.7 x 103 and 5.6 x 103 cm-3, respectively. This was comparable to levels measured in urban areas of northern Europe, but lower than those from polluted urban areas such as the Yangtze River Delta, China and Huelva and Santa Cruz de Tenerife, Spain. Average particle number concentration (PNC) in the Brisbane region did not show significant seasonal variation, however a relatively large variation was observed during the warmer season. Diurnal variation of Aitken and nucleation mode particles displayed different patterns, which suggested that direct vehicle exhaust emissions were a major contributor of Aitken mode particles, while nucleation mode particles originated from vehicle exhaust emissions in the morning and photochemical production at around noon. A total of 65 nucleation events were observed during 2009, in which 40 events were classified as nucleation growth events and the remainder were nucleation burst events. An interesting observation in this study was that all nucleation growth events were associated with vehicle exhaust emission plumes, while the nucleation burst events were associated with industrial emission plumes from an industrial area. The average particle growth rate for nucleation events was found to be 4.6 nm hr-1 (ranging from 1.79-7.78 nm hr-1), which is comparable to other urban studies conducted in the United States, while monthly particle growth rates were found to be positively related to monthly solar radiation (r = 0.76, p <0.05). The particle growth rate values reported in this work are the first of their kind to be reported for the subtropical urban area of Australia. Furthermore, the influence of nucleation events on PNC within the urban airshed was also investigated. PNC was simultaneously measured at urban (QUT), roadside (Woolloongabba) and semi-urban (Rocklea) sites in Brisbane during 2009. Total PNC at these sites was found to be significantly affected by regional nucleation events. The relative fractions of PNC to total daily PNC observed at QUT, Woolloongabba and Rocklea were found to be 12%, 9% and 14%, respectively, during regional nucleation events. These values were higher than those observed as a result of vehicle exhaust emissions during weekday mornings, which ranged from 5.1-5.5% at QUT and Woolloongabba. In addition, PNC in the semi-urban area of Rocklea increased by a factor of 15.4 when it was upwind from urban pollution sources under the influence of nucleation burst events. Finally, we investigated the influence of sulfuric acid on new particle formation in the study region. A H2SO4 proxy was calculated by using [SO2], solar radiation and particle condensation sink data to represent the new particle production strength for the urban, roadside and semi-urban areas of Brisbane during the period June-July of 2009. The temporal variations of the H2SO4 proxies and the nucleation mode particle concentration were found to be in phase during nucleation events in the urban and roadside areas. In contrast, the peak of proxy concentration occurred 1-2 hr prior to the observed peak in nucleation mode particle concentration at the downwind semi-urban area of Brisbane. A moderate to strong linear relationship was found between the proxy and the freshly formed particles, with r2 values of 0.26-0.77 during the nucleation events. In addition, the log[H2SO4 proxy] required to produce new particles was found to be ~1.0 ppb Wm-2 s and below 0.5 ppb Wm-2 s for the urban and semi-urban areas, respectively. The particle growth rates were similar during nucleation events at the three study locations, with an average value of 2.7 ± 0.5 nm hr-1. This result suggested that a similar nucleation mechanism dominated in the study region, which was strongly related to sulphuric acid concentration, however the relationship between the proxy and PNC was poor in the semi-urban area of Rocklea. This can be explained by the fact that the nucleation process was initiated upwind of the site and the resultant particles were transported via the wind to Rocklea. This explanation is also supported by the higher geometric mean diameter value observed for particles during the nucleation event and the time lag relationship between the H2SO4 proxy and PNC observed at Rocklea. In summary, particle size distribution was continuously measured in a subtropical urban area of southern hemisphere during 2009, the findings from which formed the first particle size distribution dataset in the study region. The characteristics of nucleation events in the Brisbane region were quantified and the properties of the nucleation growth and burst events are discussed in detail using a case studies approach. To further investigate the influence of nucleation events on PNC in the study region, PNC was simultaneously measured at three locations to examine the spatial variation of PNC during the regional nucleation events. In addition, the impact of upwind urban pollution on the downwind semi-urban area was quantified during these nucleation events. Sulphuric acid was found to be an important factor influencing new particle formation in the urban and roadside areas of the study region, however, a direct relationship with nucleation events at the semi-urban site was not observed. This study provided an overview of new particle formation in the Brisbane region, and its influence on PNC in the surrounding area. The findings of this work are the first of their kind for an urban area in the southern hemisphere.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This study used next generation sequencing technologies to investigate growth in a cultured crustacean. The objective was to identify and characterise specific gene loci that contribute important phenotypic variation to growth as well as to develop a large set of SNP markers in candidate genes for assessing correlations between specific mutations and individual growth performance. The genomic dataset generated provides a fundamental resource for application in future crustacean stock improvement programs. Ultimately, the data can be applied to development of culture lines with improved growth performance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Benhabib and Spiegel (1994) examine the role of human capital in the development process empirically using a theory-driven specification rather than the standard production function approach. While they find evidence of a positive impact of human capital on income growth, their result is not robust to the inclusion of inequality as an additional covariate. Using an alternate dataset and different measures of inequality, we find robust support for the hypothesis that human capital matters even when we account for the adverse effect of income inequality on growth.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we propose a method to generate a large scale and accurate dense 3D semantic map of street scenes. A dense 3D semantic model of the environment can significantly improve a number of robotic applications such as autonomous driving, navigation or localisation. Instead of using offline trained classifiers for semantic segmentation, our approach employs a data-driven, nonparametric method to parse scenes which easily scale to a large environment and generalise to different scenes. We use stereo image pairs collected from cameras mounted on a moving car to produce dense depth maps which are combined into a global 3D reconstruction using camera poses from stereo visual odometry. Simultaneously, 2D automatic semantic segmentation using a nonparametric scene parsing method is fused into the 3D model. Furthermore, the resultant 3D semantic model is improved with the consideration of moving objects in the scene. We demonstrate our method on the publicly available KITTI dataset and evaluate the performance against manually generated ground truth.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Despite the increasing number of immigrants, there is a limited body of literature describing the use of hospital emergency department (ED) care by immigrants in Australia. This study aims to describe how immigrants from refugee source countries (IRSC) utilise ED care, compared to immigrants from the main English speaking countries (MESC), immigrants from other countries (IOC) and the local population in Queensland. A retrospective analysis of a Queensland state-wide hospital ED dataset (ED Information System) from 1-1-2008 to 31-12-2010 was conducted. Our study showed that immigrants are not a homogenous group. We found that immigrants from IRSC are more likely to use interpreters (8.9%) in the ED compared to IOC. Furthermore, IRSC have a higher rate of ambulance use (odds ratio 1.2, 95% confidence interval (CI) 1.2–1.3), are less likely to be admitted to the hospital from the ED (odds ratio 0.7 (95% CI 0.7–0.8), and have a longer length of stay (LOS; mean differences 33.0, 95% CI 28.8–37.2), in minutes, in the ED compared to the Australian born population. Our findings highlight the need to develop policies and educational interventions to ensure the equitable use of health services among vulnerable immigrant populations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Currently there are ~3000 known species of Sarcophagidae (Diptera), which are classified into 173 genera in three subfamilies. Almost 25% of sarcophagids belong to the genus Sarcophaga (sensu lato) however little is known about the validity of, and relationships between the ~150 (or more) subgenera of Sarcophaga s.l. In this preliminary study, we evaluated the usefulness of three sources of data for resolving relationships between 35 species from 14 Sarcophaga s.l. subgenera: the mitochondrial COI barcode region, ~800. bp of the nuclear gene CAD, and 110 morphological characters. Bayesian, maximum likelihood (ML) and maximum parsimony (MP) analyses were performed on the combined dataset. Much of the tree was only supported by the Bayesian and ML analyses, with the MP tree poorly resolved. The genus Sarcophaga s.l. was resolved as monophyletic in both the Bayesian and ML analyses and strong support was obtained at the species-level. Notably, the only subgenus consistently resolved as monophyletic was Liopygia. The monophyly of and relationships between the remaining Sarcophaga s.l. subgenera sampled remain questionable. We suggest that future phylogenetic studies on the genus Sarcophaga s.l. use combined datasets for analyses. We also advocate the use of additional data and a range of inference strategies to assist with resolving relationships within Sarcophaga s.l.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Random Breath Testing (RBT) is the main drink driving law enforcement tool used throughout Australia. International comparative research considers Australia to have the most successful RBT program compared to other countries in terms of crash reductions (Erke, Goldenbeld, & Vaa, 2009). This success is attributed to the programs high intensity (Erke et al., 2009). Our review of the extant literature suggests that there is no research evidence that indicates an optimal level of alcohol breath testing. That is, we suggest that no research exists to guide policy regarding whether or not there is a point at which alcohol related crashes reach a point of diminishing returns as a result of either saturated or targeted RBT testing. Aims: In this paper we first provide an examination of RBTs and alcohol related crashes across Australian jurisdictions. We then address the question of whether or not an optimal level of random breath testing exists by examining the relationship between the number of RBTs conducted and the occurrence of alcohol-related crashes over time, across all Australian states. Method: To examine the association between RBT rates and alcohol related crashes and to assess whether an optimal ratio of RBT tests per licenced drivers can be determined we draw on three administrative data sources form each jurisdiction. Where possible data collected spans January 1st 2000 to September 30th 2012. The RBT administrative dataset includes the number of Random Breath Tests (RBTs) conducted per month. The traffic crash administrative dataset contains aggregated monthly count of the number of traffic crashes where an individual’s recorded BAC reaches or exceeds 0.05g/ml of alcohol in blood. The licenced driver data were the monthly number of registered licenced drivers spanning January 2000 to December 2011. Results: The data highlights that the Australian story does not reflective of all States and territories. The stable RBT to licenced driver ratio in Queensland (of 1:1) suggests a stable rate of alcohol related crash data of 5.5 per 100,000 licenced drivers. Yet, in South Australia were a relative stable rate of RBT to licenced driver ratio of 1:2 is maintained the rate of alcohol related traffic crashes is substantially less at 3.7 per 100,000. We use joinpoint regression techniques and varying regression models to fit the data and compare the different patterns between jurisdictions. Discussion: The results of this study provide an updated review and evaluation of RBTs conducted in Australia and examines the association between RBTs and alcohol related traffic crashes. We also present an evidence base to guide policy decisions for RBT operations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The study assessed natural levels and patterns of genetic variation in Arabian Gulf populations of a native pearl oyster to define wild population structure considering potential intrinsic and extrinsic factors that could influence any wild structure detected. The study was also the first attempt to develop microsatellite markers and to generate a genome survey sequence (GSS) dataset for the target species using next generation sequencing technology. The partial genome dataset generated has potential biotechnological applications and for pearl oyster farming in the future.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Multiple sclerosis (MS) is the most common cause of chronic neurologic disability beginning in early to middle adult life. Results from recent genome-wide association studies (GWAS) have substantially lengthened the list of disease loci and provide convincing evidence supporting a multifactorial and polygenic model of inheritance. Nevertheless, the knowledge of MS genetics remains incomplete, with many risk alleles still to be revealed. Methods: We used a discovery GWAS dataset (8,844 samples, 2,124 cases and 6,720 controls) and a multi-step logistic regression protocol to identify novel genetic associations. The emerging genetic profile included 350 independent markers and was used to calculate and estimate the cumulative genetic risk in an independent validation dataset (3,606 samples). Analysis of covariance (ANCOVA) was implemented to compare clinical characteristics of individuals with various degrees of genetic risk. Gene ontology and pathway enrichment analysis was done using the DAVID functional annotation tool, the GO Tree Machine, and the Pathway-Express profiling tool. Results: In the discovery dataset, the median cumulative genetic risk (P-Hat) was 0.903 and 0.007 in the case and control groups, respectively, together with 79.9% classification sensitivity and 95.8% specificity. The identified profile shows a significant enrichment of genes involved in the immune response, cell adhesion, cell communication/ signaling, nervous system development, and neuronal signaling, including ionotropic glutamate receptors, which have been implicated in the pathological mechanism driving neurodegeneration. In the validation dataset, the median cumulative genetic risk was 0.59 and 0.32 in the case and control groups, respectively, with classification sensitivity 62.3% and specificity 75.9%. No differences in disease progression or T2-lesion volumes were observed among four levels of predicted genetic risk groups (high, medium, low, misclassified). On the other hand, a significant difference (F = 2.75, P = 0.04) was detected for age of disease onset between the affected misclassified as controls (mean = 36 years) and the other three groups (high, 33.5 years; medium, 33.4 years; low, 33.1 years). Conclusions: The results are consistent with the polygenic model of inheritance. The cumulative genetic risk established using currently available genome-wide association data provides important insights into disease heterogeneity and completeness of current knowledge in MS genetics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We conducted an association study across the human leukocyte antigen (HLA) complex to identify loci associated with multiple sclerosis (MS). Comparing 1927 SNPs in 1618 MS cases and 3413 controls of European ancestry, we identified seven SNPs that were independently associated with MS conditional on the others (each ). All associations were significant in an independent replication cohort of 2212 cases and 2251 controls () and were highly significant in the combined dataset (). The associated SNPs included proxies for HLA-DRB1*15:01 and HLA-DRB1*03:01, and SNPs in moderate linkage disequilibrium (LD) with HLA-A*02:01, HLA-DRB1*04:01 and HLA-DRB1*13:03. We also found a strong association with rs9277535 in the class II gene HLA-DPB1 (discovery set , replication set , combined ). HLA-DPB1 is located centromeric of the more commonly typed class II genes HLA-DRB1, -DQA1 and -DQB1. It is separated from these genes by a recombination hotspot, and the association is not affected by conditioning on genotypes at DRB1, DQA1 and DQB1. Hence rs9277535 represents an independent MS-susceptibility locus of genome-wide significance. It is correlated with the HLA-DPB1*03:01 allele, which has been implicated previously in MS in smaller studies. Further genotyping in large datasets is required to confirm and resolve this association.