978 resultados para Categorical data
Resumo:
Currently there are ~3000 known species of Sarcophagidae (Diptera), which are classified into 173 genera in three subfamilies. Almost 25% of sarcophagids belong to the genus Sarcophaga (sensu lato) however little is known about the validity of, and relationships between the ~150 (or more) subgenera of Sarcophaga s.l. In this preliminary study, we evaluated the usefulness of three sources of data for resolving relationships between 35 species from 14 Sarcophaga s.l. subgenera: the mitochondrial COI barcode region, ~800. bp of the nuclear gene CAD, and 110 morphological characters. Bayesian, maximum likelihood (ML) and maximum parsimony (MP) analyses were performed on the combined dataset. Much of the tree was only supported by the Bayesian and ML analyses, with the MP tree poorly resolved. The genus Sarcophaga s.l. was resolved as monophyletic in both the Bayesian and ML analyses and strong support was obtained at the species-level. Notably, the only subgenus consistently resolved as monophyletic was Liopygia. The monophyly of and relationships between the remaining Sarcophaga s.l. subgenera sampled remain questionable. We suggest that future phylogenetic studies on the genus Sarcophaga s.l. use combined datasets for analyses. We also advocate the use of additional data and a range of inference strategies to assist with resolving relationships within Sarcophaga s.l.
Resumo:
Big Data is a rising IT trend similar to cloud computing, social networking or ubiquitous computing. Big Data can offer beneficial scenarios in the e-health arena. However, one of the scenarios can be that Big Data needs to be kept secured for a long period of time in order to gain its benefits such as finding cures for infectious diseases and protecting patient privacy. From this connection, it is beneficial to analyse Big Data to make meaningful information while the data is stored securely. Therefore, the analysis of various database encryption techniques is essential. In this study, we simulated 3 types of technical environments, namely, Plain-text, Microsoft Built-in Encryption, and custom Advanced Encryption Standard, using Bucket Index in Data-as-a-Service. The results showed that custom AES-DaaS has a faster range query response time than MS built-in encryption. Furthermore, while carrying out the scalability test, we acknowledged that there are performance thresholds depending on physical IT resources. Therefore, for the purpose of efficient Big Data management in eHealth it is noteworthy to examine their scalability limits as well even if it is under a cloud computing environment. In addition, when designing an e-health database, both patient privacy and system performance needs to be dealt as top priorities.
Resumo:
This paper describes the work being conducted in the baseline rail level crossing project, supported by the Australian rail industry and the Cooperative Research Centre for Rail Innovation. The paper discusses the limitations of near-miss data for analysis obtained using current level crossing occurrence reporting practices. The project is addressing these limitations through the development of a data collection and analysis system with an underlying level crossing accident causation model. An overview of the methodology and improved data recording process are described. The paper concludes with a brief discussion of benefits this project is expected to provide the Australian rail industry.
Resumo:
This research aims to develop a reliable density estimation method for signalised arterials based on cumulative counts from upstream and downstream detectors. In order to overcome counting errors associated with urban arterials with mid-link sinks and sources, CUmulative plots and Probe Integration for Travel timE estimation (CUPRITE) is employed for density estimation. The method, by utilizing probe vehicles’ samples, reduces or cancels the counting inconsistencies when vehicles’ conservation is not satisfied within a section. The method is tested in a controlled environment, and the authors demonstrate the effectiveness of CUPRITE for density estimation in a signalised section, and discuss issues associated with the method.
Resumo:
Background: Multiple sclerosis (MS) is the most common cause of chronic neurologic disability beginning in early to middle adult life. Results from recent genome-wide association studies (GWAS) have substantially lengthened the list of disease loci and provide convincing evidence supporting a multifactorial and polygenic model of inheritance. Nevertheless, the knowledge of MS genetics remains incomplete, with many risk alleles still to be revealed. Methods: We used a discovery GWAS dataset (8,844 samples, 2,124 cases and 6,720 controls) and a multi-step logistic regression protocol to identify novel genetic associations. The emerging genetic profile included 350 independent markers and was used to calculate and estimate the cumulative genetic risk in an independent validation dataset (3,606 samples). Analysis of covariance (ANCOVA) was implemented to compare clinical characteristics of individuals with various degrees of genetic risk. Gene ontology and pathway enrichment analysis was done using the DAVID functional annotation tool, the GO Tree Machine, and the Pathway-Express profiling tool. Results: In the discovery dataset, the median cumulative genetic risk (P-Hat) was 0.903 and 0.007 in the case and control groups, respectively, together with 79.9% classification sensitivity and 95.8% specificity. The identified profile shows a significant enrichment of genes involved in the immune response, cell adhesion, cell communication/ signaling, nervous system development, and neuronal signaling, including ionotropic glutamate receptors, which have been implicated in the pathological mechanism driving neurodegeneration. In the validation dataset, the median cumulative genetic risk was 0.59 and 0.32 in the case and control groups, respectively, with classification sensitivity 62.3% and specificity 75.9%. No differences in disease progression or T2-lesion volumes were observed among four levels of predicted genetic risk groups (high, medium, low, misclassified). On the other hand, a significant difference (F = 2.75, P = 0.04) was detected for age of disease onset between the affected misclassified as controls (mean = 36 years) and the other three groups (high, 33.5 years; medium, 33.4 years; low, 33.1 years). Conclusions: The results are consistent with the polygenic model of inheritance. The cumulative genetic risk established using currently available genome-wide association data provides important insights into disease heterogeneity and completeness of current knowledge in MS genetics.
Resumo:
We present a method for optical encryption of information, based on the time-dependent dynamics of writing and erasure of refractive index changes in a bulk lithium niobate medium. Information is written into the photorefractive crystal with a spatially amplitude modulated laser beam which when overexposed significantly degrades the stored data making it unrecognizable. We show that the degradation can be reversed and that a one-to-one relationship exists between the degradation and recovery rates. It is shown that this simple relationship can be used to determine the erasure time required for decrypting the scrambled index patterns. In addition, this method could be used as a straightforward general technique for determining characteristic writing and erasure rates in photorefractive media.
Resumo:
During the current (1995-present) eruptive phase of the Soufrière Hills volcano on Montserrat, voluminous pyroclastic flows entered the sea off the eastern flank of the island, resulting in the deposition of well-defined submarine pyroclastic lobes. Previously reported bathymetric surveys documented the sequential construction of these deposits, but could not image their internal structure, the morphology or extent of their base, or interaction with the underlying sediments. We show, by combining these bathymetric data with new high-resolution three dimensional (3D) seismic data, that the sequence of previously detected pyroclastic deposits from different phases of the ongoing eruptive activity is still well preserved. A detailed interpretation of the 3D seismic data reveals the absence of significant (> 3. m) basal erosion in the distal extent of submarine pyroclastic deposits. We also identify a previously unrecognized seismic unit directly beneath the stack of recent lobes. We propose three hypotheses for the origin of this seismic unit, but prefer an interpretation that the deposit is the result of the subaerial flank collapse that formed the English's Crater scarp on the Soufrière Hills volcano. The 1995-recent volcanic activity on Montserrat accounts for a significant portion of the sediments on the southeast slope of Montserrat, in places forming deposits that are more than 60. m thick, which implies that the potential for pyroclastic flows to build volcanic island edifices is significant.
Resumo:
The Bluetooth technology is being increasingly used to track vehicles throughout their trips, within urban networks and across freeway stretches. One important opportunity offered by this type of data is the measurement of Origin-Destination patterns, emerging from the aggregation and clustering of individual trips. In order to obtain accurate estimations, however, a number of issues need to be addressed, through data filtering and correction techniques. These issues mainly stem from the use of the Bluetooth technology amongst drivers, and the physical properties of the Bluetooth sensors themselves. First, not all cars are equipped with discoverable Bluetooth devices and the Bluetooth-enabled vehicles may belong to some small socio-economic groups of users. Second, the Bluetooth datasets include data from various transport modes; such as pedestrian, bicycles, cars, taxi driver, buses and trains. Third, the Bluetooth sensors may fail to detect all of the nearby Bluetooth-enabled vehicles. As a consequence, the exact journey for some vehicles may become a latent pattern that will need to be extracted from the data. Finally, sensors that are in close proximity to each other may have overlapping detection areas, thus making the task of retrieving the correct travelled path even more challenging. The aim of this paper is twofold. We first give a comprehensive overview of the aforementioned issues. Further, we propose a methodology that can be followed, in order to cleanse, correct and aggregate Bluetooth data. We postulate that the methods introduced by this paper are the first crucial steps that need to be followed in order to compute accurate Origin-Destination matrices in urban road networks.
Resumo:
This thesis is a study for automatic discovery of text features for describing user information needs. It presents an innovative data-mining approach that discovers useful knowledge from both relevance and non-relevance feedback information. The proposed approach can largely reduce noises in discovered patterns and significantly improve the performance of text mining systems. This study provides a promising method for the study of Data Mining and Web Intelligence.
Resumo:
Literature is limited in its knowledge of the Bluetooth protocol based data acquisition process and in the accuracy and reliability of the analysis performed using the data. This paper extends the body of knowledge surrounding the use of data from the Bluetooth Media Access Control Scanner (BMS) as a complementary traffic data source. A multi layer simulation model named Traffic and Communication Simulation (TCS) is developed. TCS is utilised to model the theoretical properties of the BMS data and analyse the accuracy and reliability of travel time estimation using the BMS data.
Resumo:
A significant amount of speech data is required to develop a robust speaker verification system, but it is difficult to find enough development speech to match all expected conditions. In this paper we introduce a new approach to Gaussian probabilistic linear discriminant analysis (GPLDA) to estimate reliable model parameters as a linearly weighted model taking more input from the large volume of available telephone data and smaller proportional input from limited microphone data. In comparison to a traditional pooled training approach, where the GPLDA model is trained over both telephone and microphone speech, this linear-weighted GPLDA approach is shown to provide better EER and DCF performance in microphone and mixed conditions in both the NIST 2008 and NIST 2010 evaluation corpora. Based upon these results, we believe that linear-weighted GPLDA will provide a better approach than pooled GPLDA, allowing for the further improvement of GPLDA speaker verification in conditions with limited development data.
Resumo:
Travel time prediction has long been the topic of transportation research. But most relevant prediction models in the literature are limited to motorways. Travel time prediction on arterial networks is challenging due to involving traffic signals and significant variability of individual vehicle travel time. The limited availability of traffic data from arterial networks makes travel time prediction even more challenging. Recently, there has been significant interest of exploiting Bluetooth data for travel time estimation. This research analysed the real travel time data collected by the Brisbane City Council using the Bluetooth technology on arterials. Databases, including experienced average daily travel time are created and classified for approximately 8 months. Thereafter, based on data characteristics, Seasonal Auto Regressive Integrated Moving Average (SARIMA) modelling is applied on the database for short-term travel time prediction. The SARMIA model not only takes the previous continuous lags into account, but also uses the values from the same time of previous days for travel time prediction. This is carried out by defining a seasonality coefficient which improves the accuracy of travel time prediction in linear models. The accuracy, robustness and transferability of the model are evaluated through comparing the real and predicted values on three sites within Brisbane network. The results contain the detailed validation for different prediction horizons (5 min to 90 minutes). The model performance is evaluated mainly on congested periods and compared to the naive technique of considering the historical average.
Resumo:
Recently, vision-based systems have been deployed in professional sports to track the ball and players to enhance analysis of matches. Due to their unobtrusive nature, vision-based approaches are preferred to wearable sensors (e.g. GPS or RFID sensors) as it does not require players or balls to be instrumented prior to matches. Unfortunately, in continuous team sports where players need to be tracked continuously over long-periods of time (e.g. 35 minutes in field-hockey or 45 minutes in soccer), current vision-based tracking approaches are not reliable enough to provide fully automatic solutions. As such, human intervention is required to fix-up missed or false detections. However, in instances where a human can not intervene due to the sheer amount of data being generated - this data can not be used due to the missing/noisy data. In this paper, we investigate two representations based on raw player detections (and not tracking) which are immune to missed and false detections. Specifically, we show that both team occupancy maps and centroids can be used to detect team activities, while the occupancy maps can be used to retrieve specific team activities. An evaluation on over 8 hours of field hockey data captured at a recent international tournament demonstrates the validity of the proposed approach.
Resumo:
The promise of ‘big data’ has generated a significant deal of interest in the development of new approaches to research in the humanities and social sciences, as well as a range of important critical interventions which warn of an unquestioned rush to ‘big data’. Drawing on the experiences made in developing innovative ‘big data’ approaches to social media research, this paper examines some of the repercussions for the scholarly research and publication practices of those researchers who do pursue the path of ‘big data’–centric investigation in their work. As researchers import the tools and methods of highly quantitative, statistical analysis from the ‘hard’ sciences into computational, digital humanities research, must they also subscribe to the language and assumptions underlying such ‘scientificity’? If so, how does this affect the choices made in gathering, processing, analysing, and disseminating the outcomes of digital humanities research? In particular, is there a need to rethink the forms and formats of publishing scholarly work in order to enable the rigorous scrutiny and replicability of research outcomes?
Resumo:
It is only in recent years that the critical role that spatial data can play in disaster management and strengthening community resilience has been recognised. The recognition of this importance is singularly evident from the fact that in Australia spatial data is considered as soft infrastructure. In the aftermath of every disaster this importance is being increasingly strengthened with state agencies paying greater attention to ensuring the availability of accurate spatial data based on the lessons learnt. For example, the major flooding in Queensland during the summer of 2011 resulted in a comprehensive review of responsibilities and accountability for the provision of spatial information during such natural disasters. A high level commission of enquiry completed a comprehensive investigation of the 2011 Brisbane flood inundation event and made specific recommendations concerning the collection of and accessibility to spatial information for disaster management and for strengthening community resilience during and after a natural disaster. The lessons learnt and processes implemented were subsequently tested by natural disasters during subsequent years. This paper provides an overview of the practical implementation of the recommendations of the commission of enquiry. It focuses particularly on the measures adopted by the state agencies with the primary role for managing spatial data and the evolution of this role in Queensland State, Australia. The paper concludes with a review of the development of the role and the increasing importance of spatial data as an infrastructure for disaster planning and management which promotes the strengthening of community resilience.