809 resultados para Superparamagnetic clustering
Resumo:
The Bluetooth technology is being increasingly used to track vehicles throughout their trips, within urban networks and across freeway stretches. One important opportunity offered by this type of data is the measurement of Origin-Destination patterns, emerging from the aggregation and clustering of individual trips. In order to obtain accurate estimations, however, a number of issues need to be addressed, through data filtering and correction techniques. These issues mainly stem from the use of the Bluetooth technology amongst drivers, and the physical properties of the Bluetooth sensors themselves. First, not all cars are equipped with discoverable Bluetooth devices and the Bluetooth-enabled vehicles may belong to some small socio-economic groups of users. Second, the Bluetooth datasets include data from various transport modes; such as pedestrian, bicycles, cars, taxi driver, buses and trains. Third, the Bluetooth sensors may fail to detect all of the nearby Bluetooth-enabled vehicles. As a consequence, the exact journey for some vehicles may become a latent pattern that will need to be extracted from the data. Finally, sensors that are in close proximity to each other may have overlapping detection areas, thus making the task of retrieving the correct travelled path even more challenging. The aim of this paper is twofold. We first give a comprehensive overview of the aforementioned issues. Further, we propose a methodology that can be followed, in order to cleanse, correct and aggregate Bluetooth data. We postulate that the methods introduced by this paper are the first crucial steps that need to be followed in order to compute accurate Origin-Destination matrices in urban road networks.
Resumo:
Smart Card data from Automated Fare Collection system has been considered as a promising source of information for transit planning. However, literature has been limited to mining travel patterns from transit users and suggesting the potential of using this information. This paper proposes a method for mining spatial regular origins-destinations and temporal habitual travelling time from transit users. These travel regularity are discussed as being useful for transit planning. After reconstructing the travel itineraries, three levels of Density-Based Spatial Clustering of Application with Noise (DBSCAN) have been utilised to retrieve travel regularity of each of each frequent transit users. Analyses of passenger classifications and personal travel time variability estimation are performed as the examples of using travel regularity in transit planning. The methodology introduced in this paper is of interest for transit authorities in planning and managements
Resumo:
Crashes on motorway contribute to a significant proportion (40-50%) of non-recurrent motorway congestions. Hence reduce crashes will help address congestion issues (Meyer, 2008). Crash likelihood estimation studies commonly focus on traffic conditions in a Short time window around the time of crash while longer-term pre-crash traffic flow trends are neglected. In this paper we will show, through data mining techniques, that a relationship between pre-crash traffic flow patterns and crash occurrence on motorways exists, and that this knowledge has the potential to improve the accuracy of existing models and opens the path for new development approaches. The data for the analysis was extracted from records collected between 2007 and 2009 on the Shibuya and Shinjuku lines of the Tokyo Metropolitan Expressway in Japan. The dataset includes a total of 824 rear-end and sideswipe crashes that have been matched with traffic flow data of one hour prior to the crash using an incident detection algorithm. Traffic flow trends (traffic speed/occupancy time series) revealed that crashes could be clustered with regards of the dominant traffic flow pattern prior to the crash. Using the k-means clustering method allowed the crashes to be clustered based on their flow trends rather than their distance. Four major trends have been found in the clustering results. Based on these findings, crash likelihood estimation algorithms can be fine-tuned based on the monitored traffic flow conditions with a sliding window of 60 minutes to increase accuracy of the results and minimize false alarms.
Resumo:
Crashes that occur on motorways contribute to a significant proportion (40-50%) of non-recurrent motorway congestions. Hence, reducing the frequency of crashes assists in addressing congestion issues (Meyer, 2008). Crash likelihood estimation studies commonly focus on traffic conditions in a short time window around the time of a crash while longer-term pre-crash traffic flow trends are neglected. In this paper we will show, through data mining techniques that a relationship between pre-crash traffic flow patterns and crash occurrence on motorways exists. We will compare them with normal traffic trends and show this knowledge has the potential to improve the accuracy of existing models and opens the path for new development approaches. The data for the analysis was extracted from records collected between 2007 and 2009 on the Shibuya and Shinjuku lines of the Tokyo Metropolitan Expressway in Japan. The dataset includes a total of 824 rear-end and sideswipe crashes that have been matched with crashes corresponding to traffic flow data using an incident detection algorithm. Traffic trends (traffic speed time series) revealed that crashes can be clustered with regards to the dominant traffic patterns prior to the crash. Using the K-Means clustering method with Euclidean distance function allowed the crashes to be clustered. Then, normal situation data was extracted based on the time distribution of crashes and were clustered to compare with the “high risk” clusters. Five major trends have been found in the clustering results for both high risk and normal conditions. The study discovered traffic regimes had differences in the speed trends. Based on these findings, crash likelihood estimation models can be fine-tuned based on the monitored traffic conditions with a sliding window of 30 minutes to increase accuracy of the results and minimize false alarms.
Resumo:
Cloud computing is an emerging computing paradigm in which IT resources are provided over the Internet as a service to users. One such service offered through the Cloud is Software as a Service or SaaS. SaaS can be delivered in a composite form, consisting of a set of application and data components that work together to deliver higher-level functional software. SaaS is receiving substantial attention today from both software providers and users. It is also predicted to has positive future markets by analyst firms. This raises new challenges for SaaS providers managing SaaS, especially in large-scale data centres like Cloud. One of the challenges is providing management of Cloud resources for SaaS which guarantees maintaining SaaS performance while optimising resources use. Extensive research on the resource optimisation of Cloud service has not yet addressed the challenges of managing resources for composite SaaS. This research addresses this gap by focusing on three new problems of composite SaaS: placement, clustering and scalability. The overall aim is to develop efficient and scalable mechanisms that facilitate the delivery of high performance composite SaaS for users while optimising the resources used. All three problems are characterised as highly constrained, large-scaled and complex combinatorial optimisation problems. Therefore, evolutionary algorithms are adopted as the main technique in solving these problems. The first research problem refers to how a composite SaaS is placed onto Cloud servers to optimise its performance while satisfying the SaaS resource and response time constraints. Existing research on this problem often ignores the dependencies between components and considers placement of a homogenous type of component only. A precise problem formulation of composite SaaS placement problem is presented. A classical genetic algorithm and two versions of cooperative co-evolutionary algorithms are designed to now manage the placement of heterogeneous types of SaaS components together with their dependencies, requirements and constraints. Experimental results demonstrate the efficiency and scalability of these new algorithms. In the second problem, SaaS components are assumed to be already running on Cloud virtual machines (VMs). However, due to the environment of a Cloud, the current placement may need to be modified. Existing techniques focused mostly at the infrastructure level instead of the application level. This research addressed the problem at the application level by clustering suitable components to VMs to optimise the resource used and to maintain the SaaS performance. Two versions of grouping genetic algorithms (GGAs) are designed to cater for the structural group of a composite SaaS. The first GGA used a repair-based method while the second used a penalty-based method to handle the problem constraints. The experimental results confirmed that the GGAs always produced a better reconfiguration placement plan compared with a common heuristic for clustering problems. The third research problem deals with the replication or deletion of SaaS instances in coping with the SaaS workload. To determine a scaling plan that can minimise the resource used and maintain the SaaS performance is a critical task. Additionally, the problem consists of constraints and interdependency between components, making solutions even more difficult to find. A hybrid genetic algorithm (HGA) was developed to solve this problem by exploring the problem search space through its genetic operators and fitness function to determine the SaaS scaling plan. The HGA also uses the problem's domain knowledge to ensure that the solutions meet the problem's constraints and achieve its objectives. The experimental results demonstrated that the HGA constantly outperform a heuristic algorithm by achieving a low-cost scaling and placement plan. This research has identified three significant new problems for composite SaaS in Cloud. Various types of evolutionary algorithms have also been developed in addressing the problems where these contribute to the evolutionary computation field. The algorithms provide solutions for efficient resource management of composite SaaS in Cloud that resulted to a low total cost of ownership for users while guaranteeing the SaaS performance.
Resumo:
Mathematical descriptions of birth–death–movement processes are often calibrated to measurements from cell biology experiments to quantify tissue growth rates. Here we describe and analyze a discrete model of a birth–death-movement process applied to a typical two–dimensional cell biology experiment. We present three different descriptions of the system: (i) a standard mean–field description which neglects correlation effects and clustering; (ii) a moment dynamics description which approximately incorporates correlation and clustering effects, and; (iii) averaged data from repeated discrete simulations which directly incorporates correlation and clustering effects. Comparing these three descriptions indicates that the mean–field and moment dynamics approaches are valid only for certain parameter regimes, and that both these descriptions fail to make accurate predictions of the system for sufficiently fast birth and death rates where the effects of spatial correlations and clustering are sufficiently strong. Without any method to distinguish between the parameter regimes where these three descriptions are valid, it is possible that either the mean–field or moment dynamics model could be calibrated to experimental data under inappropriate conditions, leading to errors in parameter estimation. In this work we demonstrate that a simple measurement of agent clustering and correlation, based on coordination number data, provides an indirect measure of agent correlation and clustering effects, and can therefore be used to make a distinction between the validity of the different descriptions of the birth–death–movement process.
Resumo:
Speaker attribution is the task of annotating a spoken audio archive based on speaker identities. This can be achieved using speaker diarization and speaker linking. In our previous work, we proposed an efficient attribution system, using complete-linkage clustering, for conducting attribution of large sets of two-speaker telephone data. In this paper, we build on our proposed approach to achieve a robust system, applicable to multiple recording domains. To do this, we first extend the diarization module of our system to accommodate multi-speaker (>2) recordings. We achieve this through using a robust cross-likelihood ratio (CLR) threshold stopping criterion for clustering, as opposed to the original stopping criterion of two speakers used for telephone data. We evaluate this baseline diarization module across a dataset of Australian broadcast news recordings, showing a significant lack of diarization accuracy without previous knowledge of the true number of speakers within a recording. We thus propose applying an additional pass of complete-linkage clustering to the diarization module, demonstrating an absolute improvement of 20% in diarization error rate (DER). We then evaluate our proposed multi-domain attribution system across the broadcast news data, demonstrating achievable attribution error rates (AER) as low as 17%.
Resumo:
QUT Library continues to rethink research support with eResearch as a primary driver. The support to the development of the Lens, an open global cyberinfrastructure, has been especially important in the light of technology transfer promotion, and partly in the response to researchers’ needs in following the innovation landscapes not only within the scientific but also patent literature. The Lens http://www.lens.org/lens/ project makes innovation more efficient, fair, transparent and inclusive. It is a joint effort between Cambia http://www.cambia.org.au and Queensland University of Technology (QUT). The Lens serves more than 84 million patent documents in the world as open, annotatable digital public goods that are integrated with scholarly and technical literature along with regulatory and business data. Users can link from search results to visualization and document clusters; from a patent document description to its full-text; from there, if applicable, the sequence data can also be found. Figure 1 shows a BLAST Alignment (DNA) using the Lens. A unique feature of the Lens is the ability to embed search and BLAST results into blogs and websites, and provide real-time updates to them. PatSeq Explorer http://www.lens.org/lens/bio/patseqexplorer allows users to navigate patent sequences that map onto the human genome and in the future, many other genomes. PatSeq Explorer offers three level views for the sequence information and links each group of sequences at the chromosomal level to their corresponding patent documents in the Lens. By integrating sequence and patent search and document clustering capabilities, users can now understand the big and small details on the true extent and scope of genetic sequence patents. QUT Library supported Cambia in developing, testing and promoting the Lens. This poster demonstrates QUT Library’s provision of best practice and holistic research support to a research group and how QUT Librarians have acquired new capabilities to meet the needs of the researchers beyond traditional research support practices.
Resumo:
This paper elaborates the approach used by the Applied Data Mining Research Group (ADMRG) for the Social Event Detection (SED) Tasks of the 2013 MediaEval Benchmark. We extended the constrained clustering algorithm to apply to the first semi-supervised clustering task, and we compared several classifiers with Latent Dirichlet Allocation as feature selector in the second event classification task. The proposed approach focuses on scalability and efficient memory allocation when applied to a high dimensional data with large clusters. Results of the first task show the effectiveness of the proposed method. Results from task 2 indicate that attention on the imbalance categories distributions is needed.
Resumo:
Crashes that occur on motorways contribute to a significant proportion (40-50%) of non-recurrent motorway congestion. Hence, reducing the frequency of crashes assist in addressing congestion issues (Meyer, 2008). Analysing traffic conditions and discovering risky traffic trends and patterns are essential basics in crash likelihood estimations studies and still require more attention and investigation. In this paper we will show, through data mining techniques, that there is a relationship between pre-crash traffic flow patterns and crash occurrence on motorways, compare them with normal traffic trends, and that this knowledge has the potentiality to improve the accuracy of existing crash likelihood estimation models, and opens the path for new development approaches. The data for the analysis was extracted from records collected between 2007 and 2009 on the Shibuya and Shinjuku lines of the Tokyo Metropolitan Expressway in Japan. The dataset includes a total of 824 rear-end and sideswipe crashes that have been matched with crashes corresponding traffic flow data using an incident detection algorithm. Traffic trends (traffic speed time series) revealed that crashes can be clustered with regards to the dominant traffic patterns prior to the crash occurrence. K-Means clustering algorithm applied to determine dominant pre-crash traffic patterns. In the first phase of this research, traffic regimes identified by analysing crashes and normal traffic situations using half an hour speed in upstream locations of crashes. Then, the second phase investigated the different combination of speed risk indicators to distinguish crashes from normal traffic situations more precisely. Five major trends have been found in the first phase of this paper for both high risk and normal conditions. The study discovered traffic regimes had differences in the speed trends. Moreover, the second phase explains that spatiotemporal difference of speed is a better risk indicator among different combinations of speed related risk indicators. Based on these findings, crash likelihood estimation models can be fine-tuned to increase accuracy of estimations and minimize false alarms.
Resumo:
Background The expansion of cell colonies is driven by a delicate balance of several mechanisms including cell motility, cell-to-cell adhesion and cell proliferation. New approaches that can be used to independently identify and quantify the role of each mechanism will help us understand how each mechanism contributes to the expansion process. Standard mathematical modelling approaches to describe such cell colony expansion typically neglect cell-to-cell adhesion, despite the fact that cell-to-cell adhesion is thought to play an important role. Results We use a combined experimental and mathematical modelling approach to determine the cell diffusivity, D, cell-to-cell adhesion strength, q, and cell proliferation rate, ?, in an expanding colony of MM127 melanoma cells. Using a circular barrier assay, we extract several types of experimental data and use a mathematical model to independently estimate D, q and ?. In our first set of experiments, we suppress cell proliferation and analyse three different types of data to estimate D and q. We find that standard types of data, such as the area enclosed by the leading edge of the expanding colony and more detailed cell density profiles throughout the expanding colony, does not provide sufficient information to uniquely identify D and q. We find that additional data relating to the degree of cell-to-cell clustering is required to provide independent estimates of q, and in turn D. In our second set of experiments, where proliferation is not suppressed, we use data describing temporal changes in cell density to determine the cell proliferation rate. In summary, we find that our experiments are best described using the range D = 161 - 243 ?m2 hour-1, q = 0.3 - 0.5 (low to moderate strength) and ? = 0.0305 - 0.0398 hour-1, and with these parameters we can accurately predict the temporal variations in the spatial extent and cell density profile throughout the expanding melanoma cell colony. Conclusions Our systematic approach to identify the cell diffusivity, cell-to-cell adhesion strength and cell proliferation rate highlights the importance of integrating multiple types of data to accurately quantify the factors influencing the spatial expansion of melanoma cell colonies.
Resumo:
This paper investigates the business cycle co-movement across countries and regions since 1950 as a measure for quantifying the economic interdependence in the ongoing globalisation process. Our methodological approach is based on analysis of a correlation matrix and the networks it contains. Such an approach summarises the interaction and interdependence of all elements, and it represents a more accurate measure of the global interdependence involved in an economic system. Our results show (1) the dynamics of interdependence has been driven more by synchronisation in regional growth patterns than by the synchronisation of the world economy, and (2) world crisis periods dramatically increase the global co-movement in the world economy.
De Novo Transcriptome Sequence Assembly and Analysis of RNA Silencing Genes of Nicotiana benthamiana
Resumo:
Background: Nicotiana benthamiana has been widely used for transient gene expression assays and as a model plant in the study of plant-microbe interactions, lipid engineering and RNA silencing pathways. Assembling the sequence of its transcriptome provides information that, in conjunction with the genome sequence, will facilitate gaining insight into the plant's capacity for high-level transient transgene expression, generation of mobile gene silencing signals, and hyper-susceptibility to viral infection. Methodology/Results: RNA-seq libraries from 9 different tissues were deep sequenced and assembled, de novo, into a representation of the transcriptome. The assembly, of16GB of sequence, yielded 237,340 contigs, clustering into 119,014 transcripts (unigenes). Between 80 and 85% of reads from all tissues could be mapped back to the full transcriptome. Approximately 63% of the unigenes exhibited a match to the Solgenomics tomato predicted proteins database. Approximately 94% of the Solgenomics N. benthamiana unigene set (16,024 sequences) matched our unigene set (119,014 sequences). Using homology searches we identified 31 homologues that are involved in RNAi-associated pathways in Arabidopsis thaliana, and show that they possess the domains characteristic of these proteins. Of these genes, the RNA dependent RNA polymerase gene, Rdr1, is transcribed but has a 72 nt insertion in exon1 that would cause premature termination of translation. Dicer-like 3 (DCL3) appears to lack both the DEAD helicase motif and second dsRNA binding motif, and DCL2 and AGO4b have unexpectedly high levels of transcription. Conclusions: The assembled and annotated representation of the transcriptome and list of RNAi-associated sequences are accessible at www.benthgenome.com alongside a draft genome assembly. These genomic resources will be very useful for further study of the developmental, metabolic and defense pathways of N. benthamiana and in understanding the mechanisms behind the features which have made it such a well-used model plant. © 2013 Nakasugi et al.
Resumo:
Rubus yellow net virus (RYNV) was cloned and sequenced from a red raspberry (Rubus idaeus L.) plant exhibiting symptoms of mosaic and mottling in the leaves. Its genomic sequence indicates that it is a distinct member of the genus Badnavirus, with 7932. bp and seven ORFs, the first three corresponding in size and location to the ORFs found in the type member Commelina yellow mottle virus. Bioinformatic analysis of the genomic sequence detected several features including nucleic acid binding motifs, multiple zinc finger-like sequences and domains associated with cellular signaling. Subsequent sequencing of the small RNAs (sRNAs) from RYNV-infected R. idaeus leaf tissue was used to determine any RYNV sequences targeted by RNA silencing and identified abundant virus-derived small RNAs (vsRNAs). The majority of the vsRNAs were 22-nt in length. We observed a highly uneven genome-wide distribution of vsRNAs with strong clustering to small defined regions distributed over both strands of the RYNV genome. Together, our data show that sequences of the aphid-transmitted pararetrovirus RYNV are targeted in red raspberry by the interfering RNA pathway, a predominant antiviral defense mechanism in plants. © 2013.
Resumo:
Introduction Falls are the most frequent adverse event reported in hospitals. Approximately 30% of in-hospital falls lead to an injury and up to 2% result in a fracture. A large randomised trial found that a trained health professional providing individualised falls prevention education to older inpatients reduced falls in a cognitively intact subgroup. This study aims to investigate whether this efficacious intervention can reduce falls and be clinically useful and cost-effective when delivered in the real-life clinical environment. Methods A stepped-wedge cluster randomised trial will be used across eight subacute units (clusters) which will be randomised to one of four dates to start the intervention. Usual care on these units includes patient's screening, assessment and implementation of individualised falls prevention strategies, ongoing staff training and environmental strategies. Patients with better levels of cognition (Mini-Mental State Examination >23/30) will receive the individualised education from a trained health professional in addition to usual care while patient's feedback received during education sessions will be provided to unit staff. Unit staff will receive training to assist in intervention delivery and to enhance uptake of strategies by patients. Falls data will be collected by two methods: case note audit by research assistants and the hospital falls reporting system. Cluster-level data including patient's admissions, length of stay and diagnosis will be collected from hospital systems. Data will be analysed allowing for correlation of outcomes (clustering) within units. An economic analysis will be undertaken which includes an incremental cost-effectiveness analysis. Ethics and dissemination The study was approved by The University of Notre Dame Australia Human Research Ethics Committee and local hospital ethics committees. Results The results will be disseminated through local site networks, and future funding and delivery of falls prevention programmes within WA Health will be informed. Results will also be disseminated through peer-reviewed publications and medical conferences.