826 resultados para Data mining models
Resumo:
Over the last few years, investigations of human epigenetic profiles have identified key elements of change to be Histone Modifications, stable and heritable DNA methylation and Chromatin remodeling. These factors determine gene expression levels and characterise conditions leading to disease. In order to extract information embedded in long DNA sequences, data mining and pattern recognition tools are widely used, but efforts have been limited to date with respect to analyzing epigenetic changes, and their role as catalysts in disease onset. Useful insight, however, can be gained by investigation of associated dinucleotide distributions. The focus of this paper is to explore specific dinucleotides frequencies across defined regions within the human genome, and to identify new patterns between epigenetic mechanisms and DNA content. Signal processing methods, including Fourier and Wavelet Transformations, are employed and principal results are reported.
Resumo:
This thesis is concerned with the detection and prediction of rain in environmental recordings using different machine learning algorithms. The results obtained in this research will help ecologists to efficiently analyse environmental data and monitor biodiversity.
Resumo:
The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.
Resumo:
Introduction & Aims Optimising fracture treatments requires a sound understanding of relationships between stability, callus development and healing outcomes. This has been the goal of computational modelling, but discrepancies remain between simulations and experimental results. We compared healing patterns vs fixation stiffness between a novel computational callus growth model and corresponding experimental data. Hypothesis We hypothesised that callus growth is stimulated by diffusible signals, whose production is in turn regulated by mechanical conditions at the fracture site. We proposed that introducing this scheme into computational models would better replicate the observed tissue patterns and the inverse relationship between callus size and fixation stiffness. Method Finite element models of bone healing under stiff and flexible fixation were constructed, based on the parameters of a parallel rat femoral osteotomy study. An iterative procedure was implemented, to simulate the development of callus and its mechanical regulation. Tissue changes were regulated according to published mechano-biological criteria. Predictions of healing patterns were compared between standard models, with a pre-defined domain for callus development, and a novel approach, in which periosteal callus growth is driven by a diffusible signal. Production of this signal was driven by local mechanical conditions. Finally, each model’s predictions were compared to the corresponding histological data. Results Models in which healing progressed within a prescribed callus domain predicted that greater interfragmentary movements would displace early periosteal bone formation further from the fracture. This results from artificially large distortional strains predicted near the fracture edge. While experiments showed increased hard callus size under flexible fixation, this was not reflected in the standard models. Allowing the callus to grow from a thin soft tissue layer, in response to a mechanically stimulated diffusible signal, results in a callus shape and tissue distribution closer to those observed histologically. Importantly, the callus volume increased with increasing interfragmentary movement. Conclusions A novel method to incorporate callus growth into computational models of fracture healing allowed us to successfully capture the relationship between callus size and fixation stability observed in our rat experiments. This approach expands our toolkit for understanding the influence of different fixation strategies on healing outcomes.
Resumo:
Public buildings and large infrastructure are typically monitored by tens or hundreds of cameras, all capturing different physical spaces and observing different types of interactions and behaviours. However to date, in large part due to limited data availability, crowd monitoring and operational surveillance research has focused on single camera scenarios which are not representative of real-world applications. In this paper we present a new, publicly available database for large scale crowd surveillance. Footage from 12 cameras for a full work day covering the main floor of a busy university campus building, including an internal and external foyer, elevator foyers, and the main external approach are provided; alongside annotation for crowd counting (single or multi-camera) and pedestrian flow analysis for 10 and 6 sites respectively. We describe how this large dataset can be used to perform distributed monitoring of building utilisation, and demonstrate the potential of this dataset to understand and learn the relationship between different areas of a building.
Resumo:
It is not uncommon to hear a person of interest described by their height, build, and clothing (i.e. type and colour). These semantic descriptions are commonly used by people to describe others, as they are quick to communicate and easy to understand. However such queries are not easily utilised within intelligent video surveillance systems, as they are difficult to transform into a representation that can be utilised by computer vision algorithms. In this paper we propose a novel approach that transforms such a semantic query into an avatar in the form of a channel representation that is searchable within a video stream. We show how spatial, colour and prior information (person shape) can be incorporated into the channel representation to locate a target using a particle-filter like approach. We demonstrate state-of-the-art performance for locating a subject in video based on a description, achieving a relative performance improvement of 46.7% over the baseline. We also apply this approach to person re-detection, and show that the approach can be used to re-detect a person in a video steam without the use of person detection.
Resumo:
Automated digital recordings are useful for large-scale temporal and spatial environmental monitoring. An important research effort has been the automated classification of calling bird species. In this paper we examine a related task, retrieval of birdcalls from a database of audio recordings, similar to a user supplied query call. Such a retrieval task can sometimes be more useful than an automated classifier. We compare three approaches to similarity-based birdcall retrieval using spectral ridge features and two kinds of gradient features, structure tensor and the histogram of oriented gradients. The retrieval accuracy of our spectral ridge method is 94% compared to 82% for the structure tensor method and 90% for the histogram of gradients method. Additionally, this approach potentially offers a more compact representation and is more computationally efficient.
Resumo:
This project developed three mathematical models for scheduling ambulances and ambulance crews and proceeded to solve each model for test scenarios based on real data. Results from these models can serve as decision aids for dispatching or relocating ambulances; and for strategic decisions on the ambulance crews needed each shift. This thesis used Flexible Flow Shop Scheduling techniques to formulate strategic, dynamic and real time models. Metaheuristic solutions techniques were applied for a case study with realistic data. These models are suitable for ambulance planners and dispatchers.