897 resultados para Optical pattern recognition -- Mathematical models
Resumo:
We present an approach to automatically de-identify health records. In our approach, personal health information is identified using a Conditional Random Fields machine learning classifier, a large set of linguistic and lexical features, and pattern matching techniques. Identified personal information is then removed from the reports. The de-identification of personal health information is fundamental for the sharing and secondary use of electronic health records, for example for data mining and disease monitoring. The effectiveness of our approach is first evaluated on the 2007 i2b2 Shared Task dataset, a widely adopted dataset for evaluating de-identification techniques. Subsequently, we investigate the robustness of the approach to limited training data; we study its effectiveness on different type and quality of data by evaluating the approach on scanned pathology reports from an Australian institution. This data contains optical character recognition errors, as well as linguistic conventions that differ from those contained in the i2b2 dataset, for example different date formats. The findings suggest that our approach compares to the best approach from the 2007 i2b2 Shared Task; in addition, the approach is found to be robust to variations of training size, data type and quality in presence of sufficient training data.
Resumo:
Objective Evaluate the effectiveness and robustness of Anonym, a tool for de-identifying free-text health records based on conditional random fields classifiers informed by linguistic and lexical features, as well as features extracted by pattern matching techniques. De-identification of personal health information in electronic health records is essential for the sharing and secondary usage of clinical data. De-identification tools that adapt to different sources of clinical data are attractive as they would require minimal intervention to guarantee high effectiveness. Methods and Materials The effectiveness and robustness of Anonym are evaluated across multiple datasets, including the widely adopted Integrating Biology and the Bedside (i2b2) dataset, used for evaluation in a de-identification challenge. The datasets used here vary in type of health records, source of data, and their quality, with one of the datasets containing optical character recognition errors. Results Anonym identifies and removes up to 96.6% of personal health identifiers (recall) with a precision of up to 98.2% on the i2b2 dataset, outperforming the best system proposed in the i2b2 challenge. The effectiveness of Anonym across datasets is found to depend on the amount of information available for training. Conclusion Findings show that Anonym compares to the best approach from the 2006 i2b2 shared task. It is easy to retrain Anonym with new datasets; if retrained, the system is robust to variations of training size, data type and quality in presence of sufficient training data.
Resumo:
Faunal vocalisations are vital indicators for environmental change and faunal vocalisation analysis can provide information for answering ecological questions. Therefore, automated species recognition in environmental recordings has become a critical research area. This thesis presents an automated species recognition approach named Timed and Probabilistic Automata. A small lexicon for describing animal calls is defined, six algorithms for acoustic component detection are developed, and a series of species recognisers are built and evaluated.The presented automated species recognition approach yields significant improvement on the analysis performance over a real world dataset, and may be transferred to commercial software in the future.
Resumo:
Traditional nearest points methods use all the samples in an image set to construct a single convex or affine hull model for classification. However, strong artificial features and noisy data may be generated from combinations of training samples when significant intra-class variations and/or noise occur in the image set. Existing multi-model approaches extract local models by clustering each image set individually only once, with fixed clusters used for matching with various image sets. This may not be optimal for discrimination, as undesirable environmental conditions (eg. illumination and pose variations) may result in the two closest clusters representing different characteristics of an object (eg. frontal face being compared to non-frontal face). To address the above problem, we propose a novel approach to enhance nearest points based methods by integrating affine/convex hull classification with an adapted multi-model approach. We first extract multiple local convex hulls from a query image set via maximum margin clustering to diminish the artificial variations and constrain the noise in local convex hulls. We then propose adaptive reference clustering (ARC) to constrain the clustering of each gallery image set by forcing the clusters to have resemblance to the clusters in the query image set. By applying ARC, noisy clusters in the query set can be discarded. Experiments on Honda, MoBo and ETH-80 datasets show that the proposed method outperforms single model approaches and other recent techniques, such as Sparse Approximated Nearest Points, Mutual Subspace Method and Manifold Discriminant Analysis.
Resumo:
Existing multi-model approaches for image set classification extract local models by clustering each image set individually only once, with fixed clusters used for matching with other image sets. However, this may result in the two closest clusters to represent different characteristics of an object, due to different undesirable environmental conditions (such as variations in illumination and pose). To address this problem, we propose to constrain the clustering of each query image set by forcing the clusters to have resemblance to the clusters in the gallery image sets. We first define a Frobenius norm distance between subspaces over Grassmann manifolds based on reconstruction error. We then extract local linear subspaces from a gallery image set via sparse representation. For each local linear subspace, we adaptively construct the corresponding closest subspace from the samples of a probe image set by joint sparse representation. We show that by minimising the sparse representation reconstruction error, we approach the nearest point on a Grassmann manifold. Experiments on Honda, ETH-80 and Cambridge-Gesture datasets show that the proposed method consistently outperforms several other recent techniques, such as Affine Hull based Image Set Distance (AHISD), Sparse Approximated Nearest Points (SANP) and Manifold Discriminant Analysis (MDA).
Resumo:
Interpolation techniques for spatial data have been applied frequently in various fields of geosciences. Although most conventional interpolation methods assume that it is sufficient to use first- and second-order statistics to characterize random fields, researchers have now realized that these methods cannot always provide reliable interpolation results, since geological and environmental phenomena tend to be very complex, presenting non-Gaussian distribution and/or non-linear inter-variable relationship. This paper proposes a new approach to the interpolation of spatial data, which can be applied with great flexibility. Suitable cross-variable higher-order spatial statistics are developed to measure the spatial relationship between the random variable at an unsampled location and those in its neighbourhood. Given the computed cross-variable higher-order spatial statistics, the conditional probability density function (CPDF) is approximated via polynomial expansions, which is then utilized to determine the interpolated value at the unsampled location as an expectation. In addition, the uncertainty associated with the interpolation is quantified by constructing prediction intervals of interpolated values. The proposed method is applied to a mineral deposit dataset, and the results demonstrate that it outperforms kriging methods in uncertainty quantification. The introduction of the cross-variable higher-order spatial statistics noticeably improves the quality of the interpolation since it enriches the information that can be extracted from the observed data, and this benefit is substantial when working with data that are sparse or have non-trivial dependence structures.
Resumo:
A novel shape recognition algorithm was developed to autonomously classify the Northern Pacific Sea Star (Asterias amurenis) from benthic images that were collected by the Starbug AUV during 6km of transects in the Derwent estuary. Despite the effects of scattering, attenuation, soft focus and motion blur within the underwater images, an optimal joint classification rate of 77.5% and misclassification rate of 13.5% was achieved. The performance of algorithm was largely attributed to its ability to recognise locally deformed sea star shapes that were created during the segmentation of the distorted images.
Resumo:
Wound healing and tumour growth involve collective cell spreading, which is driven by individual motility and proliferation events within a population of cells. Mathematical models are often used to interpret experimental data and to estimate the parameters so that predictions can be made. Existing methods for parameter estimation typically assume that these parameters are constants and often ignore any uncertainty in the estimated values. We use approximate Bayesian computation (ABC) to estimate the cell diffusivity, D, and the cell proliferation rate, λ, from a discrete model of collective cell spreading, and we quantify the uncertainty associated with these estimates using Bayesian inference. We use a detailed experimental data set describing the collective cell spreading of 3T3 fibroblast cells. The ABC analysis is conducted for different combinations of initial cell densities and experimental times in two separate scenarios: (i) where collective cell spreading is driven by cell motility alone, and (ii) where collective cell spreading is driven by combined cell motility and cell proliferation. We find that D can be estimated precisely, with a small coefficient of variation (CV) of 2–6%. Our results indicate that D appears to depend on the experimental time, which is a feature that has been previously overlooked. Assuming that the values of D are the same in both experimental scenarios, we use the information about D from the first experimental scenario to obtain reasonably precise estimates of λ, with a CV between 4 and 12%. Our estimates of D and λ are consistent with previously reported values; however, our method is based on a straightforward measurement of the position of the leading edge whereas previous approaches have involved expensive cell counting techniques. Additional insights gained using a fully Bayesian approach justify the computational cost, especially since it allows us to accommodate information from different experiments in a principled way.
Resumo:
Urbanisation significantly changes the characteristics of a catchment as natural areas are transformed to impervious surfaces such as roads, roofs and parking lots. The increased fraction of impervious surfaces leads to changes to the stormwater runoff characteristics, whilst a variety of anthropogenic activities common to urban areas generate a range of pollutants such as nutrients, solids and organic matter. These pollutants accumulate on catchment surfaces and are removed and trans- ported by stormwater runoff and thereby contribute pollutant loads to receiving waters. In summary, urbanisation influences the stormwater characteristics of a catchment, including hydrology and water quality. Due to the growing recognition that stormwater pollution is a significant environmental problem, the implementation of mitigation strategies to improve the quality of stormwater runoff is becoming increasingly common in urban areas. A scientifically robust stormwater quality treatment strategy is an essential requirement for effective urban stormwater management. The efficient design of treatment systems is closely dependent on the state of knowledge in relation to the primary factors influencing stormwater quality. In this regard, stormwater modelling outcomes provide designers with important guidance and datasets which significantly underpin the design of effective stormwater treatment systems. Therefore, the accuracy of modelling approaches and the reliability modelling outcomes are of particular concern. This book discusses the inherent complexity and key characteristics in the areas of urban hydrology and stormwater quality, based on the influence exerted by a range of rainfall and catchment characteristics. A comprehensive field sampling and testing programme in relation to pollutant build-up, an urban catchment monitoring programme in relation to stormwater quality and the outcomes from advanced statistical analyses provided the platform for the knowledge creation. Two case studies and two real-world applications are discussed to illustrate the translation of the knowledge created to practical use in relation to the role of rainfall and catchment characteristics on urban stormwater quality. An innovative rainfall classification based on stormwater quality was developed to support the effective and scientifically robust design of stormwater treatment systems. Underpinned by the rainfall classification methodology, a reliable approach for design rainfall selection is proposed in order to optimise stormwater treatment based on both, stormwater quality and quantity. This is a paradigm shift from the common approach where stormwater treatment systems are designed based solely on stormwater quantity data. Additionally, how pollutant build-up and stormwater runoff quality vary with a range of catchment characteristics was also investigated. Based on the study out- comes, it can be concluded that the use of only a limited number of catchment parameters such as land use and impervious surface percentage, as it is the case in current modelling approaches, could result in appreciable error in water quality estimation. Influential factors which should be incorporated into modelling in relation to catchment characteristics, should also include urban form and impervious surface area distribution. The knowledge created through the research investigations discussed in this monograph is expected to make a significant contribution to engineering practice such as hydrologic and stormwater quality modelling, stormwater treatment design and urban planning, as the study outcomes provide practical approaches and recommendations for urban stormwater quality enhancement. Furthermore, this monograph also demonstrates how fundamental knowledge of stormwater quality processes can be translated to provide guidance on engineering practice, the comprehensive application of multivariate data analyses techniques and a paradigm on integrative use of computer models and mathematical models to derive practical outcomes.
Resumo:
Oxygen flux between aquatic ecosystems and the water column is a measure of ecosystem metabolism. However, the oxygen flux varies during the day in a “hysteretic” pattern: there is higher net oxygen production at a given irradiance in the morning than in the afternoon. In this study, we investigated the mechanism responsible for the hysteresis in oxygen flux by measuring the daily pattern of oxygen flux, light, and temperature in a seagrass ecosystem (Zostera muelleri in Swansea Shoals, Australia) at three depths. We hypothesised that the oxygen flux pattern could be due to diel variations in either gross primary production or respiration in response to light history or temperature. Hysteresis in oxygen flux was clearly observed at all three depths. We compared this data to mathematical models, and found that the modification of ecosystem respiration by light history is the best explanation for the hysteresis in oxygen flux. Light history-dependent respiration might be due to diel variations in seagrass respiration or the dependence of bacterial production on dissolved organic carbon exudates. Our results indicate that the daily variation in respiration rate may be as important as the daily changes of photosynthetic characteristics in determining the metabolic status of aquatic ecosystems.
Resumo:
Flos Chrysanthemum is a generic name for a particular group of edible plants, which also have medicinal properties. There are, in fact, twenty to thirty different cultivars, which are commonly used in beverages and for medicinal purposes. In this work, four Flos Chrysanthemum cultivars, Hangju, Taiju, Gongju, and Boju, were collected and chromatographic fingerprints were used to distinguish and assess these cultivars for quality control purposes. Chromatography fingerprints contain chemical information but also often have baseline drifts and peak shifts, which complicate data processing, and adaptive iteratively reweighted, penalized least squares, and correlation optimized warping were applied to correct the fingerprint peaks. The adjusted data were submitted to unsupervised and supervised pattern recognition methods. Principal component analysis was used to qualitatively differentiate the Flos Chrysanthemum cultivars. Partial least squares, continuum power regression, and K-nearest neighbors were used to predict the unknown samples. Finally, the elliptic joint confidence region method was used to evaluate the prediction ability of these models. The partial least squares and continuum power regression methods were shown to best represent the experimental results.
Resumo:
A simple sequential thinning algorithm for peeling off pixels along contours is described. An adaptive algorithm obtained by incorporating shape adaptivity into this sequential process is also given. The distortions in the skeleton at the right-angle and acute-angle corners are minimized in the adaptive algorithm. The asymmetry of the skeleton, which is a characteristic of sequential algorithm, and is due to the presence of T-corners in some of the even-thickness pattern is eliminated. The performance (in terms of time requirements and shape preservation) is compared with that of a modern thinning algorithm.
Resumo:
This thesis concerns the development of mathematical models to describe the interactions that occur between spray droplets and leaves. Models are presented that not only provide a contribution to mathematical knowledge in the field of fluid dynamics, but are also of utility within the agrichemical industry. The thesis is presented in two parts. First, thin film models are implemented with efficient numerical schemes in order to simulate droplets on virtual leaf surfaces. Then the interception event is considered, whereby energy balance techniques are employed to instantaneously predict whether an impacting droplet will bounce, splash, or adhere to a leaf.