909 resultados para Automatic Inference
Resumo:
At present, the most reliable method to obtain end-user perceived quality is through subjective tests. In this paper, the impact of automatic region-of-interest (ROI) coding on perceived quality of mobile video is investigated. The evidence, which is based on perceptual comparison analysis, shows that the coding strategy improves perceptual quality. This is particularly true in low bit rate situations. The ROI detection method used in this paper is based on two approaches: - (1) automatic ROI by analyzing the visual contents automatically, and; - (2) eye-tracking based ROI by aggregating eye-tracking data across many users, used to both evaluate the accuracy of automatic ROI detection and the subjective quality of automatic ROI encoded video. The perceptual comparison analysis is based on subjective assessments with 54 participants, across different content types, screen resolutions, and target bit rates while comparing the two ROI detection methods. The results from the user study demonstrate that ROI-based video encoding has higher perceived quality compared to normal video encoded at a similar bit rate, particularly in the lower bit rate range.
Resumo:
Automatic-dishwasher detergent is a common household substance which is extremely corrosive and potentially fatal if ingested. In this report, we discuss the implications of the ingestion of automatic-dishwasher detergent in 18 children over a three-year period. Ten of the 18 children gained access to the automatic-dishwasher detergent from the dishwasher on the completion of the washing-cycle, while the remainder ingested the detergent directly from the packet. There was a poor correlation between the presenting signs and symptoms and the subsequent endoscopic finding in the 14 children who underwent endoscopy.
Resumo:
Maximum intensity contrast has been used as a measure of lens defocus. A photodiode array under the control of 8085 microprocessor is used to measure the maximum intensity contrast and to position the lens for best focus. The lens is moved by a stepper motor under processor control at a speed of 350 to 500 steps/s. At this speed, focusing time was found to be between 5 and 8 s. Under coherent illuminating conditions, an accuracy of ± 50 μm has been achieved.
Resumo:
We are addressing the problem of jointly using multiple noisy speech patterns for automatic speech recognition (ASR), given that they come from the same class. If the user utters a word K times, the ASR system should try to use the information content in all the K patterns of the word simultaneously and improve its speech recognition accuracy compared to that of the single pattern based speech recognition. T address this problem, recently we proposed a Multi Pattern Dynamic Time Warping (MPDTW) algorithm to align the K patterns by finding the least distortion path between them. A Constrained Multi Pattern Viterbi algorithm was used on this aligned path for isolated word recognition (IWR). In this paper, we explore the possibility of using only the MPDTW algorithm for IWR. We also study the properties of the MPDTW algorithm. We show that using only 2 noisy test patterns (10 percent burst noise at -5 dB SNR) reduces the noisy speech recognition error rate by 37.66 percent when compared to the single pattern recognition using the Dynamic Time Warping algorithm.
Resumo:
Objective Death certificates provide an invaluable source for cancer mortality statistics; however, this value can only be realised if accurate, quantitative data can be extracted from certificates – an aim hampered by both the volume and variable nature of certificates written in natural language. This paper proposes an automatic classification system for identifying cancer related causes of death from death certificates. Methods Detailed features, including terms, n-grams and SNOMED CT concepts were extracted from a collection of 447,336 death certificates. These features were used to train Support Vector Machine classifiers (one classifier for each cancer type). The classifiers were deployed in a cascaded architecture: the first level identified the presence of cancer (i.e., binary cancer/nocancer) and the second level identified the type of cancer (according to the ICD-10 classification system). A held-out test set was used to evaluate the effectiveness of the classifiers according to precision, recall and F-measure. In addition, detailed feature analysis was performed to reveal the characteristics of a successful cancer classification model. Results The system was highly effective at identifying cancer as the underlying cause of death (F-measure 0.94). The system was also effective at determining the type of cancer for common cancers (F-measure 0.7). Rare cancers, for which there was little training data, were difficult to classify accurately (F-measure 0.12). Factors influencing performance were the amount of training data and certain ambiguous cancers (e.g., those in the stomach region). The feature analysis revealed a combination of features were important for cancer type classification, with SNOMED CT concept and oncology specific morphology features proving the most valuable. Conclusion The system proposed in this study provides automatic identification and characterisation of cancers from large collections of free-text death certificates. This allows organisations such as Cancer Registries to monitor and report on cancer mortality in a timely and accurate manner. In addition, the methods and findings are generally applicable beyond cancer classification and to other sources of medical text besides death certificates.
Resumo:
Recent axiomatic derivations of the maximum entropy principle from consistency conditions are critically examined. We show that proper application of consistency conditions alone allows a wider class of functionals, essentially of the form ∝ dx p(x)[p(x)/g(x)] s , for some real numbers, to be used for inductive inference and the commonly used form − ∝ dx p(x)ln[p(x)/g(x)] is only a particular case. The role of the prior densityg(x) is clarified. It is possible to regard it as a geometric factor, describing the coordinate system used and it does not represent information of the same kind as obtained by measurements on the system in the form of expectation values.
Inference of the genetic architecture underlying BMI and height with the use of 20,240 sibling pairs
Resumo:
Evidence that complex traits are highly polygenic has been presented by population-based genome-wide association studies (GWASs) through the identification of many significant variants, as well as by family-based de novo sequencing studies indicating that several traits have a large mutational target size. Here, using a third study design, we show results consistent with extreme polygenicity for body mass index (BMI) and height. On a sample of 20,240 siblings (from 9,570 nuclear families), we used a within-family method to obtain narrow-sense heritability estimates of 0.42 (SE = 0.17, p = 0.01) and 0.69 (SE = 0.14, p = 6 x 10(-)(7)) for BMI and height, respectively, after adjusting for covariates. The genomic inflation factors from locus-specific linkage analysis were 1.69 (SE = 0.21, p = 0.04) for BMI and 2.18 (SE = 0.21, p = 2 x 10(-10)) for height. This inflation is free of confounding and congruent with polygenicity, consistent with observations of ever-increasing genomic-inflation factors from GWASs with large sample sizes, implying that those signals are due to true genetic signals across the genome rather than population stratification. We also demonstrate that the distribution of the observed test statistics is consistent with both rare and common variants underlying a polygenic architecture and that previous reports of linkage signals in complex traits are probably a consequence of polygenic architecture rather than the segregation of variants with large effects. The convergent empirical evidence from GWASs, de novo studies, and within-family segregation implies that family-based sequencing studies for complex traits require very large sample sizes because the effects of causal variants are small on average.
Resumo:
The aim of this thesis is to develop a fully automatic lameness detection system that operates in a milking robot. The instrumentation, measurement software, algorithms for data analysis and a neural network model for lameness detection were developed. Automatic milking has become a common practice in dairy husbandry, and in the year 2006 about 4000 farms worldwide used over 6000 milking robots. There is a worldwide movement with the objective of fully automating every process from feeding to milking. Increase in automation is a consequence of increasing farm sizes, the demand for more efficient production and the growth of labour costs. As the level of automation increases, the time that the cattle keeper uses for monitoring animals often decreases. This has created a need for systems for automatically monitoring the health of farm animals. The popularity of milking robots also offers a new and unique possibility to monitor animals in a single confined space up to four times daily. Lameness is a crucial welfare issue in the modern dairy industry. Limb disorders cause serious welfare, health and economic problems especially in loose housing of cattle. Lameness causes losses in milk production and leads to early culling of animals. These costs could be reduced with early identification and treatment. At present, only a few methods for automatically detecting lameness have been developed, and the most common methods used for lameness detection and assessment are various visual locomotion scoring systems. The problem with locomotion scoring is that it needs experience to be conducted properly, it is labour intensive as an on-farm method and the results are subjective. A four balance system for measuring the leg load distribution of dairy cows during milking in order to detect lameness was developed and set up in the University of Helsinki Research farm Suitia. The leg weights of 73 cows were successfully recorded during almost 10,000 robotic milkings over a period of 5 months. The cows were locomotion scored weekly, and the lame cows were inspected clinically for hoof lesions. Unsuccessful measurements, caused by cows standing outside the balances, were removed from the data with a special algorithm, and the mean leg loads and the number of kicks during milking was calculated. In order to develop an expert system to automatically detect lameness cases, a model was needed. A probabilistic neural network (PNN) classifier model was chosen for the task. The data was divided in two parts and 5,074 measurements from 37 cows were used to train the model. The operation of the model was evaluated for its ability to detect lameness in the validating dataset, which had 4,868 measurements from 36 cows. The model was able to classify 96% of the measurements correctly as sound or lame cows, and 100% of the lameness cases in the validation data were identified. The number of measurements causing false alarms was 1.1%. The developed model has the potential to be used for on-farm decision support and can be used in a real-time lameness monitoring system.
Resumo:
The family of location and scale mixtures of Gaussians has the ability to generate a number of flexible distributional forms. The family nests as particular cases several important asymmetric distributions like the Generalized Hyperbolic distribution. The Generalized Hyperbolic distribution in turn nests many other well known distributions such as the Normal Inverse Gaussian. In a multivariate setting, an extension of the standard location and scale mixture concept is proposed into a so called multiple scaled framework which has the advantage of allowing different tail and skewness behaviours in each dimension with arbitrary correlation between dimensions. Estimation of the parameters is provided via an EM algorithm and extended to cover the case of mixtures of such multiple scaled distributions for application to clustering. Assessments on simulated and real data confirm the gain in degrees of freedom and flexibility in modelling data of varying tail behaviour and directional shape.
Resumo:
Whether a statistician wants to complement a probability model for observed data with a prior distribution and carry out fully probabilistic inference, or base the inference only on the likelihood function, may be a fundamental question in theory, but in practice it may well be of less importance if the likelihood contains much more information than the prior. Maximum likelihood inference can be justified as a Gaussian approximation at the posterior mode, using flat priors. However, in situations where parametric assumptions in standard statistical models would be too rigid, more flexible model formulation, combined with fully probabilistic inference, can be achieved using hierarchical Bayesian parametrization. This work includes five articles, all of which apply probability modeling under various problems involving incomplete observation. Three of the papers apply maximum likelihood estimation and two of them hierarchical Bayesian modeling. Because maximum likelihood may be presented as a special case of Bayesian inference, but not the other way round, in the introductory part of this work we present a framework for probability-based inference using only Bayesian concepts. We also re-derive some results presented in the original articles using the toolbox equipped herein, to show that they are also justifiable under this more general framework. Here the assumption of exchangeability and de Finetti's representation theorem are applied repeatedly for justifying the use of standard parametric probability models with conditionally independent likelihood contributions. It is argued that this same reasoning can be applied also under sampling from a finite population. The main emphasis here is in probability-based inference under incomplete observation due to study design. This is illustrated using a generic two-phase cohort sampling design as an example. The alternative approaches presented for analysis of such a design are full likelihood, which utilizes all observed information, and conditional likelihood, which is restricted to a completely observed set, conditioning on the rule that generated that set. Conditional likelihood inference is also applied for a joint analysis of prevalence and incidence data, a situation subject to both left censoring and left truncation. Other topics covered are model uncertainty and causal inference using posterior predictive distributions. We formulate a non-parametric monotonic regression model for one or more covariates and a Bayesian estimation procedure, and apply the model in the context of optimal sequential treatment regimes, demonstrating that inference based on posterior predictive distributions is feasible also in this case.
Resumo:
Clustering identities in a video is a useful task to aid in video search, annotation and retrieval, and cast identification. However, reliably clustering faces across multiple videos is challenging task due to variations in the appearance of the faces, as videos are captured in an uncontrolled environment. A person's appearance may vary due to session variations including: lighting and background changes, occlusions, changes in expression and make up. In this paper we propose the novel Local Total Variability Modelling (Local TVM) approach to cluster faces across a news video corpus; and incorporate this into a novel two stage video clustering system. We first cluster faces within a single video using colour, spatial and temporal cues; after which we use face track modelling and hierarchical agglomerative clustering to cluster faces across the entire corpus. We compare different face recognition approaches within this framework. Experiments on a news video database show that the Local TVM technique is able effectively model the session variation observed in the data, resulting in improved clustering performance, with much greater computational efficiency than other methods.
Resumo:
Genetics, the science of heredity and variation in living organisms, has a central role in medicine, in breeding crops and livestock, and in studying fundamental topics of biological sciences such as evolution and cell functioning. Currently the field of genetics is under a rapid development because of the recent advances in technologies by which molecular data can be obtained from living organisms. In order that most information from such data can be extracted, the analyses need to be carried out using statistical models that are tailored to take account of the particular genetic processes. In this thesis we formulate and analyze Bayesian models for genetic marker data of contemporary individuals. The major focus is on the modeling of the unobserved recent ancestry of the sampled individuals (say, for tens of generations or so), which is carried out by using explicit probabilistic reconstructions of the pedigree structures accompanied by the gene flows at the marker loci. For such a recent history, the recombination process is the major genetic force that shapes the genomes of the individuals, and it is included in the model by assuming that the recombination fractions between the adjacent markers are known. The posterior distribution of the unobserved history of the individuals is studied conditionally on the observed marker data by using a Markov chain Monte Carlo algorithm (MCMC). The example analyses consider estimation of the population structure, relatedness structure (both at the level of whole genomes as well as at each marker separately), and haplotype configurations. For situations where the pedigree structure is partially known, an algorithm to create an initial state for the MCMC algorithm is given. Furthermore, the thesis includes an extension of the model for the recent genetic history to situations where also a quantitative phenotype has been measured from the contemporary individuals. In that case the goal is to identify positions on the genome that affect the observed phenotypic values. This task is carried out within the Bayesian framework, where the number and the relative effects of the quantitative trait loci are treated as random variables whose posterior distribution is studied conditionally on the observed genetic and phenotypic data. In addition, the thesis contains an extension of a widely-used haplotyping method, the PHASE algorithm, to settings where genetic material from several individuals has been pooled together, and the allele frequencies of each pool are determined in a single genotyping.
Resumo:
This thesis which consists of an introduction and four peer-reviewed original publications studies the problems of haplotype inference (haplotyping) and local alignment significance. The problems studied here belong to the broad area of bioinformatics and computational biology. The presented solutions are computationally fast and accurate, which makes them practical in high-throughput sequence data analysis. Haplotype inference is a computational problem where the goal is to estimate haplotypes from a sample of genotypes as accurately as possible. This problem is important as the direct measurement of haplotypes is difficult, whereas the genotypes are easier to quantify. Haplotypes are the key-players when studying for example the genetic causes of diseases. In this thesis, three methods are presented for the haplotype inference problem referred to as HaploParser, HIT, and BACH. HaploParser is based on a combinatorial mosaic model and hierarchical parsing that together mimic recombinations and point-mutations in a biologically plausible way. In this mosaic model, the current population is assumed to be evolved from a small founder population. Thus, the haplotypes of the current population are recombinations of the (implicit) founder haplotypes with some point--mutations. HIT (Haplotype Inference Technique) uses a hidden Markov model for haplotypes and efficient algorithms are presented to learn this model from genotype data. The model structure of HIT is analogous to the mosaic model of HaploParser with founder haplotypes. Therefore, it can be seen as a probabilistic model of recombinations and point-mutations. BACH (Bayesian Context-based Haplotyping) utilizes a context tree weighting algorithm to efficiently sum over all variable-length Markov chains to evaluate the posterior probability of a haplotype configuration. Algorithms are presented that find haplotype configurations with high posterior probability. BACH is the most accurate method presented in this thesis and has comparable performance to the best available software for haplotype inference. Local alignment significance is a computational problem where one is interested in whether the local similarities in two sequences are due to the fact that the sequences are related or just by chance. Similarity of sequences is measured by their best local alignment score and from that, a p-value is computed. This p-value is the probability of picking two sequences from the null model that have as good or better best local alignment score. Local alignment significance is used routinely for example in homology searches. In this thesis, a general framework is sketched that allows one to compute a tight upper bound for the p-value of a local pairwise alignment score. Unlike the previous methods, the presented framework is not affeced by so-called edge-effects and can handle gaps (deletions and insertions) without troublesome sampling and curve fitting.
Automatic detection of diabetic foot complications with infrared thermography by asymmetric analysis
Resumo:
Early identification of diabetic foot complications and their precursors is essential in preventing their devastating consequences, such as foot infection and amputation. Frequent, automatic risk assessment by an intelligent telemedicine system might be feasible and cost effective. Infrared thermography is a promising modality for such a system. The temperature differences between corresponding areas on contralateral feet are the clinically significant parameters. This asymmetric analysis is hindered by (1) foot segmentation errors, especially when the foot temperature and the ambient temperature are comparable, and by (2) different shapes and sizes between contralateral feet due to deformities or minor amputations. To circumvent the first problem, we used a color image and a thermal image acquired synchronously. Foot regions, detected in the color image, were rigidly registered to the thermal image. This resulted in 97.8% ± 1.1% sensitivity and 98.4% ± 0.5% specificity over 76 high-risk diabetic patients with manual annotation as a reference. Nonrigid landmark-based registration with Bsplines solved the second problem. Corresponding points in the two feet could be found regardless of the shapes and sizes of the feet. With that, the temperature difference of the left and right feet could be obtained.