950 resultados para Gaussian random fields
Resumo:
Objective Evaluate the effectiveness and robustness of Anonym, a tool for de-identifying free-text health records based on conditional random fields classifiers informed by linguistic and lexical features, as well as features extracted by pattern matching techniques. De-identification of personal health information in electronic health records is essential for the sharing and secondary usage of clinical data. De-identification tools that adapt to different sources of clinical data are attractive as they would require minimal intervention to guarantee high effectiveness. Methods and Materials The effectiveness and robustness of Anonym are evaluated across multiple datasets, including the widely adopted Integrating Biology and the Bedside (i2b2) dataset, used for evaluation in a de-identification challenge. The datasets used here vary in type of health records, source of data, and their quality, with one of the datasets containing optical character recognition errors. Results Anonym identifies and removes up to 96.6% of personal health identifiers (recall) with a precision of up to 98.2% on the i2b2 dataset, outperforming the best system proposed in the i2b2 challenge. The effectiveness of Anonym across datasets is found to depend on the amount of information available for training. Conclusion Findings show that Anonym compares to the best approach from the 2006 i2b2 shared task. It is easy to retrain Anonym with new datasets; if retrained, the system is robust to variations of training size, data type and quality in presence of sufficient training data.
Resumo:
This thesis investigates the fusion of 3D visual information with 2D image cues to provide 3D semantic maps of large-scale environments in which a robot traverses for robotic applications. A major theme of this thesis was to exploit the availability of 3D information acquired from robot sensors to improve upon 2D object classification alone. The proposed methods have been evaluated on several indoor and outdoor datasets collected from mobile robotic platforms including a quadcopter and ground vehicle covering several kilometres of urban roads.
Resumo:
Objective This paper presents an automatic active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort, and (2) the robustness of incremental active learning framework across different selection criteria and datasets is determined. Materials and methods The comparative performance of an active learning framework and a fully supervised approach were investigated to study how active learning reduces the annotation effort while achieving the same effectiveness as a supervised approach. Conditional Random Fields as the supervised method, and least confidence and information density as two selection criteria for active learning framework were used. The effect of incremental learning vs. standard learning on the robustness of the models within the active learning framework with different selection criteria was also investigated. Two clinical datasets were used for evaluation: the i2b2/VA 2010 NLP challenge and the ShARe/CLEF 2013 eHealth Evaluation Lab. Results The annotation effort saved by active learning to achieve the same effectiveness as supervised learning is up to 77%, 57%, and 46% of the total number of sequences, tokens, and concepts, respectively. Compared to the Random sampling baseline, the saving is at least doubled. Discussion Incremental active learning guarantees robustness across all selection criteria and datasets. The reduction of annotation effort is always above random sampling and longest sequence baselines. Conclusion Incremental active learning is a promising approach for building effective and robust medical concept extraction models, while significantly reducing the burden of manual annotation.
Resumo:
This paper presents a new active learning query strategy for information extraction, called Domain Knowledge Informativeness (DKI). Active learning is often used to reduce the amount of annotation effort required to obtain training data for machine learning algorithms. A key component of an active learning approach is the query strategy, which is used to iteratively select samples for annotation. Knowledge resources have been used in information extraction as a means to derive additional features for sample representation. DKI is, however, the first query strategy that exploits such resources to inform sample selection. To evaluate the merits of DKI, in particular with respect to the reduction in annotation effort that the new query strategy allows to achieve, we conduct a comprehensive empirical comparison of active learning query strategies for information extraction within the clinical domain. The clinical domain was chosen for this work because of the availability of extensive structured knowledge resources which have often been exploited for feature generation. In addition, the clinical domain offers a compelling use case for active learning because of the necessary high costs and hurdles associated with obtaining annotations in this domain. Our experimental findings demonstrated that 1) amongst existing query strategies, the ones based on the classification model’s confidence are a better choice for clinical data as they perform equally well with a much lighter computational load, and 2) significant reductions in annotation effort are achievable by exploiting knowledge resources within active learning query strategies, with up to 14% less tokens and concepts to manually annotate than with state-of-the-art query strategies.
Resumo:
Background: Magnetic resonance diffusion tensor imaging (DTI) shows promise in the early detection of microstructural pathophysiological changes in the brain. Objectives: To measure microstructural differences in the brains of participants with amnestic mild cognitive impairment (MCI) compared with an age-matched control group using an optimised DTI technique with fully automated image analysis tools and to investigate the correlation between diffusivity measurements and neuropsychological performance scores across groups. Methods: 34 participants (17 participants with MCI, 17 healthy elderly adults) underwent magnetic resonance imaging (MRI)-based DTI. To control for the effects of anatomical variation, diffusion images of all participants were registered to standard anatomical space. Significant statistical differences in diffusivity measurements between the two groups were determined on a pixel-by-pixel basis using gaussian random field theory. Results: Significantly raised mean diffusivity measurements (p<0.001) were observed in the left and right entorhinal cortices (BA28), posterior occipital-parietal cortex (BA18 and BA19), right parietal supramarginal gyrus (BA40) and right frontal precentral gyri (BA4 and BA6) in participants with MCI. With respect to fractional anisotropy, participants with MCI had significantly reduced measurements (p<0.001) in the limbic parahippocampal subgyral white matter, right thalamus and left posterior cingulate. Pearson's correlation coefficients calculated across all participants showed significant correlations between neuropsychological assessment scores and regional measurements of mean diffusivity and fractional anisotropy. Conclusions: DTI-based diffusivity measures may offer a sensitive method of detecting subtle microstructural brain changes associated with preclinical Alzheimer's disease.
Resumo:
Within online learning communities, receiving timely and meaningful insights into the quality of learning activities is an important part of an effective educational experience. Commonly adopted methods – such as the Community of Inquiry framework – rely on manual coding of online discussion transcripts, which is a costly and time consuming process. There are several efforts underway to enable the automated classification of online discussion messages using supervised machine learning, which would enable the real-time analysis of interactions occurring within online learning communities. This paper investigates the importance of incorporating features that utilise the structure of on-line discussions for the classification of "cognitive presence" – the central dimension of the Community of Inquiry framework focusing on the quality of students' critical thinking within online learning communities. We implemented a Conditional Random Field classification solution, which incorporates structural features that may be useful in increasing classification performance over other implementations. Our approach leads to an improvement in classification accuracy of 5.8% over current existing techniques when tested on the same dataset, with a precision and recall of 0.630 and 0.504 respectively.
Resumo:
Information available on company websites can help people navigate to the offices of groups and individuals within the company. Automatically retrieving this within-organisation spatial information is a challenging AI problem This paper introduces a novel unsupervised pattern-based method to extract within-organisation spatial information by taking advantage of HTML structure patterns, together with a novel Conditional Random Fields (CRF) based method to identify different categories of within-organisation spatial information. The results show that the proposed method can achieve a high performance in terms of F-Score, indicating that this purely syntactic method based on web search and an analysis of HTML structure is well-suited for retrieving within-organisation spatial information.
Resumo:
From the autocorrelation function of geomagnetic polarity intervals, it is shown that the field reversal intervals are not independent but form a process akin to the Markov process, where the random input to the model is itself a moving average process. The input to the moving average model is, however, an independent Gaussian random sequence. All the parameters in this model of the geomagnetic field reversal have been estimated. In physical terms this model implies that the mechanism of reversal possesses a memory.
Resumo:
Spike detection in neural recordings is the initial step in the creation of brain machine interfaces. The Teager energy operator (TEO) treats a spike as an increase in the `local' energy and detects this increase. The performance of TEO in detecting action potential spikes suffers due to its sensitivity to the frequency of spikes in the presence of noise which is present in microelectrode array (MEA) recordings. The multiresolution TEO (mTEO) method overcomes this shortcoming of the TEO by tuning the parameter k to an optimal value m so as to match to frequency of the spike. In this paper, we present an algorithm for the mTEO using the multiresolution structure of wavelets along with inbuilt lowpass filtering of the subband signals. The algorithm is efficient and can be implemented for real-time processing of neural signals for spike detection. The performance of the algorithm is tested on a simulated neural signal with 10 spike templates obtained from [14]. The background noise is modeled as a colored Gaussian random process. Using the noise standard deviation and autocorrelation functions obtained from recorded data, background noise was simulated by an autoregressive (AR(5)) filter. The simulations show a spike detection accuracy of 90%and above with less than 5% false positives at an SNR of 2.35 dB as compared to 80% accuracy and 10% false positives reported [6] on simulated neural signals.
Resumo:
When the size (L) of a one-dimensional metallic conductor is less than the correlation length λ-1 of the Gaussian random potential, one expects transport properties to show ballistic behaviour. Using an invariant imbedding method, we study the exact distribution of the resistance, of the phase θ of the reflection amplitude of an incident electron of wave number k0, and of dθ/dk0, for λL ll 1. The resistance is non-self-averaging and the n-th resistance moment varies periodically as (1 - cos 2k0L)n. The charge fluctuation noise, determined by the distribution of dθ/dk0, is constant at low frequencies.
Resumo:
In this paper, we propose a novel and efficient algorithm for modelling sub-65 nm clock interconnect-networks in the presence of process variation. We develop a method for delay analysis of interconnects considering the impact of Gaussian metal process variations. The resistance and capacitance of a distributed RC line are expressed as correlated Gaussian random variables which are then used to compute the standard deviation of delay Probability Distribution Function (PDF) at all nodes in the interconnect network. Main objective is to find delay PDF at a cheaper cost. Convergence of this approach is in probability distribution but not in mean of delay. We validate our approach against SPICE based Monte Carlo simulations while the current method entails significantly lower computational cost.
Resumo:
Columns which have stochastically distributed Young's modulus and mass density and are subjected to deterministic periodic axial loadings are considered. The general case of a column supported on a Winkler elastic foundation of random stiffness and also on discrete elastic supports which are also random is considered. Material property fluctuations are modeled as independent one-dimensional univariate homogeneous real random fields in space. In addition to autocorrelation functions or their equivalent power spectral density functions, the input random fields are characterized by scale of fluctuations or variance functions for their second order properties. The foundation stiffness coefficient and the stiffnesses of discrete elastic supports are treated to constitute independent random variables. The system equations of boundary frequencies are obtained using Bolotin's method for deterministic systems. Stochastic FEM is used to obtain the discrete system with random as well as periodic coefficients. Statistical properties of boundary frequencies are derived in terms of input parameter statistics. A complete covariance structure is obtained. The equations developed are illustrated using a numerical example employing a practical correlation structure.
Resumo:
A computational scheme for determining the dynamic stiffness coefficients of a linear, inclined, translating and viscously/hysteretically damped cable element is outlined. Also taken into account is the coupling between inplane transverse and longitudinal forms of cable vibration. The scheme is based on conversion of the governing set of quasistatic boundary value problems into a larger equivalent set of initial value problems, which are subsequently numerically integrated in a spatial domain using marching algorithms. Numerical results which bring out the nature of the dynamic stiffness coefficients are presented. A specific example of random vibration analysis of a long span cable subjected to earthquake support motions modeled as vector gaussian random processes is also discussed. The approach presented is versatile and capable of handling many complicating effects in cable dynamics in a unified manner.
Resumo:
Nonconservatively loaded columns. which have stochastically distributed material property values and stochastic loadings in space are considered. Young's modulus and mass density are treated to constitute random fields. The support stiffness coefficient and tip follower load are considered to be random variables. The fluctuations of external and distributed loadings are considered to constitute a random field. The variational formulation is adopted to get the differential equation and boundary conditions. The non self-adjoint operators are used at the boundary of the regularity domain. The statistics of vibration frequencies and modes are obtained using the standard perturbation method, by treating the fluctuations to be stochastic perturbations. Linear dependence of vibration and stability parameters over property value fluctuations and loading fluctuations are assumed. Bounds for the statistics of vibration frequencies are obtained. The critical load is first evaluated for the averaged problem and the corresponding eigenvalue statistics are sought. Then, the frequency equation is employed to transform the eigenvalue statistics to critical load statistics. Specialization of the general procedure to Beck, Leipholz and Pfluger columns is carried out. For Pfluger column, nonlinear transformations are avoided by directly expressing the critical load statistics in terms of input variable statistics.
Resumo:
The Leipholz column which is having the Young modulus and mass per unit length as stochastic processes and also the distributed tangential follower load behaving stochastically is considered. The non self-adjoint differential equation and boundary conditions are considered to have random field coefficients. The standard perturbation method is employed. The non self-adjoint operators are used within the regularity domain. Full covariance structure of the free vibration eigenvalues and critical loads is derived in terms of second order properties of input random fields characterizing the system parameter fluctuations. The mean value of critical load is calculated using the averaged problem and the corresponding eigenvalue statistics are sought. Through the frequency equation a transformation is done to yield load parameter statistics. A numerical study incorporating commonly observed correlation models is reported which illustrates the full potentials of the derived expressions.