961 resultados para VARIABLE NEIGHBORHOOD RANDOM FIELDS


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present an approach to automatically de-identify health records. In our approach, personal health information is identified using a Conditional Random Fields machine learning classifier, a large set of linguistic and lexical features, and pattern matching techniques. Identified personal information is then removed from the reports. The de-identification of personal health information is fundamental for the sharing and secondary use of electronic health records, for example for data mining and disease monitoring. The effectiveness of our approach is first evaluated on the 2007 i2b2 Shared Task dataset, a widely adopted dataset for evaluating de-identification techniques. Subsequently, we investigate the robustness of the approach to limited training data; we study its effectiveness on different type and quality of data by evaluating the approach on scanned pathology reports from an Australian institution. This data contains optical character recognition errors, as well as linguistic conventions that differ from those contained in the i2b2 dataset, for example different date formats. The findings suggest that our approach compares to the best approach from the 2007 i2b2 Shared Task; in addition, the approach is found to be robust to variations of training size, data type and quality in presence of sufficient training data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective Evaluate the effectiveness and robustness of Anonym, a tool for de-identifying free-text health records based on conditional random fields classifiers informed by linguistic and lexical features, as well as features extracted by pattern matching techniques. De-identification of personal health information in electronic health records is essential for the sharing and secondary usage of clinical data. De-identification tools that adapt to different sources of clinical data are attractive as they would require minimal intervention to guarantee high effectiveness. Methods and Materials The effectiveness and robustness of Anonym are evaluated across multiple datasets, including the widely adopted Integrating Biology and the Bedside (i2b2) dataset, used for evaluation in a de-identification challenge. The datasets used here vary in type of health records, source of data, and their quality, with one of the datasets containing optical character recognition errors. Results Anonym identifies and removes up to 96.6% of personal health identifiers (recall) with a precision of up to 98.2% on the i2b2 dataset, outperforming the best system proposed in the i2b2 challenge. The effectiveness of Anonym across datasets is found to depend on the amount of information available for training. Conclusion Findings show that Anonym compares to the best approach from the 2006 i2b2 shared task. It is easy to retrain Anonym with new datasets; if retrained, the system is robust to variations of training size, data type and quality in presence of sufficient training data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis investigates the fusion of 3D visual information with 2D image cues to provide 3D semantic maps of large-scale environments in which a robot traverses for robotic applications. A major theme of this thesis was to exploit the availability of 3D information acquired from robot sensors to improve upon 2D object classification alone. The proposed methods have been evaluated on several indoor and outdoor datasets collected from mobile robotic platforms including a quadcopter and ground vehicle covering several kilometres of urban roads.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective This paper presents an automatic active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort, and (2) the robustness of incremental active learning framework across different selection criteria and datasets is determined. Materials and methods The comparative performance of an active learning framework and a fully supervised approach were investigated to study how active learning reduces the annotation effort while achieving the same effectiveness as a supervised approach. Conditional Random Fields as the supervised method, and least confidence and information density as two selection criteria for active learning framework were used. The effect of incremental learning vs. standard learning on the robustness of the models within the active learning framework with different selection criteria was also investigated. Two clinical datasets were used for evaluation: the i2b2/VA 2010 NLP challenge and the ShARe/CLEF 2013 eHealth Evaluation Lab. Results The annotation effort saved by active learning to achieve the same effectiveness as supervised learning is up to 77%, 57%, and 46% of the total number of sequences, tokens, and concepts, respectively. Compared to the Random sampling baseline, the saving is at least doubled. Discussion Incremental active learning guarantees robustness across all selection criteria and datasets. The reduction of annotation effort is always above random sampling and longest sequence baselines. Conclusion Incremental active learning is a promising approach for building effective and robust medical concept extraction models, while significantly reducing the burden of manual annotation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a new active learning query strategy for information extraction, called Domain Knowledge Informativeness (DKI). Active learning is often used to reduce the amount of annotation effort required to obtain training data for machine learning algorithms. A key component of an active learning approach is the query strategy, which is used to iteratively select samples for annotation. Knowledge resources have been used in information extraction as a means to derive additional features for sample representation. DKI is, however, the first query strategy that exploits such resources to inform sample selection. To evaluate the merits of DKI, in particular with respect to the reduction in annotation effort that the new query strategy allows to achieve, we conduct a comprehensive empirical comparison of active learning query strategies for information extraction within the clinical domain. The clinical domain was chosen for this work because of the availability of extensive structured knowledge resources which have often been exploited for feature generation. In addition, the clinical domain offers a compelling use case for active learning because of the necessary high costs and hurdles associated with obtaining annotations in this domain. Our experimental findings demonstrated that 1) amongst existing query strategies, the ones based on the classification model’s confidence are a better choice for clinical data as they perform equally well with a much lighter computational load, and 2) significant reductions in annotation effort are achievable by exploiting knowledge resources within active learning query strategies, with up to 14% less tokens and concepts to manually annotate than with state-of-the-art query strategies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Within online learning communities, receiving timely and meaningful insights into the quality of learning activities is an important part of an effective educational experience. Commonly adopted methods – such as the Community of Inquiry framework – rely on manual coding of online discussion transcripts, which is a costly and time consuming process. There are several efforts underway to enable the automated classification of online discussion messages using supervised machine learning, which would enable the real-time analysis of interactions occurring within online learning communities. This paper investigates the importance of incorporating features that utilise the structure of on-line discussions for the classification of "cognitive presence" – the central dimension of the Community of Inquiry framework focusing on the quality of students' critical thinking within online learning communities. We implemented a Conditional Random Field classification solution, which incorporates structural features that may be useful in increasing classification performance over other implementations. Our approach leads to an improvement in classification accuracy of 5.8% over current existing techniques when tested on the same dataset, with a precision and recall of 0.630 and 0.504 respectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information available on company websites can help people navigate to the offices of groups and individuals within the company. Automatically retrieving this within-organisation spatial information is a challenging AI problem This paper introduces a novel unsupervised pattern-based method to extract within-organisation spatial information by taking advantage of HTML structure patterns, together with a novel Conditional Random Fields (CRF) based method to identify different categories of within-organisation spatial information. The results show that the proposed method can achieve a high performance in terms of F-Score, indicating that this purely syntactic method based on web search and an analysis of HTML structure is well-suited for retrieving within-organisation spatial information.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Rapid granular flows are defined as flows in which the time scales for the particle interactions are small compared to the inverse of the strain rate, so that the particle interactions can be treated as instantaneous collisions. We first show, using Discrete Element simulations, that even very dense flows of sand or glass beads with volume fraction between 0.5 and 0.6 are rapid granular flows. Since collisions are instantaneous, a kinetic theory approach for the constitutive relations is most appropriate, and we present kinetic theory results for different microscopic models for particle interaction. The significant difference between granular flows and normal fluids is that energy is not conserved in a granular flow. The differences in the hydrodynamic modes caused by the non-conserved nature of energy are discussed. Going beyond the Boltzmann equation, the effect of correlations is studied using the ring kinetic approximation, and it is shown that the divergences in the viscometric coefficients, which are present for elastic fluids, are not present for granular flows because energy is not conserved. The hydrodynamic model is applied to the flow down an inclined plane. Since energy is not a conserved variable, the hydrodynamic fields in the bulk of a granular flow are obtained from the mass and momentum conservation equations alone. Energy becomes a relevant variable only in thin 'boundary layers' at the boundaries of the flow where there is a balance between the rates of conduction and dissipation. We show that such a hydrodynamic model can predict the salient features of a chute flow, including the flow initiation when the angle of inclination is increased above the 'friction angle', the striking lack of observable variation of the volume fraction with height, the observation of a steady flow only for certain restitution coefficients, and the density variations in the boundary layers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Columns which have stochastically distributed Young's modulus and mass density and are subjected to deterministic periodic axial loadings are considered. The general case of a column supported on a Winkler elastic foundation of random stiffness and also on discrete elastic supports which are also random is considered. Material property fluctuations are modeled as independent one-dimensional univariate homogeneous real random fields in space. In addition to autocorrelation functions or their equivalent power spectral density functions, the input random fields are characterized by scale of fluctuations or variance functions for their second order properties. The foundation stiffness coefficient and the stiffnesses of discrete elastic supports are treated to constitute independent random variables. The system equations of boundary frequencies are obtained using Bolotin's method for deterministic systems. Stochastic FEM is used to obtain the discrete system with random as well as periodic coefficients. Statistical properties of boundary frequencies are derived in terms of input parameter statistics. A complete covariance structure is obtained. The equations developed are illustrated using a numerical example employing a practical correlation structure.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Leipholz column which is having the Young modulus and mass per unit length as stochastic processes and also the distributed tangential follower load behaving stochastically is considered. The non self-adjoint differential equation and boundary conditions are considered to have random field coefficients. The standard perturbation method is employed. The non self-adjoint operators are used within the regularity domain. Full covariance structure of the free vibration eigenvalues and critical loads is derived in terms of second order properties of input random fields characterizing the system parameter fluctuations. The mean value of critical load is calculated using the averaged problem and the corresponding eigenvalue statistics are sought. Through the frequency equation a transformation is done to yield load parameter statistics. A numerical study incorporating commonly observed correlation models is reported which illustrates the full potentials of the derived expressions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we consider the application of belief propagation (BP) to achieve near-optimal signal detection in large multiple-input multiple-output (MIMO) systems at low complexities. Large-MIMO architectures based on spatial multiplexing (V-BLAST) as well as non-orthogonal space-time block codes(STBC) from cyclic division algebra (CDA) are considered. We adopt graphical models based on Markov random fields (MRF) and factor graphs (FG). In the MRF based approach, we use pairwise compatibility functions although the graphical models of MIMO systems are fully/densely connected. In the FG approach, we employ a Gaussian approximation (GA) of the multi-antenna interference, which significantly reduces the complexity while achieving very good performance for large dimensions. We show that i) both MRF and FG based BP approaches exhibit large-system behavior, where increasingly closer to optimal performance is achieved with increasing number of dimensions, and ii) damping of messages/beliefs significantly improves the bit error performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Structural Support Vector Machines (SSVMs) and Conditional Random Fields (CRFs) are popular discriminative methods used for classifying structured and complex objects like parse trees, image segments and part-of-speech tags. The datasets involved are very large dimensional, and the models designed using typical training algorithms for SSVMs and CRFs are non-sparse. This non-sparse nature of models results in slow inference. Thus, there is a need to devise new algorithms for sparse SSVM and CRF classifier design. Use of elastic net and L1-regularizer has already been explored for solving primal CRF and SSVM problems, respectively, to design sparse classifiers. In this work, we focus on dual elastic net regularized SSVM and CRF. By exploiting the weakly coupled structure of these convex programming problems, we propose a new sequential alternating proximal (SAP) algorithm to solve these dual problems. This algorithm works by sequentially visiting each training set example and solving a simple subproblem restricted to a small subset of variables associated with that example. Numerical experiments on various benchmark sequence labeling datasets demonstrate that the proposed algorithm scales well. Further, the classifiers designed are sparser than those designed by solving the respective primal problems and demonstrate comparable generalization performance. Thus, the proposed SAP algorithm is a useful alternative for sparse SSVM and CRF classifier design.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modeling the spatial variability that exists in pavement systems can be conveniently represented by means of random fields; in this study, a probabilistic analysis that considers the spatial variability, including the anisotropic nature of the pavement layer properties, is presented. The integration of the spatially varying log-normal random fields into a linear-elastic finite difference analysis has been achieved through the expansion optimal linear estimation method. For the estimation of the critical pavement responses, metamodels based on polynomial chaos expansion (PCE) are developed to replace the computationally expensive finite-difference model. The sparse polynomial chaos expansion based on an adaptive regression-based algorithm, and enhanced by the combined use of the global sensitivity analysis (GSA) is used, with significant savings in computational effort. The effect of anisotropy in each layer on the pavement responses was studied separately, and an effort is made to identify the pavement layer wherein the introduction of anisotropic characteristics results in the most significant impact on the critical strains. It is observed that the anisotropy in the base layer has a significant but diverse effect on both critical strains. While the compressive strain tends to be considerably higher than that observed for the isotropic section, the tensile strains show a decrease in the mean value with the introduction of base-layer anisotropy. Furthermore, asphalt-layer anisotropy also tends to decrease the critical tensile strain while having little effect on the critical compressive strain. (C) 2015 American Society of Civil Engineers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis presents a technique for obtaining the response of linear structural systems with parameter uncertainties subjected to either deterministic or random excitation. The parameter uncertainties are modeled as random variables or random fields, and are assumed to be time-independent. The new method is an extension of the deterministic finite element method to the space of random functions.

First, the general formulation of the method is developed, in the case where the excitation is deterministic in time. Next, the application of this formulation to systems satisfying the one-dimensional wave equation with uncertainty in their physical properties is described. A particular physical conceptualization of this equation is chosen for study, and some engineering applications are discussed in both an earthquake ground motion and a structural context.

Finally, the formulation of the new method is extended to include cases where the excitation is random in time. Application of this formulation to the random response of a primary-secondary system is described. It is found that parameter uncertainties can have a strong effect on the system response characteristics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As técnicas de injeção de traçadores têm sido amplamente utilizadas na investigação de escoamentos em meios porosos, principalmente em problemas envolvendo a simulação numérica de escoamentos miscíveis em reservatórios de petróleo e o transporte de contaminantes em aquíferos. Reservatórios subterrâneos são em geral heterogêneos e podem apresentar variações significativas das suas propriedades em várias escalas de comprimento. Estas variações espaciais são incorporadas às equações que governam o escoamento no interior do meio poroso por meio de campos aleatórios. Estes campos podem prover uma descrição das heterogeneidades da formação subterrânea nos casos onde o conhecimento geológico não fornece o detalhamento necessário para a predição determinística do escoamento através do meio poroso. Nesta tese é empregado um modelo lognormal para o campo de permeabilidades a fim de reproduzir-se a distribuição de permeabilidades do meio real, e a geração numérica destes campos aleatórios é feita pelo método da Soma Sucessiva de Campos Gaussianos Independentes (SSCGI). O objetivo principal deste trabalho é o estudo da quantificação de incertezas para o problema inverso do transporte de um traçador em um meio poroso heterogêneo empregando uma abordagem Bayesiana para a atualização dos campos de permeabilidades, baseada na medição dos valores da concentração espacial do traçador em tempos específicos. Um método do tipo Markov Chain Monte Carlo a dois estágios é utilizado na amostragem da distribuição de probabilidade a posteriori e a cadeia de Markov é construída a partir da reconstrução aleatória dos campos de permeabilidades. Na resolução do problema de pressão-velocidade que governa o escoamento empregase um método do tipo Elementos Finitos Mistos adequado para o cálculo acurado dos fluxos em campos de permeabilidades heterogêneos e uma abordagem Lagrangiana, o método Forward Integral Tracking (FIT), é utilizada na simulação numérica do problema do transporte do traçador. Resultados numéricos são obtidos e apresentados para um conjunto de realizações amostrais dos campos de permeabilidades.