47 resultados para data gathering algorithm
em Université de Lausanne, Switzerland
Resumo:
An online algorithm for determining respiratory mechanics in patients using non-invasive ventilation (NIV) in pressure support mode was developed and embedded in a ventilator system. Based on multiple linear regression (MLR) of respiratory data, the algorithm was tested on a patient bench model under conditions with and without leak and simulating a variety of mechanics. Bland-Altman analysis indicates reliable measures of compliance across the clinical range of interest (± 11-18% limits of agreement). Resistance measures showed large quantitative errors (30-50%), however, it was still possible to qualitatively distinguish between normal and obstructive resistances. This outcome provides clinically significant information for ventilator titration and patient management.
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. Recent advances in machine learning offer a novel approach to model spatial distribution of petrophysical properties in complex reservoirs alternative to geostatistics. The approach is based of semisupervised learning, which handles both ?labelled? observed data and ?unlabelled? data, which have no measured value but describe prior knowledge and other relevant data in forms of manifolds in the input space where the modelled property is continuous. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic geological features and describe stochastic variability and non-uniqueness of spatial properties. On the other hand, it is able to capture and preserve key spatial dependencies such as connectivity of high permeability geo-bodies, which is often difficult in contemporary petroleum reservoir studies. Semi-supervised SVR as a data driven algorithm is designed to integrate various kind of conditioning information and learn dependences from it. The semi-supervised SVR model is able to balance signal/noise levels and control the prior belief in available data. In this work, stochastic semi-supervised SVR geomodel is integrated into Bayesian framework to quantify uncertainty of reservoir production with multiple models fitted to past dynamic observations (production history). Multiple history matched models are obtained using stochastic sampling and/or MCMC-based inference algorithms, which evaluate posterior probability distribution. Uncertainty of the model is described by posterior probability of the model parameters that represent key geological properties: spatial correlation size, continuity strength, smoothness/variability of spatial property distribution. The developed approach is illustrated with a fluvial reservoir case. The resulting probabilistic production forecasts are described by uncertainty envelopes. The paper compares the performance of the models with different combinations of unknown parameters and discusses sensitivity issues.
Resumo:
Detecting local differences between groups of connectomes is a great challenge in neuroimaging, because the large number of tests that have to be performed and the impact on multiplicity correction. Any available information should be exploited to increase the power of detecting true between-group effects. We present an adaptive strategy that exploits the data structure and the prior information concerning positive dependence between nodes and connections, without relying on strong assumptions. As a first step, we decompose the brain network, i.e., the connectome, into subnetworks and we apply a screening at the subnetwork level. The subnetworks are defined either according to prior knowledge or by applying a data driven algorithm. Given the results of the screening step, a filtering is performed to seek real differences at the node/connection level. The proposed strategy could be used to strongly control either the family-wise error rate or the false discovery rate. We show by means of different simulations the benefit of the proposed strategy, and we present a real application of comparing connectomes of preschool children and adolescents.
Resumo:
The atomic force microscope is not only a very convenient tool for studying the topography of different samples, but it can also be used to measure specific binding forces between molecules. For this purpose, one type of molecule is attached to the tip and the other one to the substrate. Approaching the tip to the substrate allows the molecules to bind together. Retracting the tip breaks the newly formed bond. The rupture of a specific bond appears in the force-distance curves as a spike from which the binding force can be deduced. In this article we present an algorithm to automatically process force-distance curves in order to obtain bond strength histograms. The algorithm is based on a fuzzy logic approach that permits an evaluation of "quality" for every event and makes the detection procedure much faster compared to a manual selection. In this article, the software has been applied to measure the binding strength between tubuline and microtubuline associated proteins.
Resumo:
High-throughput technologies are now used to generate more than one type of data from the same biological samples. To properly integrate such data, we propose using co-modules, which describe coherent patterns across paired data sets, and conceive several modular methods for their identification. We first test these methods using in silico data, demonstrating that the integrative scheme of our Ping-Pong Algorithm uncovers drug-gene associations more accurately when considering noisy or complex data. Second, we provide an extensive comparative study using the gene-expression and drug-response data from the NCI-60 cell lines. Using information from the DrugBank and the Connectivity Map databases we show that the Ping-Pong Algorithm predicts drug-gene associations significantly better than other methods. Co-modules provide insights into possible mechanisms of action for a wide range of drugs and suggest new targets for therapy
Resumo:
Des progrès significatifs ont été réalisés dans le domaine de l'intégration quantitative des données géophysique et hydrologique l'échelle locale. Cependant, l'extension à de plus grandes échelles des approches correspondantes constitue encore un défi majeur. Il est néanmoins extrêmement important de relever ce défi pour développer des modèles fiables de flux des eaux souterraines et de transport de contaminant. Pour résoudre ce problème, j'ai développé une technique d'intégration des données hydrogéophysiques basée sur une procédure bayésienne de simulation séquentielle en deux étapes. Cette procédure vise des problèmes à plus grande échelle. L'objectif est de simuler la distribution d'un paramètre hydraulique cible à partir, d'une part, de mesures d'un paramètre géophysique pertinent qui couvrent l'espace de manière exhaustive, mais avec une faible résolution (spatiale) et, d'autre part, de mesures locales de très haute résolution des mêmes paramètres géophysique et hydraulique. Pour cela, mon algorithme lie dans un premier temps les données géophysiques de faible et de haute résolution à travers une procédure de réduction déchelle. Les données géophysiques régionales réduites sont ensuite reliées au champ du paramètre hydraulique à haute résolution. J'illustre d'abord l'application de cette nouvelle approche dintégration des données à une base de données synthétiques réaliste. Celle-ci est constituée de mesures de conductivité hydraulique et électrique de haute résolution réalisées dans les mêmes forages ainsi que destimations des conductivités électriques obtenues à partir de mesures de tomographic de résistivité électrique (ERT) sur l'ensemble de l'espace. Ces dernières mesures ont une faible résolution spatiale. La viabilité globale de cette méthode est testée en effectuant les simulations de flux et de transport au travers du modèle original du champ de conductivité hydraulique ainsi que du modèle simulé. Les simulations sont alors comparées. Les résultats obtenus indiquent que la procédure dintégration des données proposée permet d'obtenir des estimations de la conductivité en adéquation avec la structure à grande échelle ainsi que des predictions fiables des caractéristiques de transports sur des distances de moyenne à grande échelle. Les résultats correspondant au scénario de terrain indiquent que l'approche d'intégration des données nouvellement mise au point est capable d'appréhender correctement les hétérogénéitées à petite échelle aussi bien que les tendances à gande échelle du champ hydraulique prévalent. Les résultats montrent également une flexibilté remarquable et une robustesse de cette nouvelle approche dintégration des données. De ce fait, elle est susceptible d'être appliquée à un large éventail de données géophysiques et hydrologiques, à toutes les gammes déchelles. Dans la deuxième partie de ma thèse, j'évalue en détail la viabilité du réechantillonnage geostatique séquentiel comme mécanisme de proposition pour les méthodes Markov Chain Monte Carlo (MCMC) appliquées à des probmes inverses géophysiques et hydrologiques de grande dimension . L'objectif est de permettre une quantification plus précise et plus réaliste des incertitudes associées aux modèles obtenus. En considérant une série dexemples de tomographic radar puits à puits, j'étudie deux classes de stratégies de rééchantillonnage spatial en considérant leur habilité à générer efficacement et précisément des réalisations de la distribution postérieure bayésienne. Les résultats obtenus montrent que, malgré sa popularité, le réechantillonnage séquentiel est plutôt inefficace à générer des échantillons postérieurs indépendants pour des études de cas synthétiques réalistes, notamment pour le cas assez communs et importants où il existe de fortes corrélations spatiales entre le modèle et les paramètres. Pour résoudre ce problème, j'ai développé un nouvelle approche de perturbation basée sur une déformation progressive. Cette approche est flexible en ce qui concerne le nombre de paramètres du modèle et lintensité de la perturbation. Par rapport au rééchantillonage séquentiel, cette nouvelle approche s'avère être très efficace pour diminuer le nombre requis d'itérations pour générer des échantillons indépendants à partir de la distribution postérieure bayésienne. - Significant progress has been made with regard to the quantitative integration of geophysical and hydrological data at the local scale. However, extending corresponding approaches beyond the local scale still represents a major challenge, yet is critically important for the development of reliable groundwater flow and contaminant transport models. To address this issue, I have developed a hydrogeophysical data integration technique based on a two-step Bayesian sequential simulation procedure that is specifically targeted towards larger-scale problems. The objective is to simulate the distribution of a target hydraulic parameter based on spatially exhaustive, but poorly resolved, measurements of a pertinent geophysical parameter and locally highly resolved, but spatially sparse, measurements of the considered geophysical and hydraulic parameters. To this end, my algorithm links the low- and high-resolution geophysical data via a downscaling procedure before relating the downscaled regional-scale geophysical data to the high-resolution hydraulic parameter field. I first illustrate the application of this novel data integration approach to a realistic synthetic database consisting of collocated high-resolution borehole measurements of the hydraulic and electrical conductivities and spatially exhaustive, low-resolution electrical conductivity estimates obtained from electrical resistivity tomography (ERT). The overall viability of this method is tested and verified by performing and comparing flow and transport simulations through the original and simulated hydraulic conductivity fields. The corresponding results indicate that the proposed data integration procedure does indeed allow for obtaining faithful estimates of the larger-scale hydraulic conductivity structure and reliable predictions of the transport characteristics over medium- to regional-scale distances. The approach is then applied to a corresponding field scenario consisting of collocated high- resolution measurements of the electrical conductivity, as measured using a cone penetrometer testing (CPT) system, and the hydraulic conductivity, as estimated from electromagnetic flowmeter and slug test measurements, in combination with spatially exhaustive low-resolution electrical conductivity estimates obtained from surface-based electrical resistivity tomography (ERT). The corresponding results indicate that the newly developed data integration approach is indeed capable of adequately capturing both the small-scale heterogeneity as well as the larger-scale trend of the prevailing hydraulic conductivity field. The results also indicate that this novel data integration approach is remarkably flexible and robust and hence can be expected to be applicable to a wide range of geophysical and hydrological data at all scale ranges. In the second part of my thesis, I evaluate in detail the viability of sequential geostatistical resampling as a proposal mechanism for Markov Chain Monte Carlo (MCMC) methods applied to high-dimensional geophysical and hydrological inverse problems in order to allow for a more accurate and realistic quantification of the uncertainty associated with the thus inferred models. Focusing on a series of pertinent crosshole georadar tomographic examples, I investigated two classes of geostatistical resampling strategies with regard to their ability to efficiently and accurately generate independent realizations from the Bayesian posterior distribution. The corresponding results indicate that, despite its popularity, sequential resampling is rather inefficient at drawing independent posterior samples for realistic synthetic case studies, notably for the practically common and important scenario of pronounced spatial correlation between model parameters. To address this issue, I have developed a new gradual-deformation-based perturbation approach, which is flexible with regard to the number of model parameters as well as the perturbation strength. Compared to sequential resampling, this newly proposed approach was proven to be highly effective in decreasing the number of iterations required for drawing independent samples from the Bayesian posterior distribution.
Resumo:
Significant progress has been made with regard to the quantitative integration of geophysical and hydrological data at the local scale. However, extending the corresponding approaches to the scale of a field site represents a major, and as-of-yet largely unresolved, challenge. To address this problem, we have developed downscaling procedure based on a non-linear Bayesian sequential simulation approach. The main objective of this algorithm is to estimate the value of the sparsely sampled hydraulic conductivity at non-sampled locations based on its relation to the electrical conductivity logged at collocated wells and surface resistivity measurements, which are available throughout the studied site. The in situ relationship between the hydraulic and electrical conductivities is described through a non-parametric multivariatekernel density function. Then a stochastic integration of low-resolution, large-scale electrical resistivity tomography (ERT) data in combination with high-resolution, local-scale downhole measurements of the hydraulic and electrical conductivities is applied. The overall viability of this downscaling approach is tested and validated by comparing flow and transport simulation through the original and the upscaled hydraulic conductivity fields. Our results indicate that the proposed procedure allows obtaining remarkably faithful estimates of the regional-scale hydraulic conductivity structure and correspondingly reliable predictions of the transport characteristics over relatively long distances.
Resumo:
Despite the central role of quantitative PCR (qPCR) in the quantification of mRNA transcripts, most analyses of qPCR data are still delegated to the software that comes with the qPCR apparatus. This is especially true for the handling of the fluorescence baseline. This article shows that baseline estimation errors are directly reflected in the observed PCR efficiency values and are thus propagated exponentially in the estimated starting concentrations as well as 'fold-difference' results. Because of the unknown origin and kinetics of the baseline fluorescence, the fluorescence values monitored in the initial cycles of the PCR reaction cannot be used to estimate a useful baseline value. An algorithm that estimates the baseline by reconstructing the log-linear phase downward from the early plateau phase of the PCR reaction was developed and shown to lead to very reproducible PCR efficiency values. PCR efficiency values were determined per sample by fitting a regression line to a subset of data points in the log-linear phase. The variability, as well as the bias, in qPCR results was significantly reduced when the mean of these PCR efficiencies per amplicon was used in the calculation of an estimate of the starting concentration per sample.
Resumo:
SUMMARY: Large sets of data, such as expression profiles from many samples, require analytic tools to reduce their complexity. The Iterative Signature Algorithm (ISA) is a biclustering algorithm. It was designed to decompose a large set of data into so-called 'modules'. In the context of gene expression data, these modules consist of subsets of genes that exhibit a coherent expression profile only over a subset of microarray experiments. Genes and arrays may be attributed to multiple modules and the level of required coherence can be varied resulting in different 'resolutions' of the modular mapping. In this short note, we introduce two BioConductor software packages written in GNU R: The isa2 package includes an optimized implementation of the ISA and the eisa package provides a convenient interface to run the ISA, visualize its output and put the biclusters into biological context. Potential users of these packages are all R and BioConductor users dealing with tabular (e.g. gene expression) data. AVAILABILITY: http://www.unil.ch/cbg/ISA CONTACT: sven.bergmann@unil.ch
Resumo:
The geometry and connectivity of fractures exert a strong influence on the flow and transport properties of fracture networks. We present a novel approach to stochastically generate three-dimensional discrete networks of connected fractures that are conditioned to hydrological and geophysical data. A hierarchical rejection sampling algorithm is used to draw realizations from the posterior probability density function at different conditioning levels. The method is applied to a well-studied granitic formation using data acquired within two boreholes located 6 m apart. The prior models include 27 fractures with their geometry (position and orientation) bounded by information derived from single-hole ground-penetrating radar (GPR) data acquired during saline tracer tests and optical televiewer logs. Eleven cross-hole hydraulic connections between fractures in neighboring boreholes and the order in which the tracer arrives at different fractures are used for conditioning. Furthermore, the networks are conditioned to the observed relative hydraulic importance of the different hydraulic connections by numerically simulating the flow response. Among the conditioning data considered, constraints on the relative flow contributions were the most effective in determining the variability among the network realizations. Nevertheless, we find that the posterior model space is strongly determined by the imposed prior bounds. Strong prior bounds were derived from GPR measurements and helped to make the approach computationally feasible. We analyze a set of 230 posterior realizations that reproduce all data given their uncertainties assuming the same uniform transmissivity in all fractures. The posterior models provide valuable statistics on length scales and density of connected fractures, as well as their connectivity. In an additional analysis, effective transmissivity estimates of the posterior realizations indicate a strong influence of the DFN structure, in that it induces large variations of equivalent transmissivities between realizations. The transmissivity estimates agree well with previous estimates at the site based on pumping, flowmeter and temperature data.
Resumo:
The multiscale finite volume (MsFV) method has been developed to efficiently solve large heterogeneous problems (elliptic or parabolic); it is usually employed for pressure equations and delivers conservative flux fields to be used in transport problems. The method essentially relies on the hypothesis that the (fine-scale) problem can be reasonably described by a set of local solutions coupled by a conservative global (coarse-scale) problem. In most cases, the boundary conditions assigned for the local problems are satisfactory and the approximate conservative fluxes provided by the method are accurate. In numerically challenging cases, however, a more accurate localization is required to obtain a good approximation of the fine-scale solution. In this paper we develop a procedure to iteratively improve the boundary conditions of the local problems. The algorithm relies on the data structure of the MsFV method and employs a Krylov-subspace projection method to obtain an unconditionally stable scheme and accelerate convergence. Two variants are considered: in the first, only the MsFV operator is used; in the second, the MsFV operator is combined in a two-step method with an operator derived from the problem solved to construct the conservative flux field. The resulting iterative MsFV algorithms allow arbitrary reduction of the solution error without compromising the construction of a conservative flux field, which is guaranteed at any iteration. Since it converges to the exact solution, the method can be regarded as a linear solver. In this context, the schemes proposed here can be viewed as preconditioned versions of the Generalized Minimal Residual method (GMRES), with a very peculiar characteristic that the residual on the coarse grid is zero at any iteration (thus conservative fluxes can be obtained).
Resumo:
Given the very large amount of data obtained everyday through population surveys, much of the new research again could use this information instead of collecting new samples. Unfortunately, relevant data are often disseminated into different files obtained through different sampling designs. Data fusion is a set of methods used to combine information from different sources into a single dataset. In this article, we are interested in a specific problem: the fusion of two data files, one of which being quite small. We propose a model-based procedure combining a logistic regression with an Expectation-Maximization algorithm. Results show that despite the lack of data, this procedure can perform better than standard matching procedures.
Resumo:
Le rétinoblastome (Rb) est une tumeur provenant des cellules rétiniennes progénitrices des photorécepteurs. C'est la tumeur pédiatrique maligne la plus fréquente avec une incidence par naissance évaluée entre 1/15Ό00 et 1/20Ό00. Les enfants atteints de Rb sont diagnostiqué dans leur grande majorité avant l'âge de 4 ans, soit le temps nécessaire à la différentiation et à la maturation des photorécepteurs et donc à la disparition de la cellule d'origine du Rb. La survie du patient, la sauvegarde oculaire et le pronostic visuel restent excellents pour autant que le traitement ne soit pas différé. Dans sa variante non héréditaire (60%) le Rb est toujours unilatéral et sporadique. Le Rb héréditaire de transmission dominante autosomique (40%), se décline sous toutes les formes, familiale (10%) ou sporadique (30%), que l'atteinte soit unilatérale ou bilatérale. La majorité des mutations causales sont uniques et distribuées de façon aléatoire sur la totalité du gène RB1 sans région prédisposante. La détection de ces mutations est couteuse et chronophage, tout en présentant un taux de détection relativement bas; surtout dans les cas de Rb sporadiques unilatéraux. Dans le but d'identifier les patients présentant un risque réel de développer un Rb, et de réduire le nombre d'examens sous narcose requis pour le dépistage de la maladie chez les sujets à risque, nous avons développé une stratégie sensible, rapide, efficace et peu couteuse basée sur une analyse de l'haplotype intragénique. Cet algorithme prend en compte a) la perte d'hétérozygotie intratumorale du gène RB1, b) l'origine paternelle préférentielle des nouvelles mutations germinales et c) un risque a priori dérivé des données empiriques de Vogel. Pendant la période allant de janvier 1994 à décembre 2006, nous avons comparé l'apparition de nouveau Rb parmi la fratrie et la descendance de patient atteints au nombre de nouveaux cas attendus calculé par notre algorithme. 134 familles ont été étudiées. L'analyse moléculaire a été effectuée chez 570 personnes dont 99 patients âgés de moins de 4 ans et donc à risque de développer un Rb. Parmi cette cohorte, nous avons observé l'apparition d'un cas de Rb, alors que les risques cumulés a posteriori calculé par notre algorithme prédisait l'apparition de 1.77 nouveau cas. Dans cette étude, nous avons pu valider notre algorithme prédisant la récurrence de Rb chez les parents de 1er degré de patients atteints. Cet outil devrait grandement faciliter le conseil génétique ainsi que le suivi des patients à risque de développer un Rb, surtout dans les cas ou le séquençage direct du gène RB1 n'est pas disponible ou est resté non informatif. - Purpose: Most RBI mutations are unique and distributed throughout the RBI gene. Their detection can be time-consuming and the yield especially low in cases of conservatively-treated sporadic unilateral retinoblas-toma (Rb) patients. In order to identify patients with true risk of developing Rb, and to reduce the number of unnecessary examinations under anesthesia in all other cases, we developed a universal sensitive, efficient and cost-effective strategy based on intragenic haplotype analysis. Methods: This algorithm allows the calculation of the a posteriori risk of developing Rb and takes into account (a) RBI loss of heterozygosity in tumors, (b) preferential paternal origin of new germline mutations, (c) a priori risk derived from empirical data by Vogel, and (d) disease penetrance of 90% in most cases. We report the occurrence of Rb in first degree relatives of patients with sporadic Rb who visited the Jules Gonin Eye Hospital, Lausanne, Switzerland, from January 1994 to December 2006 compared to expected new cases of Rb using our algorithm. Results: A total of 134 families with sporadic Rb were enrolled; testing was performed in 570 individuals and 99 patients younger than 4 years old were identified. We observed one new case of Rb. Using our algorithm, the cumulated total a posteriori risk of recurrence was 1.77. Conclusions: This is the first time that linkage analysis has been validated to monitor the risk of recurrence in sporadic Rb. This should be a useful tool in genetic counseling, especially when direct RBI screening for mutations leaves a negative result or is unavailable.
Resumo:
The 2008 Data Fusion Contest organized by the IEEE Geoscience and Remote Sensing Data Fusion Technical Committee deals with the classification of high-resolution hyperspectral data from an urban area. Unlike in the previous issues of the contest, the goal was not only to identify the best algorithm but also to provide a collaborative effort: The decision fusion of the best individual algorithms was aiming at further improving the classification performances, and the best algorithms were ranked according to their relative contribution to the decision fusion. This paper presents the five awarded algorithms and the conclusions of the contest, stressing the importance of decision fusion, dimension reduction, and supervised classification methods, such as neural networks and support vector machines.
Resumo:
Objective: The Agency for Healthcare Research and Quality (AHRQ) developed Patient Safety Indicators (PSIs) for use with ICD-9-CM data. Many countries have adopted ICD-10 for coding hospital diagnoses. We conducted this study to develop an internationally harmonized ICD-10 coding algorithm for the AHRQ PSIs. Methods: The AHRQ PSI Version 2.1 has been translated into ICD-10-AM (Australian Modification), and PSI Version 3.0a has been independently translated into ICD-10-GM (German Modification). We converted these two country-specific coding algorithms into ICD-10-WHO (World Health Organization version) and combined them to form one master list. Members of an international expert panel-including physicians, professional medical coders, disease classification specialists, health services researchers, epidemiologists, and users of the PSI-independently evaluated this master list and rated each code as either "include," "exclude," or "uncertain," following the AHRQ PSI definitions. After summarizing the independent rating results, we held a face-to-face meeting to discuss codes for which there was no unanimous consensus and newly proposed codes. A modified Delphi method was employed to generate a final ICD-10 WHO coding list. Results: Of 20 PSIs, 15 that were based mainly on diagnosis codes were selected for translation. At the meeting, panelists discussed 794 codes for which consensus had not been achieved and 2,541 additional codes that were proposed by individual panelists for consideration prior to the meeting. Three documents were generated: a PSI ICD-10-WHO version-coding list, a list of issues for consideration on certain AHRQ PSIs and ICD-9-CM codes, and a recommendation to WHO to improve specification of some disease classifications. Conclusion: An ICD-10-WHO PSI coding list has been developed and structured in a manner similar to the AHRQ manual. Although face validity of the list has been ensured through a rigorous expert panel assessment, its true validity and applicability should be assessed internationally.