854 resultados para data gathering algorithm


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract : This work is concerned with the development and application of novel unsupervised learning methods, having in mind two target applications: the analysis of forensic case data and the classification of remote sensing images. First, a method based on a symbolic optimization of the inter-sample distance measure is proposed to improve the flexibility of spectral clustering algorithms, and applied to the problem of forensic case data. This distance is optimized using a loss function related to the preservation of neighborhood structure between the input space and the space of principal components, and solutions are found using genetic programming. Results are compared to a variety of state-of--the-art clustering algorithms. Subsequently, a new large-scale clustering method based on a joint optimization of feature extraction and classification is proposed and applied to various databases, including two hyperspectral remote sensing images. The algorithm makes uses of a functional model (e.g., a neural network) for clustering which is trained by stochastic gradient descent. Results indicate that such a technique can easily scale to huge databases, can avoid the so-called out-of-sample problem, and can compete with or even outperform existing clustering algorithms on both artificial data and real remote sensing images. This is verified on small databases as well as very large problems. Résumé : Ce travail de recherche porte sur le développement et l'application de méthodes d'apprentissage dites non supervisées. Les applications visées par ces méthodes sont l'analyse de données forensiques et la classification d'images hyperspectrales en télédétection. Dans un premier temps, une méthodologie de classification non supervisée fondée sur l'optimisation symbolique d'une mesure de distance inter-échantillons est proposée. Cette mesure est obtenue en optimisant une fonction de coût reliée à la préservation de la structure de voisinage d'un point entre l'espace des variables initiales et l'espace des composantes principales. Cette méthode est appliquée à l'analyse de données forensiques et comparée à un éventail de méthodes déjà existantes. En second lieu, une méthode fondée sur une optimisation conjointe des tâches de sélection de variables et de classification est implémentée dans un réseau de neurones et appliquée à diverses bases de données, dont deux images hyperspectrales. Le réseau de neurones est entraîné à l'aide d'un algorithme de gradient stochastique, ce qui rend cette technique applicable à des images de très haute résolution. Les résultats de l'application de cette dernière montrent que l'utilisation d'une telle technique permet de classifier de très grandes bases de données sans difficulté et donne des résultats avantageusement comparables aux méthodes existantes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

For the last 2 decades, supertree reconstruction has been an active field of research and has seen the development of a large number of major algorithms. Because of the growing popularity of the supertree methods, it has become necessary to evaluate the performance of these algorithms to determine which are the best options (especially with regard to the supermatrix approach that is widely used). In this study, seven of the most commonly used supertree methods are investigated by using a large empirical data set (in terms of number of taxa and molecular markers) from the worldwide flowering plant family Sapindaceae. Supertree methods were evaluated using several criteria: similarity of the supertrees with the input trees, similarity between the supertrees and the total evidence tree, level of resolution of the supertree and computational time required by the algorithm. Additional analyses were also conducted on a reduced data set to test if the performance levels were affected by the heuristic searches rather than the algorithms themselves. Based on our results, two main groups of supertree methods were identified: on one hand, the matrix representation with parsimony (MRP), MinFlip, and MinCut methods performed well according to our criteria, whereas the average consensus, split fit, and most similar supertree methods showed a poorer performance or at least did not behave the same way as the total evidence tree. Results for the super distance matrix, that is, the most recent approach tested here, were promising with at least one derived method performing as well as MRP, MinFlip, and MinCut. The output of each method was only slightly improved when applied to the reduced data set, suggesting a correct behavior of the heuristic searches and a relatively low sensitivity of the algorithms to data set sizes and missing data. Results also showed that the MRP analyses could reach a high level of quality even when using a simple heuristic search strategy, with the exception of MRP with Purvis coding scheme and reversible parsimony. The future of supertrees lies in the implementation of a standardized heuristic search for all methods and the increase in computing power to handle large data sets. The latter would prove to be particularly useful for promising approaches such as the maximum quartet fit method that yet requires substantial computing power.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The primary goal of this project is to demonstrate the accuracy and utility of a freezing drizzle algorithm that can be implemented on roadway environmental sensing systems (ESSs). The types of problems related to the occurrence of freezing precipitation range from simple traffic delays to major accidents that involve fatalities. Freezing drizzle can also lead to economic impacts in communities with lost work hours, vehicular damage, and downed power lines. There are means for transportation agencies to perform preventive and reactive treatments to roadways, but freezing drizzle can be difficult to forecast accurately or even detect as weather radar and surface observation networks poorly observe these conditions. The detection of freezing precipitation is problematic and requires special instrumentation and analysis. The Federal Aviation Administration (FAA) development of aircraft anti-icing and deicing technologies has led to the development of a freezing drizzle algorithm that utilizes air temperature data and a specialized sensor capable of detecting ice accretion. However, at present, roadway ESSs are not capable of reporting freezing drizzle. This study investigates the use of the methods developed for the FAA and the National Weather Service (NWS) within a roadway environment to detect the occurrence of freezing drizzle using a combination of icing detection equipment and available ESS sensors. The work performed in this study incorporated the algorithm developed initially and further modified for work with the FAA for aircraft icing. The freezing drizzle algorithm developed for the FAA was applied using data from standard roadway ESSs. The work performed in this study lays the foundation for addressing the central question of interest to winter maintenance professionals as to whether it is possible to use roadside freezing precipitation detection (e.g., icing detection) sensors to determine the occurrence of pavement icing during freezing precipitation events and the rates at which this occurs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The temporal dynamics of species diversity are shaped by variations in the rates of speciation and extinction, and there is a long history of inferring these rates using first and last appearances of taxa in the fossil record. Understanding diversity dynamics critically depends on unbiased estimates of the unobserved times of speciation and extinction for all lineages, but the inference of these parameters is challenging due to the complex nature of the available data. Here, we present a new probabilistic framework to jointly estimate species-specific times of speciation and extinction and the rates of the underlying birth-death process based on the fossil record. The rates are allowed to vary through time independently of each other, and the probability of preservation and sampling is explicitly incorporated in the model to estimate the true lifespan of each lineage. We implement a Bayesian algorithm to assess the presence of rate shifts by exploring alternative diversification models. Tests on a range of simulated data sets reveal the accuracy and robustness of our approach against violations of the underlying assumptions and various degrees of data incompleteness. Finally, we demonstrate the application of our method with the diversification of the mammal family Rhinocerotidae and reveal a complex history of repeated and independent temporal shifts of both speciation and extinction rates, leading to the expansion and subsequent decline of the group. The estimated parameters of the birth-death process implemented here are directly comparable with those obtained from dated molecular phylogenies. Thus, our model represents a step towards integrating phylogenetic and fossil information to infer macroevolutionary processes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cross-hole radar tomography is a useful tool for mapping shallow subsurface electrical properties viz. dielectric permittivity and electrical conductivity. Common practice is to invert cross-hole radar data with ray-based tomographic algorithms using first arrival traveltimes and first cycle amplitudes. However, the resolution of conventional standard ray-based inversion schemes for cross-hole ground-penetrating radar (GPR) is limited because only a fraction of the information contained in the radar data is used. The resolution can be improved significantly by using a full-waveform inversion that considers the entire waveform, or significant parts thereof. A recently developed 2D time-domain vectorial full-waveform crosshole radar inversion code has been modified in the present study by allowing optimized acquisition setups that reduce the acquisition time and computational costs significantly. This is achieved by minimizing the number of transmitter points and maximizing the number of receiver positions. The improved algorithm was employed to invert cross-hole GPR data acquired within a gravel aquifer (4-10 m depth) in the Thur valley, Switzerland. The simulated traces of the final model obtained by the full-waveform inversion fit the observed traces very well in the lower part of the section and reasonably well in the upper part of the section. Compared to the ray-based inversion, the results from the full-waveform inversion show significantly higher resolution images. At either side, 2.5 m distance away from the cross-hole plane, borehole logs were acquired. There is a good correspondence between the conductivity tomograms and the natural gamma logs at the boundary of the gravel layer and the underlying lacustrine clay deposits. Using existing petrophysical models, the inversion results and neutron-neutron logs are converted to porosity. Without any additional calibration, the values obtained for the converted neutron-neutron logs and permittivity results are very close and similar vertical variations can be observed. The full-waveform inversion provides in both cases additional information about the subsurface. Due to the presence of the water table and associated refracted/reflected waves, the upper traces are not well fitted and the upper 2 m in the permittivity and conductivity tomograms are not reliably reconstructed because the unsaturated zone is not incorporated into the inversion domain.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fetal MRI reconstruction aims at finding a high-resolution image given a small set of low-resolution images. It is usually modeled as an inverse problem where the regularization term plays a central role in the reconstruction quality. Literature has considered several regularization terms s.a. Dirichlet/Laplacian energy, Total Variation (TV)- based energies and more recently non-local means. Although TV energies are quite attractive because of their ability in edge preservation, standard explicit steepest gradient techniques have been applied to optimize fetal-based TV energies. The main contribution of this work lies in the introduction of a well-posed TV algorithm from the point of view of convex optimization. Specifically, our proposed TV optimization algorithm or fetal reconstruction is optimal w.r.t. the asymptotic and iterative convergence speeds O(1/n2) and O(1/√ε), while existing techniques are in O(1/n2) and O(1/√ε). We apply our algorithm to (1) clinical newborn data, considered as ground truth, and (2) clinical fetal acquisitions. Our algorithm compares favorably with the literature in terms of speed and accuracy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The broad aim of biomedical science in the postgenomic era is to link genomic and phenotype information to allow deeper understanding of the processes leading from genomic changes to altered phenotype and disease. The EuroPhenome project (http://www.EuroPhenome.org) is a comprehensive resource for raw and annotated high-throughput phenotyping data arising from projects such as EUMODIC. EUMODIC is gathering data from the EMPReSSslim pipeline (http://www.empress.har.mrc.ac.uk/) which is performed on inbred mouse strains and knock-out lines arising from the EUCOMM project. The EuroPhenome interface allows the user to access the data via the phenotype or genotype. It also allows the user to access the data in a variety of ways, including graphical display, statistical analysis and access to the raw data via web services. The raw phenotyping data captured in EuroPhenome is annotated by an annotation pipeline which automatically identifies statistically different mutants from the appropriate baseline and assigns ontology terms for that specific test. Mutant phenotypes can be quickly identified using two EuroPhenome tools: PhenoMap, a graphical representation of statistically relevant phenotypes, and mining for a mutant using ontology terms. To assist with data definition and cross-database comparisons, phenotype data is annotated using combinations of terms from biological ontologies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Pyogenic liver abscess is a severe condition and a therapeutic challenge. Treatment failure may be due to an unrecognized ingested foreign body that migrated from the gastrointestinal tract. There has recently been a marked increase in the number of reported cases of this condition, but initial misdiagnosis as cryptogenic liver abscess still occurs in the majority of cases. We conducted the current study to characterize this entity and provide a diagnostic strategy applicable worldwide. To this end, data were collected from our case and from a systematic review that identified 59 well-described cases. Another systematic review identified series of cryptogenic-and Asian Klebsiella-liver abscess; these data were pooled and compared with the data from the cases of migrated foreign body liver abscess. The review points out the low diagnostic accuracy of history taking, modern imaging, and even surgical exploration. A fistula found through imaging procedures or endoscopy warrants surgical exploration. Findings suggestive of foreign body migration are symptoms of gastrointestinal perforation, computed tomography demonstration of a thickened gastrointestinal wall in continuity with the abscess, and adhesions seen during surgery. Treatment failure, left lobe location, unique location (that is, only 1 abscess location within the liver), and absence of underlying conditions also point to the diagnosis, as shown by comparison with the cryptogenic liver abscess series. This study demonstrates that migrated foreign body liver abscess is a specific entity, increasingly reported. It usually is not cured when unrecognized, and diagnosis is mainly delayed. This study provides what we consider the best available evidence for timely diagnosis with worldwide applicability. Increased awareness is required to treat this underestimated condition effectively, and further studies are needed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Surveillance of multiple congenital anomalies is considered to be more sensitive for the detection of new teratogens than surveillance of all or isolated congenital anomalies. Current literature proposes the manual review of all cases for classification into isolated or multiple congenital anomalies. METHODS: Multiple anomalies were defined as two or more major congenital anomalies, excluding sequences and syndromes. A computer algorithm for classification of major congenital anomaly cases in the EUROCAT database according to International Classification of Diseases (ICD)v10 codes was programmed, further developed, and implemented for 1 year's data (2004) from 25 registries. The group of cases classified with potential multiple congenital anomalies were manually reviewed by three geneticists to reach a final agreement of classification as "multiple congenital anomaly" cases. RESULTS: A total of 17,733 cases with major congenital anomalies were reported giving an overall prevalence of major congenital anomalies at 2.17%. The computer algorithm classified 10.5% of all cases as "potentially multiple congenital anomalies". After manual review of these cases, 7% were agreed to have true multiple congenital anomalies. Furthermore, the algorithm classified 15% of all cases as having chromosomal anomalies, 2% as monogenic syndromes, and 76% as isolated congenital anomalies. The proportion of multiple anomalies varies by congenital anomaly subgroup with up to 35% of cases with bilateral renal agenesis. CONCLUSIONS: The implementation of the EUROCAT computer algorithm is a feasible, efficient, and transparent way to improve classification of congenital anomalies for surveillance and research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Geophysical techniques can help to bridge the inherent gap with regard to spatial resolution and the range of coverage that plagues classical hydrological methods. This has lead to the emergence of the new and rapidly growing field of hydrogeophysics. Given the differing sensitivities of various geophysical techniques to hydrologically relevant parameters and their inherent trade-off between resolution and range the fundamental usefulness of multi-method hydrogeophysical surveys for reducing uncertainties in data analysis and interpretation is widely accepted. A major challenge arising from such endeavors is the quantitative integration of the resulting vast and diverse database in order to obtain a unified model of the probed subsurface region that is internally consistent with all available data. To address this problem, we have developed a strategy towards hydrogeophysical data integration based on Monte-Carlo-type conditional stochastic simulation that we consider to be particularly suitable for local-scale studies characterized by high-resolution and high-quality datasets. Monte-Carlo-based optimization techniques are flexible and versatile, allow for accounting for a wide variety of data and constraints of differing resolution and hardness and thus have the potential of providing, in a geostatistical sense, highly detailed and realistic models of the pertinent target parameter distributions. Compared to more conventional approaches of this kind, our approach provides significant advancements in the way that the larger-scale deterministic information resolved by the hydrogeophysical data can be accounted for, which represents an inherently problematic, and as of yet unresolved, aspect of Monte-Carlo-type conditional simulation techniques. We present the results of applying our algorithm to the integration of porosity log and tomographic crosshole georadar data to generate stochastic realizations of the local-scale porosity structure. Our procedure is first tested on pertinent synthetic data and then applied to corresponding field data collected at the Boise Hydrogeophysical Research Site near Boise, Idaho, USA.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Through this article, we propose a mixed management of patients' medical records, so as to share responsibilities between the patient and the Medical Practitioner by making Patients responsible for the validation of their administrative information, and MPs responsible for the validation of their Patients' medical information. Our proposal can be considered a solution to the main problem faced by patients, health practitioners and the authorities, namely the gathering and updating of administrative and medical data belonging to the patient in order to accurately reconstitute a patient's medical history. This method is based on two processes. The aim of the first process is to provide a patient's administrative data, in order to know where and when the patient received care (name of the health structure or health practitioner, type of care: out patient or inpatient). The aim of the second process is to provide a patient's medical information and to validate it under the accountability of the Medical Practitioner with the help of the patient if needed. During these two processes, the patient's privacy will be ensured through cryptographic hash functions like the Secure Hash Algorithm, which allows pseudonymisation of a patient's identity. The proposed Medical Record Search Engines will be able to retrieve and to provide upon a request formulated by the Medical ractitioner all the available information concerning a patient who has received care in different health structures without divulging the patient's identity. Our method can lead to improved efficiency of personal medical record management under the mixed responsibilities of the patient and the MP.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Research in epistasis or gene-gene interaction detection for human complex traits has grown over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate different omics information sources into the epistasis screening to enhance power. The quest for gene-gene interactions poses severe multiple-testing problems. In this context, the maxT algorithm is one technique to control the false-positive rate. However, the memory needed by this algorithm rises linearly with the amount of hypothesis tests. Gene-gene interaction studies will require a memory proportional to the squared number of SNPs. A genome-wide epistasis search would therefore require terabytes of memory. Hence, cache problems are likely to occur, increasing the computation time. In this work we present a new version of maxT, requiring an amount of memory independent from the number of genetic effects to be investigated. This algorithm was implemented in C++ in our epistasis screening software MBMDR-3.0.3. We evaluate the new implementation in terms of memory efficiency and speed using simulated data. The software is illustrated on real-life data for Crohn’s disease. Results: In the case of a binary (affected/unaffected) trait, the parallel workflow of MBMDR-3.0.3 analyzes all gene-gene interactions with a dataset of 100,000 SNPs typed on 1000 individuals within 4 days and 9 hours, using 999 permutations of the trait to assess statistical significance, on a cluster composed of 10 blades, containing each four Quad-Core AMD Opteron(tm) Processor 2352 2.1 GHz. In the case of a continuous trait, a similar run takes 9 days. Our program found 14 SNP-SNP interactions with a multiple-testing corrected p-value of less than 0.05 on real-life Crohn’s disease (CD) data. Conclusions: Our software is the first implementation of the MB-MDR methodology able to solve large-scale SNP-SNP interactions problems within a few days, without using much memory, while adequately controlling the type I error rates. A new implementation to reach genome-wide epistasis screening is under construction. In the context of Crohn’s disease, MBMDR-3.0.3 could identify epistasis involving regions that are well known in the field and could be explained from a biological point of view. This demonstrates the power of our software to find relevant phenotype-genotype higher-order associations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A statewide study was performed to develop regional regression equations for estimating selected annual exceedance- probability statistics for ungaged stream sites in Iowa. The study area comprises streamgages located within Iowa and 50 miles beyond the State’s borders. Annual exceedanceprobability estimates were computed for 518 streamgages by using the expected moments algorithm to fit a Pearson Type III distribution to the logarithms of annual peak discharges for each streamgage using annual peak-discharge data through 2010. The estimation of the selected statistics included a Bayesian weighted least-squares/generalized least-squares regression analysis to update regional skew coefficients for the 518 streamgages. Low-outlier and historic information were incorporated into the annual exceedance-probability analyses, and a generalized Grubbs-Beck test was used to detect multiple potentially influential low flows. Also, geographic information system software was used to measure 59 selected basin characteristics for each streamgage. Regional regression analysis, using generalized leastsquares regression, was used to develop a set of equations for each flood region in Iowa for estimating discharges for ungaged stream sites with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities, which are equivalent to annual flood-frequency recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years, respectively. A total of 394 streamgages were included in the development of regional regression equations for three flood regions (regions 1, 2, and 3) that were defined for Iowa based on landform regions and soil regions. Average standard errors of prediction range from 31.8 to 45.2 percent for flood region 1, 19.4 to 46.8 percent for flood region 2, and 26.5 to 43.1 percent for flood region 3. The pseudo coefficients of determination for the generalized leastsquares equations range from 90.8 to 96.2 percent for flood region 1, 91.5 to 97.9 percent for flood region 2, and 92.4 to 96.0 percent for flood region 3. The regression equations are applicable only to stream sites in Iowa with flows not significantly affected by regulation, diversion, channelization, backwater, or urbanization and with basin characteristics within the range of those used to develop the equations. These regression equations will be implemented within the U.S. Geological Survey StreamStats Web-based geographic information system tool. StreamStats allows users to click on any ungaged site on a river and compute estimates of the eight selected statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged sites also are provided by the Web-based tool. StreamStats also allows users to click on any streamgage in Iowa and estimates computed for these eight selected statistics are provided for the streamgage.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Traditionally, the Iowa Department of Transportation has used the Iowa Runoff Chart and single-variable regional-regression equations (RREs) from a U.S. Geological Survey report (published in 1987) as the primary methods to estimate annual exceedance-probability discharge (AEPD) for small (20 square miles or less) drainage basins in Iowa. With the publication of new multi- and single-variable RREs by the U.S. Geological Survey (published in 2013), the Iowa Department of Transportation needs to determine which methods of AEPD estimation provide the best accuracy and the least bias for small drainage basins in Iowa. Twenty five streamgages with drainage areas less than 2 square miles (mi2) and 55 streamgages with drainage areas between 2 and 20 mi2 were selected for the comparisons that used two evaluation metrics. Estimates of AEPDs calculated for the streamgages using the expected moments algorithm/multiple Grubbs-Beck test analysis method were compared to estimates of AEPDs calculated from the 2013 multivariable RREs; the 2013 single-variable RREs; the 1987 single-variable RREs; the TR-55 rainfall-runoff model; and the Iowa Runoff Chart. For the 25 streamgages with drainage areas less than 2 mi2, results of the comparisons seem to indicate the best overall accuracy and the least bias may be achieved by using the TR-55 method for flood regions 1 and 3 (published in 2013) and by using the 1987 single-variable RREs for flood region 2 (published in 2013). For drainage basins with areas between 2 and 20 mi2, results of the comparisons seem to indicate the best overall accuracy and the least bias may be achieved by using the 1987 single-variable RREs for the Southern Iowa Drift Plain landform region and for flood region 3 (published in 2013), by using the 2013 multivariable RREs for the Iowan Surface landform region, and by using the 2013 or 1987 single-variable RREs for flood region 2 (published in 2013). For all other landform or flood regions in Iowa, use of the 2013 single-variable RREs may provide the best overall accuracy and the least bias. An examination was conducted to understand why the 1987 single-variable RREs seem to provide better accuracy and less bias than either of the 2013 multi- or single-variable RREs. A comparison of 1-percent annual exceedance-probability regression lines for hydrologic regions 1–4 from the 1987 single-variable RREs and for flood regions 1–3 from the 2013 single-variable RREs indicates that the 1987 single-variable regional-regression lines generally have steeper slopes and lower discharges when compared to 2013 single-variable regional-regression lines for corresponding areas of Iowa. The combination of the definition of hydrologic regions, the lower discharges, and the steeper slopes of regression lines associated with the 1987 single-variable RREs seem to provide better accuracy and less bias when compared to the 2013 multi- or single-variable RREs; better accuracy and less bias was determined particularly for drainage areas less than 2 mi2, and also for some drainage areas between 2 and 20 mi2. The 2013 multi- and single-variable RREs are considered to provide better accuracy and less bias for larger drainage areas. Results of this study indicate that additional research is needed to address the curvilinear relation between drainage area and AEPDs for areas of Iowa.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper introduces a mixture model based on the beta distribution, without preestablishedmeans and variances, to analyze a large set of Beauty-Contest data obtainedfrom diverse groups of experiments (Bosch-Domenech et al. 2002). This model gives a bettert of the experimental data, and more precision to the hypothesis that a large proportionof individuals follow a common pattern of reasoning, described as iterated best reply (degenerate),than mixture models based on the normal distribution. The analysis shows thatthe means of the distributions across the groups of experiments are pretty stable, while theproportions of choices at dierent levels of reasoning vary across groups.