854 resultados para data gathering algorithm


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Academic and industrial research in the late 90s have brought about an exponential explosion of DNA sequence data. Automated expert systems are being created to help biologists to extract patterns, trends and links from this ever-deepening ocean of information. Two such systems aimed on retrieving and subsequently utilizing phylogenetically relevant information have been developed in this dissertation, the major objective of which was to automate the often difficult and confusing phylogenetic reconstruction process. ^ Popular phylogenetic reconstruction methods, such as distance-based methods, attempt to find an optimal tree topology (that reflects the relationships among related sequences and their evolutionary history) by searching through the topology space. Various compromises between the fast (but incomplete) and exhaustive (but computationally prohibitive) search heuristics have been suggested. An intelligent compromise algorithm that relies on a flexible “beam” search principle from the Artificial Intelligence domain and uses the pre-computed local topology reliability information to adjust the beam search space continuously is described in the second chapter of this dissertation. ^ However, sometimes even a (virtually) complete distance-based method is inferior to the significantly more elaborate (and computationally expensive) maximum likelihood (ML) method. In fact, depending on the nature of the sequence data in question either method might prove to be superior. Therefore, it is difficult (even for an expert) to tell a priori which phylogenetic reconstruction method—distance-based, ML or maybe maximum parsimony (MP)—should be chosen for any particular data set. ^ A number of factors, often hidden, influence the performance of a method. For example, it is generally understood that for a phylogenetically “difficult” data set more sophisticated methods (e.g., ML) tend to be more effective and thus should be chosen. However, it is the interplay of many factors that one needs to consider in order to avoid choosing an inferior method (potentially a costly mistake, both in terms of computational expenses and in terms of reconstruction accuracy.) ^ Chapter III of this dissertation details a phylogenetic reconstruction expert system that selects a superior proper method automatically. It uses a classifier (a Decision Tree-inducing algorithm) to map a new data set to the proper phylogenetic reconstruction method. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The discrete-time Markov chain is commonly used in describing changes of health states for chronic diseases in a longitudinal study. Statistical inferences on comparing treatment effects or on finding determinants of disease progression usually require estimation of transition probabilities. In many situations when the outcome data have some missing observations or the variable of interest (called a latent variable) can not be measured directly, the estimation of transition probabilities becomes more complicated. In the latter case, a surrogate variable that is easier to access and can gauge the characteristics of the latent one is usually used for data analysis. ^ This dissertation research proposes methods to analyze longitudinal data (1) that have categorical outcome with missing observations or (2) that use complete or incomplete surrogate observations to analyze the categorical latent outcome. For (1), different missing mechanisms were considered for empirical studies using methods that include EM algorithm, Monte Carlo EM and a procedure that is not a data augmentation method. For (2), the hidden Markov model with the forward-backward procedure was applied for parameter estimation. This method was also extended to cover the computation of standard errors. The proposed methods were demonstrated by the Schizophrenia example. The relevance of public health, the strength and limitations, and possible future research were also discussed. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

SNP genotyping arrays have been developed to characterize single-nucleotide polymorphisms (SNPs) and DNA copy number variations (CNVs). The quality of the inferences about copy number can be affected by many factors including batch effects, DNA sample preparation, signal processing, and analytical approach. Nonparametric and model-based statistical algorithms have been developed to detect CNVs from SNP genotyping data. However, these algorithms lack specificity to detect small CNVs due to the high false positive rate when calling CNVs based on the intensity values. Association tests based on detected CNVs therefore lack power even if the CNVs affecting disease risk are common. In this research, by combining an existing Hidden Markov Model (HMM) and the logistic regression model, a new genome-wide logistic regression algorithm was developed to detect CNV associations with diseases. We showed that the new algorithm is more sensitive and can be more powerful in detecting CNV associations with diseases than an existing popular algorithm, especially when the CNV association signal is weak and a limited number of SNPs are located in the CNV.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The effectiveness of the Anisotropic Analytical Algorithm (AAA) implemented in the Eclipse treatment planning system (TPS) was evaluated using theRadiologicalPhysicsCenteranthropomorphic lung phantom using both flattened and flattening-filter-free high energy beams. Radiation treatment plans were developed following the Radiation Therapy Oncology Group and theRadiologicalPhysicsCenterguidelines for lung treatment using Stereotactic Radiation Body Therapy. The tumor was covered such that at least 95% of Planning Target Volume (PTV) received 100% of the prescribed dose while ensuring that normal tissue constraints were followed as well. Calculated doses were exported from the Eclipse TPS and compared with the experimental data as measured using thermoluminescence detectors (TLD) and radiochromic films that were placed inside the phantom. The results demonstrate that the AAA superposition-convolution algorithm is able to calculate SBRT treatment plans with all clinically used photon beams in the range from 6 MV to 18 MV. The measured dose distribution showed a good agreement with the calculated distribution using clinically acceptable criteria of ±5% dose or 3mm distance to agreement. These results show that in a heterogeneous environment a 3D pencil beam superposition-convolution algorithms with Monte Carlo pre-calculated scatter kernels, such as AAA, are able to reliably calculate dose, accounting for increased lateral scattering due to the loss of electronic equilibrium in low density medium. The data for high energy plans (15 MV and 18 MV) showed very good tumor coverage in contrast to findings by other investigators for less sophisticated dose calculation algorithms, which demonstrated less than expected tumor doses and generally worse tumor coverage for high energy plans compared to 6MV plans. This demonstrates that the modern superposition-convolution AAA algorithm is a significant improvement over previous algorithms and is able to calculate doses accurately for SBRT treatment plans in the highly heterogeneous environment of the thorax for both lower (≤12 MV) and higher (greater than 12 MV) beam energies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genome-wide association studies (GWAS) have successfully identified several genetic loci associated with inherited predisposition to primary biliary cirrhosis (PBC), the most common autoimmune disease of the liver. Pathway-based tests constitute a novel paradigm for GWAS analysis. By evaluating genetic variation across a biological pathway (gene set), these tests have the potential to determine the collective impact of variants with subtle effects that are individually too weak to be detected in traditional single variant GWAS analysis. To identify biological pathways associated with the risk of development of PBC, GWAS of PBC from Italy (449 cases and 940 controls) and Canada (530 cases and 398 controls) were independently analyzed. The linear combination test (LCT), a recently developed pathway-level statistical method was used for this analysis. For additional validation, pathways that were replicated at the P <0.05 level of significance in both GWAS on LCT analysis were also tested for association with PBC in each dataset using two complementary GWAS pathway approaches. The complementary approaches included a modification of the gene set enrichment analysis algorithm (i-GSEA4GWAS) and Fisher's exact test for pathway enrichment ratios. Twenty-five pathways were associated with PBC risk on LCT analysis in the Italian dataset at P<0.05, of which eight had an FDR<0.25. The top pathway in the Italian dataset was the TNF/stress related signaling pathway (p=7.38×10 -4, FDR=0.18). Twenty-six pathways were associated with PBC at the P<0.05 level using the LCT in the Canadian dataset with the regulation and function of ChREBP in liver pathway (p=5.68×10-4, FDR=0.285) emerging as the most significant pathway. Two pathways, phosphatidylinositol signaling system (Italian: p=0.016, FDR=0.436; Canadian: p=0.034, FDR=0.693) and hedgehog signaling (Italian: p=0.044, FDR=0.636; Canadian: p=0.041, FDR=0.693), were replicated at LCT P<0.05 in both datasets. Statistically significant association of both pathways with PBC genetic susceptibility was confirmed in the Italian dataset on i-GSEA4GWAS. Results for the phosphatidylinositol signaling system were also significant in both datasets on applying Fisher's exact test for pathway enrichment ratios. This study identified a combination of known and novel pathway-level associations with PBC risk. If functionally validated, the findings may yield fresh insights into the etiology of this complex autoimmune disease with possible preventive and therapeutic application.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The electron pencil-beam redefinition algorithm (PBRA) of Shiu and Hogstrom has been developed for use in radiotherapy treatment planning (RTP). Earlier studies of Boyd and Hogstrom showed that the PBRA lacked an adequate incident beam model, that PBRA might require improved electron physics, and that no data existed which allowed adequate assessment of the PBRA-calculated dose accuracy in a heterogeneous medium such as one presented by patient anatomy. The hypothesis of this research was that by addressing the above issues the PBRA-calculated dose would be accurate to within 4% or 2 mm in regions of high dose gradients. A secondary electron source was added to the PBRA to account for collimation-scattered electrons in the incident beam. Parameters of the dual-source model were determined from a minimal data set to allow ease of beam commissioning. Comparisons with measured data showed 3% or better dose accuracy in water within the field for cases where 4% accuracy was not previously achievable. A measured data set was developed that allowed an evaluation of PBRA in regions distal to localized heterogeneities. Geometries in the data set included irregular surfaces and high- and low-density internal heterogeneities. The data was estimated to have 1% precision and 2% agreement with accurate, benchmarked Monte Carlo (MC) code. PBRA electron transport was enhanced by modeling local pencil beam divergence. This required fundamental changes to the mathematics of electron transport (divPBRA). Evaluation of divPBRA with the measured data set showed marginal improvement in dose accuracy when compared to PBRA; however, 4% or 2mm accuracy was not achieved by either PBRA version for all data points. Finally, PBRA was evaluated clinically by comparing PBRA- and MC-calculated dose distributions using site-specific patient RTP data. Results show PBRA did not agree with MC to within 4% or 2mm in a small fraction (<3%) of the irradiated volume. Although the hypothesis of the research was shown to be false, the minor dose inaccuracies should have little or no impact on RTP decisions or patient outcome. Therefore, given ease of beam commissioning, documentation of accuracy, and calculational speed, the PBRA should be considered a practical tool for clinical use. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Wadden Sea is located in the southeastern part of the North Sea forming an extended intertidal area along the Dutch, German and Danish coast. It is a highly dynamic and largely natural ecosystem influenced by climatic changes and anthropogenic use of the North Sea. Changes in the environment of the Wadden Sea, natural or anthropogenic origin, cannot be monitored by the standard measurement methods alone, because large-area surveys of the intertidal flats are often difficult due to tides, tidal channels and unstable underground. For this reason, remote sensing offers effective monitoring tools. In this study a multi-sensor concept for classification of intertidal areas in the Wadden Sea has been developed. Basis for this method is a combined analysis of RapidEye (RE) and TerraSAR-X (TSX) satellite data coupled with ancillary vector data about the distribution of vegetation, mussel beds and sediments. The classification of the vegetation and mussel beds is based on a decision tree and a set of hierarchically structured algorithms which use object and texture features. The sediments are classified by an algorithm which uses thresholds and a majority filter. Further improvements focus on radiometric enhancement and atmospheric correction. First results show that we are able to identify vegetation and mussel beds with the use of multi-sensor remote sensing. The classification of the sediments in the tidal flats is a challenge compared to vegetation and mussel beds. The results demonstrate that the sediments cannot be classified with high accuracy by their spectral properties alone due to their similarity which is predominately caused by their water content.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

ENVISAT ASAR WSM images with pixel size 150 × 150 m, acquired in different meteorological, oceanographic and sea ice conditions were used to determined icebergs in the Amundsen Sea (Antarctica). An object-based method for automatic iceberg detection from SAR data has been developed and applied. The object identification is based on spectral and spatial parameters on 5 scale levels, and was verified with manual classification in four polygon areas, chosen to represent varying environmental conditions. The algorithm works comparatively well in freezing temperatures and strong wind conditions, prevailing in the Amundsen Sea during the year. The detection rate was 96% which corresponds to 94% of the area (counting icebergs larger than 0.03 km**2), for all seasons. The presented algorithm tends to generate errors in the form of false alarms, mainly caused by the presence of ice floes, rather than misses. This affects the reliability since false alarms were manually corrected post analysis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This database (Leemans & Cramer 1991) contains monthly averages of mean temperature, temperature range, precipitation, rain days and sunshine hours for the terrestrial surface of the globe, gridded at 0.5 degree longitude/latitude resolution. All grd-files contain the same 62483 pixels in the same order, with 30' latitude and longitude resolution. The coordinates are in degree-decimals and indicate the SW corner of each pixel. Topography is from ETOPO5 and indicates modal elevation. Data were generated from a large data base, using the partial thin-plate splining algorithm (Hutchinson & Bischof 1983). This version is widely used around the globe, notably by all groups participating in the IGBP NPP model intercomparison.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Breeding distribution of the Adelie penguin, Pygoscelis adeliae, was surveyed with Landsat-7 Enhanced Thematic Mapper Plus (ETM+) data along the coastline of Antarctica, an area covering approximately 330° of longitude. An algorithm was designed to minimize the radiometric contribution from exogenous sources and to retrieve Adelie penguin colony location and spatial extent from the ETM+ data. In all, 9143 individual pixels were classified as belonging to an Adelie penguin colony class out of the entire dataset of 195 ETM+ scenes, where the dimension of each pixel is 30 m by 30 m, and each scene is approximately 180 km by 180 km. Pixel clustering identified a total of 187 individual Adelie penguin colonies, ranging in size from a single pixel (900 m**2) to a maximum of 875 pixels (0.788 km**2). Colony retrievals have a very low error of commission, on the order of 1 percent or less, and the error of omission was estimated to be 2.9 percent by population based on comparisons with direct observations from surveys across east Antarctica. Thus, the Landsat retrievals can successfully locate Adelie penguin colonies that account for ~97 percent of a regional population. Geographic coordinates and the spatial extent of each colony retrieved from the Landsat data are available publically. Regional analysis found several areas where the Landsat retrievals suggest populations that are significantly larger than published estimates. Six Adelie penguin colonies were found that are believed to be unreported in the literature.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A simple method for efficient inversion of arbitrary radiative transfer models for image analysis is presented. The method operates by representing the shape of the function that maps model parameters to spectral reflectance by an adaptive look-up tree (ALUT) that evenly distributes the discretization error of tabulated reflectances in spectral space. A post-processing step organizes the data into a binary space partitioning tree that facilitates an efficient inversion search algorithm. In an example shallow water remote sensing application, the method performs faster than an implementation of previously published methodology and has the same accuracy in bathymetric retrievals. The method has no user configuration parameters requiring expert knowledge and minimizes the number of forward model runs required, making it highly suitable for routine operational implementation of image analysis methods. For the research community, straightforward and robust inversion allows research to focus on improving the radiative transfer models themselves without the added complication of devising an inversion strategy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The data acquired by Remote Sensing systems allow obtaining thematic maps of the earth's surface, by means of the registered image classification. This implies the identification and categorization of all pixels into land cover classes. Traditionally, methods based on statistical parameters have been widely used, although they show some disadvantages. Nevertheless, some authors indicate that those methods based on artificial intelligence, may be a good alternative. Thus, fuzzy classifiers, which are based on Fuzzy Logic, include additional information in the classification process through based-rule systems. In this work, we propose the use of a genetic algorithm (GA) to select the optimal and minimum set of fuzzy rules to classify remotely sensed images. Input information of GA has been obtained through the training space determined by two uncorrelated spectral bands (2D scatter diagrams), which has been irregularly divided by five linguistic terms defined in each band. The proposed methodology has been applied to Landsat-TM images and it has showed that this set of rules provides a higher accuracy level in the classification process

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The application of thematic maps obtained through the classification of remote images needs the obtained products with an optimal accuracy. The registered images from the airplanes display a very satisfactory spatial resolution, but the classical methods of thematic classification not always give better results than when the registered data from satellite are used. In order to improve these results of classification, in this work, the LIDAR sensor data from first return (Light Detection And Ranging) registered simultaneously with the spectral sensor data from airborne are jointly used. The final results of the thematic classification of the scene object of study have been obtained, quantified and discussed with and without LIDAR data, after applying different methods: Maximum Likehood Classification, Support Vector Machine with four different functions kernel and Isodata clustering algorithm (ML, SVM-L, SVM-P, SVM-RBF, SVM-S, Isodata). The best results are obtained for SVM with Sigmoide kernel. These allow the correlation with others different physical parameters with great interest like Manning hydraulic coefficient, for their incorporation in a GIS and their application in hydraulic modeling.