956 resultados para Data sets


Relevância:

60.00% 60.00%

Publicador:

Resumo:

A method to estimate an extreme quantile that requires no distributional assumptions is presented. The approach is based on transformed kernel estimation of the cumulative distribution function (cdf). The proposed method consists of a double transformation kernel estimation. We derive optimal bandwidth selection methods that have a direct expression for the smoothing parameter. The bandwidth can accommodate to the given quantile level. The procedure is useful for large data sets and improves quantile estimation compared to other methods in heavy tailed distributions. Implementation is straightforward and R programs are available.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

All of the imputation techniques usually applied for replacing values below thedetection limit in compositional data sets have adverse effects on the variability. In thiswork we propose a modification of the EM algorithm that is applied using the additivelog-ratio transformation. This new strategy is applied to a compositional data set and theresults are compared with the usual imputation techniques

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Asymptomatic Plasmodium infection carriers represent a major threat to malaria control worldwide as they are silent natural reservoirs and do not seek medical care. There are no standard criteria for asymptomaticPlasmodium infection; therefore, its diagnosis relies on the presence of the parasite during a specific period of symptomless infection. The antiparasitic immune response can result in reducedPlasmodium sp. load with control of disease manifestations, which leads to asymptomatic infection. Both the innate and adaptive immune responses seem to play major roles in asymptomatic Plasmodiuminfection; T regulatory cell activity (through the production of interleukin-10 and transforming growth factor-β) and B-cells (with a broad antibody response) both play prominent roles. Furthermore, molecules involved in the haem detoxification pathway (such as haptoglobin and haeme oxygenase-1) and iron metabolism (ferritin and activated c-Jun N-terminal kinase) have emerged in recent years as potential biomarkers and thus are helping to unravel the immune response underlying asymptomatic Plasmodium infection. The acquisition of large data sets and the use of robust statistical tools, including network analysis, associated with well-designed malaria studies will likely help elucidate the immune mechanisms responsible for asymptomatic infection.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

MRI tractography is the mapping of neural fiber pathways based on diffusion MRI of tissue diffusion anisotropy. Tractography based on diffusion tensor imaging (DTI) cannot directly image multiple fiber orientations within a single voxel. To address this limitation, diffusion spectrum MRI (DSI) and related methods were developed to image complex distributions of intravoxel fiber orientation. Here we demonstrate that tractography based on DSI has the capacity to image crossing fibers in neural tissue. DSI was performed in formalin-fixed brains of adult macaque and in the brains of healthy human subjects. Fiber tract solutions were constructed by a streamline procedure, following directions of maximum diffusion at every point, and analyzed in an interactive visualization environment (TrackVis). We report that DSI tractography accurately shows the known anatomic fiber crossings in optic chiasm, centrum semiovale, and brainstem; fiber intersections in gray matter, including cerebellar folia and the caudate nucleus; and radial fiber architecture in cerebral cortex. In contrast, none of these examples of fiber crossing and complex structure was identified by DTI analysis of the same data sets. These findings indicate that DSI tractography is able to image crossing fibers in neural tissue, an essential step toward non-invasive imaging of connectional neuroanatomy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Muchas investigaciones arqueobotánicas, desde un enfoque cualitativo-descriptivo, limitan el propio campo de estudio a los análisis de presencia/ausencia y/o de frecuencia de taxones a partir de su recuento en los conjuntos vegetales. De esa manera, los datos proporcionados resultan ser inconcluyentes y no fiables para la reconstrucción del paleoambiente, la determinación de la dieta alimenticia y de la práctica económica realizada (recolección VS agricultura), y totalmente insuficientes para determinar los cambios históricos ocurridos en los procesos productivos. Por lo que concierne el Perú, desde los primeros estudios con referencia a restos vegetales recuperados en yacimientos arqueológicos, principalmente de la costa, se documenta el importante papel que han desarrollado las especies vegetales en la vida de las comunidades pre-hispánicas. No obstante la excepcional abundancia y óptima preservación de este tipo de material (botánico) en muchos de los yacimientos arqueológicos de esta región, gracias a las extremas condiciones climáticas y ambientales sobre todo de sus áridas zonas costeras, los estudios arqueobotánicos desarrollados hasta el momento son muy escasos y las limitaciones análiticas que presentan en su mayoría reflejan la poca importancia dada a las investigaciones arqueobotánicas. En el presente trabajo desarrollamos y aplicamos una metodología analítica de tipo cuantitativa para el estudio de los macrorestos vegetales procedentes de un yacimiento de la Costa sur del Perú. Con ello pretendemos obtener datos representativos y objetivos de los conjuntos analizados, cuyo procesado lleve a una exhaustiva y correcta interpretación de la información.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Barraclough and co-workers (in a paper published in 1996) observed that there was a significant positive correlation between the rate of evolution of the rbcL chloroplast gene within families of flowering plants and the number of species in those families. We tested three additional data sets of our own (based on both plastid and nuclear genes) and used methods designed specifically for the comparison of sister families (based on random speciation and extinction). We show that, over all sister groups, the correlation between the rate of gene evolution and an increased diversity is not always present. Despite tending towards a positive association, the observation of individual probabilities presents a U-shaped distribution of association (i.e. it can be either significantly positive or negative). We discuss the influence of both phylogenetic sampling and applied taxonomies on the results.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

It has been shown that the accuracy of mammographic abnormality detection methods is strongly dependent on the breast tissue characteristics, where a dense breast drastically reduces detection sensitivity. In addition, breast tissue density is widely accepted to be an important risk indicator for the development of breast cancer. Here, we describe the development of an automatic breast tissue classification methodology, which can be summarized in a number of distinct steps: 1) the segmentation of the breast area into fatty versus dense mammographic tissue; 2) the extraction of morphological and texture features from the segmented breast areas; and 3) the use of a Bayesian combination of a number of classifiers. The evaluation, based on a large number of cases from two different mammographic data sets, shows a strong correlation ( and 0.67 for the two data sets) between automatic and expert-based Breast Imaging Reporting and Data System mammographic density assessment

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Like numerous torrents in mountainous regions, the Illgraben creek (canton of Wallis, SW Switzerland) produces almost every year several debris flows. The total area of the active catchment is only 4.7 km², but large events ranging from 50'000 to 400'000 m³ are common (Zimmermann 2000). Consequently, the pathway of the main channel often changes suddenly. One single event can for instance fill the whole river bed and dig new several-meters-deep channels somewhere else (Bardou et al. 2003). The quantification of both, the rhythm and the magnitude of these changes, is very important to assess the variability of the bed's cross section and long profile. These parameters are indispensable for numerical modelling, as they should be considered as initial conditions. To monitor the channel evolution an Optech ILRIS 3D terrestrial laser scanner (LIDAR) was used. LIDAR permits to make a complete high precision 3D model of the channel and its surroundings by scanning it from different view points. The 3D data are treated and interpreted with the software Polyworks from Innovmetric Software Inc. Sequential 3D models allow for the determination of the variation in the bed's cross section and long profile. These data will afterwards be used to quantify the erosion and the deposition in the torrent reaches. To complete the chronological evolution of the landforms, precise digital terrain models, obtained by high resolution photogrammetry based on old aerial photographs, will be used. A 500 m long section of the Illgraben channel was scanned on 18th of August 2005 and on 7th of April 2006. These two data sets permit identifying the changes of the channel that occurred during the winter season. An upcoming scanning campaign in September 2006 will allow for the determination of the changes during this summer. Preliminary results show huge variations in the pathway of the Illgraben channel, as well as important vertical and lateral erosion of the river bed. Here we present the results of a river bank on the left (north-western) flank of the channel (Figure 1). For the August 2005 model the scans from 3 viewpoints were superposed, whereas the April 2006 3D image was obtained by combining 5 separate scans. The bank was eroded. The bank got eroded essentially on its left part (up to 6.3 m), where it is hit by the river and the debris flows (Figures 2 and 3). A debris cone has also formed (Figure 3), which suggests that a part of the bank erosion is due to shallow landslides. They probably occur when the river erosion creates an undercut slope. These geometrical data allow for the monitoring of the alluvial dynamics (i.e. aggradation and degradation) on different time scales and the influence of debris flows occurrence on these changes. Finally, the resistance against erosion of the bed's cross section and long profile will be analysed to assess the variability of these two key parameters. This information may then be used in debris flow simulation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Multiple genome-wide association studies (GWAS) have been performed in HIV-1 infected individuals, identifying common genetic influences on viral control and disease course. Similarly, common genetic correlates of acquisition of HIV-1 after exposure have been interrogated using GWAS, although in generally small samples. Under the auspices of the International Collaboration for the Genomics of HIV, we have combined the genome-wide single nucleotide polymorphism (SNP) data collected by 25 cohorts, studies, or institutions on HIV-1 infected individuals and compared them to carefully matched population-level data sets (a list of all collaborators appears in Note S1 in Text S1). After imputation using the 1,000 Genomes Project reference panel, we tested approximately 8 million common DNA variants (SNPs and indels) for association with HIV-1 acquisition in 6,334 infected patients and 7,247 population samples of European ancestry. Initial association testing identified the SNP rs4418214, the C allele of which is known to tag the HLA-B*57:01 and B*27:05 alleles, as genome-wide significant (p = 3.6×10(-11)). However, restricting analysis to individuals with a known date of seroconversion suggested that this association was due to the frailty bias in studies of lethal diseases. Further analyses including testing recessive genetic models, testing for bulk effects of non-genome-wide significant variants, stratifying by sexual or parenteral transmission risk and testing previously reported associations showed no evidence for genetic influence on HIV-1 acquisition (with the exception of CCR5Δ32 homozygosity). Thus, these data suggest that genetic influences on HIV acquisition are either rare or have smaller effects than can be detected by this sample size.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Models of codon evolution have attracted particular interest because of their unique capabilities to detect selection forces and their high fit when applied to sequence evolution. We described here a novel approach for modeling codon evolution, which is based on Kronecker product of matrices. The 61 × 61 codon substitution rate matrix is created using Kronecker product of three 4 × 4 nucleotide substitution matrices, the equilibrium frequency of codons, and the selection rate parameter. The entities of the nucleotide substitution matrices and selection rate are considered as parameters of the model, which are optimized by maximum likelihood. Our fully mechanistic model allows the instantaneous substitution matrix between codons to be fully estimated with only 19 parameters instead of 3,721, by using the biological interdependence existing between positions within codons. We illustrate the properties of our models using computer simulations and assessed its relevance by comparing the AICc measures of our model and other models of codon evolution on simulations and a large range of empirical data sets. We show that our model fits most biological data better compared with the current codon models. Furthermore, the parameters in our model can be interpreted in a similar way as the exchangeability rates found in empirical codon models.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In recent research, both soil (root-zone) and air temperature have been used as predictors for the treeline position worldwide. In this study, we intended to (a) test the proposed temperature limitation at the treeline, and (b) investigate effects of season length for both heat sum and mean temperature variables in the Swiss Alps. As soil temperature data are available for a limited number of sites only, we developed an air-to-soil transfer model (ASTRAMO). The air-to-soil transfer model predicts daily mean root-zone temperatures (10cm below the surface) at the treeline exclusively from daily mean air temperatures. The model using calibrated air and root-zone temperature measurements at nine treeline sites in the Swiss Alps incorporates time lags to account for the damping effect between air and soil temperatures as well as the temporal autocorrelations typical for such chronological data sets. Based on the measured and modeled root-zone temperatures we analyzed. the suitability of the thermal treeline indicators seasonal mean and degree-days to describe the Alpine treeline position. The root-zone indicators were then compared to the respective indicators based on measured air temperatures, with all indicators calculated for two different indicator period lengths. For both temperature types (root-zone and air) and both indicator periods, seasonal mean temperature was the indicator with the lowest variation across all treeline sites. The resulting indicator values were 7.0 degrees C +/- 0.4 SD (short indicator period), respectively 7.1 degrees C +/- 0.5 SD (long indicator period) for root-zone temperature, and 8.0 degrees C +/- 0.6 SD (short indicator period), respectively 8.8 degrees C +/- 0.8 SD (long indicator period) for air temperature. Generally, a higher variation was found for all air based treeline indicators when compared to the root-zone temperature indicators. Despite this, we showed that treeline indicators calculated from both air and root-zone temperatures can be used to describe the Alpine treeline position.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

One of the first useful products from the human genome will be a set of predicted genes. Besides its intrinsic scientific interest, the accuracy and completeness of this data set is of considerable importance for human health and medicine. Though progress has been made on computational gene identification in terms of both methods and accuracy evaluation measures, most of the sequence sets in which the programs are tested are short genomic sequences, and there is concern that these accuracy measures may not extrapolate well to larger, more challenging data sets. Given the absence of experimentally verified large genomic data sets, we constructed a semiartificial test set comprising a number of short single-gene genomic sequences with randomly generated intergenic regions. This test set, which should still present an easier problem than real human genomic sequence, mimics the approximately 200kb long BACs being sequenced. In our experiments with these longer genomic sequences, the accuracy of GENSCAN, one of the most accurate ab initio gene prediction programs, dropped significantly, although its sensitivity remained high. Conversely, the accuracy of similarity-based programs, such as GENEWISE, PROCRUSTES, and BLASTX was not affected significantly by the presence of random intergenic sequence, but depended on the strength of the similarity to the protein homolog. As expected, the accuracy dropped if the models were built using more distant homologs, and we were able to quantitatively estimate this decline. However, the specificities of these techniques are still rather good even when the similarity is weak, which is a desirable characteristic for driving expensive follow-up experiments. Our experiments suggest that though gene prediction will improve with every new protein that is discovered and through improvements in the current set of tools, we still have a long way to go before we can decipher the precise exonic structure of every gene in the human genome using purely computational methodology.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The completion of the sequencing of the mouse genome promises to help predict human genes with greater accuracy. While current ab initio gene prediction programs are remarkably sensitive (i.e., they predict at least a fragment of most genes), their specificity is often low, predicting a large number of false-positive genes in the human genome. Sequence conservation at the protein level with the mouse genome can help eliminate some of those false positives. Here we describe SGP2, a gene prediction program that combines ab initio gene prediction with TBLASTX searches between two genome sequences to provide both sensitive and specific gene predictions. The accuracy of SGP2 when used to predict genes by comparing the human and mouse genomes is assessed on a number of data sets, including single-gene data sets, the highly curated human chromosome 22 predictions, and entire genome predictions from ENSEMBL. Results indicate that SGP2 outperforms purely ab initio gene prediction methods. Results also indicate that SGP2 works about as well with 3x shotgun data as it does with fully assembled genomes. SGP2 provides a high enough specificity that its predictions can be experimentally verified at a reasonable cost. SGP2 was used to generate a complete set of gene predictions on both the human and mouse by comparing the genomes of these two species. Our results suggest that another few thousand human and mouse genes currently not in ENSEMBL are worth verifying experimentally.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Machine learning and pattern recognition methods have been used to diagnose Alzheimer's disease (AD) and mild cognitive impairment (MCI) from individual MRI scans. Another application of such methods is to predict clinical scores from individual scans. Using relevance vector regression (RVR), we predicted individuals' performances on established tests from their MRI T1 weighted image in two independent data sets. From Mayo Clinic, 73 probable AD patients and 91 cognitively normal (CN) controls completed the Mini-Mental State Examination (MMSE), Dementia Rating Scale (DRS), and Auditory Verbal Learning Test (AVLT) within 3months of their scan. Baseline MRI's from the Alzheimer's disease Neuroimaging Initiative (ADNI) comprised the other data set; 113 AD, 351 MCI, and 122 CN subjects completed the MMSE and Alzheimer's Disease Assessment Scale-Cognitive subtest (ADAS-cog) and 39 AD, 92 MCI, and 32 CN ADNI subjects completed MMSE, ADAS-cog, and AVLT. Predicted and actual clinical scores were highly correlated for the MMSE, DRS, and ADAS-cog tests (P<0.0001). Training with one data set and testing with another demonstrated stability between data sets. DRS, MMSE, and ADAS-Cog correlated better than AVLT with whole brain grey matter changes associated with AD. This result underscores their utility for screening and tracking disease. RVR offers a novel way to measure interactions between structural changes and neuropsychological tests beyond that of univariate methods. In clinical practice, we envision using RVR to aid in diagnosis and predict clinical outcome.