101 resultados para multidimensional data


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The DNA microarray technology has arguably caught the attention of the worldwide life science community and is now systematically supporting major discoveries in many fields of study. The majority of the initial technical challenges of conducting experiments are being resolved, only to be replaced with new informatics hurdles, including statistical analysis, data visualization, interpretation, and storage. Two systems of databases, one containing expression data and one containing annotation data are quickly becoming essential knowledge repositories of the research community. This present paper surveys several databases, which are considered "pillars" of research and important nodes in the network. This paper focuses on a generalized workflow scheme typical for microarray experiments using two examples related to cancer research. The workflow is used to reference appropriate databases and tools for each step in the process of array experimentation. Additionally, benefits and drawbacks of current array databases are addressed, and suggestions are made for their improvement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJECTIVE: The optimal coronary MR angiography sequence has yet to be determined. We sought to quantitatively and qualitatively compare four coronary MR angiography sequences. SUBJECTS AND METHODS. Free-breathing coronary MR angiography was performed in 12 patients using four imaging sequences (turbo field-echo, fast spin-echo, balanced fast field-echo, and spiral turbo field-echo). Quantitative comparisons, including signal-to-noise ratio, contrast-to-noise ratio, vessel diameter, and vessel sharpness, were performed using a semiautomated analysis tool. Accuracy for detection of hemodynamically significant disease (> 50%) was assessed in comparison with radiographic coronary angiography. RESULTS: Signal-to-noise and contrast-to-noise ratios were markedly increased using the spiral (25.7 +/- 5.7 and 15.2 +/- 3.9) and balanced fast field-echo (23.5 +/- 11.7 and 14.4 +/- 8.1) sequences compared with the turbo field-echo (12.5 +/- 2.7 and 8.3 +/- 2.6) sequence (p < 0.05). Vessel diameter was smaller with the spiral sequence (2.6 +/- 0.5 mm) than with the other techniques (turbo field-echo, 3.0 +/- 0.5 mm, p = 0.6; balanced fast field-echo, 3.1 +/- 0.5 mm, p < 0.01; fast spin-echo, 3.1 +/- 0.5 mm, p < 0.01). Vessel sharpness was highest with the balanced fast field-echo sequence (61.6% +/- 8.5% compared with turbo field-echo, 44.0% +/- 6.6%; spiral, 44.7% +/- 6.5%; fast spin-echo, 18.4% +/- 6.7%; p < 0.001). The overall accuracies of the sequences were similar (range, 74% for turbo field-echo, 79% for spiral). Scanning time for the fast spin-echo sequences was longest (10.5 +/- 0.6 min), and for the spiral acquisitions was shortest (5.2 +/- 0.3 min). CONCLUSION: Advantages in signal-to-noise and contrast-to-noise ratios, vessel sharpness, and the qualitative results appear to favor spiral and balanced fast field-echo coronary MR angiography sequences, although subjective accuracy for the detection of coronary artery disease was similar to that of other sequences.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents general problems and approaches for the spatial data analysis using machine learning algorithms. Machine learning is a very powerful approach to adaptive data analysis, modelling and visualisation. The key feature of the machine learning algorithms is that they learn from empirical data and can be used in cases when the modelled environmental phenomena are hidden, nonlinear, noisy and highly variable in space and in time. Most of the machines learning algorithms are universal and adaptive modelling tools developed to solve basic problems of learning from data: classification/pattern recognition, regression/mapping and probability density modelling. In the present report some of the widely used machine learning algorithms, namely artificial neural networks (ANN) of different architectures and Support Vector Machines (SVM), are adapted to the problems of the analysis and modelling of geo-spatial data. Machine learning algorithms have an important advantage over traditional models of spatial statistics when problems are considered in a high dimensional geo-feature spaces, when the dimension of space exceeds 5. Such features are usually generated, for example, from digital elevation models, remote sensing images, etc. An important extension of models concerns considering of real space constrains like geomorphology, networks, and other natural structures. Recent developments in semi-supervised learning can improve modelling of environmental phenomena taking into account on geo-manifolds. An important part of the study deals with the analysis of relevant variables and models' inputs. This problem is approached by using different feature selection/feature extraction nonlinear tools. To demonstrate the application of machine learning algorithms several interesting case studies are considered: digital soil mapping using SVM, automatic mapping of soil and water system pollution using ANN; natural hazards risk analysis (avalanches, landslides), assessments of renewable resources (wind fields) with SVM and ANN models, etc. The dimensionality of spaces considered varies from 2 to more than 30. Figures 1, 2, 3 demonstrate some results of the studies and their outputs. Finally, the results of environmental mapping are discussed and compared with traditional models of geostatistics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Gene expression analysis has emerged as a major biological research area, with real-time quantitative reverse transcription PCR (RT-QPCR) being one of the most accurate and widely used techniques for expression profiling of selected genes. In order to obtain results that are comparable across assays, a stable normalization strategy is required. In general, the normalization of PCR measurements between different samples uses one to several control genes (e. g. housekeeping genes), from which a baseline reference level is constructed. Thus, the choice of the control genes is of utmost importance, yet there is not a generally accepted standard technique for screening a large number of candidates and identifying the best ones. Results: We propose a novel approach for scoring and ranking candidate genes for their suitability as control genes. Our approach relies on publicly available microarray data and allows the combination of multiple data sets originating from different platforms and/or representing different pathologies. The use of microarray data allows the screening of tens of thousands of genes, producing very comprehensive lists of candidates. We also provide two lists of candidate control genes: one which is breast cancer-specific and one with more general applicability. Two genes from the breast cancer list which had not been previously used as control genes are identified and validated by RT-QPCR. Open source R functions are available at http://www.isrec.isb-sib.ch/similar to vpopovic/research/ Conclusion: We proposed a new method for identifying candidate control genes for RT-QPCR which was able to rank thousands of genes according to some predefined suitability criteria and we applied it to the case of breast cancer. We also empirically showed that translating the results from microarray to PCR platform was achievable.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

SUMMARY: Large sets of data, such as expression profiles from many samples, require analytic tools to reduce their complexity. The Iterative Signature Algorithm (ISA) is a biclustering algorithm. It was designed to decompose a large set of data into so-called 'modules'. In the context of gene expression data, these modules consist of subsets of genes that exhibit a coherent expression profile only over a subset of microarray experiments. Genes and arrays may be attributed to multiple modules and the level of required coherence can be varied resulting in different 'resolutions' of the modular mapping. In this short note, we introduce two BioConductor software packages written in GNU R: The isa2 package includes an optimized implementation of the ISA and the eisa package provides a convenient interface to run the ISA, visualize its output and put the biclusters into biological context. Potential users of these packages are all R and BioConductor users dealing with tabular (e.g. gene expression) data. AVAILABILITY: http://www.unil.ch/cbg/ISA CONTACT: sven.bergmann@unil.ch

Relevância:

20.00% 20.00%

Publicador:

Resumo:

CONTEXT: Several genetic risk scores to identify asymptomatic subjects at high risk of developing type 2 diabetes mellitus (T2DM) have been proposed, but it is unclear whether they add extra information to risk scores based on clinical and biological data. OBJECTIVE: The objective of the study was to assess the extra clinical value of genetic risk scores in predicting the occurrence of T2DM. DESIGN: This was a prospective study, with a mean follow-up time of 5 yr. SETTING AND SUBJECTS: The study included 2824 nondiabetic participants (1548 women, 52 ± 10 yr). MAIN OUTCOME MEASURE: Six genetic risk scores for T2DM were tested. Four were derived from the literature and two were created combining all (n = 24) or shared (n = 9) single-nucleotide polymorphisms of the previous scores. A previously validated clinic + biological risk score for T2DM was used as reference. RESULTS: Two hundred seven participants (7.3%) developed T2DM during follow-up. On bivariate analysis, no differences were found for all but one genetic score between nondiabetic and diabetic participants. After adjusting for the validated clinic + biological risk score, none of the genetic scores improved discrimination, as assessed by changes in the area under the receiver-operating characteristic curve (range -0.4 to -0.1%), sensitivity (-2.9 to -1.0%), specificity (0.0-0.1%), and positive (-6.6 to +0.7%) and negative (-0.2 to 0.0%) predictive values. Similarly, no improvement in T2DM risk prediction was found: net reclassification index ranging from -5.3 to -1.6% and nonsignificant (P ≥ 0.49) integrated discrimination improvement. CONCLUSIONS: In this study, adding genetic information to a previously validated clinic + biological score does not seem to improve the prediction of T2DM.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ce guide présente la méthode Data Envelopment Analysis (DEA), une méthode d'évaluation de la performance . Il est destiné aux responsables d'organisations publiques qui ne sont pas familiers avec les notions d'optimisation mathématique, autrement dit de recherche opérationnelle. L'utilisation des mathématiques est par conséquent réduite au minimum. Ce guide est fortement orienté vers la pratique. Il permet aux décideurs de réaliser leurs propres analyses d'efficience et d'interpréter facilement les résultats obtenus. La méthode DEA est un outil d'analyse et d'aide à la décision dans les domaines suivants : - en calculant un score d'efficience, elle indique si une organisation dispose d'une marge d'amélioration ; - en fixant des valeurs-cibles, elle indique de combien les inputs doivent être réduits et les outputs augmentés pour qu'une organisation devienne efficiente ; - en identifiant le type de rendements d'échelle, elle indique si une organisation doit augmenter ou au contraire réduire sa taille pour minimiser son coût moyen de production ; - en identifiant les pairs de référence, elle désigne quelles organisations disposent des best practice à analyser.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Natural selection is typically exerted at some specific life stages. If natural selection takes place before a trait can be measured, using conventional models can cause wrong inference about population parameters. When the missing data process relates to the trait of interest, a valid inference requires explicit modeling of the missing process. We propose a joint modeling approach, a shared parameter model, to account for nonrandom missing data. It consists of an animal model for the phenotypic data and a logistic model for the missing process, linked by the additive genetic effects. A Bayesian approach is taken and inference is made using integrated nested Laplace approximations. From a simulation study we find that wrongly assuming that missing data are missing at random can result in severely biased estimates of additive genetic variance. Using real data from a wild population of Swiss barn owls Tyto alba, our model indicates that the missing individuals would display large black spots; and we conclude that genes affecting this trait are already under selection before it is expressed. Our model is a tool to correctly estimate the magnitude of both natural selection and additive genetic variance.