924 resultados para Automatic Analysis of Multivariate Categorical Data Sets
Resumo:
L'imagerie intravasculaire ultrasonore (IVUS) est une technologie médicale par cathéter qui produit des images de coupe des vaisseaux sanguins. Elle permet de quantifier et d'étudier la morphologie de plaques d'athérosclérose en plus de visualiser la structure des vaisseaux sanguins (lumière, intima, plaque, média et adventice) en trois dimensions. Depuis quelques années, cette méthode d'imagerie est devenue un outil de choix en recherche aussi bien qu'en clinique pour l'étude de la maladie athérosclérotique. L'imagerie IVUS est par contre affectée par des artéfacts associés aux caractéristiques des capteurs ultrasonores, par la présence de cônes d'ombre causés par les calcifications ou des artères collatérales, par des plaques dont le rendu est hétérogène ou par le chatoiement ultrasonore (speckle) sanguin. L'analyse automatisée de séquences IVUS de grande taille représente donc un défi important. Une méthode de segmentation en trois dimensions (3D) basée sur l'algorithme du fast-marching à interfaces multiples est présentée. La segmentation utilise des attributs des régions et contours des images IVUS. En effet, une nouvelle fonction de vitesse de propagation des interfaces combinant les fonctions de densité de probabilité des tons de gris des composants de la paroi vasculaire et le gradient des intensités est proposée. La segmentation est grandement automatisée puisque la lumière du vaisseau est détectée de façon entièrement automatique. Dans une procédure d'initialisation originale, un minimum d'interactions est nécessaire lorsque les contours initiaux de la paroi externe du vaisseau calculés automatiquement sont proposés à l'utilisateur pour acceptation ou correction sur un nombre limité d'images de coupe longitudinale. La segmentation a été validée à l'aide de séquences IVUS in vivo provenant d'artères fémorales provenant de différents sous-groupes d'acquisitions, c'est-à-dire pré-angioplastie par ballon, post-intervention et à un examen de contrôle 1 an suivant l'intervention. Les résultats ont été comparés avec des contours étalons tracés manuellement par différents experts en analyse d'images IVUS. Les contours de la lumière et de la paroi externe du vaisseau détectés selon la méthode du fast-marching sont en accord avec les tracés manuels des experts puisque les mesures d'aire sont similaires et les différences point-à-point entre les contours sont faibles. De plus, la segmentation par fast-marching 3D s'est effectuée en un temps grandement réduit comparativement à l'analyse manuelle. Il s'agit de la première étude rapportée dans la littérature qui évalue la performance de la segmentation sur différents types d'acquisition IVUS. En conclusion, la segmentation par fast-marching combinant les informations des distributions de tons de gris et du gradient des intensités des images est précise et efficace pour l'analyse de séquences IVUS de grandes tailles. Un outil de segmentation robuste pourrait devenir largement répandu pour la tâche ardue et fastidieuse qu'est l'analyse de ce type d'images.
Resumo:
We study the workings of the factor analysis of high-dimensional data using artificial series generated from a large, multi-sector dynamic stochastic general equilibrium (DSGE) model. The objective is to use the DSGE model as a laboratory that allow us to shed some light on the practical benefits and limitations of using factor analysis techniques on economic data. We explain in what sense the artificial data can be thought of having a factor structure, study the theoretical and finite sample properties of the principal components estimates of the factor space, investigate the substantive reason(s) for the good performance of di¤usion index forecasts, and assess the quality of the factor analysis of highly dissagregated data. In all our exercises, we explain the precise relationship between the factors and the basic macroeconomic shocks postulated by the model.
Resumo:
We propose to show in this paper, that the time series obtained from biological systems such as human brain are invariably nonstationary because of different time scales involved in the dynamical process. This makes the invariant parameters time dependent. We made a global analysis of the EEG data obtained from the eight locations on the skull space and studied simultaneously the dynamical characteristics from various parts of the brain. We have proved that the dynamical parameters are sensitive to the time scales and hence in the study of brain one must identify all relevant time scales involved in the process to get an insight in the working of brain.
Resumo:
Multivariate lifetime data arise in various forms including recurrent event data when individuals are followed to observe the sequence of occurrences of a certain type of event; correlated lifetime when an individual is followed for the occurrence of two or more types of events, or when distinct individuals have dependent event times. In most studies there are covariates such as treatments, group indicators, individual characteristics, or environmental conditions, whose relationship to lifetime is of interest. This leads to a consideration of regression models.The well known Cox proportional hazards model and its variations, using the marginal hazard functions employed for the analysis of multivariate survival data in literature are not sufficient to explain the complete dependence structure of pair of lifetimes on the covariate vector. Motivated by this, in Chapter 2, we introduced a bivariate proportional hazards model using vector hazard function of Johnson and Kotz (1975), in which the covariates under study have different effect on two components of the vector hazard function. The proposed model is useful in real life situations to study the dependence structure of pair of lifetimes on the covariate vector . The well known partial likelihood approach is used for the estimation of parameter vectors. We then introduced a bivariate proportional hazards model for gap times of recurrent events in Chapter 3. The model incorporates both marginal and joint dependence of the distribution of gap times on the covariate vector . In many fields of application, mean residual life function is considered superior concept than the hazard function. Motivated by this, in Chapter 4, we considered a new semi-parametric model, bivariate proportional mean residual life time model, to assess the relationship between mean residual life and covariates for gap time of recurrent events. The counting process approach is used for the inference procedures of the gap time of recurrent events. In many survival studies, the distribution of lifetime may depend on the distribution of censoring time. In Chapter 5, we introduced a proportional hazards model for duration times and developed inference procedures under dependent (informative) censoring. In Chapter 6, we introduced a bivariate proportional hazards model for competing risks data under right censoring. The asymptotic properties of the estimators of the parameters of different models developed in previous chapters, were studied. The proposed models were applied to various real life situations.
Resumo:
BACKGROUND: Serial Analysis of Gene Expression (SAGE) is a powerful tool for genome-wide transcription studies. Unlike microarrays, it has the ability to detect novel forms of RNA such as alternatively spliced and antisense transcripts, without the need for prior knowledge of their existence. One limitation of using SAGE on an organism with a complex genome and lacking detailed sequence information, such as the hexaploid bread wheat Triticum aestivum, is accurate annotation of the tags generated. Without accurate annotation it is impossible to fully understand the dynamic processes involved in such complex polyploid organisms. Hence we have developed and utilised novel procedures to characterise, in detail, SAGE tags generated from the whole grain transcriptome of hexaploid wheat. RESULTS: Examination of 71,930 Long SAGE tags generated from six libraries derived from two wheat genotypes grown under two different conditions suggested that SAGE is a reliable and reproducible technique for use in studying the hexaploid wheat transcriptome. However, our results also showed that in poorly annotated and/or poorly sequenced genomes, such as hexaploid wheat, considerably more information can be extracted from SAGE data by carrying out a systematic analysis of both perfect and "fuzzy" (partially matched) tags. This detailed analysis of the SAGE data shows first that while there is evidence of alternative polyadenylation this appears to occur exclusively within the 3' untranslated regions. Secondly, we found no strong evidence for widespread alternative splicing in the developing wheat grain transcriptome. However, analysis of our SAGE data shows that antisense transcripts are probably widespread within the transcriptome and appear to be derived from numerous locations within the genome. Examination of antisense transcripts showing sequence similarity to the Puroindoline a and Puroindoline b genes suggests that such antisense transcripts might have a role in the regulation of gene expression. CONCLUSION: Our results indicate that the detailed analysis of transcriptome data, such as SAGE tags, is essential to understand fully the factors that regulate gene expression and that such analysis of the wheat grain transcriptome reveals that antisense transcripts maybe widespread and hence probably play a significant role in the regulation of gene expression during grain development.
Resumo:
We provide a system identification framework for the analysis of THz-transient data. The subspace identification algorithm for both deterministic and stochastic systems is used to model the time-domain responses of structures under broadband excitation. Structures with additional time delays can be modelled within the state-space framework using additional state variables. We compare the numerical stability of the commonly used least-squares ARX models to that of the subspace N4SID algorithm by using examples of fourth-order and eighth-order systems under pulse and chirp excitation conditions. These models correspond to structures having two and four modes simultaneously propagating respectively. We show that chirp excitation combined with the subspace identification algorithm can provide a better identification of the underlying mode dynamics than the ARX model does as the complexity of the system increases. The use of an identified state-space model for mode demixing, upon transformation to a decoupled realization form is illustrated. Applications of state-space models and the N4SID algorithm to THz transient spectroscopy as well as to optical systems are highlighted.
Resumo:
Rainfall can be modeled as a spatially correlated random field superimposed on a background mean value; therefore, geostatistical methods are appropriate for the analysis of rain gauge data. Nevertheless, there are certain typical features of these data that must be taken into account to produce useful results, including the generally non-Gaussian mixed distribution, the inhomogeneity and low density of observations, and the temporal and spatial variability of spatial correlation patterns. Many studies show that rigorous geostatistical analysis performs better than other available interpolation techniques for rain gauge data. Important elements are the use of climatological variograms and the appropriate treatment of rainy and nonrainy areas. Benefits of geostatistical analysis for rainfall include ease of estimating areal averages, estimation of uncertainties, and the possibility of using secondary information (e.g., topography). Geostatistical analysis also facilitates the generation of ensembles of rainfall fields that are consistent with a given set of observations, allowing for a more realistic exploration of errors and their propagation in downstream models, such as those used for agricultural or hydrological forecasting. This article provides a review of geostatistical methods used for kriging, exemplified where appropriate by daily rain gauge data from Ethiopia.
Resumo:
We provide a unified framework for a range of linear transforms that can be used for the analysis of terahertz spectroscopic data, with particular emphasis on their application to the measurement of leaf water content. The use of linear transforms for filtering, regression, and classification is discussed. For illustration, a classification problem involving leaves at three stages of drought and a prediction problem involving simulated spectra are presented. Issues resulting from scaling the data set are discussed. Using Lagrange multipliers, we arrive at the transform that yields the maximum separation between the spectra and show that this optimal transform is equivalent to computing the Euclidean distance between the samples. The optimal linear transform is compared with the average for all the spectra as well as with the Karhunen–Loève transform to discriminate a wet leaf from a dry leaf. We show that taking several principal components into account is equivalent to defining new axes in which data are to be analyzed. The procedure shows that the coefficients of the Karhunen–Loève transform are well suited to the process of classification of spectra. This is in line with expectations, as these coefficients are built from the statistical properties of the data set analyzed.
Resumo:
Overall phylogenetic relationships within the genus Pelargonium (Geraniaceae) were inferred based on DNA sequences from mitochondrial(mt)-encoded nad1 b/c exons and from chloroplast(cp)-encoded trnL (UAA) 5' exon-trnF (GAA) exon regions using two species of Geranium and Sarcocaulon vanderetiae as outgroups. The group II intron between nad1 exons b and c was found to be absent from the Pelargonium, Geranium, and Sarcocaulon sequences presented here as well as from Erodium, which is the first recorded loss of this intron in angiosperms. Separate phylogenetic analyses of the mtDNA and cpDNA data sets produced largely congruent topologies, indicating linkage between mitochondrial and chloroplast genome inheritance. Simultaneous analysis of the combined data sets yielded a well-resolved topology with high clade support exhibiting a basic split into small and large chromosome species, the first group containing two lineages and the latter three. One large chromosome lineage (x = 11) comprises species from sections Myrrhidium and Chorisma and is sister to a lineage comprising P. mutans (x = 11) and species from section Jenkinsonia (x = 9). Sister to these two lineages is a lineage comprising species from sections Ciconium (x = 9) and Subsucculentia (x = 10). Cladistic evaluation of this pattern suggests that x = 11 is the ancestral basic chromosome number for the genus.
Resumo:
It is thought that speciation in phytophagous insects is often due to colonization of novel host plants, because radiations of plant and insect lineages are typically asynchronous. Recent phylogenetic comparisons have supported this model of diversification for both insect herbivores and specialized pollinators. An exceptional case where contemporaneous plant insect diversification might be expected is the obligate mutualism between fig trees (Ficus species, Moraceae) and their pollinating wasps (Agaonidae, Hymenoptera). The ubiquity and ecological significance of this mutualism in tropical and subtropical ecosystems has long intrigued biologists, but the systematic challenge posed by >750 interacting species pairs has hindered progress toward understanding its evolutionary history. In particular, taxon sampling and analytical tools have been insufficient for large-scale co-phylogenetic analyses. Here, we sampled nearly 200 interacting pairs of fig and wasp species from across the globe. Two supermatrices were assembled: on average, wasps had sequences from 77% of six genes (5.6kb), figs had sequences from 60% of five genes (5.5 kb), and overall 850 new DNA sequences were generated for this study. We also developed a new analytical tool, Jane 2, for event-based phylogenetic reconciliation analysis of very large data sets. Separate Bayesian phylogenetic analyses for figs and fig wasps under relaxed molecular clock assumptions indicate Cretaceous diversification of crown groups and contemporaneous divergence for nearly half of all fig and pollinator lineages. Event-based co-phylogenetic analyses further support the co-diversification hypothesis. Biogeographic analyses indicate that the presentday distribution of fig and pollinator lineages is consistent with an Eurasian origin and subsequent dispersal, rather than with Gondwanan vicariance. Overall, our findings indicate that the fig-pollinator mutualism represents an extreme case among plant-insect interactions of coordinated dispersal and long-term co-diversification.
Resumo:
The aim of this study was to determine whether geographical differences impact the composition of bacterial communities present in the airways of cystic fibrosis (CF) patients attending CF centers in the United States or United Kingdom. Thirty-eight patients were matched on the basis of clinical parameters into 19 pairs comprised of one U.S. and one United Kingdom patient. Analysis was performed to determine what, if any, bacterial correlates could be identified. Two culture-independent strategies were used: terminal restriction fragment length polymorphism (T-RFLP) profiling and 16S rRNA clone sequencing. Overall, 73 different terminal restriction fragment lengths were detected, ranging from 2 to 10 for U.S. and 2 to 15 for United Kingdom patients. The statistical analysis of T-RFLP data indicated that patient pairing was successful and revealed substantial transatlantic similarities in the bacterial communities. A small number of bands was present in the vast majority of patients in both locations, indicating that these are species common to the CF lung. Clone sequence analysis also revealed that a number of species not traditionally associated with the CF lung were present in both sample groups. The species number per sample was similar, but differences in species presence were observed between sample groups. Cluster analysis revealed geographical differences in bacterial presence and relative species abundance. Overall, the U.S. samples showed tighter clustering with each other compared to that of United Kingdom samples, which may reflect the lower diversity detected in the U.S. sample group. The impact of cross-infection and biogeography is considered, and the implications for treating CF lung infections also are discussed.
Resumo:
This paper presents a novel approach to the automatic classification of very large data sets composed of terahertz pulse transient signals, highlighting their potential use in biochemical, biomedical, pharmaceutical and security applications. Two different types of THz spectra are considered in the classification process. Firstly a binary classification study of poly-A and poly-C ribonucleic acid samples is performed. This is then contrasted with a difficult multi-class classification problem of spectra from six different powder samples that although have fairly indistinguishable features in the optical spectrum, they also possess a few discernable spectral features in the terahertz part of the spectrum. Classification is performed using a complex-valued extreme learning machine algorithm that takes into account features in both the amplitude as well as the phase of the recorded spectra. Classification speed and accuracy are contrasted with that achieved using a support vector machine classifier. The study systematically compares the classifier performance achieved after adopting different Gaussian kernels when separating amplitude and phase signatures. The two signatures are presented as feature vectors for both training and testing purposes. The study confirms the utility of complex-valued extreme learning machine algorithms for classification of the very large data sets generated with current terahertz imaging spectrometers. The classifier can take into consideration heterogeneous layers within an object as would be required within a tomographic setting and is sufficiently robust to detect patterns hidden inside noisy terahertz data sets. The proposed study opens up the opportunity for the establishment of complex-valued extreme learning machine algorithms as new chemometric tools that will assist the wider proliferation of terahertz sensing technology for chemical sensing, quality control, security screening and clinic diagnosis. Furthermore, the proposed algorithm should also be very useful in other applications requiring the classification of very large datasets.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
1. Distance sampling is a widely used technique for estimating the size or density of biological populations. Many distance sampling designs and most analyses use the software Distance. 2. We briefly review distance sampling and its assumptions, outline the history, structure and capabilities of Distance, and provide hints on its use. 3. Good survey design is a crucial prerequisite for obtaining reliable results. Distance has a survey design engine, with a built-in geographic information system, that allows properties of different proposed designs to be examined via simulation, and survey plans to be generated. 4. A first step in analysis of distance sampling data is modeling the probability of detection. Distance contains three increasingly sophisticated analysis engines for this: conventional distance sampling, which models detection probability as a function of distance from the transect and assumes all objects at zero distance are detected; multiple-covariate distance sampling, which allows covariates in addition to distance; and mark–recapture distance sampling, which relaxes the assumption of certain detection at zero distance. 5. All three engines allow estimation of density or abundance, stratified if required, with associated measures of precision calculated either analytically or via the bootstrap. 6. Advanced analysis topics covered include the use of multipliers to allow analysis of indirect surveys (such as dung or nest surveys), the density surface modeling analysis engine for spatial and habitat-modeling, and information about accessing the analysis engines directly from other software. 7. Synthesis and applications. Distance sampling is a key method for producing abundance and density estimates in challenging field conditions. The theory underlying the methods continues to expand to cope with realistic estimation situations. In step with theoretical developments, state-of- the-art software that implements these methods is described that makes the methods accessible to practicing ecologists.
Resumo:
In the analysis of instrumented indentation data, it is common practice to incorporate the combined moduli of the indenter (E-i) and the specimen (E) in the so-called reduced modulus (E-r) to account for indenter deformation. Although indenter systems with rigid or elastic tips are considered as equivalent if E-r is the same, the validity of this practice has been questioned over the years. The present work uses systematic finite element simulations to examine the role of the elastic deformation of the indenter tip in instrumented indentation measurements and the validity of the concept of the reduced modulus in conical and pyramidal (Berkovich) indentations. It is found that the apical angle increases as a result of the indenter deformation, which influences in the analysis of the results. Based upon the inaccuracies introduced by the reduced modulus approximation in the analysis of the unloading segment of instrumented indentation applied load (P)-penetration depth (delta) curves, a detailed examination is then conducted on the role of indenter deformation upon the dimensionless functions describing the loading stages of such curves. Consequences of the present results in the extraction of the uniaxial stress-strain characteristics of the indented material through such dimensional analyses are finally illustrated. It is found that large overestimations in the assessment of the strain hardening behavior result by neglecting tip compliance. Guidelines are given in the paper to reduce such overestimations.