924 resultados para Automatic Analysis of Multivariate Categorical Data Sets
Resumo:
This thesis develops a comprehensive and a flexible statistical framework for the analysis and detection of space, time and space-time clusters of environmental point data. The developed clustering methods were applied in both simulated datasets and real-world environmental phenomena; however, only the cases of forest fires in Canton of Ticino (Switzerland) and in Portugal are expounded in this document. Normally, environmental phenomena can be modelled as stochastic point processes where each event, e.g. the forest fire ignition point, is characterised by its spatial location and occurrence in time. Additionally, information such as burned area, ignition causes, landuse, topographic, climatic and meteorological features, etc., can also be used to characterise the studied phenomenon. Thereby, the space-time pattern characterisa- tion represents a powerful tool to understand the distribution and behaviour of the events and their correlation with underlying processes, for instance, socio-economic, environmental and meteorological factors. Consequently, we propose a methodology based on the adaptation and application of statistical and fractal point process measures for both global (e.g. the Morisita Index, the Box-counting fractal method, the multifractal formalism and the Ripley's K-function) and local (e.g. Scan Statistics) analysis. Many measures describing the space-time distribution of environmental phenomena have been proposed in a wide variety of disciplines; nevertheless, most of these measures are of global character and do not consider complex spatial constraints, high variability and multivariate nature of the events. Therefore, we proposed an statistical framework that takes into account the complexities of the geographical space, where phenomena take place, by introducing the Validity Domain concept and carrying out clustering analyses in data with different constrained geographical spaces, hence, assessing the relative degree of clustering of the real distribution. Moreover, exclusively to the forest fire case, this research proposes two new methodologies to defining and mapping both the Wildland-Urban Interface (WUI) described as the interaction zone between burnable vegetation and anthropogenic infrastructures, and the prediction of fire ignition susceptibility. In this regard, the main objective of this Thesis was to carry out a basic statistical/- geospatial research with a strong application part to analyse and to describe complex phenomena as well as to overcome unsolved methodological problems in the characterisation of space-time patterns, in particular, the forest fire occurrences. Thus, this Thesis provides a response to the increasing demand for both environmental monitoring and management tools for the assessment of natural and anthropogenic hazards and risks, sustainable development, retrospective success analysis, etc. The major contributions of this work were presented at national and international conferences and published in 5 scientific journals. National and international collaborations were also established and successfully accomplished. -- Cette thèse développe une méthodologie statistique complète et flexible pour l'analyse et la détection des structures spatiales, temporelles et spatio-temporelles de données environnementales représentées comme de semis de points. Les méthodes ici développées ont été appliquées aux jeux de données simulées autant qu'A des phénomènes environnementaux réels; nonobstant, seulement le cas des feux forestiers dans le Canton du Tessin (la Suisse) et celui de Portugal sont expliqués dans ce document. Normalement, les phénomènes environnementaux peuvent être modélisés comme des processus ponctuels stochastiques ou chaque événement, par ex. les point d'ignition des feux forestiers, est déterminé par son emplacement spatial et son occurrence dans le temps. De plus, des informations tels que la surface bru^lée, les causes d'ignition, l'utilisation du sol, les caractéristiques topographiques, climatiques et météorologiques, etc., peuvent aussi être utilisées pour caractériser le phénomène étudié. Par conséquent, la définition de la structure spatio-temporelle représente un outil puissant pour compren- dre la distribution du phénomène et sa corrélation avec des processus sous-jacents tels que les facteurs socio-économiques, environnementaux et météorologiques. De ce fait, nous proposons une méthodologie basée sur l'adaptation et l'application de mesures statistiques et fractales des processus ponctuels d'analyse global (par ex. l'indice de Morisita, la dimension fractale par comptage de boîtes, le formalisme multifractal et la fonction K de Ripley) et local (par ex. la statistique de scan). Des nombreuses mesures décrivant les structures spatio-temporelles de phénomènes environnementaux peuvent être trouvées dans la littérature. Néanmoins, la plupart de ces mesures sont de caractère global et ne considèrent pas de contraintes spatiales com- plexes, ainsi que la haute variabilité et la nature multivariée des événements. A cet effet, la méthodologie ici proposée prend en compte les complexités de l'espace géographique ou le phénomène a lieu, à travers de l'introduction du concept de Domaine de Validité et l'application des mesures d'analyse spatiale dans des données en présentant différentes contraintes géographiques. Cela permet l'évaluation du degré relatif d'agrégation spatiale/temporelle des structures du phénomène observé. En plus, exclusif au cas de feux forestiers, cette recherche propose aussi deux nouvelles méthodologies pour la définition et la cartographie des zones périurbaines, décrites comme des espaces anthropogéniques à proximité de la végétation sauvage ou de la forêt, et de la prédiction de la susceptibilité à l'ignition de feu. A cet égard, l'objectif principal de cette Thèse a été d'effectuer une recherche statistique/géospatiale avec une forte application dans des cas réels, pour analyser et décrire des phénomènes environnementaux complexes aussi bien que surmonter des problèmes méthodologiques non résolus relatifs à la caractérisation des structures spatio-temporelles, particulièrement, celles des occurrences de feux forestières. Ainsi, cette Thèse fournit une réponse à la demande croissante de la gestion et du monitoring environnemental pour le déploiement d'outils d'évaluation des risques et des dangers naturels et anthro- pogéniques. Les majeures contributions de ce travail ont été présentées aux conférences nationales et internationales, et ont été aussi publiées dans 5 revues internationales avec comité de lecture. Des collaborations nationales et internationales ont été aussi établies et accomplies avec succès.
Resumo:
We present a participant study that compares biological data exploration tasks using volume renderings of laser confocal microscopy data across three environments that vary in level of immersion: a desktop, fishtank, and cave system. For the tasks, data, and visualization approach used in our study, we found that subjects qualitatively preferred and quantitatively performed better in the cave compared with the fishtank and desktop. Subjects performed real-world biological data analysis tasks that emphasized understanding spatial relationships including characterizing the general features in a volume, identifying colocated features, and reporting geometric relationships such as whether clusters of cells were coplanar. After analyzing data in each environment, subjects were asked to choose which environment they wanted to analyze additional data sets in - subjects uniformly selected the cave environment.
Resumo:
In this work we study the classification of forest types using mathematics based image analysis on satellite data. We are interested in improving classification of forest segments when a combination of information from two or more different satellites is used. The experimental part is based on real satellite data originating from Canada. This thesis gives summary of the mathematics basics of the image analysis and supervised learning , methods that are used in the classification algorithm. Three data sets and four feature sets were investigated in this thesis. The considered feature sets were 1) histograms (quantiles) 2) variance 3) skewness and 4) kurtosis. Good overall performances were achieved when a combination of ASTERBAND and RADARSAT2 data sets was used.
Resumo:
In vivo proton magnetic resonance spectroscopy (¹H-MRS) is a technique capable of assessing biochemical content and pathways in normal and pathological tissue. In the brain, ¹H-MRS complements the information given by magnetic resonance images. The main goal of the present study was to assess the accuracy of ¹H-MRS for the classification of brain tumors in a pilot study comparing results obtained by manual and semi-automatic quantification of metabolites. In vivo single-voxel ¹H-MRS was performed in 24 control subjects and 26 patients with brain neoplasms that included meningiomas, high-grade neuroglial tumors and pilocytic astrocytomas. Seven metabolite groups (lactate, lipids, N-acetyl-aspartate, glutamate and glutamine group, total creatine, total choline, myo-inositol) were evaluated in all spectra by two methods: a manual one consisting of integration of manually defined peak areas, and the advanced method for accurate, robust and efficient spectral fitting (AMARES), a semi-automatic quantification method implemented in the jMRUI software. Statistical methods included discriminant analysis and the leave-one-out cross-validation method. Both manual and semi-automatic analyses detected differences in metabolite content between tumor groups and controls (P < 0.005). The classification accuracy obtained with the manual method was 75% for high-grade neuroglial tumors, 55% for meningiomas and 56% for pilocytic astrocytomas, while for the semi-automatic method it was 78, 70, and 98%, respectively. Both methods classified all control subjects correctly. The study demonstrated that ¹H-MRS accurately differentiated normal from tumoral brain tissue and confirmed the superiority of the semi-automatic quantification method.
Resumo:
We compare correspondance análisis to the logratio approach based on compositional data. We also compare correspondance análisis and an alternative approach using Hellinger distance, for representing categorical data in a contingency table. We propose a coefficient which globally measures the similarity between these approaches. This coefficient can be decomposed into several components, one component for each principal dimension, indicating the contribution of the dimensions to the difference between the two representations. These three methods of representation can produce quite similar results. One illustrative example is given
Resumo:
In order to obtain a high-resolution Pleistocene stratigraphy, eleven continuously cored boreholes, 100 to 220m deep were drilled in the northern part of the Po Plain by Regione Lombardia in the last five years. Quantitative provenance analysis (QPA, Weltje and von Eynatten, 2004) of Pleistocene sands was carried out by using multivariate statistical analysis (principal component analysis, PCA, and similarity analysis) on an integrated data set, including high-resolution bulk petrography and heavy-mineral analyses on Pleistocene sands and of 250 major and minor modern rivers draining the southern flank of the Alps from West to East (Garzanti et al, 2004; 2006). Prior to the onset of major Alpine glaciations, metamorphic and quartzofeldspathic detritus from the Western and Central Alps was carried from the axial belt to the Po basin longitudinally parallel to the SouthAlpine belt by a trunk river (Vezzoli and Garzanti, 2008). This scenario rapidly changed during the marine isotope stage 22 (0.87 Ma), with the onset of the first major Pleistocene glaciation in the Alps (Muttoni et al, 2003). PCA and similarity analysis from core samples show that the longitudinal trunk river at this time was shifted southward by the rapid southward and westward progradation of transverse alluvial river systems fed from the Central and Southern Alps. Sediments were transported southward by braided river systems as well as glacial sediments transported by Alpine valley glaciers invaded the alluvial plain. Kew words: Detrital modes; Modern sands; Provenance; Principal Components Analysis; Similarity, Canberra Distance; palaeodrainage
Resumo:
A multivariate fit to the variation in global mean surface air temperature anomaly over the past half century is presented. The fit procedure allows for the effect of response time on the waveform, amplitude and lag of each radiative forcing input, and each is allowed to have its own time constant. It is shown that the contribution of solar variability to the temperature trend since 1987 is small and downward; the best estimate is -1.3% and the 2sigma confidence level sets the uncertainty range of -0.7 to -1.9%. The result is the same if one quantifies the solar variation using galactic cosmic ray fluxes (for which the analysis can be extended back to 1953) or the most accurate total solar irradiance data composite. The rise in the global mean air surface temperatures is predominantly associated with a linear increase that represents the combined effects of changes in anthropogenic well-mixed greenhouse gases and aerosols, although, in recent decades, there is also a considerable contribution by a relative lack of major volcanic eruptions. The best estimate is that the anthropogenic factors contribute 75% of the rise since 1987, with an uncertainty range (set by the 2sigma confidence level using an AR(1) noise model) of 49–160%; thus, the uncertainty is large, but we can state that at least half of the temperature trend comes from the linear term and that this term could explain the entire rise. The results are consistent with the intergovernmental panel on climate change (IPCC) estimates of the changes in radiative forcing (given for 1961–1995) and are here combined with those estimates to find the response times, equilibrium climate sensitivities and pertinent heat capacities (i.e. the depth into the oceans to which a given radiative forcing variation penetrates) of the quasi-periodic (decadal-scale) input forcing variations. As shown by previous studies, the decadal-scale variations do not penetrate as deeply into the oceans as the longer term drifts and have shorter response times. Hence, conclusions about the response to century-scale forcing changes (and hence the associated equilibrium climate sensitivity and the temperature rise commitment) cannot be made from studies of the response to shorter period forcing changes.
Resumo:
Baking and 2-g mixograph analyses were performed for 55 cultivars (19 spring and 36 winter wheat) from various quality classes from the 2002 harvest in Poland. An instrumented 2-g direct-drive mixograph was used to study the mixing characteristics of the wheat cultivars. A number of parameters were extracted automatically from each mixograph trace and correlated with baking volume and flour quality parameters (protein content and high molecular weight glutenin subunit [HMW-GS] composition by SDS-PAGE) using multiple linear regression statistical analysis. Principal component analysis of the mixograph data discriminated between four flour quality classes, and predictions of baking volume were obtained using several selected mixograph parameters, chosen using a best subsets regression routine, giving R-2 values of 0.862-0.866. In particular, three new spring wheat strains (CHD 502a-c) recently registered in Poland were highly discriminated and predicted to give high baking volume on the basis of two mixograph parameters: peak bandwidth and 10-min bandwidth.
Resumo:
Background: Robot-mediated therapies offer entirely new approaches to neurorehabilitation. In this paper we present the results obtained from trialling the GENTLE/S neurorehabilitation system assessed using the upper limb section of the Fugl-Meyer ( FM) outcome measure. Methods: We demonstrate the design of our clinical trial and its results analysed using a novel statistical approach based on a multivariate analytical model. This paper provides the rational for using multivariate models in robot-mediated clinical trials and draws conclusions from the clinical data gathered during the GENTLE/S study. Results: The FM outcome measures recorded during the baseline ( 8 sessions), robot-mediated therapy ( 9 sessions) and sling-suspension ( 9 sessions) was analysed using a multiple regression model. The results indicate positive but modest recovery trends favouring both interventions used in GENTLE/S clinical trial. The modest recovery shown occurred at a time late after stroke when changes are not clinically anticipated. Conclusion: This study has applied a new method for analysing clinical data obtained from rehabilitation robotics studies. While the data obtained during the clinical trial is of multivariate nature, having multipoint and progressive nature, the multiple regression model used showed great potential for drawing conclusions from this study. An important conclusion to draw from this paper is that this study has shown that the intervention and control phase both caused changes over a period of 9 sessions in comparison to the baseline. This might indicate that use of new challenging and motivational therapies can influence the outcome of therapies at a point when clinical changes are not expected. Further work is required to investigate the effects arising from early intervention, longer exposure and intensity of the therapies. Finally, more function-oriented robot-mediated therapies or sling-suspension therapies are needed to clarify the effects resulting from each intervention for stroke recovery.
Resumo:
We are developing computational tools supporting the detailed analysis of the dependence of neural electrophysiological response on dendritic morphology. We approach this problem by combining simulations of faithful models of neurons (experimental real life morphological data with known models of channel kinetics) with algorithmic extraction of morphological and physiological parameters and statistical analysis. In this paper, we present the novel method for an automatic recognition of spike trains in voltage traces, which eliminates the need for human intervention. This enables classification of waveforms with consistent criteria across all the analyzed traces and so it amounts to reduction of the noise in the data. This method allows for an automatic extraction of relevant physiological parameters necessary for further statistical analysis. In order to illustrate the usefulness of this procedure to analyze voltage traces, we characterized the influence of the somatic current injection level on several electrophysiological parameters in a set of modeled neurons. This application suggests that such an algorithmic processing of physiological data extracts parameters in a suitable form for further investigation of structure-activity relationship in single neurons.
Resumo:
Variability in the strength of the stratospheric Lagrangian mean meridional or Brewer-Dobson circulation and horizontal mixing into the tropics over the past three decades are examined using observations of stratospheric mean age of air and ozone. We use a simple representation of the stratosphere, the tropical leaky pipe (TLP) model, guided by mean meridional circulation and horizontal mixing changes in several reanalyses data sets and chemistry climate model (CCM) simulations, to help elucidate reasons for the observed changes in stratospheric mean age and ozone. We find that the TLP model is able to accurately simulate multiyear variability in ozone following recent major volcanic eruptions and the early 2000s sea surface temperature changes, as well as the lasting impact on mean age of relatively short-term circulation perturbations. We also find that the best quantitative agreement with the observed mean age and ozone trends over the past three decades is found assuming a small strengthening of the mean circulation in the lower stratosphere, a moderate weakening of the mean circulation in the middle and upper stratosphere, and a moderate increase in the horizontal mixing into the tropics. The mean age trends are strongly sensitive to trends in the horizontal mixing into the tropics, and the uncertainty in the mixing trends causes uncertainty in the mean circulation trends. Comparisons of the mean circulation and mixing changes suggested by the measurements with those from a recent suite of CCM runs reveal significant differences that may have important implications on the accurate simulation of future stratospheric climate.
Resumo:
An analysis method for diffusion tensor (DT) magnetic resonance imaging data is described, which, contrary to the standard method (multivariate fitting), does not require a specific functional model for diffusion-weighted (DW) signals. The method uses principal component analysis (PCA) under the assumption of a single fibre per pixel. PCA and the standard method were compared using simulations and human brain data. The two methods were equivalent in determining fibre orientation. PCA-derived fractional anisotropy and DT relative anisotropy had similar signal-to-noise ratio (SNR) and dependence on fibre shape. PCA-derived mean diffusivity had similar SNR to the respective DT scalar, and it depended on fibre anisotropy. Appropriate scaling of the PCA measures resulted in very good agreement between PCA and DT maps. In conclusion, the assumption of a specific functional model for DW signals is not necessary for characterization of anisotropic diffusion in a single fibre.
Resumo:
The Advanced Along-Track Scanning Radiometer (AATSR) was launched on Envisat in March 2002. The AATSR instrument is designed to retrieve precise and accurate global sea surface temperature (SST) that, combined with the large data set collected from its predecessors, ATSR and ATSR-2, will provide a long term record of SST data that is greater than 15 years. This record can be used for independent monitoring and detection of climate change. The AATSR validation programme has successfully completed its initial phase. The programme involves validation of the AATSR derived SST values using in situ radiometers, in situ buoys and global SST fields from other data sets. The results of the initial programme presented here will demonstrate that the AATSR instrument is currently close to meeting its scientific objectives of determining global SST to an accuracy of 0.3 K (one sigma). For night time data, the analysis gives a warm bias of between +0.04 K (0.28 K) for buoys to +0.06 K (0.20 K) for radiometers, with slightly higher errors observed for day time data, showing warm biases of between +0.02 (0.39 K) for buoys to +0.11 K (0.33 K) for radiometers. They show that the ATSR series of instruments continues to be the world leader in delivering accurate space-based observations of SST, which is a key climate parameter.
Resumo:
A procedure is presented for fitting incoherent scatter radar data from non-thermal F-region ionospheric plasma, using theoretical spectra previously predicted. It is found that values of the shape distortion factor D∗, associated with deviations of the ion velocity distribution from a Maxwellian distribution, and ion temperatures can be deduced (the results being independent of the path of iteration) if the angle between the line-of-sight and the geomagnetic field is larger than about 15–20°. The procedure can be used with one or both of two sets of assumptions. These concern the validity of the adopted model for the line-of-sight ion velocity distribution in the one case or for the full three-dimensional ion velocity distribution function in the other. The distribution function employed was developed to describe the line-of-sight velocity distribution for large aspect angles, but both experimental data and Monte Carlo simulations indicate that the form of the field-perpendicular distribution can also describe the distribution at more general aspect angles. The assumption of this form for the line-of-sight velocity distribution at a general aspect angle enables rigorous derivation of values of the one-dimensional, line-of-sight ion temperature. With some additional assumptions (principally that the field-parallel distribution is always Maxwellian and there is a simple relationship between the ion temperature anisotropy and the distortion of the field-perpendicular distribution from a Maxwellian), fits to data for large aspect angles enable determination of line-of-sight temperatures at all aspect angles and hence, of the average ion temperature and the ion temperature anisotropy. For small aspect angles, the analysis is restricted to the determination of the line-of-sight ion temperature because the theoretical spectrum is insensitive to non-thermal effects when the plasma is viewed along directions almost parallel to the magnetic field. This limitation is expected to apply to any realistic model of the ion velocity distribution function and its consequences are discussed. Fit strategies which allow for mixed ion composition are also considered. Examples of fits to data from various EISCAT observing programmes are presented.
Resumo:
The stratospheric mean-meridional circulation (MMC) and eddy mixing are compared among six meteorological reanalysis data sets: NCEP-NCAR, NCEP-CFSR, ERA-40, ERA-Interim, JRA-25, and JRA-55 for the period 1979–2012. The reanalysis data sets produced using advanced systems (i.e., NCEP-CFSR, ERA-Interim, and JRA-55) generally reveal a weaker MMC in the Northern Hemisphere (NH) compared with those produced using older systems (i.e., NCEP/NCAR, ERA-40, and JRA-25). The mean mixing strength differs largely among the data products. In the NH lower stratosphere, the contribution of planetary-scale mixing is larger in the new data sets than in the old data sets, whereas that of small-scale mixing is weaker in the new data sets. Conventional data assimilation techniques introduce analysis increments without maintaining physical balance, which may have caused an overly strong MMC and spurious small-scale eddies in the old data sets. At the NH mid-latitudes, only ERA-Interim reveals a weakening MMC trend in the deep branch of the Brewer–Dobson circulation (BDC). The relative importance of the eddy mixing compared with the mean-meridional transport in the subtropical lower stratosphere shows increasing trends in ERA-Interim and JRA-55; this together with the weakened MMC in the deep branch may imply an increasing age-of-air (AoA) in the NH middle stratosphere in ERA-Interim. Overall, discrepancies between the different variables and trends therein as derived from the different reanalyses are still relatively large, suggesting that more investments in these products are needed in order to obtain a consolidated picture of observed changes in the BDC and the mechanisms that drive them.