914 resultados para statistical data analysis
Resumo:
The coverage and volume of geo-referenced datasets are extensive and incessantly¦growing. The systematic capture of geo-referenced information generates large volumes¦of spatio-temporal data to be analyzed. Clustering and visualization play a key¦role in the exploratory data analysis and the extraction of knowledge embedded in¦these data. However, new challenges in visualization and clustering are posed when¦dealing with the special characteristics of this data. For instance, its complex structures,¦large quantity of samples, variables involved in a temporal context, high dimensionality¦and large variability in cluster shapes.¦The central aim of my thesis is to propose new algorithms and methodologies for¦clustering and visualization, in order to assist the knowledge extraction from spatiotemporal¦geo-referenced data, thus improving making decision processes.¦I present two original algorithms, one for clustering: the Fuzzy Growing Hierarchical¦Self-Organizing Networks (FGHSON), and the second for exploratory visual data analysis:¦the Tree-structured Self-organizing Maps Component Planes. In addition, I present¦methodologies that combined with FGHSON and the Tree-structured SOM Component¦Planes allow the integration of space and time seamlessly and simultaneously in¦order to extract knowledge embedded in a temporal context.¦The originality of the FGHSON lies in its capability to reflect the underlying structure¦of a dataset in a hierarchical fuzzy way. A hierarchical fuzzy representation of¦clusters is crucial when data include complex structures with large variability of cluster¦shapes, variances, densities and number of clusters. The most important characteristics¦of the FGHSON include: (1) It does not require an a-priori setup of the number¦of clusters. (2) The algorithm executes several self-organizing processes in parallel.¦Hence, when dealing with large datasets the processes can be distributed reducing the¦computational cost. (3) Only three parameters are necessary to set up the algorithm.¦In the case of the Tree-structured SOM Component Planes, the novelty of this algorithm¦lies in its ability to create a structure that allows the visual exploratory data analysis¦of large high-dimensional datasets. This algorithm creates a hierarchical structure¦of Self-Organizing Map Component Planes, arranging similar variables' projections in¦the same branches of the tree. Hence, similarities on variables' behavior can be easily¦detected (e.g. local correlations, maximal and minimal values and outliers).¦Both FGHSON and the Tree-structured SOM Component Planes were applied in¦several agroecological problems proving to be very efficient in the exploratory analysis¦and clustering of spatio-temporal datasets.¦In this thesis I also tested three soft competitive learning algorithms. Two of them¦well-known non supervised soft competitive algorithms, namely the Self-Organizing¦Maps (SOMs) and the Growing Hierarchical Self-Organizing Maps (GHSOMs); and the¦third was our original contribution, the FGHSON. Although the algorithms presented¦here have been used in several areas, to my knowledge there is not any work applying¦and comparing the performance of those techniques when dealing with spatiotemporal¦geospatial data, as it is presented in this thesis.¦I propose original methodologies to explore spatio-temporal geo-referenced datasets¦through time. Our approach uses time windows to capture temporal similarities and¦variations by using the FGHSON clustering algorithm. The developed methodologies¦are used in two case studies. In the first, the objective was to find similar agroecozones¦through time and in the second one it was to find similar environmental patterns¦shifted in time.¦Several results presented in this thesis have led to new contributions to agroecological¦knowledge, for instance, in sugar cane, and blackberry production.¦Finally, in the framework of this thesis we developed several software tools: (1)¦a Matlab toolbox that implements the FGHSON algorithm, and (2) a program called¦BIS (Bio-inspired Identification of Similar agroecozones) an interactive graphical user¦interface tool which integrates the FGHSON algorithm with Google Earth in order to¦show zones with similar agroecological characteristics.
Roadway Lighting and Safety: Phase II – Monitoring Quality, Durability and Efficiency, November 2011
Resumo:
This Phase II project follows a previous project titled Strategies to Address Nighttime Crashes at Rural, Unsignalized Intersections. Based on the results of the previous study, the Iowa Highway Research Board (IHRB) indicated interest in pursuing further research to address the quality of lighting, rather than just the presence of light, with respect to safety. The research team supplemented the literature review from the previous study, specifically addressing lighting level in terms of measurement, the relationship between light levels and safety, and lamp durability and efficiency. The Center for Transportation Research and Education (CTRE) teamed with a national research leader in roadway lighting, Virginia Tech Transportation Institute (VTTI) to collect the data. An integral instrument to the data collection efforts was the creation of the Roadway Monitoring System (RMS). The RMS allowed the research team to collect lighting data and approach information for each rural intersection identified in the previous phase. After data cleanup, the final data set contained illuminance data for 101 lighted intersections (of 137 lighted intersections in the first study). Data analysis included a robust statistical analysis based on Bayesian techniques. Average illuminance, average glare, and average uniformity ratio values were used to classify quality of lighting at the intersections.
Resumo:
The project "Quantification and qualification of ambulatory health care", financed by the Swiss National Science Foundation and covering the Cantons of Vaud and Fribourg, has two main goals: --a structural study of the elements of the ambulatory care sector. This is done through inventories of the professions concerned (physicians, public health nurses, physiotherapists, pharmacists, medical laboratories), allowing to better characterize the "offer". This inventory work includes the collect and analysis of existing statistical data as well as surveys, by questionnaires sent (from September 1980) to the different professions and by interviews. --a functional study, inspired from the US National Ambulatory Medical Care Survey and from similar studies elsewhere, in order to investigate the modes of practice of various providers, with particular regard to interprofessional collaboration (through studying referrals from the ones to the others). The first months of the project have been used for a methodological research in this regard, centered on the use of systems analysis, and for the elaboration of adequate instruments.
Resumo:
The dynamics of N losses in fertilizer by ammonia volatilization is affected by several factors, making investigation of these dynamics more complex. Moreover, some features of the behavior of the variable can lead to deviation from normal distribution, making the main commonly adopted statistical strategies inadequate for data analysis. Thus, the purpose of this study was to evaluate the patterns of cumulative N losses from urea through ammonia volatilization in order to find a more adequate and detailed way of assessing the behavior of the variable. For that reason, changes in patterns of ammonia volatilization losses as a result of applying different combinations of two soil classes [Planossolo and Chernossolo (Typic Albaqualf and Vertic Argiaquolls)] and different rates of urea (50, 100 and 150 kg ha-1 N), in the presence or absence of a urease inhibitor, were evaluated, adopting a 2 × 3 × 2 factorial design with four replications. Univariate and multivariate analysis of variance were performed using the adjusted parameter values of a logistic function as a response variable. The results obtained from multivariate analysis indicated a prominent effect of the soil class factor on the set of parameters, indicating greater relevance of soil adsorption potential on ammonia volatilization losses. Univariate analysis showed that the parameters related to total N losses and rate of volatilization were more affected by soil class and the rate of urea applied. The urease inhibitor affected only the rate and inflection point parameters, decreasing the rate of losses and delaying the beginning of the process, but had no effect on total ammonia losses. Patterns of ammonia volatilization losses provide details on behavior of the variable, details which can be used to develop and adopt more accurate techniques for more efficient use of urea.
Resumo:
The use of synthetic combinatorial peptide libraries in positional scanning format (PS-SCL) has emerged recently as an alternative approach for the identification of peptides recognized by T lymphocytes. The choice of both the PS-SCL used for screening experiments and the method used for data analysis are crucial for implementing this approach. With this aim, we tested the recognition of different PS-SCL by a tyrosinase 368-376-specific CTL clone and analyzed the data obtained with a recently developed biometric data analysis based on a model of independent and additive contribution of individual amino acids to peptide antigen recognition. Mixtures defined with amino acids present at the corresponding positions in the native sequence were among the most active for all of the libraries. Somewhat surprisingly, a higher number of native amino acids were identifiable by using amidated COOH-terminal rather than free COOH-terminal PS-SCL. Also, our data clearly indicate that when using PS-SCL longer than optimal, frame shifts occur frequently and should be taken into account. Biometric analysis of the data obtained with the amidated COOH-terminal nonapeptide library allowed the identification of the native ligand as the sequence with the highest score in a public human protein database. However, the adequacy of the PS-SCL data for the identification for the peptide ligand varied depending on the PS-SCL used. Altogether these results provide insight into the potential of PS-SCL for the identification of CTL-defined tumor-derived antigenic sequences and may significantly implement our ability to interpret the results of these analyses.
Resumo:
Quantitative information from magnetic resonance imaging (MRI) may substantiate clinical findings and provide additional insight into the mechanism of clinical interventions in therapeutic stroke trials. The PERFORM study is exploring the efficacy of terutroban versus aspirin for secondary prevention in patients with a history of ischemic stroke. We report on the design of an exploratory longitudinal MRI follow-up study that was performed in a subgroup of the PERFORM trial. An international multi-centre longitudinal follow-up MRI study was designed for different MR systems employing safety and efficacy readouts: new T2 lesions, new DWI lesions, whole brain volume change, hippocampal volume change, changes in tissue microstructure as depicted by mean diffusivity and fractional anisotropy, vessel patency on MR angiography, and the presence of and development of new microbleeds. A total of 1,056 patients (men and women ≥ 55 years) were included. The data analysis included 3D reformation, image registration of different contrasts, tissue segmentation, and automated lesion detection. This large international multi-centre study demonstrates how new MRI readouts can be used to provide key information on the evolution of cerebral tissue lesions and within the macrovasculature after atherothrombotic stroke in a large sample of patients.
Roadway Lighting and Safety: Phase II – Monitoring Quality, Durability and Efficiency, November 2011
Resumo:
This Phase II project follows a previous project titled Strategies to Address Nighttime Crashes at Rural, Unsignalized Intersections. Based on the results of the previous study, the Iowa Highway Research Board (IHRB) indicated interest in pursuing further research to address the quality of lighting, rather than just the presence of light, with respect to safety. The research team supplemented the literature review from the previous study, specifically addressing lighting level in terms of measurement, the relationship between light levels and safety, and lamp durability and efficiency. The Center for Transportation Research and Education (CTRE) teamed with a national research leader in roadway lighting, Virginia Tech Transportation Institute (VTTI) to collect the data. An integral instrument to the data collection efforts was the creation of the Roadway Monitoring System (RMS). The RMS allowed the research team to collect lighting data and approach information for each rural intersection identified in the previous phase. After data cleanup, the final data set contained illuminance data for 101 lighted intersections (of 137 lighted intersections in the first study). Data analysis included a robust statistical analysis based on Bayesian techniques. Average illuminance, average glare, and average uniformity ratio values were used to classify quality of lighting at the intersections.
Resumo:
In June 2006, the Swiss Parliament made two important decisions with regards to public registers' governance and individuals' identification. It adopted a new law on the harmonisation of population registers in order to simplify statistical data collection and data exchange from around 4'000 decentralized registers, and it also approved the introduction of a Unique Person Identifier (UPI). The law is rather vague about the implementation of this harmonisation and even though many projects are currently being undertaken in this domain, most of them are quite technical. We believe there is a need for analysis tools and therefore we propose a conceptual framework based on three pillars (Privacy, Identity and Governance) to analyse the requirements in terms of data management for population registers.
Resumo:
This paper introduces a nonlinear measure of dependence between random variables in the context of remote sensing data analysis. The Hilbert-Schmidt Independence Criterion (HSIC) is a kernel method for evaluating statistical dependence. HSIC is based on computing the Hilbert-Schmidt norm of the cross-covariance operator of mapped samples in the corresponding Hilbert spaces. The HSIC empirical estimator is very easy to compute and has good theoretical and practical properties. We exploit the capabilities of HSIC to explain nonlinear dependences in two remote sensing problems: temperature estimation and chlorophyll concentration prediction from spectra. Results show that, when the relationship between random variables is nonlinear or when few data are available, the HSIC criterion outperforms other standard methods, such as the linear correlation or mutual information.
Resumo:
This paper presents multiple kernel learning (MKL) regression as an exploratory spatial data analysis and modelling tool. The MKL approach is introduced as an extension of support vector regression, where MKL uses dedicated kernels to divide a given task into sub-problems and to treat them separately in an effective way. It provides better interpretability to non-linear robust kernel regression at the cost of a more complex numerical optimization. In particular, we investigate the use of MKL as a tool that allows us to avoid using ad-hoc topographic indices as covariables in statistical models in complex terrains. Instead, MKL learns these relationships from the data in a non-parametric fashion. A study on data simulated from real terrain features confirms the ability of MKL to enhance the interpretability of data-driven models and to aid feature selection without degrading predictive performances. Here we examine the stability of the MKL algorithm with respect to the number of training data samples and to the presence of noise. The results of a real case study are also presented, where MKL is able to exploit a large set of terrain features computed at multiple spatial scales, when predicting mean wind speed in an Alpine region.
Resumo:
(from the journal abstract) Scientific interest for the concept of alliance has been maintained and stimulated by repeated findings that a strong alliance is associated with facilitative treatment process and favourable treatment outcome. However, because the alliance is not in itself a therapeutic technique, these findings were unsuccessful in bringing about significant improvements in clinical practice. An essential issue in modern psychotherapeutic research concerns the relation between common factors which are known to explain great variance in empirical results and the specific therapeutic techniques which are the primary basis of clinical training and practice. This pilot study explored sequences in therapist interventions over four sessions of brief psychodynamic investigation. It aims at determining if patterns of interventions can be found during brief psychodynamic investigation and if these patterns can be associated with differences in the therapeutic alliance. Therapist interventions where coded using the Psychodynamic Intervention Rating Scale (PIRS) which enables the classification of each therapist utterance into one of 9 categories of interpretive interventions (defence interpretation, transference interpretation), supportive interventions (question, clarification, association, reflection, supportive strategy) or interventions about the therapeutic frame (work-enhancing statement, contractual arrangement). Data analysis was done using lag sequential analysis, a statistical procedure which identifies contingent relationships in time among a large number of behaviours. The sample includes N = 20 therapist-patient dyads assigned to three groups with: (1) a high and stable alliance profile, (2) a low and stable alliance profile and (3) an improving alliance profile. Results suggest that therapists most often have one single intention when interacting with patients. Large sequences of questions, associations and clarifications were found, which indicate that if a therapist asks a question, clarifies or associates, there is a significant probability that he will continue doing so. A single theme sequence involving frame interventions was also observed. These sequences were found in all three alliance groups. One exception was found for mixed sequences of interpretations and supportive interventions. The simultaneous use of these two interventions was associated with a high or an improving alliance over the course of treatment, but not with a low and stable alliance where only single theme sequences of interpretations were found. In other words, in this last group, therapists were either supportive or interpretative, whereas with high or improving alliance, interpretations were always given along with supportive interventions. This finding provides evidence that examining therapist interpretation individually can only yield incomplete findings. How interpretations were given is important for alliance building. It also suggests that therapists should carefully dose their interpretations and be supportive when necessary in order to build a strong therapeutic alliance. And from a research point of view, to study technical interventions, we must look into dynamic variables such as dosage, the supportive quality of an intervention, and timing. (PsycINFO Database Record (c) 2005 APA, all rights reserved)
Resumo:
The present study proposes a modification in one of the most frequently applied effect size procedures in single-case data analysis the percent of nonoverlapping data. In contrast to other techniques, the calculus and interpretation of this procedure is straightforward and it can be easily complemented by visual inspection of the graphed data. Although the percent of nonoverlapping data has been found to perform reasonably well in N = 1 data, the magnitude of effect estimates it yields can be distorted by trend and autocorrelation. Therefore, the data correction procedure focuses on removing the baseline trend from data prior to estimating the change produced in the behavior due to intervention. A simulation study is carried out in order to compare the original and the modified procedures in several experimental conditions. The results suggest that the new proposal is unaffected by trend and autocorrelation and can be used in case of unstable baselines and sequentially related measurements.
Resumo:
The present study focuses on single-case data analysis and specifically on two procedures for quantifying differences between baseline and treatment measurements The first technique tested is based on generalized least squares regression analysis and is compared to a proposed non-regression technique, which allows obtaining similar information. The comparison is carried out in the context of generated data representing a variety of patterns (i.e., independent measurements, different serial dependence underlying processes, constant or phase-specific autocorrelation and data variability, different types of trend, and slope and level change). The results suggest that the two techniques perform adequately for a wide range of conditions and researchers can use both of them with certain guarantees. The regression-based procedure offers more efficient estimates, whereas the proposed non-regression procedure is more sensitive to intervention effects. Considering current and previous findings, some tentative recommendations are offered to applied researchers in order to help choosing among the plurality of single-case data analysis techniques.
Resumo:
The focus of my PhD research was the concept of modularity. In the last 15 years, modularity has become a classic term in different fields of biology. On the conceptual level, a module is a set of interacting elements that remain mostly independent from the elements outside of the module. I used modular analysis techniques to study gene expression evolution in vertebrates. In particular, I identified ``natural'' modules of gene expression in mouse and human, and I showed that expression of organ-specific and system-specific genes tends to be conserved between such distance vertebrates as mammals and fishes. Also with a modular approach, I studied patterns of developmental constraints on transcriptome evolution. I showed that none of the two commonly accepted models of the evolution of embryonic development (``evo-devo'') are exclusively valid. In particular, I found that the conservation of the sequences of regulatory regions is highest during mid-development of zebrafish, and thus it supports the ``hourglass model''. In contrast, events of gene duplication and new gene introduction are most rare in early development, which supports the ``early conservation model''. In addition to the biological insights on transcriptome evolution, I have also discussed in detail the advantages of modular approaches in large-scale data analysis. Moreover, I re-analyzed several studies (published in high-ranking journals), and showed that their conclusions do not hold out under a detailed analysis. This demonstrates that complex analysis of high-throughput data requires a co-operation between biologists, bioinformaticians, and statisticians.