915 resultados para Secondary Data Analysis
Resumo:
Recently, there has been a growing interest in the field of metabolomics, materialized by a remarkable growth in experimental techniques, available data and related biological applications. Indeed, techniques as Nuclear Magnetic Resonance, Gas or Liquid Chromatography, Mass Spectrometry, Infrared and UV-visible spectroscopies have provided extensive datasets that can help in tasks as biological and biomedical discovery, biotechnology and drug development. However, as it happens with other omics data, the analysis of metabolomics datasets provides multiple challenges, both in terms of methodologies and in the development of appropriate computational tools. Indeed, from the available software tools, none addresses the multiplicity of existing techniques and data analysis tasks. In this work, we make available a novel R package, named specmine, which provides a set of methods for metabolomics data analysis, including data loading in different formats, pre-processing, metabolite identification, univariate and multivariate data analysis, machine learning, and feature selection. Importantly, the implemented methods provide adequate support for the analysis of data from diverse experimental techniques, integrating a large set of functions from several R packages in a powerful, yet simple to use environment. The package, already available in CRAN, is accompanied by a web site where users can deposit datasets, scripts and analysis reports to be shared with the community, promoting the efficient sharing of metabolomics data analysis pipelines.
Resumo:
In this study we propose an application of the MuSIASEM approach which is used to provide an integrated analysis of Laos across different scales. With the term “integrated analysis across scales” we mean the generation of a series of packages of quantitative indicators, characterizing the performance of the socioeconomic activities performed in Laos when considering: (i) different hierarchical levels of organization (farming systems described at the level of household, rural villages, regions of Laos, the whole country level); and (ii) different dimensions of analysis (economic dimension, social dimension, ecological dimension, technical dimension). What is relevant in this application is that the information carried out by these different packages of indicators is integrated in a system of accounting which establishes interlinkages across these indicators. This is a essential feature to study sustainability trade-offs and to build more robust scenarios of possible changes. The multi-scale integrated representation presented in this study is based on secondary data (gathered in a three year EU project – SEAtrans and integrated by other available statistical sources) and it is integrated in GIS, when dealing with the spatial representation of Laos. However, even if we use data referring to Laos, the goal of this study is not that of providing useful information about a practical policy issue of Laos, but rather, to illustrate the possibility of using a multipurpose grammar to produce an integrated set of sustainability indicators at three different levels: (i) local; (ii) meso; (iii) macro level. The technical issue addressed is the simultaneous adoption of two multi-level matrices – one referring to a characterization of human activity over a set of different categories, and another referring to a characterization of land uses over the same set of categories. In this way, it becomes possible to explain the characteristics of Laos (an integrated set of indicators defining the performance of the whole country) in relation to the characteristics of the rural Laos and urban Laos. The characteristics of rural Laos, can be explained using the characteristics of three regions defined within Laos (Northern Laos, Central Laos and Southern Laos), which in turn can be defined (using an analogous package of indicators), starting from the characteristics of three main typologies of farming systems found in the regions.
Resumo:
Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr)transformation to obtain the random vector y of dimension D. The factor model istheny = Λf + e (1)with the factors f of dimension k & D, the error term e, and the loadings matrix Λ.Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysismodel (1) can be written asCov(y) = ΛΛT + ψ (2)where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as theloadings matrix Λ are estimated from an estimation of Cov(y).Given observed clr transformed data Y as realizations of the random vectory. Outliers or deviations from the idealized model assumptions of factor analysiscan severely effect the parameter estimation. As a way out, robust estimation ofthe covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), seePison et al. (2003). Well known robust covariance estimators with good statisticalproperties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), relyon a full-rank data matrix Y which is not the case for clr transformed data (see,e.g., Aitchison, 1986).The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves thissingularity problem. The data matrix Y is transformed to a matrix Z by usingan orthonormal basis of lower dimension. Using the ilr transformed data, a robustcovariance matrix C(Z) can be estimated. The result can be back-transformed tothe clr space byC(Y ) = V C(Z)V Twhere the matrix V with orthonormal columns comes from the relation betweenthe clr and the ilr transformation. Now the parameters in the model (2) can beestimated (Basilevsky, 1994) and the results have a direct interpretation since thelinks to the original variables are still preserved.The above procedure will be applied to data from geochemistry. Our specialinterest is on comparing the results with those of Reimann et al. (2002) for the Kolaproject data
Resumo:
Several eco-toxicological studies have shown that insectivorous mammals, due to theirfeeding habits, easily accumulate high amounts of pollutants in relation to other mammal species. To assess the bio-accumulation levels of toxic metals and their in°uenceon essential metals, we quantified the concentration of 19 elements (Ca, K, Fe, B, P,S, Na, Al, Zn, Ba, Rb, Sr, Cu, Mn, Hg, Cd, Mo, Cr and Pb) in bones of 105 greaterwhite-toothed shrews (Crocidura russula) from a polluted (Ebro Delta) and a control(Medas Islands) area. Since chemical contents of a bio-indicator are mainly compositional data, conventional statistical analyses currently used in eco-toxicology can givemisleading results. Therefore, to improve the interpretation of the data obtained, weused statistical techniques for compositional data analysis to define groups of metalsand to evaluate the relationships between them, from an inter-population viewpoint.Hypothesis testing on the adequate balance-coordinates allow us to confirm intuitionbased hypothesis and some previous results. The main statistical goal was to test equalmeans of balance-coordinates for the two defined populations. After checking normality,one-way ANOVA or Mann-Whitney tests were carried out for the inter-group balances
Resumo:
We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyse the ratios of the data values. The usual approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property. This weighted log-ratio analysis is theoretically equivalent to spectral mapping , a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modelling. The weighted log-ratio methodology is applied here to frequency data in linguistics and to chemical compositional data in archaeology.
Resumo:
Whether for investigative or intelligence aims, crime analysts often face up the necessity to analyse the spatiotemporal distribution of crimes or traces left by suspects. This article presents a visualisation methodology supporting recurrent practical analytical tasks such as the detection of crime series or the analysis of traces left by digital devices like mobile phone or GPS devices. The proposed approach has led to the development of a dedicated tool that has proven its effectiveness in real inquiries and intelligence practices. It supports a more fluent visual analysis of the collected data and may provide critical clues to support police operations as exemplified by the presented case studies.
Resumo:
Quantitative information from magnetic resonance imaging (MRI) may substantiate clinical findings and provide additional insight into the mechanism of clinical interventions in therapeutic stroke trials. The PERFORM study is exploring the efficacy of terutroban versus aspirin for secondary prevention in patients with a history of ischemic stroke. We report on the design of an exploratory longitudinal MRI follow-up study that was performed in a subgroup of the PERFORM trial. An international multi-centre longitudinal follow-up MRI study was designed for different MR systems employing safety and efficacy readouts: new T2 lesions, new DWI lesions, whole brain volume change, hippocampal volume change, changes in tissue microstructure as depicted by mean diffusivity and fractional anisotropy, vessel patency on MR angiography, and the presence of and development of new microbleeds. A total of 1,056 patients (men and women ≥ 55 years) were included. The data analysis included 3D reformation, image registration of different contrasts, tissue segmentation, and automated lesion detection. This large international multi-centre study demonstrates how new MRI readouts can be used to provide key information on the evolution of cerebral tissue lesions and within the macrovasculature after atherothrombotic stroke in a large sample of patients.
Resumo:
The Office of Special Investigations at Iowa Department of Transportation (DOT) collects FWD data on regular basis to evaluate pavement structural conditions. The primary objective of this study was to develop a fully-automated software system for rapid processing of the FWD data along with a user manual. The software system automatically reads the FWD raw data collected by the JILS-20 type FWD machine that Iowa DOT owns, processes and analyzes the collected data with the rapid prediction algorithms developed during the phase I study. This system smoothly integrates the FWD data analysis algorithms and the computer program being used to collect the pavement deflection data. This system can be used to assess pavement condition, estimate remaining pavement life, and eventually help assess pavement rehabilitation strategies by the Iowa DOT pavement management team. This report describes the developed software in detail and can also be used as a user-manual for conducting simulation studies and detailed analyses. *********************** Large File ***********************
Resumo:
The present research deals with an important public health threat, which is the pollution created by radon gas accumulation inside dwellings. The spatial modeling of indoor radon in Switzerland is particularly complex and challenging because of many influencing factors that should be taken into account. Indoor radon data analysis must be addressed from both a statistical and a spatial point of view. As a multivariate process, it was important at first to define the influence of each factor. In particular, it was important to define the influence of geology as being closely associated to indoor radon. This association was indeed observed for the Swiss data but not probed to be the sole determinant for the spatial modeling. The statistical analysis of data, both at univariate and multivariate level, was followed by an exploratory spatial analysis. Many tools proposed in the literature were tested and adapted, including fractality, declustering and moving windows methods. The use of Quan-tité Morisita Index (QMI) as a procedure to evaluate data clustering in function of the radon level was proposed. The existing methods of declustering were revised and applied in an attempt to approach the global histogram parameters. The exploratory phase comes along with the definition of multiple scales of interest for indoor radon mapping in Switzerland. The analysis was done with a top-to-down resolution approach, from regional to local lev¬els in order to find the appropriate scales for modeling. In this sense, data partition was optimized in order to cope with stationary conditions of geostatistical models. Common methods of spatial modeling such as Κ Nearest Neighbors (KNN), variography and General Regression Neural Networks (GRNN) were proposed as exploratory tools. In the following section, different spatial interpolation methods were applied for a par-ticular dataset. A bottom to top method complexity approach was adopted and the results were analyzed together in order to find common definitions of continuity and neighborhood parameters. Additionally, a data filter based on cross-validation was tested with the purpose of reducing noise at local scale (the CVMF). At the end of the chapter, a series of test for data consistency and methods robustness were performed. This lead to conclude about the importance of data splitting and the limitation of generalization methods for reproducing statistical distributions. The last section was dedicated to modeling methods with probabilistic interpretations. Data transformation and simulations thus allowed the use of multigaussian models and helped take the indoor radon pollution data uncertainty into consideration. The catego-rization transform was presented as a solution for extreme values modeling through clas-sification. Simulation scenarios were proposed, including an alternative proposal for the reproduction of the global histogram based on the sampling domain. The sequential Gaussian simulation (SGS) was presented as the method giving the most complete information, while classification performed in a more robust way. An error measure was defined in relation to the decision function for data classification hardening. Within the classification methods, probabilistic neural networks (PNN) show to be better adapted for modeling of high threshold categorization and for automation. Support vector machines (SVM) on the contrary performed well under balanced category conditions. In general, it was concluded that a particular prediction or estimation method is not better under all conditions of scale and neighborhood definitions. Simulations should be the basis, while other methods can provide complementary information to accomplish an efficient indoor radon decision making.
Resumo:
In general, laboratory activities are costly in terms of time, space, and money. As such, the ability to provide realistically simulated laboratory data that enables students to practice data analysis techniques as a complementary activity would be expected to reduce these costs while opening up very interesting possibilities. In the present work, a novel methodology is presented for design of analytical chemistry instrumental analysis exercises that can be automatically personalized for each student and the results evaluated immediately. The proposed system provides each student with a different set of experimental data generated randomly while satisfying a set of constraints, rather than using data obtained from actual laboratory work. This allows the instructor to provide students with a set of practical problems to complement their regular laboratory work along with the corresponding feedback provided by the system's automatic evaluation process. To this end, the Goodle Grading Management System (GMS), an innovative web-based educational tool for automating the collection and assessment of practical exercises for engineering and scientific courses, was developed. The proposed methodology takes full advantage of the Goodle GMS fusion code architecture. The design of a particular exercise is provided ad hoc by the instructor and requires basic Matlab knowledge. The system has been employed with satisfactory results in several university courses. To demonstrate the automatic evaluation process, three exercises are presented in detail. The first exercise involves a linear regression analysis of data and the calculation of the quality parameters of an instrumental analysis method. The second and third exercises address two different comparison tests, a comparison test of the mean and a t-paired test.
Resumo:
Our objective was to clone, express and characterize adult Dermatophagoides farinae group 1 (Der f 1) allergens to further produce recombinant allergens for future clinical applications in order to eliminate side reactions from crude extracts of mites. Based on GenBank data, we designed primers and amplified the cDNA fragment coding for Der f 1 by nested-PCR. After purification and recovery, the cDNA fragment was cloned into the pMD19-T vector. The fragment was then sequenced, subcloned into the plasmid pET28a(+), expressed in Escherichia coli BL21 and identified by Western blotting. The cDNA coding for Der f 1 was cloned, sequenced and expressed successfully. Sequence analysis showed the presence of an open reading frame containing 966 bp that encodes a protein of 321 amino acids. Interestingly, homology analysis showed that the Der p 1 shared more than 87% identity in amino acid sequence with Eur m 1 but only 80% with Der f 1. Furthermore, phylogenetic analyses suggested that D. pteronyssinus was evolutionarily closer to Euroglyphus maynei than to D. farinae, even though D. pteronyssinus and D. farinae belong to the same Dermatophagoides genus. A total of three cysteine peptidase active sites were found in the predicted amino acid sequence, including 127-138 (QGGCGSCWAFSG), 267-277 (NYHAVNIVGYG) and 284-303 (YWIVRNSWDTTWGDSGYGYF). Moreover, secondary structure analysis revealed that Der f 1 contained an a helix (33.96%), an extended strand (17.13%), a ß turn (5.61%), and a random coil (43.30%). A simple three-dimensional model of this protein was constructed using a Swiss-model server. The cDNA coding for Der f 1 was cloned, sequenced and expressed successfully. Alignment and phylogenetic analysis suggests that D. pteronyssinus is evolutionarily more similar to E. maynei than to D. farinae.
Resumo:
This study examined students considered at risk of non-completion of their Ontario Secondary School Diploma and aimed to offer insight into the questions, "What factors currently lead to school disconnect" and "How can these factors be addressed?" Eight students currently enrolled in an alternative learning environment participated in the study. Each was asked to take part in two, digitally recorded interviews that were subsequently transcribed by the researcher. The data were then coded and analysed according to specific themes: obstacles, empowerment, goals, views about success, opinions of school, and power of the teacher. From these themes, three broad focus areas emerged that were used to keep the data analysis focused: worldview, school effects, and self-image. Variances between the data collected and ideas presented in the current literature were highlighted as a reminder that when dealing with a human population, we cannot rely on textbook definitions and theory alone.
Resumo:
This qualitative investigation examined the nature of 7 highly artistic visual arts students at 2 secondary schools in southcentral Ontario. Through interviews, questionnaires, observations, and artwork documents, this study attempted to understand these highly artistic students in terms of creativity, motivation, social and emotional perspectives, and cognitive processes. Data collection occuned over a 3-monlh period. and the data analysis program NVivo 7 was used for coding to develop themes and categories for organizing data. The findings of this study illustrate the significant place that \ isual arts can lake in the growth and development for the youth of today. Participants idcniificd dcxclopnig critical thinking and problem-solving skills, taking risks, and meeting challenges ilirouuh their engagement in the creative process. The transferability of these skills \\ as referenced to numerous aspects of their lives. By enhancing individual perspectives through the study of visual arts, their local and world connections were extended, and environmental and societal concerns evolved. In addition, the communicative opportunities that visual arts provided for these students in terms of personal expression provided emotional health and paths of personal discovery. Through the participants' production of artwork with the many stages this involves, combined with insight into their needs, the participants relayed miportant suggestions for programming enhancements and educational settmgs lor \ isiial arts classrooms. These suggestions are meaningful for educators and curriculum developers of the future.
Resumo:
The main objective of this research was to examine the relationship between surface electromyographic (SEMG) spike activity and force. The secondary objective was to determine to what extent subcutaneous tissue impacts the high frequency component of the signal, as well as, examining the relationship between measures of SEMG spike shape and their traditional time and frequency analogues. A total of96 participants (46 males and 50 females) ranging in age (18-35 years), generated three 5-second isometric step contractions at each force level of 40, 60, 80, and 100 percent of maximal voluntary contraction (MVC). The presentation of the contractions was balanced across subjects. The right arm of the subject was positioned in the sagittal plane, with the shoulder and elbow flexed to 90 degrees. The elbow rested on a support in a neutral position (mid pronation/mid supination) and placed within a wrist cuff, fastened below the styloid process. The wrist cuff was attached to a load cell (JR3 Inc., Woodland, CA) recording the force produced. Biceps brachii activity was monitored with a pair of Ag/AgCI recording electrodes (Grass F-E9, Astro-Med Inc., West Warwick, RI) placed in a bipolar configuration, with an interelectrode distance (lED) of 2cm distal to the motor point. Data analysis was performed on a I second window of data in the middle of the 5-second contraction. The results indicated that all spike shape measures exhibited significant (p < 0.01) differences as force increase~ from 40 to 100% MVC. The spike shape measures suggest that increased motor unit (MU) recruitment was responsible for increasing force up to 80% MVC. The results suggested that further increases in force relied on MU III synchronization. The results also revealed that the subcutaneous tissue (skin fold thickness) had no relationship (r = 0.02; P > 0.05) with the mean number of peaks per spike (MNPPS), which was the high frequency component of the signal. Mean spike amplitude (MSA) and mean spike frequency (MSF) were highly correlated with their traditional measures root mean square (RMS) and mean power frequency (MPF), respectively (r = 0.99; r = 0.97; P < 0.01).
Resumo:
One hundred and seventy-two subj ects participated in this quantitative, correlational survey which tested Hackman and Oldham's Job Characteristics Model in an educational setting. Subjects were Teaching Masters, Chairmen and Deans from an Ontario community college. The data were collected via mailed questionnaire, on all variables of the model. Several reliable, valid instruments were used to test the variables. Data analysis through Pearson correlation and stepwise multiple regression analyses revealed that core job characteristics predicted certain critical psychological states and that these critical psychological states, in turn were able to predict various personal and work outcomes but not absenteeism. The context variable, Satisfaction with Co-workers, was the only consistent moderating variable between core characteristics and critical psychological states; however, individual employee differences did moderate the relationship between critical psychological states and all of the personal and work outcomes except Internal Work Motivation. Two other moderator variables, Satisfaction with Context and Growth Need Strength, demonstrated an ability to predict the outcome General Job Satisfaction. The research suggests that this model may be used for job design and redesign purposes within the community college setting.