922 resultados para Compositional data analysis-roots in geosciences


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Raw measurement data does not always immediately convey useful information, but applying mathematical statistical analysis tools into measurement data can improve the situation. Data analysis can offer benefits like acquiring meaningful insight from the dataset, basing critical decisions on the findings, and ruling out human bias through proper statistical treatment. In this thesis we analyze data from an industrial mineral processing plant with the aim of studying the possibility of forecasting the quality of the final product, given by one variable, with a model based on the other variables. For the study mathematical tools like Qlucore Omics Explorer (QOE) and Sparse Bayesian regression (SB) are used. Later on, linear regression is used to build a model based on a subset of variables that seem to have most significant weights in the SB model. The results obtained from QOE show that the variable representing the desired final product does not correlate with other variables. For SB and linear regression, the results show that both SB and linear regression models built on 1-day averaged data seriously underestimate the variance of true data, whereas the two models built on 1-month averaged data are reliable and able to explain a larger proportion of variability in the available data, making them suitable for prediction purposes. However, it is concluded that no single model can fit well the whole available dataset and therefore, it is proposed for future work to make piecewise non linear regression models if the same available dataset is used, or the plant to provide another dataset that should be collected in a more systematic fashion than the present data for further analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In general, laboratory activities are costly in terms of time, space, and money. As such, the ability to provide realistically simulated laboratory data that enables students to practice data analysis techniques as a complementary activity would be expected to reduce these costs while opening up very interesting possibilities. In the present work, a novel methodology is presented for design of analytical chemistry instrumental analysis exercises that can be automatically personalized for each student and the results evaluated immediately. The proposed system provides each student with a different set of experimental data generated randomly while satisfying a set of constraints, rather than using data obtained from actual laboratory work. This allows the instructor to provide students with a set of practical problems to complement their regular laboratory work along with the corresponding feedback provided by the system's automatic evaluation process. To this end, the Goodle Grading Management System (GMS), an innovative web-based educational tool for automating the collection and assessment of practical exercises for engineering and scientific courses, was developed. The proposed methodology takes full advantage of the Goodle GMS fusion code architecture. The design of a particular exercise is provided ad hoc by the instructor and requires basic Matlab knowledge. The system has been employed with satisfactory results in several university courses. To demonstrate the automatic evaluation process, three exercises are presented in detail. The first exercise involves a linear regression analysis of data and the calculation of the quality parameters of an instrumental analysis method. The second and third exercises address two different comparison tests, a comparison test of the mean and a t-paired test.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ion mobility spectrometry (IMS) is a straightforward, low cost method for fast and sensitive determination of organic and inorganic analytes. Originally this portable technique was applied to the determination of gas phase compounds in security and military use. Nowadays, IMS has received increasing attention in environmental and biological analysis, and in food quality determination. This thesis consists of literature review of suitable sample preparation and introduction methods for liquid matrices applicable to IMS from its early development stages to date. Thermal desorption, solid phase microextraction (SPME) and membrane extraction were examined in experimental investigations of hazardous aquatic pollutants and potential pollutants. Also the effect of different natural waters on the extraction efficiency was studied, and the utilised IMS data processing methods are discussed. Parameters such as extraction and desorption temperatures, extraction time, SPME fibre depth, SPME fibre type and salt addition were examined for the studied sample preparation and introduction methods. The observed critical parameters were extracting material and temperature. The extraction methods showed time and cost effectiveness because sampling could be performed in single step procedures and from different natural water matrices within a few minutes. Based on these experimental and theoretical studies, the most suitable method to test in the automated monitoring system is membrane extraction. In future an IMS based early warning system for monitoring water pollutants could ensure the safe supply of drinking water. IMS can also be utilised for monitoring natural waters in cases of environmental leakage or chemical accidents. When combined with sophisticated sample introduction methods, IMS possesses the potential for both on-line and on-site identification of analytes in different water matrices.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Results of subgroup analysis (SA) reported in randomized clinical trials (RCT) cannot be adequately interpreted without information about the methods used in the study design and the data analysis. Our aim was to show how often inaccurate or incomplete reports occur. First, we selected eight methodological aspects of SA on the basis of their importance to a reader in determining the confidence that should be placed in the author's conclusions regarding such analysis. Then, we reviewed the current practice of reporting these methodological aspects of SA in clinical trials in four leading journals, i.e., the New England Journal of Medicine, the Journal of the American Medical Association, the Lancet, and the American Journal of Public Health. Eight consecutive reports from each journal published after July 1, 1998 were included. Of the 32 trials surveyed, 17 (53%) had at least one SA. Overall, the proportion of RCT reporting a particular methodological aspect ranged from 23 to 94%. Information on whether the SA preceded/followed the analysis was reported in only 7 (41%) of the studies. Of the total possible number of items to be reported, NEJM, JAMA, Lancet and AJPH clearly mentioned 59, 67, 58 and 72%, respectively. We conclude that current reporting of SA in RCT is incomplete and inaccurate. The results of such SA may have harmful effects on treatment recommendations if accepted without judicious scrutiny. We recommend that editors improve the reporting of SA in RCT by giving authors a list of the important items to be reported.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of the present study was to compare heart rate variability (HRV) at rest and during exercise using a temporal series obtained with the Polar S810i monitor and a signal from a LYNX® signal conditioner (BIO EMG 1000 model) with a channel configured for the acquisition of ECG signals. Fifteen healthy subjects aged 20.9 ± 1.4 years were analyzed. The subjects remained at rest for 20 min and performed exercise for another 20 min with the workload selected to achieve 60% of submaximal heart rate. RR series were obtained for each individual with a Polar S810i instrument and with an ECG analyzed with a biological signal conditioner. The HRV indices (rMSSD, pNN50, LFnu, HFnu, and LF/HF) were calculated after signal processing and analysis. The unpaired Student t-test and intraclass correlation coefficient were used for data analysis. No statistically significant differences were observed when comparing the values analyzed by means of the two devices for HRV at rest and during exercise. The intraclass correlation coefficient demonstrated satisfactory correlation between the values obtained by the devices at rest (pNN50 = 0.994; rMSSD = 0.995; LFnu = 0.978; HFnu = 0.978; LF/HF = 0.982) and during exercise (pNN50 = 0.869; rMSSD = 0.929; LFnu = 0.973; HFnu = 0.973; LF/HF = 0.942). The calculation of HRV values by means of temporal series obtained from the Polar S810i instrument appears to be as reliable as those obtained by processing the ECG signal captured with a signal conditioner.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Our surrounding landscape is in a constantly dynamic state, but recently the rate of changes and their effects on the environment have considerably increased. In terms of the impact on nature, this development has not been entirely positive, but has rather caused a decline in valuable species, habitats, and general biodiversity. Regardless of recognizing the problem and its high importance, plans and actions of how to stop the detrimental development are largely lacking. This partly originates from a lack of genuine will, but is also due to difficulties in detecting many valuable landscape components and their consequent neglect. To support knowledge extraction, various digital environmental data sources may be of substantial help, but only if all the relevant background factors are known and the data is processed in a suitable way. This dissertation concentrates on detecting ecologically valuable landscape components by using geospatial data sources, and applies this knowledge to support spatial planning and management activities. In other words, the focus is on observing regionally valuable species, habitats, and biotopes with GIS and remote sensing data, using suitable methods for their analysis. Primary emphasis is given to the hemiboreal vegetation zone and the drastic decline in its semi-natural grasslands, which were created by a long trajectory of traditional grazing and management activities. However, the applied perspective is largely methodological, and allows for the application of the obtained results in various contexts. Models based on statistical dependencies and correlations of multiple variables, which are able to extract desired properties from a large mass of initial data, are emphasized in the dissertation. In addition, the papers included combine several data sets from different sources and dates together, with the aim of detecting a wider range of environmental characteristics, as well as pointing out their temporal dynamics. The results of the dissertation emphasise the multidimensionality and dynamics of landscapes, which need to be understood in order to be able to recognise their ecologically valuable components. This not only requires knowledge about the emergence of these components and an understanding of the used data, but also the need to focus the observations on minute details that are able to indicate the existence of fragmented and partly overlapping landscape targets. In addition, this pinpoints the fact that most of the existing classifications are too generalised as such to provide all the required details, but they can be utilized at various steps along a longer processing chain. The dissertation also emphases the importance of landscape history as an important factor, which both creates and preserves ecological values, and which sets an essential standpoint for understanding the present landscape characteristics. The obtained results are significant both in terms of preserving semi-natural grasslands, as well as general methodological development, giving support to science-based framework in order to evaluate ecological values and guide spatial planning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose ofthis study was to explore various types ofreflection and to explore reflection on action, reflection as a practice, and reflection as a process. In doing this, the intent was to discover the perceived benefits of reflection in the classroom and to provide guidelines for future use at the undergraduate and graduate level. The qualitative components in this study included the data collection strategy of semistructured interviews with 2 undergraduate students, 2 graduate students, 1 undergraduate studies professor, and 1 graduate studies professor. The data analysis strategies included a within-case analysis and a cross-case analysis. Through the interviews participants discussed their experiences with the use ofreflection in the classroom. Through the completion ofthis analysis the researcher expected to discover the benefits ofreflection at this level of education, as well as provide suggestions for future use. Both undergraduate and graduate students and professors were found to benefit from the use of reflection in the classroom. The use ofreflection in the undergraduate and graduate classroom was found to improve student/teacher and student/peer relationships, foster critical thinking, allow for connections between learned theory and life experience, and improve students' writing abilities. Based on the results ofthe study the implications ofreflection for the undergraduate and graduate classroom and for further research are provided.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research used a quantitative study approach to investigate the “boy crisis” in Canada. Boy crisis advocates suggest that boys are being surpassed by girls on reading assessments and promote strategies to assist male students. A feminist framework was used in this study that allowed for an investigation and discussion of the factors that mediate between gender and success at reading comprehension, interpretation, and response to text without ignoring female students. Reading scores and questionnaire data compiled by the Pan-Canadian Assessment Program were used in this research, specifically the PCAP-13 2007 assessment of approximately 30,000 13-year-old students from all Canadian provinces and Yukon Territory (CMEC, 2008). Approximately 20,000 participants wrote the reading assessment, while 30,000 students completed the questionnaire responses. Predictor variables were tested using parametric tests such as independent samples t-test, one-way ANOVA, chi-square analysis, and Pearson r. Findings from this study indicate that although boys scored lower than girls on the PCAP-13 2007 reading assessment, factors were found to influence the reading scores of both male and female students to varying degrees. Socioeconomic status, perceptions of the reading material used in language arts classrooms, reading preference, reading interest, parental involvement, parental encouragement for reading, and self-efficacy were all found to affect the reading performance of boys and girls. Relationships between variables were also found and are discussed in this research. The analysis presented in this study allows parents, educators, and policy makers to begin to critically examine and re-evaluate boy crisis literature and offers suggestions on how to improve reading performance for all students of all socioeconomic backgrounds.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Behavioral researchers commonly use single subject designs to evaluate the effects of a given treatment. Several different methods of data analysis are used, each with their own set of methodological strengths and limitations. Visual inspection is commonly used as a method of analyzing data which assesses the variability, level, and trend both within and between conditions (Cooper, Heron, & Heward, 2007). In an attempt to quantify treatment outcomes, researchers developed two methods for analysing data called Percentage of Non-overlapping Data Points (PND) and Percentage of Data Points Exceeding the Median (PEM). The purpose of the present study is to compare and contrast the use of Hierarchical Linear Modelling (HLM), PND and PEM in single subject research. The present study used 39 behaviours, across 17 participants to compare treatment outcomes of a group cognitive behavioural therapy program, using PND, PEM, and HLM on three response classes of Obsessive Compulsive Behaviour in children with Autism Spectrum Disorder. Findings suggest that PEM and HLM complement each other and both add invaluable information to the overall treatment results. Future research should consider using both PEM and HLM when analysing single subject designs, specifically grouped data with variability.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Triple quadrupole mass spectrometers coupled with high performance liquid chromatography are workhorses in quantitative bioanalyses. It provides substantial benefits including reproducibility, sensitivity and selectivity for trace analysis. Selected Reaction Monitoring allows targeted assay development but data sets generated contain very limited information. Data mining and analysis of non-targeted high-resolution mass spectrometry profiles of biological samples offer the opportunity to perform more exhaustive assessments, including quantitative and qualitative analysis. The objectives of this study was to test method precision and accuracy, statistically compare bupivacaine drug concentration in real study samples and verify if high resolution and accurate mass data collected in scan mode can actually permit retrospective data analysis, more specifically, extract metabolite related information. The precision and accuracy data presented using both instruments provided equivalent results. Overall, the accuracy was ranging from 106.2 to 113.2% and the precision observed was from 1.0 to 3.7%. Statistical comparisons using a linear regression between both methods reveal a coefficient of determination (R2) of 0.9996 and a slope of 1.02 demonstrating a very strong correlation between both methods. Individual sample comparison showed differences from -4.5% to 1.6% well within the accepted analytical error. Moreover, post acquisition extracted ion chromatograms at m/z 233.1648 ± 5 ppm (M-56) and m/z 305.2224 ± 5 ppm (M+16) revealed the presence of desbutyl-bupivacaine and three distinct hydroxylated bupivacaine metabolites. Post acquisition analysis allowed us to produce semiquantitative evaluations of the concentration-time profiles for bupicavaine metabolites.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining is one of the hottest research areas nowadays as it has got wide variety of applications in common man’s life to make the world a better place to live. It is all about finding interesting hidden patterns in a huge history data base. As an example, from a sales data base, one can find an interesting pattern like “people who buy magazines tend to buy news papers also” using data mining. Now in the sales point of view the advantage is that one can place these things together in the shop to increase sales. In this research work, data mining is effectively applied to a domain called placement chance prediction, since taking wise career decision is so crucial for anybody for sure. In India technical manpower analysis is carried out by an organization named National Technical Manpower Information System (NTMIS), established in 1983-84 by India's Ministry of Education & Culture. The NTMIS comprises of a lead centre in the IAMR, New Delhi, and 21 nodal centres located at different parts of the country. The Kerala State Nodal Centre is located at Cochin University of Science and Technology. In Nodal Centre, they collect placement information by sending postal questionnaire to passed out students on a regular basis. From this raw data available in the nodal centre, a history data base was prepared. Each record in this data base includes entrance rank ranges, reservation, Sector, Sex, and a particular engineering. From each such combination of attributes from the history data base of student records, corresponding placement chances is computed and stored in the history data base. From this data, various popular data mining models are built and tested. These models can be used to predict the most suitable branch for a particular new student with one of the above combination of criteria. Also a detailed performance comparison of the various data mining models is done.This research work proposes to use a combination of data mining models namely a hybrid stacking ensemble for better predictions. A strategy to predict the overall absorption rate for various branches as well as the time it takes for all the students of a particular branch to get placed etc are also proposed. Finally, this research work puts forward a new data mining algorithm namely C 4.5 * stat for numeric data sets which has been proved to have competent accuracy over standard benchmarking data sets called UCI data sets. It also proposes an optimization strategy called parameter tuning to improve the standard C 4.5 algorithm. As a summary this research work passes through all four dimensions for a typical data mining research work, namely application to a domain, development of classifier models, optimization and ensemble methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

While most data analysis and decision support tools use numerical aspects of the data, Conceptual Information Systems focus on their conceptual structure. This paper discusses how both approaches can be combined.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traditionally, compositional data has been identified with closed data, and the simplex has been considered as the natural sample space of this kind of data. In our opinion, the emphasis on the constrained nature of compositional data has contributed to mask its real nature. More crucial than the constraining property of compositional data is the scale-invariant property of this kind of data. Indeed, when we are considering only few parts of a full composition we are not working with constrained data but our data are still compositional. We believe that it is necessary to give a more precise definition of composition. This is the aim of this oral contribution

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The biplot has proved to be a powerful descriptive and analytical tool in many areas of applications of statistics. For compositional data the necessary theoretical adaptation has been provided, with illustrative applications, by Aitchison (1990) and Aitchison and Greenacre (2002). These papers were restricted to the interpretation of simple compositional data sets. In many situations the problem has to be described in some form of conditional modelling. For example, in a clinical trial where interest is in how patients’ steroid metabolite compositions may change as a result of different treatment regimes, interest is in relating the compositions after treatment to the compositions before treatment and the nature of the treatments applied. To study this through a biplot technique requires the development of some form of conditional compositional biplot. This is the purpose of this paper. We choose as a motivating application an analysis of the 1992 US President ial Election, where interest may be in how the three-part composition, the percentage division among the three candidates - Bush, Clinton and Perot - of the presidential vote in each state, depends on the ethnic composition and on the urban-rural composition of the state. The methodology of conditional compositional biplots is first developed and a detailed interpretation of the 1992 US Presidential Election provided. We use a second application involving the conditional variability of tektite mineral compositions with respect to major oxide compositions to demonstrate some hazards of simplistic interpretation of biplots. Finally we conjecture on further possible applications of conditional compositional biplots