919 resultados para exploratory spatial data analysis
Resumo:
In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.
Resumo:
Precision horticulture and spatial analysis applied to orchards are a growing and evolving part of precision agriculture technology. The aim of this discipline is to reduce production costs by monitoring and analysing orchard-derived information to improve crop performance in an environmentally sound manner. Georeferencing and geostatistical analysis coupled to point-specific data mining allow to devise and implement management decisions tailored within the single orchard. Potential applications range from the opportunity to verify in real time along the season the effectiveness of cultural practices to achieve the production targets in terms of fruit size, number, yield and, in a near future, fruit quality traits. These data will impact not only the pre-harvest but their effect will extend to the post-harvest sector of the fruit chain. Chapter 1 provides an updated overview on precision horticulture , while in Chapter 2 a preliminary spatial statistic analysis of the variability in apple orchards is provided before and after manual thinning; an interpretation of this variability and how it can be managed to maximize orchard performance is offered. Then in Chapter 3 a stratification of spatial data into management classes to interpret and manage spatial variation on the orchard is undertaken. An inverse model approach is also applied to verify whether the crop production explains environmental variation. In Chapter 4 an integration of the techniques adopted before is presented. A new key for reading the information gathered within the field is offered. The overall goal of this Dissertation was to probe into the feasibility, the desirability and the effectiveness of a precision approach to fruit growing, following the lines of other areas of agriculture that already adopt this management tool. As existing applications of precision horticulture already had shown, crop specificity is an important factor to be accounted for. This work focused on apple because of its importance in the area where the work was carried out, and worldwide.
Resumo:
Nuclear Magnetic Resonance (NMR) is a branch of spectroscopy that is based on the fact that many atomic nuclei may be oriented by a strong magnetic field and will absorb radiofrequency radiation at characteristic frequencies. The parameters that can be measured on the resulting spectral lines (line positions, intensities, line widths, multiplicities and transients in time-dependent experi-ments) can be interpreted in terms of molecular structure, conformation, molecular motion and other rate processes. In this way, high resolution (HR) NMR allows performing qualitative and quantitative analysis of samples in solution, in order to determine the structure of molecules in solution and not only. In the past, high-field NMR spectroscopy has mainly concerned with the elucidation of chemical structure in solution, but today is emerging as a powerful exploratory tool for probing biochemical and physical processes. It represents a versatile tool for the analysis of foods. In literature many NMR studies have been reported on different type of food such as wine, olive oil, coffee, fruit juices, milk, meat, egg, starch granules, flour, etc using different NMR techniques. Traditionally, univariate analytical methods have been used to ex-plore spectroscopic data. This method is useful to measure or to se-lect a single descriptive variable from the whole spectrum and , at the end, only this variable is analyzed. This univariate methods ap-proach, applied to HR-NMR data, lead to different problems due especially to the complexity of an NMR spectrum. In fact, the lat-ter is composed of different signals belonging to different mole-cules, but it is also true that the same molecules can be represented by different signals, generally strongly correlated. The univariate methods, in this case, takes in account only one or a few variables, causing a loss of information. Thus, when dealing with complex samples like foodstuff, univariate analysis of spectra data results not enough powerful. Spectra need to be considered in their wholeness and, for analysing them, it must be taken in consideration the whole data matrix: chemometric methods are designed to treat such multivariate data. Multivariate data analysis is used for a number of distinct, differ-ent purposes and the aims can be divided into three main groups: • data description (explorative data structure modelling of any ge-neric n-dimensional data matrix, PCA for example); • regression and prediction (PLS); • classification and prediction of class belongings for new samples (LDA and PLS-DA and ECVA). The aim of this PhD thesis was to verify the possibility of identify-ing and classifying plants or foodstuffs, in different classes, based on the concerted variation in metabolite levels, detected by NMR spectra and using the multivariate data analysis as a tool to inter-pret NMR information. It is important to underline that the results obtained are useful to point out the metabolic consequences of a specific modification on foodstuffs, avoiding the use of a targeted analysis for the different metabolites. The data analysis is performed by applying chemomet-ric multivariate techniques to the NMR dataset of spectra acquired. The research work presented in this thesis is the result of a three years PhD study. This thesis reports the main results obtained from these two main activities: A1) Evaluation of a data pre-processing system in order to mini-mize unwanted sources of variations, due to different instrumental set up, manual spectra processing and to sample preparations arte-facts; A2) Application of multivariate chemiometric models in data analy-sis.
Resumo:
Although in Europe and in the USA many studies focus on organic, little is known on the topic in China. This research provides an insight on Shanghai consumers’ perception of organic, aiming at understanding and representing in graphic form the network of mental associations that stems from the organic concept. To acquire, process and aggregate the individual networks it was used the “Brand concept mapping” methodology (Roedder et al., 2006), while the data analysis was carried out also using analytic procedures. The results achieved suggest that organic food is perceived as healthy, safe and costly. Although these attributes are pretty much consistent with the European perception, some relevant differences emerged. First, organic is not necessarily synonymous with natural product in China, also due to a poor translation of the term in the Chinese language that conveys the idea of a manufactured product. Secondly, the organic label has to deal with the competition with the green food label in terms of image and positioning on the market, since they are easily associated and often confused. “Environmental protection” also emerged as relevant association, while the ethical and social values were not mentioned. In conclusion, health care and security concerns are the factors that influence most the food consumption in China (many people are so concerned about food safety that they found it difficult to shop), and the associations “Safe”, “Pure and natural”, “without chemicals” and “healthy” have been identified as the best candidates for leveraging a sound image of organic food .