922 resultados para Functional data analysis
Resumo:
The thesis is the result of work conducted during a period of six months at the Strategy department of Automobili Lamborghini S.p.A. in Sant'Agata Bolognese (BO) and concerns the study and analysis of Big Data relating to Lamborghini's connected cars. The Big Data is a project of Connected Car Project House, that is an inter-departmental team which works toward the definition of the Lamborghini corporate connectivity strategy and its implementation in the product portfolio. The Data of the connected cars is one of the hottest topics right now in the automotive industry; in fact, all the largest automotive companies are investi,ng a lot in this direction, in order to derive the greatest advantages both from a purely economic point of view, because from these data you can understand a lot the behaviors and habits of each driver, and from a technological point of view because it will increasingly promote the development of 5G that will be an important enabler for the future of connectivity. The main purpose of the work by Lamborghini prospective is to analyze the data of the connected cars, in particular a data-set referred to connected Huracans that had been already placed on the market, and, starting from that point, derive valuable Key Performance Indicators (KPIs) on which the company could partly base the decisions to be made in the near future. The key result that we have obtained at the end of this period was the creation of a Dashboard, in which is possible to visualize many parameters and indicators both related to driving habits and the use of the vehicle itself, which has brought great insights on the huge potential and value that is present behind the study of these data. The final Demo of the project has received great interest, not only from the whole strategy department but also from all the other business areas of Lamborghini, making mostly a great awareness that this will be the road to follow in the coming years.
Resumo:
I principi Agile, pubblicati nell’omonimo Manifesto più di 20 anni fa, al giorno d’oggi sono declinati in una moltitudine di framework: Scrum, XP, Kanban, Lean, Adaptive, Crystal, etc. Nella prima parte della tesi (Capitoli 1 e 2) sono stati descritti alcuni di questi framework e si è analizzato come un approccio Agile è utilizzato nella pratica in uno specifico caso d’uso: lo sviluppo di una piattaforma software a supporto di un sistema di e-grocery da parte di un team di lab51. Si sono verificate le differenze e le similitudini rispetto alcuni metodi Agile formalizzati in letteratura spiegando le motivazioni che hanno portato a differenziarsi da questi framework illustrando i vantaggi per il team. Nella seconda parte della tesi (Capitoli 3 e 4) è stata effettuata un’analisi dei dati raccolti dal supermercato online negli ultimi anni con l’obiettivo di migliorare l’algoritmo di riordino. In particolare, per prevedere le vendite dei singoli prodotti al fine di avere degli ordini più adeguati in quantità e frequenza, sono stati studiati vari approcci: dai modelli statistici di time series forecasting, alle reti neurali, fino ad una metodologia sviluppata ad hoc.
Resumo:
There are many natural events that can negatively affect the urban ecosystem, but weather-climate variations are certainly among the most significant. The history of settlements has been characterized by extreme events like earthquakes and floods, which repeat themselves at different times, causing extensive damage to the built heritage on a structural and urban scale. Changes in climate also alter various climatic subsystems, changing rainfall regimes and hydrological cycles, increasing the frequency and intensity of extreme precipitation events (heavy rainfall). From an hydrological risk perspective, it is crucial to understand future events that could occur and their magnitude in order to design safer infrastructures. Unfortunately, it is not easy to understand future scenarios as the complexity of climate is enormous. For this thesis, precipitation and discharge extremes were primarily used as data sources. It is important to underline that the two data sets are not separated: changes in rainfall regime, due to climate change, could significantly affect overflows into receiving water bodies. It is imperative that we understand and model climate change effects on water structures to support the development of adaptation strategies. The main purpose of this thesis is to search for suitable water structures for a road located along the Tione River. Therefore, through the analysis of the area from a hydrological point of view, we aim to guarantee the safety of the infrastructure over time. The observations made have the purpose to underline how models such as a stochastic one can improve the quality of an analysis for design purposes, and influence choices.
Resumo:
Cette thèse porte sur l'analyse bayésienne de données fonctionnelles dans un contexte hydrologique. L'objectif principal est de modéliser des données d'écoulements d'eau d'une manière parcimonieuse tout en reproduisant adéquatement les caractéristiques statistiques de celles-ci. L'analyse de données fonctionnelles nous amène à considérer les séries chronologiques d'écoulements d'eau comme des fonctions à modéliser avec une méthode non paramétrique. Dans un premier temps, les fonctions sont rendues plus homogènes en les synchronisant. Ensuite, disposant d'un échantillon de courbes homogènes, nous procédons à la modélisation de leurs caractéristiques statistiques en faisant appel aux splines de régression bayésiennes dans un cadre probabiliste assez général. Plus spécifiquement, nous étudions une famille de distributions continues, qui inclut celles de la famille exponentielle, de laquelle les observations peuvent provenir. De plus, afin d'avoir un outil de modélisation non paramétrique flexible, nous traitons les noeuds intérieurs, qui définissent les éléments de la base des splines de régression, comme des quantités aléatoires. Nous utilisons alors le MCMC avec sauts réversibles afin d'explorer la distribution a posteriori des noeuds intérieurs. Afin de simplifier cette procédure dans notre contexte général de modélisation, nous considérons des approximations de la distribution marginale des observations, nommément une approximation basée sur le critère d'information de Schwarz et une autre qui fait appel à l'approximation de Laplace. En plus de modéliser la tendance centrale d'un échantillon de courbes, nous proposons aussi une méthodologie pour modéliser simultanément la tendance centrale et la dispersion de ces courbes, et ce dans notre cadre probabiliste général. Finalement, puisque nous étudions une diversité de distributions statistiques au niveau des observations, nous mettons de l'avant une approche afin de déterminer les distributions les plus adéquates pour un échantillon de courbes donné.
Resumo:
Besides the spinal deformity, scoliosis modifies notably the general appearance of the trunk resulting in trunk rotation, imbalance, and asymmetries that constitutes patients' major concern. Existing classifications of scoliosis, based on the type of spinal curve as depicted on radiographs, are currently used to guide treatment strategies. Unfortunately, even though a perfect correction of the spinal curve is achieved, some trunk deformities remain, making patients dissatisfied with the treatment received. The purpose of this study is to identify possible shape patterns of trunk surface deformity associated with scoliosis. First, trunk surface is represented by a multivariate functional trunk shape descriptor based on 3-D clinical measurements computed on cross sections of the trunk. Then, the classical formulation of hierarchical clustering is adapted to the case of multivariate functional data and applied to a set of 236 trunk surface 3-D reconstructions. The highest internal validity is obtained when considering 11 clusters that explain up to 65% of the variance in our dataset. Our clustering result shows a concordance with the radiographic classification of spinal curves in 68% of the cases. As opposed to radiographic evaluation, the trunk descriptor is 3-D and its functional nature offers a compact and elegant description of not only the type, but also the severity and extent of the trunk surface deformity along the trunk length. In future work, new management strategies based on the resulting trunk shape patterns could be thought of in order to improve the esthetic outcome after treatment, and thus patients satisfaction.
Resumo:
Fractal theory presents a large number of applications to image and signal analysis. Although the fractal dimension can be used as an image object descriptor, a multiscale approach, such as multiscale fractal dimension (MFD), increases the amount of information extracted from an object. MFD provides a curve which describes object complexity along the scale. However, this curve presents much redundant information, which could be discarded without loss in performance. Thus, it is necessary the use of a descriptor technique to analyze this curve and also to reduce the dimensionality of these data by selecting its meaningful descriptors. This paper shows a comparative study among different techniques for MFD descriptors generation. It compares the use of well-known and state-of-the-art descriptors, such as Fourier, Wavelet, Polynomial Approximation (PA), Functional Data Analysis (FDA), Principal Component Analysis (PCA), Symbolic Aggregate Approximation (SAX), kernel PCA, Independent Component Analysis (ICA), geometrical and statistical features. The descriptors are evaluated in a classification experiment using Linear Discriminant Analysis over the descriptors computed from MFD curves from two data sets: generic shapes and rotated fish contours. Results indicate that PCA, FDA, PA and Wavelet Approximation provide the best MFD descriptors for recognition and classification tasks. (C) 2012 Elsevier B.V. All rights reserved.
Wavelet correlation between subjects: A time-scale data driven analysis for brain mapping using fMRI
Resumo:
Functional magnetic resonance imaging (fMRI) based on BOLD signal has been used to indirectly measure the local neural activity induced by cognitive tasks or stimulation. Most fMRI data analysis is carried out using the general linear model (GLM), a statistical approach which predicts the changes in the observed BOLD response based on an expected hemodynamic response function (HRF). In cases when the task is cognitively complex or in cases of diseases, variations in shape and/or delay may reduce the reliability of results. A novel exploratory method using fMRI data, which attempts to discriminate between neurophysiological signals induced by the stimulation protocol from artifacts or other confounding factors, is introduced in this paper. This new method is based on the fusion between correlation analysis and the discrete wavelet transform, to identify similarities in the time course of the BOLD signal in a group of volunteers. We illustrate the usefulness of this approach by analyzing fMRI data from normal subjects presented with standardized human face pictures expressing different degrees of sadness. The results show that the proposed wavelet correlation analysis has greater statistical power than conventional GLM or time domain intersubject correlation analysis. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Resumo:
Plant growth analysis presents difficulties related to statistical comparison of growth rates, and the analysis of variance of primary data could guide the interpretation of results. The objective of this work was to evaluate the analysis of variance of data from distinct harvests of an experiment, focusing especially on the homogeneity of variances and the choice of an adequate ANOVA model. Data from five experiments covering different crops and growth conditions were used. From the total number of variables, 19% were originally homoscedastic, 60% became homoscedastic after logarithmic transformation, and 21% remained heteroscedastic after transformation. Data transformation did not affect the F test in one experiment, whereas in the other experiments transformation modified the F test usually reducing the number of significant effects. Even when transformation has not altered the F test, mean comparisons led to divergent interpretations. The mixed ANOVA model, considering harvest as a random effect, reduced the number of significant effects of every factor which had the F test modified by this model. Examples illustrated that analysis of variance of primary variables provides a tool for identifying significant differences in growth rates. The analysis of variance imposes restrictions to experimental design thereby eliminating some advantages of the functional growth analysis.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Abstract Background Prostate cancer is a leading cause of death in the male population, therefore, a comprehensive study about the genes and the molecular networks involved in the tumoral prostate process becomes necessary. In order to understand the biological process behind potential biomarkers, we have analyzed a set of 57 cDNA microarrays containing ~25,000 genes. Results Principal Component Analysis (PCA) combined with the Maximum-entropy Linear Discriminant Analysis (MLDA) were applied in order to identify genes with the most discriminative information between normal and tumoral prostatic tissues. Data analysis was carried out using three different approaches, namely: (i) differences in gene expression levels between normal and tumoral conditions from an univariate point of view; (ii) in a multivariate fashion using MLDA; and (iii) with a dependence network approach. Our results show that malignant transformation in the prostatic tissue is more related to functional connectivity changes in their dependence networks than to differential gene expression. The MYLK, KLK2, KLK3, HAN11, LTF, CSRP1 and TGM4 genes presented significant changes in their functional connectivity between normal and tumoral conditions and were also classified as the top seven most informative genes for the prostate cancer genesis process by our discriminant analysis. Moreover, among the identified genes we found classically known biomarkers and genes which are closely related to tumoral prostate, such as KLK3 and KLK2 and several other potential ones. Conclusion We have demonstrated that changes in functional connectivity may be implicit in the biological process which renders some genes more informative to discriminate between normal and tumoral conditions. Using the proposed method, namely, MLDA, in order to analyze the multivariate characteristic of genes, it was possible to capture the changes in dependence networks which are related to cell transformation.
Resumo:
Background: A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results: In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions: This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.
Resumo:
CONTEXT Although open radical cystectomy (ORC) is still the standard approach, laparoscopic radical cystectomy (LRC) and robot-assisted radical cystectomy (RARC) are increasingly performed. OBJECTIVE To report on a systematic literature review and cumulative analysis of pathologic, oncologic, and functional outcomes of RARC in comparison with ORC and LRC. EVIDENCE ACQUISITION Medline, Scopus, and Web of Science databases were searched using a free-text protocol including the terms robot-assisted radical cystectomy or da Vinci radical cystectomy or robot* radical cystectomy. RARC case series and studies comparing RARC with either ORC or LRC were collected. A cumulative analysis was conducted. EVIDENCE SYNTHESIS The searches retrieved 105 papers, 87 of which reported on pathologic, oncologic, or functional outcomes. Most series were retrospective and had small case numbers, short follow-up, and potential patient selection bias. The lymph node yield during lymph node dissection was 19 (range: 3-55), with half of the series following an extended template (yield range: 11-55). The lymph node-positive rate was 22%. The performance of lymphadenectomy was correlated with surgeon and institutional volume. Cumulative analyses showed no significant difference in lymph node yield between RARC and ORC. Positive surgical margin (PSM) rates were 5.6% (1-1.5% in pT2 disease and 0-25% in pT3 and higher disease). PSM rates did not appear to decrease with sequential case numbers. Cumulative analyses showed no significant difference in rates of surgical margins between RARC and ORC or RARC and LRC. Neoadjuvant chemotherapy use ranged from 0% to 31%, with adjuvant chemotherapy used in 4-29% of patients. Only six series reported a mean follow-up of >36 mo. Three-year disease-free survival (DFS), cancer-specific survival (CSS), and overall survival (OS) rates were 67-76%, 68-83%, and 61-80%, respectively. The 5-yr DFS, CSS, and OS rates were 53-74%, 66-80%, and 39-66%, respectively. Similar to ORC, disease of higher pathologic stage or evidence of lymph node involvement was associated with worse survival. Very limited data were available with respect to functional outcomes. The 12-mo continence rates with continent diversion were 83-100% in men for daytime continence and 66-76% for nighttime continence. In one series, potency was recovered in 63% of patients who were evaluable at 12 mo. CONCLUSIONS Oncologic and functional data from RARC remain immature, and longer-term prospective studies are needed. Cumulative analyses demonstrated that lymph node yields and PSM rates were similar between RARC and ORC. Conclusive long-term survival outcomes for RARC were limited, although oncologic outcomes up to 5 yr were similar to those reported for ORC. PATIENT SUMMARY Although open radical cystectomy (RC) is still regarded as the standard treatment for muscle-invasive bladder cancer, laparoscopic and robot-assisted RCs are becoming more popular. Templates of lymph node dissection, lymph node yields, and positive surgical margin rates are acceptable with robot-assisted RC. Although definitive comparisons with open RC with respect to oncologic or functional outcomes are lacking, early results appear comparable.
Resumo:
Background: Understanding transcriptional regulation by genome-wide microarray studies can contribute to unravel complex relationships between genes. Attempts to standardize the annotation of microarray data include the Minimum Information About a Microarray Experiment (MIAME) recommendations, the MAGE-ML format for data interchange, and the use of controlled vocabularies or ontologies. The existing software systems for microarray data analysis implement the mentioned standards only partially and are often hard to use and extend. Integration of genomic annotation data and other sources of external knowledge using open standards is therefore a key requirement for future integrated analysis systems. Results: The EMMA 2 software has been designed to resolve shortcomings with respect to full MAGE-ML and ontology support and makes use of modern data integration techniques. We present a software system that features comprehensive data analysis functions for spotted arrays, and for the most common synthesized oligo arrays such as Agilent, Affymetrix and NimbleGen. The system is based on the full MAGE object model. Analysis functionality is based on R and Bioconductor packages and can make use of a compute cluster for distributed services. Conclusion: Our model-driven approach for automatically implementing a full MAGE object model provides high flexibility and compatibility. Data integration via SOAP-based web-services is advantageous in a distributed client-server environment as the collaborative analysis of microarray data is gaining more and more relevance in international research consortia. The adequacy of the EMMA 2 software design and implementation has been proven by its application in many distributed functional genomics projects. Its scalability makes the current architecture suited for extensions towards future transcriptomics methods based on high-throughput sequencing approaches which have much higher computational requirements than microarrays.
Resumo:
Alternative splicing of gene transcripts greatly expands the functional capacity of the genome, and certain splice isoforms may indicate specific disease states such as cancer. Splice junction microarrays interrogate thousands of splice junctions, but data analysis is difficult and error prone because of the increased complexity compared to differential gene expression analysis. We present Rank Change Detection (RCD) as a method to identify differential splicing events based upon a straightforward probabilistic model comparing the over-or underrepresentation of two or more competing isoforms. RCD has advantages over commonly used methods because it is robust to false positive errors due to nonlinear trends in microarray measurements. Further, RCD does not depend on prior knowledge of splice isoforms, yet it takes advantage of the inherent structure of mutually exclusive junctions, and it is conceptually generalizable to other types of splicing arrays or RNA-Seq. RCD specifically identifies the biologically important cases when a splice junction becomes more or less prevalent compared to other mutually exclusive junctions. The example data is from different cell lines of glioblastoma tumors assayed with Agilent microarrays.