8 resultados para data-mining application

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advances in biomedical signal acquisition systems for motion analysis have led to lowcost and ubiquitous wearable sensors which can be used to record movement data in different settings. This implies the potential availability of large amounts of quantitative data. It is then crucial to identify and to extract the information of clinical relevance from the large amount of available data. This quantitative and objective information can be an important aid for clinical decision making. Data mining is the process of discovering such information in databases through data processing, selection of informative data, and identification of relevant patterns. The databases considered in this thesis store motion data from wearable sensors (specifically accelerometers) and clinical information (clinical data, scores, tests). The main goal of this thesis is to develop data mining tools which can provide quantitative information to the clinician in the field of movement disorders. This thesis will focus on motor impairment in Parkinson's disease (PD). Different databases related to Parkinson subjects in different stages of the disease were considered for this thesis. Each database is characterized by the data recorded during a specific motor task performed by different groups of subjects. The data mining techniques that were used in this thesis are feature selection (a technique which was used to find relevant information and to discard useless or redundant data), classification, clustering, and regression. The aims were to identify high risk subjects for PD, characterize the differences between early PD subjects and healthy ones, characterize PD subtypes and automatically assess the severity of symptoms in the home setting.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Precision horticulture and spatial analysis applied to orchards are a growing and evolving part of precision agriculture technology. The aim of this discipline is to reduce production costs by monitoring and analysing orchard-derived information to improve crop performance in an environmentally sound manner. Georeferencing and geostatistical analysis coupled to point-specific data mining allow to devise and implement management decisions tailored within the single orchard. Potential applications range from the opportunity to verify in real time along the season the effectiveness of cultural practices to achieve the production targets in terms of fruit size, number, yield and, in a near future, fruit quality traits. These data will impact not only the pre-harvest but their effect will extend to the post-harvest sector of the fruit chain. Chapter 1 provides an updated overview on precision horticulture , while in Chapter 2 a preliminary spatial statistic analysis of the variability in apple orchards is provided before and after manual thinning; an interpretation of this variability and how it can be managed to maximize orchard performance is offered. Then in Chapter 3 a stratification of spatial data into management classes to interpret and manage spatial variation on the orchard is undertaken. An inverse model approach is also applied to verify whether the crop production explains environmental variation. In Chapter 4 an integration of the techniques adopted before is presented. A new key for reading the information gathered within the field is offered. The overall goal of this Dissertation was to probe into the feasibility, the desirability and the effectiveness of a precision approach to fruit growing, following the lines of other areas of agriculture that already adopt this management tool. As existing applications of precision horticulture already had shown, crop specificity is an important factor to be accounted for. This work focused on apple because of its importance in the area where the work was carried out, and worldwide.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The main aim of this Ph.D. dissertation is the study of clustering dependent data by means of copula functions with particular emphasis on microarray data. Copula functions are a popular multivariate modeling tool in each field where the multivariate dependence is of great interest and their use in clustering has not been still investigated. The first part of this work contains the review of the literature of clustering methods, copula functions and microarray experiments. The attention focuses on the K–means (Hartigan, 1975; Hartigan and Wong, 1979), the hierarchical (Everitt, 1974) and the model–based (Fraley and Raftery, 1998, 1999, 2000, 2007) clustering techniques because their performance is compared. Then, the probabilistic interpretation of the Sklar’s theorem (Sklar’s, 1959), the estimation methods for copulas like the Inference for Margins (Joe and Xu, 1996) and the Archimedean and Elliptical copula families are presented. In the end, applications of clustering methods and copulas to the genetic and microarray experiments are highlighted. The second part contains the original contribution proposed. A simulation study is performed in order to evaluate the performance of the K–means and the hierarchical bottom–up clustering methods in identifying clusters according to the dependence structure of the data generating process. Different simulations are performed by varying different conditions (e.g., the kind of margins (distinct, overlapping and nested) and the value of the dependence parameter ) and the results are evaluated by means of different measures of performance. In light of the simulation results and of the limits of the two investigated clustering methods, a new clustering algorithm based on copula functions (‘CoClust’ in brief) is proposed. The basic idea, the iterative procedure of the CoClust and the description of the written R functions with their output are given. The CoClust algorithm is tested on simulated data (by varying the number of clusters, the copula models, the dependence parameter value and the degree of overlap of margins) and is compared with the performance of model–based clustering by using different measures of performance, like the percentage of well–identified number of clusters and the not rejection percentage of H0 on . It is shown that the CoClust algorithm allows to overcome all observed limits of the other investigated clustering techniques and is able to identify clusters according to the dependence structure of the data independently of the degree of overlap of margins and the strength of the dependence. The CoClust uses a criterion based on the maximized log–likelihood function of the copula and can virtually account for any possible dependence relationship between observations. Many peculiar characteristics are shown for the CoClust, e.g. its capability of identifying the true number of clusters and the fact that it does not require a starting classification. Finally, the CoClust algorithm is applied to the real microarray data of Hedenfalk et al. (2001) both to the gene expressions observed in three different cancer samples and to the columns (tumor samples) of the whole data matrix.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the last years, Intelligent Tutoring Systems have been a very successful way for improving learning experience. Many issues must be addressed until this technology can be defined mature. One of the main problems within the Intelligent Tutoring Systems is the process of contents authoring: knowledge acquisition and manipulation processes are difficult tasks because they require a specialised skills on computer programming and knowledge engineering. In this thesis we discuss a general framework for knowledge management in an Intelligent Tutoring System and propose a mechanism based on first order data mining to partially automate the process of knowledge acquisition that have to be used in the ITS during the tutoring process. Such a mechanism can be applied in Constraint Based Tutor and in the Pseudo-Cognitive Tutor. We design and implement a part of the proposed architecture, mainly the module of knowledge acquisition from examples based on first order data mining. We then show that the algorithm can be applied at least two different domains: first order algebra equation and some topics of C programming language. Finally we discuss the limitation of current approach and the possible improvements of the whole framework.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this work, new tools in atmospheric pollutant sampling and analysis were applied in order to go deeper in source apportionment study. The project was developed mainly by the study of atmospheric emission sources in a suburban area influenced by a municipal solid waste incinerator (MSWI), a medium-sized coastal tourist town and a motorway. Two main research lines were followed. For what concerns the first line, the potentiality of the use of PM samplers coupled with a wind select sensor was assessed. Results showed that they may be a valid support in source apportionment studies. However, meteorological and territorial conditions could strongly affect the results. Moreover, new markers were investigated, particularly focusing on the processes of biomass burning. OC revealed a good biomass combustion process indicator, as well as all determined organic compounds. Among metals, lead and aluminium are well related to the biomass combustion. Surprisingly PM was not enriched of potassium during bonfire event. The second research line consists on the application of Positive Matrix factorization (PMF), a new statistical tool in data analysis. This new technique was applied to datasets which refer to different time resolution data. PMF application to atmospheric deposition fluxes identified six main sources affecting the area. The incinerator’s relative contribution seemed to be negligible. PMF analysis was then applied to PM2.5 collected with samplers coupled with a wind select sensor. The higher number of determined environmental indicators allowed to obtain more detailed results on the sources affecting the area. Vehicular traffic revealed the source of greatest concern for the study area. Also in this case, incinerator’s relative contribution seemed to be negligible. Finally, the application of PMF analysis to hourly aerosol data demonstrated that the higher the temporal resolution of the data was, the more the source profiles were close to the real one.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this thesis the evolution of the techno-social systems analysis methods will be reported, through the explanation of the various research experience directly faced. The first case presented is a research based on data mining of a dataset of words association named Human Brain Cloud: validation will be faced and, also through a non-trivial modeling, a better understanding of language properties will be presented. Then, a real complex system experiment will be introduced: the WideNoise experiment in the context of the EveryAware european project. The project and the experiment course will be illustrated and data analysis will be displayed. Then the Experimental Tribe platform for social computation will be introduced . It has been conceived to help researchers in the implementation of web experiments, and aims also to catalyze the cumulative growth of experimental methodologies and the standardization of tools cited above. In the last part, three other research experience which already took place on the Experimental Tribe platform will be discussed in detail, from the design of the experiment to the analysis of the results and, eventually, to the modeling of the systems involved. The experiments are: CityRace, about the measurement of human traffic-facing strategies; laPENSOcosì, aiming to unveil the political opinion structure; AirProbe, implemented again in the EveryAware project framework, which consisted in monitoring air quality opinion shift of a community informed about local air pollution. At the end, the evolution of the technosocial systems investigation methods shall emerge together with the opportunities and the threats offered by this new scientific path.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Autism Spectrum Disorders (ASDs) describe a set of neurodevelopmental disorders. ASD represents a significant public health problem. Currently, ASDs are not diagnosed before the 2nd year of life but an early identification of ASDs would be crucial as interventions are much more effective than specific therapies starting in later childhood. To this aim, cheap an contact-less automatic approaches recently aroused great clinical interest. Among them, the cry and the movements of the newborn, both involving the central nervous system, are proposed as possible indicators of neurological disorders. This PhD work is a first step towards solving this challenging problem. An integrated system is presented enabling the recording of audio (crying) and video (movements) data of the newborn, their automatic analysis with innovative techniques for the extraction of clinically relevant parameters and their classification with data mining techniques. New robust algorithms were developed for the selection of the voiced parts of the cry signal, the estimation of acoustic parameters based on the wavelet transform and the analysis of the infant’s general movements (GMs) through a new body model for segmentation and 2D reconstruction. In addition to a thorough literature review this thesis presents the state of the art on these topics that shows that no studies exist concerning normative ranges for newborn infant cry in the first 6 months of life nor the correlation between cry and movements. Through the new automatic methods a population of control infants (“low-risk”, LR) was compared to a group of “high-risk” (HR) infants, i.e. siblings of children already diagnosed with ASD. A subset of LR infants clinically diagnosed as newborns with Typical Development (TD) and one affected by ASD were compared. The results show that the selected acoustic parameters allow good differentiation between the two groups. This result provides new perspectives both diagnostic and therapeutic.