925 resultados para Modelling lifetime data
Resumo:
The ability to accurately predict the lifetime of building components is crucial to optimizing building design, material selection and scheduling of required maintenance. This paper discusses a number of possible data mining methods that can be applied to do the lifetime prediction of metallic components and how different sources of service life information could be integrated to form the basis of the lifetime prediction model
Resumo:
In a seminal data mining article, Leo Breiman [1] argued that to develop effective predictive classification and regression models, we need to move away from the sole dependency on statistical algorithms and embrace a wider toolkit of modeling algorithms that include data mining procedures. Nevertheless, many researchers still rely solely on statistical procedures when undertaking data modeling tasks; the sole reliance on these procedures has lead to the development of irrelevant theory and questionable research conclusions ([1], p.199). We will outline initiatives that the HPC & Research Support group is undertaking to engage researchers with data mining tools and techniques; including a new range of seminars, workshops, and one-on-one consultations covering data mining algorithms, the relationship between data mining and the research cycle, and limitations and problems with these new algorithms. Organisational limitations and restrictions to these initiatives are also discussed.
Resumo:
Background Birth weight and length have seasonal fluctuations. Previous analyses of birth weight by latitude effects identified seemingly contradictory results, showing both 6 and 12 monthly periodicities in weight. The aims of this paper are twofold: (a) to explore seasonal patterns in a large, Danish Medical Birth Register, and (b) to explore models based on seasonal exposures and a non-linear exposure-risk relationship. Methods Birth weight and birth lengths on over 1.5 million Danish singleton, live births were examined for seasonality. We modelled seasonal patterns based on linear, U- and J-shaped exposure-risk relationships. We then added an extra layer of complexity by modelling weighted population-based exposure patterns. Results The Danish data showed clear seasonal fluctuations for both birth weight and birth length. A bimodal model best fits the data, however the amplitude of the 6 and 12 month peaks changed over time. In the modelling exercises, U- and J-shaped exposure-risk relationships generate time series with both 6 and 12 month periodicities. Changing the weightings of the population exposure risks result in unexpected properties. A J-shaped exposure-risk relationship with a diminishing population exposure over time fitted the observed seasonal pattern in the Danish birth weight data. Conclusion In keeping with many other studies, Danish birth anthropometric data show complex and shifting seasonal patterns. We speculate that annual periodicities with non-linear exposure-risk models may underlie these findings. Understanding the nature of seasonal fluctuations can help generate candidate exposures.
Resumo:
This paper argues for a renewed focus on statistical reasoning in the beginning school years, with opportunities for children to engage in data modelling. Some of the core components of data modelling are addressed. A selection of results from the first data modelling activity implemented during the second year (2010; second grade) of a current longitudinal study are reported. Data modelling involves investigations of meaningful phenomena, deciding what is worthy of attention (identifying complex attributes), and then progressing to organising, structuring, visualising, and representing data. Reported here are children's abilities to identify diverse and complex attributes, sort and classify data in different ways, and create and interpret models to represent their data.
Resumo:
Mixture models are a flexible tool for unsupervised clustering that have found popularity in a vast array of research areas. In studies of medicine, the use of mixtures holds the potential to greatly enhance our understanding of patient responses through the identification of clinically meaningful clusters that, given the complexity of many data sources, may otherwise by intangible. Furthermore, when developed in the Bayesian framework, mixture models provide a natural means for capturing and propagating uncertainty in different aspects of a clustering solution, arguably resulting in richer analyses of the population under study. This thesis aims to investigate the use of Bayesian mixture models in analysing varied and detailed sources of patient information collected in the study of complex disease. The first aim of this thesis is to showcase the flexibility of mixture models in modelling markedly different types of data. In particular, we examine three common variants on the mixture model, namely, finite mixtures, Dirichlet Process mixtures and hidden Markov models. Beyond the development and application of these models to different sources of data, this thesis also focuses on modelling different aspects relating to uncertainty in clustering. Examples of clustering uncertainty considered are uncertainty in a patient’s true cluster membership and accounting for uncertainty in the true number of clusters present. Finally, this thesis aims to address and propose solutions to the task of comparing clustering solutions, whether this be comparing patients or observations assigned to different subgroups or comparing clustering solutions over multiple datasets. To address these aims, we consider a case study in Parkinson’s disease (PD), a complex and commonly diagnosed neurodegenerative disorder. In particular, two commonly collected sources of patient information are considered. The first source of data are on symptoms associated with PD, recorded using the Unified Parkinson’s Disease Rating Scale (UPDRS) and constitutes the first half of this thesis. The second half of this thesis is dedicated to the analysis of microelectrode recordings collected during Deep Brain Stimulation (DBS), a popular palliative treatment for advanced PD. Analysis of this second source of data centers on the problems of unsupervised detection and sorting of action potentials or "spikes" in recordings of multiple cell activity, providing valuable information on real time neural activity in the brain.
Resumo:
This paper argues for a renewed focus on statistical reasoning in the beginning school years, with opportunities for children to engage in data modelling. Results are reported from the first year of a 3-year longitudinal study in which three classes of first-grade children (6-year-olds) and their teachers engaged in data modelling activities. The theme of Looking after our Environment, part of the children’s science curriculum, provided the task context. The goals for the two activities addressed here included engaging children in core components of data modelling, namely, selecting attributes, structuring and representing data, identifying variation in data, and making predictions from given data. Results include the various ways in which children represented and re represented collected data, including attribute selection, and the metarepresentational competence they displayed in doing so. The “data lenses” through which the children dealt with informal inference (variation and prediction) are also reported.
Resumo:
Concerns regarding groundwater contamination with nitrate and the long-term sustainability of groundwater resources have prompted the development of a multi-layered three dimensional (3D) geological model to characterise the aquifer geometry of the Wairau Plain, Marlborough District, New Zealand. The 3D geological model which consists of eight litho-stratigraphic units has been subsequently used to synthesise hydrogeological and hydrogeochemical data for different aquifers in an approach that aims to demonstrate how integration of water chemistry data within the physical framework of a 3D geological model can help to better understand and conceptualise groundwater systems in complex geological settings. Multivariate statistical techniques(e.g. Principal Component Analysis and Hierarchical Cluster Analysis) were applied to groundwater chemistry data to identify hydrochemical facies which are characteristic of distinct evolutionary pathways and a common hydrologic history of groundwaters. Principal Component Analysis on hydrochemical data demonstrated that natural water-rock interactions, redox potential and human agricultural impact are the key controls of groundwater quality in the Wairau Plain. Hierarchical Cluster Analysis revealed distinct hydrochemical water quality groups in the Wairau Plain groundwater system. Visualisation of the results of the multivariate statistical analyses and distribution of groundwater nitrate concentrations in the context of aquifer lithology highlighted the link between groundwater chemistry and the lithology of host aquifers. The methodology followed in this study can be applied in a variety of hydrogeological settings to synthesise geological, hydrogeological and hydrochemical data and present them in a format readily understood by a wide range of stakeholders. This enables a more efficient communication of the results of scientific studies to the wider community.
Resumo:
Citizen Science projects are initiatives in which members of the general public participate in scientific research projects and perform or manage research-related tasks such as data collection and/or data annotation. Citizen Science is technologically possible and scientifically significant. However, as the gathered information is from the crowd, the data quality is always hard to manage. There are many ways to manage data quality, and reputation management is one of the common approaches. In recent year, many research teams have deployed many audio or image sensors in natural environment in order to monitor the status of animals or plants. The collected data will be analysed by ecologists. However, as the amount of collected data is exceedingly huge and the number of ecologists is very limited, it is impossible for scientists to manually analyse all these data. The functions of existing automated tools to process the data are still very limited and the results are still not very accurate. Therefore, researchers have turned to recruiting general citizens who are interested in helping scientific research to do the pre-processing tasks such as species tagging. Although research teams can save time and money by recruiting general citizens to volunteer their time and skills to help data analysis, the reliability of contributed data varies a lot. Therefore, this research aims to investigate techniques to enhance the reliability of data contributed by general citizens in scientific research projects especially for acoustic sensing projects. In particular, we aim to investigate how to use reputation management to enhance data reliability. Reputation systems have been used to solve the uncertainty and improve data quality in many marketing and E-Commerce domains. The commercial organizations which have chosen to embrace the reputation management and implement the technology have gained many benefits. Data quality issues are significant to the domain of Citizen Science due to the quantity and diversity of people and devices involved. However, research on reputation management in this area is relatively new. We therefore start our investigation by examining existing reputation systems in different domains. Then we design novel reputation management approaches for Citizen Science projects to categorise participants and data. We have investigated some critical elements which may influence data reliability in Citizen Science projects. These elements include personal information such as location and education and performance information such as the ability to recognise certain bird calls. The designed reputation framework is evaluated by a series of experiments involving many participants for collecting and interpreting data, in particular, environmental acoustic data. Our research in exploring the advantages of reputation management in Citizen Science (or crowdsourcing in general) will help increase awareness among organizations that are unacquainted with its potential benefits.
Resumo:
longitudinal study of data modelling across grades 1-3. The activity engaged children in designing, implementing, and analysing a survey about their new playground. Data modelling involves investigations of meaningful phenomena, deciding what is worthy of attention (identifying complex attributes), and then progressing to organising, structuring, visualising, and representing data. The core components of data modelling addressed here are children’s structuring and representing of data, with a focus on their display of metarepresentational competence (diSessa, 2004). Such competence includes students’ abilities to invent or design a variety of new representations, explain their creations, understand the role they play, and critique and compare the adequacy of representations. Reported here are the ways in which the children structured and represented their data, the metarepresentational competence displayed, and links between their metarepresentational competence and conceptual competence.
Resumo:
Citizen Science projects are initiatives in which members of the general public participate in scientific research projects and perform or manage research-related tasks such as data collection and/or data annotation. Citizen Science is technologically possible and scientifically significant. However, although research teams can save time and money by recruiting general citizens to volunteer their time and skills to help data analysis, the reliability of contributed data varies a lot. Data reliability issues are significant to the domain of Citizen Science due to the quantity and diversity of people and devices involved. Participants may submit low quality, misleading, inaccurate, or even malicious data. Therefore, finding a way to improve the data reliability has become an urgent demand. This study aims to investigate techniques to enhance the reliability of data contributed by general citizens in scientific research projects especially for acoustic sensing projects. In particular, we propose to design a reputation framework to enhance data reliability and also investigate some critical elements that should be aware of during developing and designing new reputation systems.