80 resultados para Data Modelling
Resumo:
A large number of models have been derived from the two-parameter Weibull distribution and are referred to as Weibull models. They exhibit a wide range of shapes for the density and hazard functions, which makes them suitable for modelling complex failure data sets. The WPP and IWPP plot allows one to determine in a systematic manner if one or more of these models are suitable for modelling a given data set. This paper deals with this topic.
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
The performance of three analytical methods for multiple-frequency bioelectrical impedance analysis (MFBIA) data was assessed. The methods were the established method of Cole and Cole, the newly proposed method of Siconolfi and co-workers and a modification of this procedure. Method performance was assessed from the adequacy of the curve fitting techniques, as judged by the correlation coefficient and standard error of the estimate, and the accuracy of the different methods in determining the theoretical values of impedance parameters describing a set of model electrical circuits. The experimental data were well fitted by all curve-fitting procedures (r = 0.9 with SEE 0.3 to 3.5% or better for most circuit-procedure combinations). Cole-Cole modelling provided the most accurate estimates of circuit impedance values, generally within 1-2% of the theoretical values, followed by the Siconolfi procedure using a sixth-order polynomial regression (1-6% variation). None of the methods, however, accurately estimated circuit parameters when the measured impedances were low (<20 Omega) reflecting the electronic limits of the impedance meter used. These data suggest that Cole-Cole modelling remains the preferred method for the analysis of MFBIA data.
Resumo:
Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.
Resumo:
The movement of chemicals through the soil to the groundwater or discharged to surface waters represents a degradation of these resources. In many cases, serious human and stock health implications are associated with this form of pollution. The chemicals of interest include nutrients, pesticides, salts, and industrial wastes. Recent studies have shown that current models and methods do not adequately describe the leaching of nutrients through soil, often underestimating the risk of groundwater contamination by surface-applied chemicals, and overestimating the concentration of resident solutes. This inaccuracy results primarily from ignoring soil structure and nonequilibrium between soil constituents, water, and solutes. A multiple sample percolation system (MSPS), consisting of 25 individual collection wells, was constructed to study the effects of localized soil heterogeneities on the transport of nutrients (NO3-, Cl-, PO43-) in the vadose zone of an agricultural soil predominantly dominated by clay. Very significant variations in drainage patterns across a small spatial scale were observed tone-way ANOVA, p < 0.001) indicating considerable heterogeneity in water flow patterns and nutrient leaching. Using data collected from the multiple sample percolation experiments, this paper compares the performance of two mathematical models for predicting solute transport, the advective-dispersion model with a reaction term (ADR), and a two-region preferential flow model (TRM) suitable for modelling nonequilibrium transport. These results have implications for modelling solute transport and predicting nutrient loading on a larger scale. (C) 2001 Elsevier Science Ltd. All rights reserved.
Resumo:
The nuclear magnetic resonance (NMR) spin-spin relaxation time (T-2) is related to the radiation-dependent concentration of polymer formed in polymer gel dosimeters manufactured from monomers in an aqueous gelatin matrix. Changes in T-2 with time post-irradiation have been reported in the literature but their nature is not fully understood. We investigated those changes with time after irradiation using FT-Raman spectroscopy and the precise determination of T-2 at high magnetic field in a polymer gel dosimeter, A model of fast exchange of magnetization taking into account ongoing gelation and strengthening of the gelatin matrix as well as the polymerization of the monomers with time is presented. Published data on the changes of T-2 in gelatin gels as a function of post-manufacture time are used and fitted closely by the model presented. The same set of parameters characterizing the variations of T-2 in gelatin gels and the increasing concentration of polymer determined from Fr-Raman spectroscopy are used successfully in the modelling of irradiated polymer gel dosimeters. Minimal variations in T-2 in an irradiated PAG dosimeter are observed after 13 h.
Resumo:
Functional magnetic resonance imaging (FMRI) analysis methods can be quite generally divided into hypothesis-driven and data-driven approaches. The former are utilised in the majority of FMRI studies, where a specific haemodynamic response is modelled utilising knowledge of event timing during the scan, and is tested against the data using a t test or a correlation analysis. These approaches often lack the flexibility to account for variability in haemodynamic response across subjects and brain regions which is of specific interest in high-temporal resolution event-related studies. Current data-driven approaches attempt to identify components of interest in the data, but currently do not utilise any physiological information for the discrimination of these components. Here we present a hypothesis-driven approach that is an extension of Friman's maximum correlation modelling method (Neurolmage 16, 454-464, 2002) specifically focused on discriminating the temporal characteristics of event-related haemodynamic activity. Test analyses, on both simulated and real event-related FMRI data, will be presented.
Resumo:
A two-component survival mixture model is proposed to analyse a set of ischaemic stroke-specific mortality data. The survival experience of stroke patients after index stroke may be described by a subpopulation of patients in the acute condition and another subpopulation of patients in the chronic phase. To adjust for the inherent correlation of observations due to random hospital effects, a mixture model of two survival functions with random effects is formulated. Assuming a Weibull hazard in both components, an EM algorithm is developed for the estimation of fixed effect parameters and variance components. A simulation study is conducted to assess the performance of the two-component survival mixture model estimators. Simulation results confirm the applicability of the proposed model in a small sample setting. Copyright (C) 2004 John Wiley Sons, Ltd.
Resumo:
Computational models complement laboratory experimentation for efficient identification of MHC-binding peptides and T-cell epitopes. Methods for prediction of MHC-binding peptides include binding motifs, quantitative matrices, artificial neural networks, hidden Markov models, and molecular modelling. Models derived by these methods have been successfully used for prediction of T-cell epitopes in cancer, autoimmunity, infectious disease, and allergy. For maximum benefit, the use of computer models must be treated as experiments analogous to standard laboratory procedures and performed according to strict standards. This requires careful selection of data for model building, and adequate testing and validation. A range of web-based databases and MHC-binding prediction programs are available. Although some available prediction programs for particular MHC alleles have reasonable accuracy, there is no guarantee that all models produce good quality predictions. In this article, we present and discuss a framework for modelling, testing, and applications of computational methods used in predictions of T-cell epitopes. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
Pollution by polycyclic aromatic hydrocarbons(PAHs) is widespread due to unsuitable disposal of industrial waste. They are mostly defined as priority pollutants by environmental protection authorities worldwide. Phenanthrene, a typical PAH, was selected as the target in this paper. The PAH-degrading mixed culture, named ZM, was collected from a petroleum contaminated river bed. This culture was injected into phenanthrene solutions at different concentrations to quantify the biodegradation process. Results show near-complete removal of phenanthrene in three days of biodegradation if the initial phenanthrene concentration is low. When the initial concentration is high, the removal rate is increased but 20%-40% of the phenanthrene remains at the end of the experiment. The biomass shows a peak on the third day due to the combined effects of microbial growth and decay. Another peak is evident for cases with a high initial concentration, possibly due to production of an intermediate metabolite. The pH generally decreased during biodegradation because of the production of organic acid. Two phenomenological models were designed to simulate the phenanthrene biodegradation and biomass growth. A relatively simple model that does not consider the intermediate metabolite and its inhibition of phenanthrene biodegradation cannot fit the observed data. A modified Monod model that considered an intermediate metabolite (organic acid) and its inhibiting reversal effect reasonably depicts the experimental results.