813 resultados para Imputation techniques
Resumo:
When it comes to information sets in real life, often pieces of the whole set may not be available. This problem can find its origin in various reasons, describing therefore different patterns. In the literature, this problem is known as Missing Data. This issue can be fixed in various ways, from not taking into consideration incomplete observations, to guessing what those values originally were, or just ignoring the fact that some values are missing. The methods used to estimate missing data are called Imputation Methods. The work presented in this thesis has two main goals. The first one is to determine whether any kind of interactions exists between Missing Data, Imputation Methods and Supervised Classification algorithms, when they are applied together. For this first problem we consider a scenario in which the databases used are discrete, understanding discrete as that it is assumed that there is no relation between observations. These datasets underwent processes involving different combina- tions of the three components mentioned. The outcome showed that the missing data pattern strongly influences the outcome produced by a classifier. Also, in some of the cases, the complex imputation techniques investigated in the thesis were able to obtain better results than simple ones. The second goal of this work is to propose a new imputation strategy, but this time we constrain the specifications of the previous problem to a special kind of datasets, the multivariate Time Series. We designed new imputation techniques for this particular domain, and combined them with some of the contrasted strategies tested in the pre- vious chapter of this thesis. The time series also were subjected to processes involving missing data and imputation to finally propose an overall better imputation method. In the final chapter of this work, a real-world example is presented, describing a wa- ter quality prediction problem. The databases that characterized this problem had their own original latent values, which provides a real-world benchmark to test the algorithms developed in this thesis.
Resumo:
All of the imputation techniques usually applied for replacing values below thedetection limit in compositional data sets have adverse effects on the variability. In thiswork we propose a modification of the EM algorithm that is applied using the additivelog-ratio transformation. This new strategy is applied to a compositional data set and theresults are compared with the usual imputation techniques
Resumo:
All of the imputation techniques usually applied for replacing values below the detection limit in compositional data sets have adverse effects on the variability. In this work we propose a modification of the EM algorithm that is applied using the additive log-ratio transformation. This new strategy is applied to a compositional data set and the results are compared with the usual imputation techniques
Resumo:
The amount of biological data has grown exponentially in recent decades. Modern biotechnologies, such as microarrays and next-generation sequencing, are capable to produce massive amounts of biomedical data in a single experiment. As the amount of the data is rapidly growing there is an urgent need for reliable computational methods for analyzing and visualizing it. This thesis addresses this need by studying how to efficiently and reliably analyze and visualize high-dimensional data, especially that obtained from gene expression microarray experiments. First, we will study the ways to improve the quality of microarray data by replacing (imputing) the missing data entries with the estimated values for these entries. Missing value imputation is a method which is commonly used to make the original incomplete data complete, thus making it easier to be analyzed with statistical and computational methods. Our novel approach was to use curated external biological information as a guide for the missing value imputation. Secondly, we studied the effect of missing value imputation on the downstream data analysis methods like clustering. We compared multiple recent imputation algorithms against 8 publicly available microarray data sets. It was observed that the missing value imputation indeed is a rational way to improve the quality of biological data. The research revealed differences between the clustering results obtained with different imputation methods. On most data sets, the simple and fast k-NN imputation was good enough, but there were also needs for more advanced imputation methods, such as Bayesian Principal Component Algorithm (BPCA). Finally, we studied the visualization of biological network data. Biological interaction networks are examples of the outcome of multiple biological experiments such as using the gene microarray techniques. Such networks are typically very large and highly connected, thus there is a need for fast algorithms for producing visually pleasant layouts. A computationally efficient way to produce layouts of large biological interaction networks was developed. The algorithm uses multilevel optimization within the regular force directed graph layout algorithm.
Resumo:
Attrition in longitudinal studies can lead to biased results. The study is motivated by the unexpected observation that alcohol consumption decreased despite increased availability, which may be due to sample attrition of heavy drinkers. Several imputation methods have been proposed, but rarely compared in longitudinal studies of alcohol consumption. The imputation of consumption level measurements is computationally particularly challenging due to alcohol consumption being a semi-continuous variable (dichotomous drinking status and continuous volume among drinkers), and the non-normality of data in the continuous part. Data come from a longitudinal study in Denmark with four waves (2003-2006) and 1771 individuals at baseline. Five techniques for missing data are compared: Last value carried forward (LVCF) was used as a single, and Hotdeck, Heckman modelling, multivariate imputation by chained equations (MICE), and a Bayesian approach as multiple imputation methods. Predictive mean matching was used to account for non-normality, where instead of imputing regression estimates, "real" observed values from similar cases are imputed. Methods were also compared by means of a simulated dataset. The simulation showed that the Bayesian approach yielded the most unbiased estimates for imputation. The finding of no increase in consumption levels despite a higher availability remained unaltered. Copyright (C) 2011 John Wiley & Sons, Ltd.
Resumo:
La question de la responsabilité pénale des intermédiaires techniques est un enjeu central et actuel dans la réglementation du cyberespace. Non seulement les implications économiques sont énormes mais c'est tout le cadre juridique de la responsabilité pénale des intermédiaires techniques qui est en cause. Or, l'environnement Internet comporte des spécificités qui rendent difficiles l'imputation de responsabilité à l'auteur de l'activité illicite qui peut alors se retrouver hors d'atteinte ou insolvable. La poursuite des intermédiaires techniques devient alors une solution envisageable aux autorités chargées de réprimer les délits, compte tenu de l'état de leur solvabilité et dans la mesure où ils sont plus facilement identifiables. Par le fait même, ces derniers se retrouvent alors pris dans l'engrenage judiciaire pour n'avoir que facilité la commission de l'activité en question, n'ayant aucunement pris part à la réalisation de celle-ci. L'absence dans le corpus législatif canadien d'un régime de responsabilité spécifiquement applicable aux intermédiaires techniques nous oblige à baliser les critères qui emportent leur responsabilité pénale, à partir de «principes directeurs» d'imputabilité se dégageant de plusieurs textes nationaux et internationaux. Dans ce contexte, le mémoire étudiera, dans un premier temps, les conditions d'ouverture de la responsabilité pénale des intermédiaires techniques en droit pénal canadien et, dans un deuxième temps, répondra à la question de savoir si le droit pénal canadien en matière d'imputabilité des intermédiaires techniques est conforme aux principes directeurs ressortant de normes et pratiques internationales.
Resumo:
Long-term measurements of CO2 flux can be obtained using the eddy covariance technique, but these datasets are affected by gaps which hinder the estimation of robust long-term means and annual ecosystem exchanges. We compare results obtained using three gap-fill techniques: multiple regression (MR), multiple imputation (MI), and artificial neural networks (ANNs), applied to a one-year dataset of hourly CO2 flux measurements collected in Lutjewad, over a flat agriculture area near the Wadden Sea dike in the north of the Netherlands. The dataset was separated in two subsets: a learning and a validation set. The performances of gap-filling techniques were analysed by calculating statistical criteria: coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), maximum absolute error (MaxAE), and mean square bias (MSB). The gap-fill accuracy is seasonally dependent, with better results in cold seasons. The highest accuracy is obtained using ANN technique which is also less sensitive to environmental/seasonal conditions. We argue that filling gaps directly on measured CO2 fluxes is more advantageous than the common method of filling gaps on calculated net ecosystem change, because ANN is an empirical method and smaller scatter is expected when gap filling is applied directly to measurements.
Resumo:
Spatio-temporal modelling is an area of increasing importance in which models and methods have often been developed to deal with specific applications. In this study, a spatio-temporal model was used to estimate daily rainfall data. Rainfall records from several weather stations, obtained from the Agritempo system for two climatic homogeneous zones, were used. Rainfall values obtained for two fixed dates (January 1 and May 1, 2012) using the spatio-temporal model were compared with the geostatisticals techniques of ordinary kriging and ordinary cokriging with altitude as auxiliary variable. The spatio-temporal model was more than 17% better at producing estimates of daily precipitation compared to kriging and cokriging in the first zone and more than 18% in the second zone. The spatio-temporal model proved to be a versatile technique, adapting to different seasons and dates.
Resumo:
The aim of this investigation was to compare the skeletal stability of three different rigid fixation methods after mandibular advancement. Fifty-five class II malocclusion patients treated with the use of bilateral sagittal split ramus osteotomy and mandibular advancement were selected for this retrospective study. Group 1 (n = 17) had miniplates with monocortical screws, Group 2 (n = 16) had bicortical screws and Group 3 (n = 22) had the osteotomy fixed by means of the hybrid technique. Cephalograms were taken preoperatively, 1 week within the postoperative care period, and 6 months after the orthognathic surgery. Linear and angular changes of the cephalometric landmarks of the chin region were measured at each period, and the changes at each cephalometric landmark were determined for the time gaps. Postoperative changes in the mandibular shape were analyzed to determine the stability of fixation methods. There was minimum difference in the relapse of the mandibular advancement among the three groups. Statistical analysis showed no significant difference in postoperative stability. However, a positive correlation between the amount of advancement and the amount of postoperative relapse was demonstrated by the linear multiple regression test (p < 0.05). It can be concluded that all techniques can be used to obtain stable postoperative results in mandibular advancement after 6 months.
Resumo:
Quantification of dermal exposure to pesticides in rural workers, used in risk assessment, can be performed with different techniques such as patches or whole body evaluation. However, the wide variety of methods can jeopardize the process by producing disparate results, depending on the principles in sample collection. A critical review was thus performed on the main techniques for quantifying dermal exposure, calling attention to this issue and the need to establish a single methodology for quantification of dermal exposure in rural workers. Such harmonization of different techniques should help achieve safer and healthier working conditions. Techniques that can provide reliable exposure data are an essential first step towards avoiding harm to workers' health.
Resumo:
The Centers for High Cost Medication (Centros de Medicação de Alto Custo, CEDMAC), Health Department, São Paulo were instituted by project in partnership with the Clinical Hospital of the Faculty of Medicine, USP, sponsored by the Foundation for Research Support of the State of São Paulo (Fundação de Amparo à Pesquisa do Estado de São Paulo, FAPESP) aimed at the formation of a statewide network for comprehensive care of patients referred for use of immunobiological agents in rheumatological diseases. The CEDMAC of Hospital de Clínicas, Universidade Estadual de Campinas (HC-Unicamp), implemented by the Division of Rheumatology, Faculty of Medical Sciences, identified the need for standardization of the multidisciplinary team conducts, in face of the specificity of care conducts, verifying the importance of describing, in manual format, their operational and technical processes. The aim of this study is to present the methodology applied to the elaboration of the CEDMAC/HC-Unicamp Manual as an institutional tool, with the aim of offering the best assistance and administrative quality. In the methodology for preparing the manuals at HC-Unicamp since 2008, the premise was to obtain a document that is participatory, multidisciplinary, focused on work processes integrated with institutional rules, with objective and didactic descriptions, in a standardized format and with electronic dissemination. The CEDMAC/HC-Unicamp Manual was elaborated in 10 months, with involvement of the entire multidisciplinary team, with 19 chapters on work processes and techniques, in addition to those concerning the organizational structure and its annexes. Published in the electronic portal of HC Manuals in July 2012 as an e-Book (ISBN 978-85-63274-17-5), the manual has been a valuable instrument in guiding professionals in healthcare, teaching and research activities.
Resumo:
Abstract The aim of this study was to evaluate three transfer techniques used to obtain working casts of implant-supported prostheses through the marginal misfit and strain induced to metallic framework. Thirty working casts were obtained from a metallic master cast, each one containing two implant analogues simulating a clinical situation of three-unit implant-supported fixed prostheses, according to the following transfer impression techniques: Group A, squared transfers splinted with dental floss and acrylic resin, sectioned and re-splinted; Group B, squared transfers splinted with dental floss and bis-acrylic resin; and Group N, squared transfers not splinted. A metallic framework was made for marginal misfit and strain measurements from the metallic master cast. The misfit between metallic framework and the working casts was evaluated with an optical microscope following the single-screw test protocol. In the same conditions, the strain was evaluated using strain gauges placed on the metallic framework. The data was submitted to one-way ANOVA followed by the Tukey's test (α=5%). For both marginal misfit and strain, there were statistically significant differences between Groups A and N (p<0.01) and Groups B and N (p<0.01), with greater values for the Group N. According to the Pearson's test, there was a positive correlation between the variables misfit and strain (r=0.5642). The results of this study showed that the impression techniques with splinted transfers promoted better accuracy than non-splinted one, regardless of the splinting material utilized.
Resumo:
El Niño South Oscillation (ENSO) is one climatic phenomenon related to the inter-annual variability of global meteorological patterns influencing sea surface temperature and rainfall variability. It influences human health indirectly through extreme temperature and moisture conditions that may accelerate the spread of some vector-borne viral diseases, like dengue fever (DF). This work examines the spatial distribution of association between ENSO and DF in the countries of the Americas during 1995-2004, which includes the 1997-1998 El Niño, one of the most important climatic events of 20(th) century. Data regarding the South Oscillation index (SOI), indicating El Niño-La Niña activity, were obtained from Australian Bureau of Meteorology. The annual DF incidence (AIy) by country was computed using Pan-American Health Association data. SOI and AIy values were standardised as deviations from the mean and plotted in bars-line graphics. The regression coefficient values between SOI and AIy (rSOI,AI) were calculated and spatially interpolated by an inverse distance weighted algorithm. The results indicate that among the five years registering high number of cases (1998, 2002, 2001, 2003 and 1997), four had El Niño activity. In the southern hemisphere, the annual spatial weighted mean centre of epidemics moved southward, from 6° 31' S in 1995 to 21° 12' S in 1999 and the rSOI,AI values were negative in Cuba, Belize, Guyana and Costa Rica, indicating a synchrony between higher DF incidence rates and a higher El Niño activity. The rSOI,AI map allows visualisation of a graded surface with higher values of ENSO-DF associations for Mexico, Central America, northern Caribbean islands and the extreme north-northwest of South America.
Resumo:
The aim of this study was to compare the performance of the following techniques on the isolation of volatiles of importance for the aroma/flavor of fresh cashew apple juice: dynamic headspace analysis using PorapakQ(®) as trap, solvent extraction with and without further concentration of the isolate, and solid-phase microextraction (fiber DVB/CAR/PDMS). A total of 181 compounds were identified, from which 44 were esters, 20 terpenes, 19 alcohols, 17 hydrocarbons, 15 ketones, 14 aldehydes, among others. Sensory evaluation of the gas chromatography effluents revealed esters (n = 24) and terpenes (n = 10) as the most important aroma compounds. The four techniques were efficient in isolating esters, a chemical class of high impact in the cashew aroma/flavor. However, the dynamic headspace methodology produced an isolate in which the analytes were in greater concentration, which facilitates their identification (gas chromatography-mass spectrometry) and sensory evaluation in the chromatographic effluents. Solvent extraction (dichloromethane) without further concentration of the isolate was the most efficient methodology for the isolation of terpenes. Because these two techniques also isolated in greater concentration the volatiles from other chemical classes important to the cashew aroma, such as aldehydes and alcohols, they were considered the most advantageous for the study of cashew aroma/flavor.
Resumo:
To perform a comparative evaluation of the mechanical resistance of simulated fractures of the mandibular body which were repaired using different fixation techniques with two different brands of 2.0 mm locking fixation systems. Four aluminum hemimandibles with linear sectioning simulating a mandibular body fracture were used as the substrates and were fixed using the two techniques and two different brands of fixation plate. These were divided into four groups: groups I and II were fixed with one four-hole plate, with four 6 mm screws in the tension zone and one four-hole plate, with four 10 mm screws in the compression zone; and groups III and IV were fixed with one four-hole plate with four 6 mm screws in the neutral zone. Fixation plates manufactured by Tóride were used for groups I and III, and by Traumec for groups II and IV. The hemimandibles were submitted to vertical, linear load testing in an Instron 4411 servohydraulic mechanical testing unit, and the load/displacement (3 mm, 5 mm and 7 mm) and the peak loads were measured. Means and standard deviations were evaluated applying variance analysis with a significance level of 5%. The only significant difference between the brands was seen at displacements of 7 mm. Comparing the techniques, groups I and II showed higher mechanical strength than groups III and IV, as expected. For the treatment of mandibular linear body fracture, two locking plates, one in the tension zone and another in the compression zone, have a greater mechanical strength than a single locking plate in the neutral zone.