973 resultados para MISSING VALUE ESTIMATION


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Renewable energy is growing in demand, and thus the the manufacture of solar cells and photovoltaic arrays has advanced dramatically in recent years. This is proved by the fact that the photovoltaic production has doubled every 2 years, increasing by an average of 48% each year since 2002. Covering the general overview of solar cell working, and its model, this thesis will start with the three generations of photovoltaic solar cell technology, and move to the motivation of dedicating research to nanostructured solar cell. For the current generation solar cells, among several factors, like photon capture, photon reflection, carrier generation by photons, carrier transport and collection, the efficiency also depends on the absorption of photons. The absorption coefficient,α, and its dependence on the wavelength, λ, is of major concern to improve the efficiency. Nano-silicon structures (quantum wells and quantum dots) have a unique advantage compared to bulk and thin film crystalline silicon that multiple direct and indirect band gaps can be realized by appropriate size control of the quantum wells. This enables multiple wavelength photons of the solar spectrum to be absorbed efficiently. There is limited research on the calculation of absorption coefficient in nano structures of silicon. We present a theoretical approach to calculate the absorption coefficient using quantum mechanical calculations on the interaction of photons with the electrons of the valence band. One model is that the oscillator strength of the direct optical transitions is enhanced by the quantumconfinement effect in Si nanocrystallites. These kinds of quantum wells can be realized in practice in porous silicon. The absorption coefficient shows a peak of 64638.2 cm-1 at = 343 nm at photon energy of ξ = 3.49 eV ( = 355.532 nm). I have shown that a large value of absorption coefficient α comparable to that of bulk silicon is possible in silicon QDs because of carrier confinement. Our results have shown that we can enhance the absorption coefficient by an order of 10, and at the same time a nearly constant absorption coefficient curve over the visible spectrum. The validity of plots is verified by the correlation with experimental photoluminescence plots. A very generic comparison for the efficiency of p-i-n junction solar cell is given for a cell incorporating QDs and sans QDs. The design and fabrication technique is discussed in brief. I have shown that by using QDs in the intrinsic region of a cell, we can improve the efficiency by a factor of 1.865 times. Thus for a solar cell of efficiency of 26% for first generation solar cell, we can improve the efficiency to nearly 48.5% on using QDs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A basic approach to study a NVH problem is to break down the system in three basic elements – source, path and receiver. While the receiver (response) and the transfer path can be measured, it is difficult to measure the source (forces) acting on the system. It becomes necessary to predict these forces to know how they influence the responses. This requires inverting the transfer path. Singular Value Decomposition (SVD) method is used to decompose the transfer path matrix into its principle components which is required for the inversion. The usual approach to force prediction requires rejecting the small singular values obtained during SVD by setting a threshold, as these small values dominate the inverse matrix. This assumption of the threshold may be subjected to rejecting important singular values severely affecting force prediction. The new approach discussed in this report looks at the column space of the transfer path matrix which is the basis for the predicted response. The response participation is an indication of how the small singular values influence the force participation. The ability to accurately reconstruct the response vector is important to establish a confidence in force vector prediction. The goal of this report is to suggest a solution that is mathematically feasible, physically meaningful, and numerically more efficient through examples. This understanding adds new insight to the effects of current code and how to apply algorithms and understanding to new codes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis develops high performance real-time signal processing modules for direction of arrival (DOA) estimation for localization systems. It proposes highly parallel algorithms for performing subspace decomposition and polynomial rooting, which are otherwise traditionally implemented using sequential algorithms. The proposed algorithms address the emerging need for real-time localization for a wide range of applications. As the antenna array size increases, the complexity of signal processing algorithms increases, making it increasingly difficult to satisfy the real-time constraints. This thesis addresses real-time implementation by proposing parallel algorithms, that maintain considerable improvement over traditional algorithms, especially for systems with larger number of antenna array elements. Singular value decomposition (SVD) and polynomial rooting are two computationally complex steps and act as the bottleneck to achieving real-time performance. The proposed algorithms are suitable for implementation on field programmable gated arrays (FPGAs), single instruction multiple data (SIMD) hardware or application specific integrated chips (ASICs), which offer large number of processing elements that can be exploited for parallel processing. The designs proposed in this thesis are modular, easily expandable and easy to implement. Firstly, this thesis proposes a fast converging SVD algorithm. The proposed method reduces the number of iterations it takes to converge to correct singular values, thus achieving closer to real-time performance. A general algorithm and a modular system design are provided making it easy for designers to replicate and extend the design to larger matrix sizes. Moreover, the method is highly parallel, which can be exploited in various hardware platforms mentioned earlier. A fixed point implementation of proposed SVD algorithm is presented. The FPGA design is pipelined to the maximum extent to increase the maximum achievable frequency of operation. The system was developed with the objective of achieving high throughput. Various modern cores available in FPGAs were used to maximize the performance and details of these modules are presented in detail. Finally, a parallel polynomial rooting technique based on Newton’s method applicable exclusively to root-MUSIC polynomials is proposed. Unique characteristics of root-MUSIC polynomial’s complex dynamics were exploited to derive this polynomial rooting method. The technique exhibits parallelism and converges to the desired root within fixed number of iterations, making this suitable for polynomial rooting of large degree polynomials. We believe this is the first time that complex dynamics of root-MUSIC polynomial were analyzed to propose an algorithm. In all, the thesis addresses two major bottlenecks in a direction of arrival estimation system, by providing simple, high throughput, parallel algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Integrated choice and latent variable (ICLV) models represent a promising new class of models which merge classic choice models with the structural equation approach (SEM) for latent variables. Despite their conceptual appeal, applications of ICLV models in marketing remain rare. We extend previous ICLV applications by first estimating a multinomial choice model and, second, by estimating hierarchical relations between latent variables. An empirical study on travel mode choice clearly demonstrates the value of ICLV models to enhance the understanding of choice processes. In addition to the usually studied directly observable variables such as travel time, we show how abstract motivations such as power and hedonism as well as attitudes such as a desire for flexibility impact on travel mode choice. Furthermore, we show that it is possible to estimate such a complex ICLV model with the widely available structural equation modeling package Mplus. This finding is likely to encourage more widespread application of this appealing model class in the marketing field.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Haldane (1935) developed a method for estimating the male-to-female ratio of mutation rate ($\alpha$) by using sex-linked recessive genetic disease, but in six different studies using hemophilia A data the estimates of $\alpha$ varied from 1.2 to 29.3. Direct genomic sequencing is a better approach, but it is laborious and not readily applicable to non-human organisms. To study the sex ratios of mutation rate in various mammals, I used an indirect method proposed by Miyata et al. (1987). This method takes advantage of the fact that different chromosomes segregate differently between males and females, and uses the ratios of mutation rate in sequences on different chromosomes to estimate the male-to-female ratio of mutation rate. I sequenced the last intron of ZFX and ZFY genes in 6 species of primates and 2 species of rodents; I also sequenced the partial genomic sequence of the Ube1x and Ube1y genes of mice and rats. The purposes of my study in addition to estimation of $\alpha$'s in different mammalian species, are to test the hypothesis that most mutations are replication dependent and to examine the generation-time effect on $\alpha$. The $\alpha$ value estimated from the ZFX and ZFY introns of the six primate specise is ${\sim}$6. This estimate is the same as an earlier estimate using only 4 species of primates, but the 95% confidence interval has been reduced from (2, 84) to (2, 33). The estimate of $\alpha$ in the rodents obtained from Zfx and Zfy introns is ${\sim}$1.9, and that deriving from Ube1x and Ube1y introns is ${\sim}$2. Both estimates have a 95% confidence interval from 1 to 3. These two estimates are very close to each other, but are only one-third of that of the primates, suggesting a generation-time effect on $\alpha$. An $\alpha$ of 6 in primates and 2 in rodents are close to the estimates of the male-to-female ratio of the number of germ-cell divisions per generation in humans and mice, which are 6 and 2, respectively, assuming the generation time in humans is 20 years and that in mice is 5 months. These findings suggest that errors during germ-cell DNA replication are the primary source of mutation and that $\alpha$ decreases with decreasing length of generation time. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Instruments for on-farm determination of colostrum quality such as refractometers and densimeters are increasingly used in dairy farms. The colour of colostrum is also supposed to reflect its quality. A paler or mature milk-like colour is associated with a lower colostrum value in terms of its general composition compared with a more yellowish and darker colour. The objective of this study was to investigate the relationships between colour measurement of colostrum using the CIELAB colour space (CIE L*=from white to black, a*=from red to green, b*=from yellow to blue, chroma value G=visual perceived colourfulness) and its composition. Dairy cow colostrum samples (n=117) obtained at 4·7±1·5 h after parturition were analysed for immunoglobulin G (IgG) by ELISA and for fat, protein and lactose by infrared spectroscopy. For colour measurements, a calibrated spectrophotometer was used. At a cut-off value of 50 mg IgG/ml, colour measurement had a sensitivity of 50·0%, a specificity of 49·5%, and a negative predictive value of 87·9%. Colostral IgG concentration was not correlated with the chroma value G, but with relative lightness L*. While milk fat content showed a relationship to the parameters L*, a*, b* and G from the colour measurement, milk protein content was not correlated with a*, but with L*, b*, and G. Lactose concentration in colostrum showed only a relationship with b* and G. In conclusion, parameters of the colour measurement showed clear relationships to colostral IgG, fat, protein and lactose concentration in dairy cows. Implementation of colour measuring devices in automatic milking systems and milking parlours might be a potential instrument to access colostrum quality as well as detecting abnormal milk.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

After attending this presentation, attendees will: (1) understand how body height from computed tomography data can be estimated; and, (2) gain knowledge about the accuracy of estimated body height and limitations. The presentation will impact the forensic science community by providing knowledge and competence which will enable attendees to develop formulas for single bones to reconstruct body height using postmortem Computer Tomography (p-CT) data. The estimation of Body Height (BH) is an important component of the identification of corpses and skeletal remains. Stature can be estimated with relative accuracy via the measurement of long bones, such as the femora. Compared to time-consuming maceration procedures, p-CT allows fast and simple measurements of bones. This study undertook four objectives concerning the accuracy of BH estimation via p-CT: (1) accuracy between measurements on native bone and p-CT imaged bone (F1 according to Martin 1914); (2) intra-observer p-CT measurement precision; (3) accuracy between formula-based estimation of the BH and conventional body length measurement during autopsy; and, (4) accuracy of different estimation formulas available.1 In the first step, the accuracy of measurements in the CT compared to those obtained using an osteometric board was evaluated on the basis of eight defleshed femora. Then the femora of 83 female and 144 male corpses of a Swiss population for which p-CTs had been performed, were measured at the Institute of Forensic Medicine in Bern. After two months, 20 individuals were measured again in order to assess the intraobserver error. The mean age of the men was 53±17 years and that of the women was 61±20 years. Additionally, the body length of the corpses was measured conventionally. The mean body length was 176.6±7.2cm for men and 163.6±7.8cm for women. The images that were obtained using a six-slice CT were reconstructed with a slice thickness of 1.25mm. Analysis and measurements of CT images were performed on a multipurpose workstation. As a forensic standard procedure, stature was estimated by means of the regression equations by Penning & Riepert developed on a Southern German population and for comparison, also those referenced by Trotter & Gleser “American White.”2,3 All statistical tests were performed with a statistical software. No significant differences were found between the CT and osteometric board measurements. The double p-CT measurement of 20 individuals resulted in an absolute intra-observer difference of 0.4±0.3mm. For both sexes, the correlation between the body length and the estimated BH using the F1 measurements was highly significant. The correlation coefficient was slightly higher for women. The differences in accuracy of the different formulas were small. While the errors of BH estimation were generally ±4.5–5.0cm, the consideration of age led to an increase in accuracy of a few millimetres to about 1cm. BH estimations according to Penning & Riepert and Trotter & Gleser were slightly more accurate when age-at-death was taken into account.2,3 That way, stature estimations in the group of individuals older than 60 years were improved by about 2.4cm and 3.1cm.2,3 The error of estimation is therefore about a third of the common ±4.7cm error range. Femur measurements in p-CT allow very accurate BH estimations. Estimations according to Penning led to good results that (barely) come closer to the true value than the frequently used formulas by Trotter & Gleser “American White.”2,3 Therefore, the formulas by Penning & Riepert are also validated for this substantial recent Swiss population.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many things have been said about literature after postmodernism, but one point there seems to be some agreement on is that it does not turn its back radically on its postmodernist forerunner, but rather generally continues to heed and value their insights. There seems to be something strikingly non-oedipal about the recent aesthetic shift. It is a project of reconstruction that remains deeply rooted in postmodernist tenets. Such an essentially non-oedipal attitude, I would argue, is central to the nature of the reconstructive shift. This, however, also raises questions about the wider cultural context from which such an aesthetic stance arises. If postmodernism was nurtured by the revolutionary spirits of the late 1960s, reconstruction faces a different world with different strategies. Instead of the postmodernist urge to subvert, expose and undermine, reconstruction yearns towards tentative and fragile intersubjective understanding, towards responsibility and community. Instead of revolt and rebellion it explores reconciliation and compromise. One instance in which this becomes visible in reconstructive narratives is the recurring figure of the lost father. Missing father figures abound in recent novels by authors like Mark Z. Danielewski, Dave Eggers, Yann Mantel, David Mitchell etc. It almost seems like a younger generation is yearning for the fathers which postmodernism has struggled hard to do away with. My paper will focus on one particularly striking example to explore the implications of this development: Daniel Wallace’s novel Big Fish and Tim Burton’s well-known film adaptation of the same. In their negotiation of fact and fiction, of doubt and belief, of freedom and responsibility, all of which converge in a father-son relationship, they serve well to illustrate central characteristics and concerns of recent attempts to leave postmodernism behind.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND Estimation of glomerular filtration rate (eGFR) using a common formula for both adult and pediatric populations is challenging. Using inulin clearances (iGFRs), this study aims to investigate the existence of a precise age cutoff beyond which the Modification of Diet in Renal Disease (MDRD), the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI), or the Cockroft-Gault (CG) formulas, can be applied with acceptable precision. Performance of the new Schwartz formula according to age is also evaluated. METHOD We compared 503 iGFRs for 503 children aged between 33 months and 18 years to eGFRs. To define the most precise age cutoff value for each formula, a circular binary segmentation method analyzing the formulas' bias values according to the children's ages was performed. Bias was defined by the difference between iGFRs and eGFRs. To validate the identified cutoff, 30% accuracy was calculated. RESULTS For MDRD, CKD-EPI and CG, the best age cutoff was ≥14.3, ≥14.2 and ≤10.8 years, respectively. The lowest mean bias and highest accuracy were -17.11 and 64.7% for MDRD, 27.4 and 51% for CKD-EPI, and 8.31 and 77.2% for CG. The Schwartz formula showed the best performance below the age of 10.9 years. CONCLUSION For the MDRD and CKD-EPI formulas, the mean bias values decreased with increasing child age and these formulas were more accurate beyond an age cutoff of 14.3 and 14.2 years, respectively. For the CG and Schwartz formulas, the lowest mean bias values and the best accuracies were below an age cutoff of 10.8 and 10.9 years, respectively. Nevertheless, the accuracies of the formulas were still below the National Kidney Foundation Kidney Disease Outcomes Quality Initiative target to be validated in these age groups and, therefore, none of these formulas can be used to estimate GFR in children and adolescent populations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Diabetes mellitus is spreading throughout the world and diabetic individuals have been shown to often assess their food intake inaccurately; therefore, it is a matter of urgency to develop automated diet assessment tools. The recent availability of mobile phones with enhanced capabilities, together with the advances in computer vision, have permitted the development of image analysis apps for the automated assessment of meals. GoCARB is a mobile phone-based system designed to support individuals with type 1 diabetes during daily carbohydrate estimation. In a typical scenario, the user places a reference card next to the dish and acquires two images using a mobile phone. A series of computer vision modules detect the plate and automatically segment and recognize the different food items, while their 3D shape is reconstructed. Finally, the carbohydrate content is calculated by combining the volume of each food item with the nutritional information provided by the USDA Nutrient Database for Standard Reference. Objective: The main objective of this study is to assess the accuracy of the GoCARB prototype when used by individuals with type 1 diabetes and to compare it to their own performance in carbohydrate counting. In addition, the user experience and usability of the system is evaluated by questionnaires. Methods: The study was conducted at the Bern University Hospital, “Inselspital” (Bern, Switzerland) and involved 19 adult volunteers with type 1 diabetes, each participating once. Each study day, a total of six meals of broad diversity were taken from the hospital’s restaurant and presented to the participants. The food items were weighed on a standard balance and the true amount of carbohydrate was calculated from the USDA nutrient database. Participants were asked to count the carbohydrate content of each meal independently and then by using GoCARB. At the end of each session, a questionnaire was completed to assess the user’s experience with GoCARB. Results: The mean absolute error was 27.89 (SD 38.20) grams of carbohydrate for the estimation of participants, whereas the corresponding value for the GoCARB system was 12.28 (SD 9.56) grams of carbohydrate, which was a significantly better performance ( P=.001). In 75.4% (86/114) of the meals, the GoCARB automatic segmentation was successful and 85.1% (291/342) of individual food items were successfully recognized. Most participants found GoCARB easy to use. Conclusions: This study indicates that the system is able to estimate, on average, the carbohydrate content of meals with higher accuracy than individuals with type 1 diabetes can. The participants thought the app was useful and easy to use. GoCARB seems to be a well-accepted supportive mHealth tool for the assessment of served-on-a-plate meals.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this study is to investigate the effects of predictor variable correlations and patterns of missingness with dichotomous and/or continuous data in small samples when missing data is multiply imputed. Missing data of predictor variables is multiply imputed under three different multivariate models: the multivariate normal model for continuous data, the multinomial model for dichotomous data and the general location model for mixed dichotomous and continuous data. Subsequent to the multiple imputation process, Type I error rates of the regression coefficients obtained with logistic regression analysis are estimated under various conditions of correlation structure, sample size, type of data and patterns of missing data. The distributional properties of average mean, variance and correlations among the predictor variables are assessed after the multiple imputation process. ^ For continuous predictor data under the multivariate normal model, Type I error rates are generally within the nominal values with samples of size n = 100. Smaller samples of size n = 50 resulted in more conservative estimates (i.e., lower than the nominal value). Correlation and variance estimates of the original data are retained after multiple imputation with less than 50% missing continuous predictor data. For dichotomous predictor data under the multinomial model, Type I error rates are generally conservative, which in part is due to the sparseness of the data. The correlation structure for the predictor variables is not well retained on multiply-imputed data from small samples with more than 50% missing data with this model. For mixed continuous and dichotomous predictor data, the results are similar to those found under the multivariate normal model for continuous data and under the multinomial model for dichotomous data. With all data types, a fully-observed variable included with variables subject to missingness in the multiple imputation process and subsequent statistical analysis provided liberal (larger than nominal values) Type I error rates under a specific pattern of missing data. It is suggested that future studies focus on the effects of multiple imputation in multivariate settings with more realistic data characteristics and a variety of multivariate analyses, assessing both Type I error and power. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The discrete-time Markov chain is commonly used in describing changes of health states for chronic diseases in a longitudinal study. Statistical inferences on comparing treatment effects or on finding determinants of disease progression usually require estimation of transition probabilities. In many situations when the outcome data have some missing observations or the variable of interest (called a latent variable) can not be measured directly, the estimation of transition probabilities becomes more complicated. In the latter case, a surrogate variable that is easier to access and can gauge the characteristics of the latent one is usually used for data analysis. ^ This dissertation research proposes methods to analyze longitudinal data (1) that have categorical outcome with missing observations or (2) that use complete or incomplete surrogate observations to analyze the categorical latent outcome. For (1), different missing mechanisms were considered for empirical studies using methods that include EM algorithm, Monte Carlo EM and a procedure that is not a data augmentation method. For (2), the hidden Markov model with the forward-backward procedure was applied for parameter estimation. This method was also extended to cover the computation of standard errors. The proposed methods were demonstrated by the Schizophrenia example. The relevance of public health, the strength and limitations, and possible future research were also discussed. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: In this secondary data analysis, three statistical methodologies were implemented to handle cases with missing data in a motivational interviewing and feedback study. The aim was to evaluate the impact that these methodologies have on the data analysis. ^ Methods: We first evaluated whether the assumption of missing completely at random held for this study. We then proceeded to conduct a secondary data analysis using a mixed linear model to handle missing data with three methodologies (a) complete case analysis, (b) multiple imputation with explicit model containing outcome variables, time, and the interaction of time and treatment, and (c) multiple imputation with explicit model containing outcome variables, time, the interaction of time and treatment, and additional covariates (e.g., age, gender, smoke, years in school, marital status, housing, race/ethnicity, and if participants play on athletic team). Several comparisons were conducted including the following ones: 1) the motivation interviewing with feedback group (MIF) vs. the assessment only group (AO), the motivation interviewing group (MIO) vs. AO, and the intervention of the feedback only group (FBO) vs. AO, 2) MIF vs. FBO, and 3) MIF vs. MIO.^ Results: We first evaluated the patterns of missingness in this study, which indicated that about 13% of participants showed monotone missing patterns, and about 3.5% showed non-monotone missing patterns. Then we evaluated the assumption of missing completely at random by Little's missing completely at random (MCAR) test, in which the Chi-Square test statistic was 167.8 with 125 degrees of freedom, and its associated p-value was p=0.006, which indicated that the data could not be assumed to be missing completely at random. After that, we compared if the three different strategies reached the same results. For the comparison between MIF and AO as well as the comparison between MIF and FBO, only the multiple imputation with additional covariates by uncongenial and congenial models reached different results. For the comparison between MIF and MIO, all the methodologies for handling missing values obtained different results. ^ Discussions: The study indicated that, first, missingness was crucial in this study. Second, to understand the assumptions of the model was important since we could not identify if the data were missing at random or missing not at random. Therefore, future researches should focus on exploring more sensitivity analyses under missing not at random assumption.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With most clinical trials, missing data presents a statistical problem in evaluating a treatment's efficacy. There are many methods commonly used to assess missing data; however, these methods leave room for bias to enter the study. This thesis was a secondary analysis on data taken from TIME, a phase 2 randomized clinical trial conducted to evaluate the safety and effect of the administration timing of bone marrow mononuclear cells (BMMNC) for subjects with acute myocardial infarction (AMI).^ We evaluated the effect of missing data by comparing the variance inflation factor (VIF) of the effect of therapy between all subjects and only subjects with complete data. Through the general linear model, an unbiased solution was made for the VIF of the treatment's efficacy using the weighted least squares method to incorporate missing data. Two groups were identified from the TIME data: 1) all subjects and 2) subjects with complete data (baseline and follow-up measurements). After the general solution was found for the VIF, it was migrated Excel 2010 to evaluate data from TIME. The resulting numerical value from the two groups was compared to assess the effect of missing data.^ The VIF values from the TIME study were considerably less in the group with missing data. By design, we varied the correlation factor in order to evaluate the VIFs of both groups. As the correlation factor increased, the VIF values increased at a faster rate in the group with only complete data. Furthermore, while varying the correlation factor, the number of subjects with missing data was also varied to see how missing data affects the VIF. When subjects with only baseline data was increased, we saw a significant rate increase in VIF values in the group with only complete data while the group with missing data saw a steady and consistent increase in the VIF. The same was seen when we varied the group with follow-up only data. This essentially showed that the VIFs steadily increased when missing data is not ignored. When missing data is ignored as with our comparison group, the VIF values sharply increase as correlation increases.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Accurate quantitative estimation of exposure using retrospective data has been one of the most challenging tasks in the exposure assessment field. To improve these estimates, some models have been developed using published exposure databases with their corresponding exposure determinants. These models are designed to be applied to reported exposure determinants obtained from study subjects or exposure levels assigned by an industrial hygienist, so quantitative exposure estimates can be obtained. ^ In an effort to improve the prediction accuracy and generalizability of these models, and taking into account that the limitations encountered in previous studies might be due to limitations in the applicability of traditional statistical methods and concepts, the use of computer science- derived data analysis methods, predominantly machine learning approaches, were proposed and explored in this study. ^ The goal of this study was to develop a set of models using decision trees/ensemble and neural networks methods to predict occupational outcomes based on literature-derived databases, and compare, using cross-validation and data splitting techniques, the resulting prediction capacity to that of traditional regression models. Two cases were addressed: the categorical case, where the exposure level was measured as an exposure rating following the American Industrial Hygiene Association guidelines and the continuous case, where the result of the exposure is expressed as a concentration value. Previously developed literature-based exposure databases for 1,1,1 trichloroethane, methylene dichloride and, trichloroethylene were used. ^ When compared to regression estimations, results showed better accuracy of decision trees/ensemble techniques for the categorical case while neural networks were better for estimation of continuous exposure values. Overrepresentation of classes and overfitting were the main causes for poor neural network performance and accuracy. Estimations based on literature-based databases using machine learning techniques might provide an advantage when they are applied to other methodologies that combine `expert inputs' with current exposure measurements, like the Bayesian Decision Analysis tool. The use of machine learning techniques to more accurately estimate exposures from literature-based exposure databases might represent the starting point for the independence from the expert judgment.^