881 resultados para Regression-based decomposition.
Resumo:
Objective: We used demographic and clinical data to design practical classification models for prediction of neurocognitive impairment (NCI) in people with HIV infection. Methods: The study population comprised 331 HIV-infected patients with available demographic, clinical, and neurocognitive data collected using a comprehensive battery of neuropsychological tests. Classification and regression trees (CART) were developed to btain detailed and reliable models to predict NCI. Following a practical clinical approach, NCI was considered the main variable for study outcomes, and analyses were performed separately in treatment-naïve and treatment-experienced patients. Results: The study sample comprised 52 treatment-naïve and 279 experienced patients. In the first group, the variables identified as better predictors of NCI were CD4 cell count and age (correct classification [CC]: 79.6%, 3 final nodes). In treatment-experienced patients, the variables most closely related to NCI were years of education, nadir CD4 cell count, central nervous system penetration-effectiveness score, age, employment status, and confounding comorbidities (CC: 82.1%, 7 final nodes). In patients with an undetectable viral load and no comorbidities, we obtained a fairly accurate model in which the main variables were nadir CD4 cell count, current CD4 cell count, time on current treatment, and past highest viral load (CC: 88%, 6 final nodes). Conclusion: Practical classification models to predict NCI in HIV infection can be obtained using demographic and clinical variables. An approach based on CART analyses may facilitate screening for HIV-associated neurocognitive disorders and complement clinical information about risk and protective factors for NCI in HIV-infected patients.
Resumo:
Peer-reviewed
Resumo:
Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.
Resumo:
This thesis concentrates on developing a practical local approach methodology based on micro mechanical models for the analysis of ductile fracture of welded joints. Two major problems involved in the local approach, namely the dilational constitutive relation reflecting the softening behaviour of material, and the failure criterion associated with the constitutive equation, have been studied in detail. Firstly, considerable efforts were made on the numerical integration and computer implementation for the non trivial dilational Gurson Tvergaard model. Considering the weaknesses of the widely used Euler forward integration algorithms, a family of generalized mid point algorithms is proposed for the Gurson Tvergaard model. Correspondingly, based on the decomposition of stresses into hydrostatic and deviatoric parts, an explicit seven parameter expression for the consistent tangent moduli of the algorithms is presented. This explicit formula avoids any matrix inversion during numerical iteration and thus greatly facilitates the computer implementation of the algorithms and increase the efficiency of the code. The accuracy of the proposed algorithms and other conventional algorithms has been assessed in a systematic manner in order to highlight the best algorithm for this study. The accurate and efficient performance of present finite element implementation of the proposed algorithms has been demonstrated by various numerical examples. It has been found that the true mid point algorithm (a = 0.5) is the most accurate one when the deviatoric strain increment is radial to the yield surface and it is very important to use the consistent tangent moduli in the Newton iteration procedure. Secondly, an assessment of the consistency of current local failure criteria for ductile fracture, the critical void growth criterion, the constant critical void volume fraction criterion and Thomason's plastic limit load failure criterion, has been made. Significant differences in the predictions of ductility by the three criteria were found. By assuming the void grows spherically and using the void volume fraction from the Gurson Tvergaard model to calculate the current void matrix geometry, Thomason's failure criterion has been modified and a new failure criterion for the Gurson Tvergaard model is presented. Comparison with Koplik and Needleman's finite element results shows that the new failure criterion is fairly accurate indeed. A novel feature of the new failure criterion is that a mechanism for void coalescence is incorporated into the constitutive model. Hence the material failure is a natural result of the development of macroscopic plastic flow and the microscopic internal necking mechanism. By the new failure criterion, the critical void volume fraction is not a material constant and the initial void volume fraction and/or void nucleation parameters essentially control the material failure. This feature is very desirable and makes the numerical calibration of void nucleation parameters(s) possible and physically sound. Thirdly, a local approach methodology based on the above two major contributions has been built up in ABAQUS via the user material subroutine UMAT and applied to welded T joints. By using the void nucleation parameters calibrated from simple smooth and notched specimens, it was found that the fracture behaviour of the welded T joints can be well predicted using present methodology. This application has shown how the damage parameters of both base material and heat affected zone (HAZ) material can be obtained in a step by step manner and how useful and capable the local approach methodology is in the analysis of fracture behaviour and crack development as well as structural integrity assessment of practical problems where non homogeneous materials are involved. Finally, a procedure for the possible engineering application of the present methodology is suggested and discussed.
Resumo:
La tècnica de l’electroencefalograma (EEG) és una de les tècniques més utilitzades per estudiar el cervell. En aquesta tècnica s’enregistren els senyals elèctrics que es produeixen en el còrtex humà a través d’elèctrodes col•locats al cap. Aquesta tècnica, però, presenta algunes limitacions a l’hora de realitzar els enregistraments, la principal limitació es coneix com a artefactes, que són senyals indesitjats que es mesclen amb els senyals EEG. L’objectiu d’aquest treball de final de màster és presentar tres nous mètodes de neteja d’artefactes que poden ser aplicats en EEG. Aquests estan basats en l’aplicació de la Multivariate Empirical Mode Decomposition, que és una nova tècnica utilitzada per al processament de senyal. Els mètodes de neteja proposats s’apliquen a dades EEG simulades que contenen artefactes (pestanyeigs), i un cop s’han aplicat els procediments de neteja es comparen amb dades EEG que no tenen pestanyeigs, per comprovar quina millora presenten. Posteriorment, dos dels tres mètodes de neteja proposats s’apliquen sobre dades EEG reals. Les conclusions que s’han extret del treball són que dos dels nous procediments de neteja proposats es poden utilitzar per realitzar el preprocessament de dades reals per eliminar pestanyeigs.
Resumo:
The quantitative structure property relationship (QSPR) for the boiling point (Tb) of polychlorinated dibenzo-p-dioxins and polychlorinated dibenzofurans (PCDD/Fs) was investigated. The molecular distance-edge vector (MDEV) index was used as the structural descriptor. The quantitative relationship between the MDEV index and Tb was modeled by using multivariate linear regression (MLR) and artificial neural network (ANN), respectively. Leave-one-out cross validation and external validation were carried out to assess the prediction performance of the models developed. For the MLR method, the prediction root mean square relative error (RMSRE) of leave-one-out cross validation and external validation was 1.77 and 1.23, respectively. For the ANN method, the prediction RMSRE of leave-one-out cross validation and external validation was 1.65 and 1.16, respectively. A quantitative relationship between the MDEV index and Tb of PCDD/Fs was demonstrated. Both MLR and ANN are practicable for modeling this relationship. The MLR model and ANN model developed can be used to predict the Tb of PCDD/Fs. Thus, the Tb of each PCDD/F was predicted by the developed models.
Resumo:
ABSTRACT In the present study, onion plants were tested under controlled conditions for the development of a climate model based on the influence of temperature (10, 15, 20 and 25°C) and leaf wetness duration (6, 12, 24 and 48 hours) on the severity of Botrytis leaf blight of onion caused by Botrytis squamosa. The relative lesion density was influenced by temperature and leaf wetness duration (P <0.05). The disease was most severe at 20°C. Data were subjected to nonlinear regression analysis. Beta generalized function was used to adjust severity and temperature data, while a logistic function was chosen to represent the effect of leaf wetness on the severity of Botrytis leaf blight. The response surface obtained by the product of two functions was expressed as ES = 0.008192 * (((x-5)1.01089) * ((30-x)1.19052)) * (0.33859/(1+3.77989 * exp (-0.10923*y))), where ES represents the estimated severity value (0.1); x, the temperature (°C); and y, the leaf wetness (in hours). This climate model should be validated under field conditions to verify its use as a computational system for the forecasting of Botrytis leaf blight in onion.
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
Litter fall consists of all organic material deposited on the forest floor, being of extremely important for the structure and maintenance of the ecosystem through nutrient cycling. This study aimed to evaluate the production and decomposition of litter fall in a secondary Atlantic forest fragment of secondary Atlantic Forest, at the Guarapiranga Ecological Park, in São Paulo, SP. The litter samples were taken monthly from May 2012 to May 2013. To assess the contribution of litter fall forty collectors were installed randomly within an area of 0.5 ha. The collected material was sent to the laboratory to be dried at 65 °C for 72 hours, being subsequently separated into fractions of leaves, twigs, reproductive parts and miscellaneous, and weighed to obtain the dry biomass. Litterbags were placed and tied close to the collectors to estimate the decomposition rate in order to evaluate the loss of dry biomass at 30, 60, 90, 120 and 150 days. After collection, the material was sent to the laboratory to be dried and weighed again. Total litter fall throughout the year reached 5.7 Mg.ha-1.yr-1 and the major amount of the material was collected from September till March. Leaves had the major contribution for total litter fall (72%), followed by twigs (14%), reproductive parts (11%) and miscellaneous (3%). Reproductive parts had a peak during the wet season. Positive correlation was observed between total litter and precipitation, temperature and radiation (r = 0.66, p<0.05; r = 0.76, p<0.05; r = 0.58, p<0.05, respectively). The multiple regression showed that precipitation and radiation contributed significantly to litter fall production. Decomposition rate was in the interval expected for secondary tropical forest and was correlated to rainfall. It was concluded that this fragment of secondary forest showed a seasonality effect driven mainly by precipitation and radiation, both important components of foliage renewal for the plant community and that decomposition was in an intermediate rate.
Resumo:
The broiler rectal temperature (t rectal) is one of the most important physiological responses to classify the animal thermal comfort. Therefore, the aim of this study was to adjust regression models in order to predict the rectal temperature (t rectal) of broiler chickens under different thermal conditions based on age (A) and a meteorological variable (air temperature - t air) or a thermal comfort index (temperature and humidity index -THI or black globe humidity index - BGHI) or a physical quantity enthalpy (H). In addition, through the inversion of these models and the expected t rectal intervals for each age, the comfort limits of t air, THI, BGHI and H for the chicks in the heating phase were determined, aiding in the validation of the equations and the preliminary limits for H. The experimental data used to adjust the mathematical models were collected in two commercial poultry farms, with Cobb chicks, from 1 to 14 days of age. It was possible to predict the t rectal of conditions from the expected t rectal and determine the lower and superior comfort thresholds of broilers satisfactorily by applying the four models adjusted; as well as to invert the models for prediction of the environmental H for the chicks first 14 days of life.
Resumo:
Singular Value Decomposition (SVD), Principal Component Analysis (PCA) and Multiple Linear Regression (MLR) are some of the mathematical pre- liminaries that are discussed prior to explaining PLS and PCR models. Both PLS and PCR are applied to real spectral data and their di erences and similarities are discussed in this thesis. The challenge lies in establishing the optimum number of components to be included in either of the models but this has been overcome by using various diagnostic tools suggested in this thesis. Correspondence analysis (CA) and PLS were applied to ecological data. The idea of CA was to correlate the macrophytes species and lakes. The di erences between PLS model for ecological data and PLS for spectral data are noted and explained in this thesis. i
Resumo:
PURPOSE: To analyze the prevalence of and factors associated with fragility fractures in Brazilian women aged 50 years and older. METHODS: This cross-sectional population survey, conducted between May 10 and October 31, 2011, included 622 women aged >50 years living in a city in southeastern Brazil. A questionnaire was administered to each woman by a trained interviewer. The associations between the occurrence of a fragility fracture after age 50 years and sociodemographic data, health-related habits and problems, self-perception of health and evaluation of functional capacity were determined by the χ2 test and Poisson regression using the backward selection criteria. RESULTS: The mean age of the 622 women was 64.1 years. The prevalence of fragility fractures was 10.8%, with 1.8% reporting hip fracture. In the final statistical model, a longer time since menopause (PR 1.03; 95%CI 1.01-1.05; p<0.01) and osteoporosis (PR 1.97; 95%CI 1.27-3.08; p<0.01) were associated with a higher prevalence of fractures. CONCLUSIONS: These findings may provide a better understanding of the risk factors associated with fragility fractures in Brazilian women and emphasize the importance of performing bone densitometry.
Resumo:
In order to reduce greenhouse emissions from forest degradation and deforestation the international programme REDD (Reducing Emissions from Deforestation and forest Degradation) was established in 2005 by the United Nations Framework Convention on Climate Change (UNFCCC). This programme is aimed to financially reward to developing countries for any emissions reductions. Under this programm the project of setting up the payment system in Nepal was established. This project is aimed to engage local communities in forest monitoring. The major objective of this thesis is to compare and verify data obtained from di erect sources - remotely sensed data, namely LiDAR and field sample measurements made by two groups of researchers using two regression models - Sparse Bayesian Regression and Bayesian Regression with Orthogonal Variables.
Resumo:
There are few population-based studies of renal dysfunction and none conducted in developing countries. In the present study the prevalence and predictors of elevated serum creatinine levels (SCr > or = 1.3 mg/dl for men and 1.1 mg/dl for women) were determined among Brazilian adults (18-59 years) and older adults (>60 years). Participants included all older adults (N = 1742) and a probabilistic sample of adults (N = 818) from Bambuí town, MG, Southeast Brazil. Predictors were investigated using multiple logistic regression. Mean SCr levels were 0.77 ± 0.15 mg/dl for adults, 1.02 ± 0.39 mg/dl for older men, and 0.81 ± 0.17 mg/dl for older women. Because there were only 4 cases (0.48%) with elevated SCr levels among adults, the analysis of elevated SCr levels was restricted to older adults. The overall prevalence of elevated SCr levels among the elderly was 5.09% (76/1494). The prevalence of hypercreatinemia increased significantly with age (chi² = 26.17, P = 0.000), being higher for older men (8.19%) than for older women (5.29%, chi² = 5.00, P = 0.02). Elevated SCr levels were associated with age 70-79 years (odds ratio [OR] = 2.25, 95% confidence interval [CI]: 1.15-4.42), hypertension (OR = 3.04, 95% CI: 1.34-6.92), use of antihypertensive drugs (OR = 2.46, 95% CI: 1.26-4.82), chest pain (OR = 3.37, 95% CI: 1.31-8.74), and claudication (OR = 3.43, 95% CI: 1.30-9.09) among men, and with age >80 years (OR = 4.88, 95% CI: 2.24-10.65), use of antihypertensive drugs (OR = 4.06, 95% CI: 1.67-9.86), physical inactivity (OR = 2.11, 95% CI: 1.11-4.02) and myocardial infarction (OR = 3.89, 95% CI: 1.58-9.62) among women. The prevalence of renal dysfunction observed was much lower than that reported in other population-based studies, but predictors were similar. New investigations are needed to confirm the variability in prevalence and associated factors of renal dysfunction among populations.