876 resultados para classification and regression tree
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
This paper describes an investigation of the hybrid PSO/ACO algorithm to classify automatically the well drilling operation stages. The method feasibility is demonstrated by its application to real mud-logging dataset. The results are compared with bio-inspired methods, and rule induction and decision tree algorithms for data mining. © 2009 Springer Berlin Heidelberg.
Resumo:
The traditional characteristics and challenges for organizing and searching information on the World Wide Web are outlined and reviewed. The classification features of two of these methods, such as Google, in the case of automated search engines, and Yahoo! Directory, in the case of subject directories are analyzed. Recent advances in the Semantic Web, particularly the growing application of ontologies and Linked Data are also reviewed. Finally, some problems and prospects related to the use of classification and indexing on the World Wide Web are discussed, emphasizing the need of rethinking the role of classification in the organization of these resources and outlining the possibilities of applying Ranganathan's facet theories of classification.
Resumo:
Prostate cancer is a serious public health problem accounting for up to 30% of clinical tumors in men. The diagnosis of this disease is made with clinical, laboratorial and radiological exams, which may indicate the need for transrectal biopsy. Prostate biopsies are discerningly evaluated by pathologists in an attempt to determine the most appropriate conduct. This paper presents a set of techniques for identifying and quantifying regions of interest in prostatic images. Analyses were performed using multi-scale lacunarity and distinct classification methods: decision tree, support vector machine and polynomial classifier. The performance evaluation measures were based on area under the receiver operating characteristic curve (AUC). The most appropriate region for distinguishing the different tissues (normal, hyperplastic and neoplasic) was defined: the corresponding lacunarity values and a rule's model were obtained considering combinations commonly explored by specialists in clinical practice. The best discriminative values (AUC) were 0.906, 0.891 and 0.859 between neoplasic versus normal, neoplasic versus hyperplastic and hyperplastic versus normal groups, respectively. The proposed protocol offers the advantage of making the findings comprehensible to pathologists. (C) 2014 Elsevier Ltd. All rights reserved.
Resumo:
Epidemiological researches are important to understand the distribution and etiology of oral diseases. The actual researches that show the relationship between patient ages, denture status and denture stomatitis are scarce. So, the aim of this study was to identify of Candida spp. in patients with Denture Stomatitis (DS) and to correlate with gender, age, time of denture use and Newton’s classification. 204 complete denture patients (46 males and 158 females) were selected. DS was classified according to Newton’s classification and it was related to gender, age and time of denture use. Samples from the palatal mucosa and the surface of the upper denture of patients with DS were evaluated using PCR test for identification of Candida species. T-test, chisquare and Fisher’s exact tests were used for statistical analysis. DS was evidenced in 54.4% of the sample. According to gender 41.3% of the males and 58.3% females had the disease and the differences were statistically significant (p = 0.032). The type of DS was directly influenced by the time of denture use (p<0.001), but it was not significantly related to the age of the participants (p>0.05). C. albicans, C. tropicalis, C. glabrata, C. krusei and C. dubliniensis were identified by PCR test. DS is more prevalent in women and the prevalence of DS was influenced by the time of denture use (years). C. albicans was identified as the most frequent specie in patients with DS.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
OBJECTIVE: Despite the high prevalence of substance abuse and mood disorders among victimized children and adolescents, few studies have investigated the association of these disorders with treatment adherence, represented by numbers of visits per month and treatment duration. We aimed to investigate the effects of substance abuse and mood disorders on treatment adherence and duration in a special programfor victimized children in Sao Paulo, Brazil. METHODS: A total of 351 participants were evaluated for psychiatric disorders and classified into one of five groups: mood disorders alone; substance abuse disorders alone; mood and substance abuse disorders; other psychiatric disorders; no psychiatric disorders. The associations between diagnostic classification and adherence to treatment and the duration of program participation were tested with logistic regression and survival analysis, respectively. RESULTS: Children with mood disorders alone had the highest rate of adherence (79.5%); those with substance abuse disorders alone had the lowest (40%); and those with both disorders had an intermediate rate of adherence (50%). Those with other psychiatric disorders and no psychiatric disorders also had high rates of adherence (75.6% and 72.9%, respectively). Living with family significantly increased adherence for children with substance abuse disorders but decreased adherence for those with no psychiatric disorders. The diagnostic correlates of duration of participation were similar to those for adherence. CONCLUSIONS: Mood and substance abuse disorders were strong predictive factors for treatment adherence and duration, albeit in opposite directions. Living with family seems to have a positive effect on treatment adherence for patients with substance abuse disorders. More effective treatment is needed for victimized substance-abusing youth.
Resumo:
OBJECTIVE: Despite the high prevalence of substance abuse and mood disorders among victimized children and adolescents, few studies have investigated the association of these disorders with treatment adherence, represented by numbers of visits per month and treatment duration. We aimed to investigate the effects of substance abuse and mood disorders on treatment adherence and duration in a special program for victimized children in São Paulo, Brazil. METHODS: A total of 351 participants were evaluated for psychiatric disorders and classified into one of five groups: mood disorders alone; substance abuse disorders alone; mood and substance abuse disorders; other psychiatric disorders; no psychiatric disorders. The associations between diagnostic classification and adherence to treatment and the duration of program participation were tested with logistic regression and survival analysis, respectively. RESULTS: Children with mood disorders alone had the highest rate of adherence (79.5%); those with substance abuse disorders alone had the lowest (40%); and those with both disorders had an intermediate rate of adherence (50%). Those with other psychiatric disorders and no psychiatric disorders also had high rates of adherence (75.6% and 72.9%, respectively). Living with family significantly increased adherence for children with substance abuse disorders but decreased adherence for those with no psychiatric disorders. The diagnostic correlates of duration of participation were similar to those for adherence. CONCLUSIONS: Mood and substance abuse disorders were strong predictive factors for treatment adherence and duration, albeit in opposite directions. Living with family seems to have a positive effect on treatment adherence for patients with substance abuse disorders. More effective treatment is needed for victimized substance-abusing youth
Resumo:
In my PhD thesis I propose a Bayesian nonparametric estimation method for structural econometric models where the functional parameter of interest describes the economic agent's behavior. The structural parameter is characterized as the solution of a functional equation, or by using more technical words, as the solution of an inverse problem that can be either ill-posed or well-posed. From a Bayesian point of view, the parameter of interest is a random function and the solution to the inference problem is the posterior distribution of this parameter. A regular version of the posterior distribution in functional spaces is characterized. However, the infinite dimension of the considered spaces causes a problem of non continuity of the solution and then a problem of inconsistency, from a frequentist point of view, of the posterior distribution (i.e. problem of ill-posedness). The contribution of this essay is to propose new methods to deal with this problem of ill-posedness. The first one consists in adopting a Tikhonov regularization scheme in the construction of the posterior distribution so that I end up with a new object that I call regularized posterior distribution and that I guess it is solution of the inverse problem. The second approach consists in specifying a prior distribution on the parameter of interest of the g-prior type. Then, I detect a class of models for which the prior distribution is able to correct for the ill-posedness also in infinite dimensional problems. I study asymptotic properties of these proposed solutions and I prove that, under some regularity condition satisfied by the true value of the parameter of interest, they are consistent in a "frequentist" sense. Once I have set the general theory, I apply my bayesian nonparametric methodology to different estimation problems. First, I apply this estimator to deconvolution and to hazard rate, density and regression estimation. Then, I consider the estimation of an Instrumental Regression that is useful in micro-econometrics when we have to deal with problems of endogeneity. Finally, I develop an application in finance: I get the bayesian estimator for the equilibrium asset pricing functional by using the Euler equation defined in the Lucas'(1978) tree-type models.
Resumo:
Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.
Resumo:
Nuclear Magnetic Resonance (NMR) is a branch of spectroscopy that is based on the fact that many atomic nuclei may be oriented by a strong magnetic field and will absorb radiofrequency radiation at characteristic frequencies. The parameters that can be measured on the resulting spectral lines (line positions, intensities, line widths, multiplicities and transients in time-dependent experi-ments) can be interpreted in terms of molecular structure, conformation, molecular motion and other rate processes. In this way, high resolution (HR) NMR allows performing qualitative and quantitative analysis of samples in solution, in order to determine the structure of molecules in solution and not only. In the past, high-field NMR spectroscopy has mainly concerned with the elucidation of chemical structure in solution, but today is emerging as a powerful exploratory tool for probing biochemical and physical processes. It represents a versatile tool for the analysis of foods. In literature many NMR studies have been reported on different type of food such as wine, olive oil, coffee, fruit juices, milk, meat, egg, starch granules, flour, etc using different NMR techniques. Traditionally, univariate analytical methods have been used to ex-plore spectroscopic data. This method is useful to measure or to se-lect a single descriptive variable from the whole spectrum and , at the end, only this variable is analyzed. This univariate methods ap-proach, applied to HR-NMR data, lead to different problems due especially to the complexity of an NMR spectrum. In fact, the lat-ter is composed of different signals belonging to different mole-cules, but it is also true that the same molecules can be represented by different signals, generally strongly correlated. The univariate methods, in this case, takes in account only one or a few variables, causing a loss of information. Thus, when dealing with complex samples like foodstuff, univariate analysis of spectra data results not enough powerful. Spectra need to be considered in their wholeness and, for analysing them, it must be taken in consideration the whole data matrix: chemometric methods are designed to treat such multivariate data. Multivariate data analysis is used for a number of distinct, differ-ent purposes and the aims can be divided into three main groups: • data description (explorative data structure modelling of any ge-neric n-dimensional data matrix, PCA for example); • regression and prediction (PLS); • classification and prediction of class belongings for new samples (LDA and PLS-DA and ECVA). The aim of this PhD thesis was to verify the possibility of identify-ing and classifying plants or foodstuffs, in different classes, based on the concerted variation in metabolite levels, detected by NMR spectra and using the multivariate data analysis as a tool to inter-pret NMR information. It is important to underline that the results obtained are useful to point out the metabolic consequences of a specific modification on foodstuffs, avoiding the use of a targeted analysis for the different metabolites. The data analysis is performed by applying chemomet-ric multivariate techniques to the NMR dataset of spectra acquired. The research work presented in this thesis is the result of a three years PhD study. This thesis reports the main results obtained from these two main activities: A1) Evaluation of a data pre-processing system in order to mini-mize unwanted sources of variations, due to different instrumental set up, manual spectra processing and to sample preparations arte-facts; A2) Application of multivariate chemiometric models in data analy-sis.
Resumo:
We present an update on clinical evaluation, staging, classification and treatment of canal cholesteatoma, including a meta-analysis of clinical data of the last 30 years.
Resumo:
The use of antibiotics is highest in primary care and directly associated with antibiotic resistance in the community. We assessed regional variations in antibiotic use in primary care in Switzerland and explored prescription patterns in relation to the use of point of care tests. Defined daily doses of antibiotics per 1000 inhabitants (DDD(1000pd) ) were calculated for the year 2007 from reimbursement data of the largest Swiss health insurer, based on the anatomic therapeutic chemical classification and the DDD methodology recommended by WHO. We present ecological associations by use of descriptive and regression analysis. We analysed data from 1 067 934 adults, representing 17.1% of the Swiss population. The rate of outpatient antibiotic prescriptions in the entire population was 8.5 DDD(1000pd) , and varied between 7.28 and 11.33 DDD(1000pd) for northwest Switzerland and the Lake Geneva region. DDD(1000pd) for the three most prescribed antibiotics were 2.90 for amoxicillin and amoxicillin-clavulanate, 1.77 for fluoroquinolones, and 1.34 for macrolides. Regions with higher DDD(1000pd) showed higher seasonal variability in antibiotic use and lower use of all point of care tests. In regression analysis for each class of antibiotics, the use of any point of care test was consistently associated with fewer antibiotic prescriptions. Prescription rates of primary care physicians showed variations between Swiss regions and were lower in northwest Switzerland and in physicians using point of care tests. Ecological studies are prone to bias and whether point of care tests reduce antibiotic use has to be investigated in pragmatic primary care trials.
Resumo:
Background: The current proposed model of colorectal tumorigenesis is based primarily on CpG island methylator phenotype (CIMP), microsatellite instability (MSI), KRAS, BRAF, and methylation status of 0-6-Methylguanine DNA Methyltransferase (MGMT) and classifies tumors into five subgroups. The aim of this study is to validate this molecular classification and test its prognostic relevance. Methods: Three hundred two patients were included in this study. Molecular analysis was performed for five CIMP-related promoters (CRABP1, MLH1, p16INK4a, CACNA1G, NEUROG1), MGMT, MSI, KRAS, and BRAF. Methylation in at least 4 promoters or in one to three promoters was considered CIMP-high and CIMP-low (CIMP-H/L), respectively. Results: CIMP-H, CIMP-L, and CIMP-negative were found in 7.1, 43, and 49.9% cases, respectively. One hundred twenty-three tumors (41%) could not be classified into any one of the proposed molecular subgroups, including 107 CIMP-L, 14 CIMP-H, and two CIMP-negative cases. The 10 year survival rate for CIMP-high patients [22.6% (95%CI: 7-43)] was significantly lower than for CIMP-L or CIMP-negative (p = 0.0295). Only the combined analysis of BRAF and CIMP (negative versus L/H) led to distinct prognostic subgroups. Conclusion: Although CIMP status has an effect on outcome, our results underline the need for standardized definitions of low- and high-level CIMP, which clearly hinders an effective prognostic and molecular classification of colorectal cancer.