54 resultados para random forest data analysis

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. Methods/Principal Findings: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of ""what if'' situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. Conclusion/Significance: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mass transfer across a gas-liquid interface was studied theoretically and experimentally, using transfer of oxygen into water as the gas-liquid system. The experimental results support the conclusions of a theoretical description of the concentration field that uses random square waves approximations. The effect of diffusion over the concentration records was quantified. It is shown that the peak of the normalized rills concentration fluctuation profiles must be lower than 0.5, and that the position of the peak of the rms value is an adequate measure of the thickness of the diffusive layer. The position of the peak is the boundary between the regions more subject to molecular diffusion or to turbulent transport of dissolved mass.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The identification, modeling, and analysis of interactions between nodes of neural systems in the human brain have become the aim of interest of many studies in neuroscience. The complex neural network structure and its correlations with brain functions have played a role in all areas of neuroscience, including the comprehension of cognitive and emotional processing. Indeed, understanding how information is stored, retrieved, processed, and transmitted is one of the ultimate challenges in brain research. In this context, in functional neuroimaging, connectivity analysis is a major tool for the exploration and characterization of the information flow between specialized brain regions. In most functional magnetic resonance imaging (fMRI) studies, connectivity analysis is carried out by first selecting regions of interest (ROI) and then calculating an average BOLD time series (across the voxels in each cluster). Some studies have shown that the average may not be a good choice and have suggested, as an alternative, the use of principal component analysis (PCA) to extract the principal eigen-time series from the ROI(s). In this paper, we introduce a novel approach called cluster Granger analysis (CGA) to study connectivity between ROIs. The main aim of this method was to employ multiple eigen-time series in each ROI to avoid temporal information loss during identification of Granger causality. Such information loss is inherent in averaging (e.g., to yield a single ""representative"" time series per ROI). This, in turn, may lead to a lack of power in detecting connections. The proposed approach is based on multivariate statistical analysis and integrates PCA and partial canonical correlation in a framework of Granger causality for clusters (sets) of time series. We also describe an algorithm for statistical significance testing based on bootstrapping. By using Monte Carlo simulations, we show that the proposed approach outperforms conventional Granger causality analysis (i.e., using representative time series extracted by signal averaging or first principal components estimation from ROIs). The usefulness of the CGA approach in real fMRI data is illustrated in an experiment using human faces expressing emotions. With this data set, the proposed approach suggested the presence of significantly more connections between the ROIs than were detected using a single representative time series in each ROI. (c) 2010 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present an algorithm for cluster analysis that integrates aspects from cluster ensemble and multi-objective clustering. The algorithm is based on a Pareto-based multi-objective genetic algorithm, with a special crossover operator, which uses clustering validation measures as objective functions. The algorithm proposed can deal with data sets presenting different types of clusters, without the need of expertise in cluster analysis. its result is a concise set of partitions representing alternative trade-offs among the objective functions. We compare the results obtained with our algorithm, in the context of gene expression data sets, to those achieved with multi-objective Clustering with automatic K-determination (MOCK). the algorithm most closely related to ours. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The TCABR data analysis and acquisition system has been upgraded to support a joint research programme using remote participation technologies. The architecture of the new system uses Java language as programming environment. Since application parameters and hardware in a joint experiment are complex with a large variability of components, requirements and specification solutions need to be flexible and modular, independent from operating system and computer architecture. To describe and organize the information on all the components and the connections among them, systems are developed using the extensible Markup Language (XML) technology. The communication between clients and servers uses remote procedure call (RPC) based on the XML (RPC-XML technology). The integration among Java language, XML and RPC-XML technologies allows to develop easily a standard data and communication access layer between users and laboratories using common software libraries and Web application. The libraries allow data retrieval using the same methods for all user laboratories in the joint collaboration, and the Web application allows a simple graphical user interface (GUI) access. The TCABR tokamak team in collaboration with the IPFN (Instituto de Plasmas e Fusao Nuclear, Instituto Superior Tecnico, Universidade Tecnica de Lisboa) is implementing this remote participation technologies. The first version was tested at the Joint Experiment on TCABR (TCABRJE), a Host Laboratory Experiment, organized in cooperation with the IAEA (International Atomic Energy Agency) in the framework of the IAEA Coordinated Research Project (CRP) on ""Joint Research Using Small Tokamaks"". (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work presents a novel approach in order to increase the recognition power of Multiscale Fractal Dimension (MFD) techniques, when applied to image classification. The proposal uses Functional Data Analysis (FDA) with the aim of enhancing the MFD technique precision achieving a more representative descriptors vector, capable of recognizing and characterizing more precisely objects in an image. FDA is applied to signatures extracted by using the Bouligand-Minkowsky MFD technique in the generation of a descriptors vector from them. For the evaluation of the obtained improvement, an experiment using two datasets of objects was carried out. A dataset was used of characters shapes (26 characters of the Latin alphabet) carrying different levels of controlled noise and a dataset of fish images contours. A comparison with the use of the well-known methods of Fourier and wavelets descriptors was performed with the aim of verifying the performance of FDA method. The descriptor vectors were submitted to Linear Discriminant Analysis (LDA) classification method and we compared the correctness rate in the classification process among the descriptors methods. The results demonstrate that FDA overcomes the literature methods (Fourier and wavelets) in the processing of information extracted from the MFD signature. In this way, the proposed method can be considered as an interesting choice for pattern recognition and image classification using fractal analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents the groundwater favorability mapping on a fractured terrain in the eastern portion of Sao Paulo State, Brazil. Remote sensing, airborne geophysical data, photogeologic interpretation, geologic and geomorphologic maps and geographic information system (GIS) techniques have been used. The results of cross-tabulation between these maps and well yield data allowed groundwater prospective parameters in a fractured-bedrock aquifer. These prospective parameters are the base for the favorability analysis whose principle is based on the knowledge-driven method. The mutticriteria analysis (weighted linear combination) was carried out to give a groundwater favorabitity map, because the prospective parameters have different weights of importance and different classes of each parameter. The groundwater favorability map was tested by cross-tabulation with new well yield data and spring occurrence. The wells with the highest values of productivity, as well as all the springs occurrence are situated in the excellent and good favorabitity mapped areas. It shows good coherence between the prospective parameters and the well yield and the importance of GIS techniques for definition of target areas for detail study and wells location. (c) 2008 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Eusarcus Perty 1833 is one of the oldest described genera of Pachylinae, comprising 36 species distributed from northeastern to southern Brazil (including the central west region), northeastern Argentina, eastern Paraguay and Uruguay. The genus is reviewed and a new classification is proposed based on a cladistic analysis. A cladistic analysis was performed with the 34 valid species of Eusarcus and 11 species belonging to certain Gonyleptidae subfamilies. The data matrix has 67 characters: 14 from dorsal scutum and pedipalp, 38 from male legs and 15 from male genitalia. Two equally parsimonious trees were found (L=319; C. I.=0.26, R. I.=0.61). Pygophalangodus gemignanii uruguayensis Ringuelet 1955a and Pygophalangodus gemignanii gemignanii Mello-Leitao 1931b are here elevated to the category of species, and the following new combinations are proposed: E. catharinensis (Mello-Leitao 1927); E. berlae (Mello-Leitao 1932); E. gemignanii (Mello-Leitao 1931b); E. signatus(Roewer 1949); E. sooretamae (Soares & Soares 1946a); E. uruguayensis (Ringuelet 1955a). The following generic synonymies are proposed: Eusarcus Perty 1833 (type species E. armatus Perty 1833) = Metagraphinotus Mello-Leitao 1927 (type species M. catharinensis Mello-Leitao 1927), Pareusarcus Roewer 1929 (type species P. corniculatus Roewer 1929), Pygophalangodus Mello-Leitao 1931b (type species P. gemignanii-gemignanii Mello-Leitao 1931b) and Antetriceras Roewer 1949 (type species A. signatus Roewer 1949). The following specific synonymies are proposed: Eusarcus hastatus Sorensen 1884 = Pucrolioides argentina Roewer 1913, E. guimaraensi H. Soares 1945, Jacarepaguana pectinifemur Piza 1943, Canestrinia canalsi Mello-Leitao 1931a, and E. maquinensis H. Soares 1966b; E. armatus Perty 1833 = E. curvispinosus Mello-Leitao 1923b, and Enantiocentron montis Mello-Leitao 1936; Eusarcus catharinensis (Mello-Leitao 1927) = E. antoninae Mello-Leitao 1936, E. perpusillus Mello-Leitao 1945, E. tripos Mello-Leitao 1940, and Metagraphinotus trochanterspinosus Soares & Soares 1947b; E. nigrimaculatus Mello-Leitao 1924 = Pareusarcus centromelos Mello-Leitao 1935a, E. furcatus Roewer 1929, Orguesia armata Roewer 1913, and Pareusarcus corniculatus Roewer 1929; E. oxyacanthus Kollar in Koch 1839a = Enantiocentron doriphorus Mello-Leitao 1932, and E. spinimanu Mello-Leitao 1932; E. pusillus Sorensen 1884 = E. vervloeti B. Soares 1944c; E. berlae Mello-Leitao 1932 = Metagraphinotus arlei Mello-Leitao 1935a. Metapucrolia armata (Sorensen 1895) is revalidated, transferred to Eusarcus and considered as a species inquirenda. A new name, Eusarcus metapucrolia is proposed for this species to avoid homonymy with the type species of Eusarcus, E. armatus Perty 1833. Eusarcus aberrans Mello-Leitao 1939a is considered as a species inquirenda. The male of E. teresincola Soares & Soares 1946a is described. Female of the following species are described: E. bifidus Roewer 1929; E. dubius B. Soares 1943b; E. insperatus B. Soares 1944a; E. schubarti Soares & Soares 1946a; E. sooretamae (Soares & Soares 1946a). The following new species are described from Brazil: E. acrophthalmus (type locality: Bahia, Ilheus, Parataquice); E. alpinus (Rio de Janeiro, Santa Maria Madalena, Parque Estadual do Desengano); E. caparaoensis (Minas Gerais, Alto Caparao, Parque Nacional do Caparao); E. cavernicola (Goias, Sao Domingos, Parque Estadual de Terra Ronca, Lapa da Angelica); E. didactylus (Rio de Janeiro, Teresopolis, Parque Nacional Serra dos Orgaos); E. garibaldiae (Santa Catarina, Itajai); E. geometricus (Rio de Janeiro, Teresopolis, Parque Nacional Serra dos Orgaos); E. manero (Rio de Janeiro, Marica, Itaipuacu); E. matogrossensis (Mato Grosso, Chapada dos Guimaraes); E. mirabilis (Minas Gerais, Marlieria, Parque Estadual Rio Doce); E. sergipanus (Sergipe, Itabaiana, Parque Nacional de Itabaiana) and E. tripectinatus (Minas Gerais, Rio Preto). The holotype of E. curvispinosus is proposed as the neotype of E. armatus Perty 1833, the type material of which has been lost. Lectotypes for the following species were designated: E. aduncus; E. hastatus; E. oxyacanthus.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a regression model considering the modified Weibull distribution. This distribution can be used to model bathtub-shaped failure rate functions. Assuming censored data, we consider maximum likelihood and Jackknife estimators for the parameters of the model. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and we also present some ways to perform global influence. Besides, for different parameter settings, sample sizes and censoring percentages, various simulations are performed and the empirical distribution of the modified deviance residual is displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended for a martingale-type residual in log-modified Weibull regression models with censored data. Finally, we analyze a real data set under log-modified Weibull regression models. A diagnostic analysis and a model checking based on the modified deviance residual are performed to select appropriate models. (c) 2008 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, regression models are evaluated for grouped survival data when the effect of censoring time is considered in the model and the regression structure is modeled through four link functions. The methodology for grouped survival data is based on life tables, and the times are grouped in k intervals so that ties are eliminated. Thus, the data modeling is performed by considering the discrete models of lifetime regression. The model parameters are estimated by using the maximum likelihood and jackknife methods. To detect influential observations in the proposed models, diagnostic measures based on case deletion, which are denominated global influence, and influence measures based on small perturbations in the data or in the model, referred to as local influence, are used. In addition to those measures, the local influence and the total influential estimate are also employed. Various simulation studies are performed and compared to the performance of the four link functions of the regression models for grouped survival data for different parameter settings, sample sizes and numbers of intervals. Finally, a data set is analyzed by using the proposed regression models. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Joint generalized linear models and double generalized linear models (DGLMs) were designed to model outcomes for which the variability can be explained using factors and/or covariates. When such factors operate, the usual normal regression models, which inherently exhibit constant variance, will under-represent variation in the data and hence may lead to erroneous inferences. For count and proportion data, such noise factors can generate a so-called overdispersion effect, and the use of binomial and Poisson models underestimates the variability and, consequently, incorrectly indicate significant effects. In this manuscript, we propose a DGLM from a Bayesian perspective, focusing on the case of proportion data, where the overdispersion can be modeled using a random effect that depends on some noise factors. The posterior joint density function was sampled using Monte Carlo Markov Chain algorithms, allowing inferences over the model parameters. An application to a data set on apple tissue culture is presented, for which it is shown that the Bayesian approach is quite feasible, even when limited prior information is available, thereby generating valuable insight for the researcher about its experimental results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Using data from a logging experiment in the eastern Brazilian Amazon region, we develop a matrix growth and yield model that captures the dynamic effects of harvest system choice on forest structure and composition. Multinomial logistic regression is used to estimate the growth transition parameters for a 10-year time step, while a Poisson regression model is used to estimate recruitment parameters. The model is designed to be easily integrated with an economic model of decisionmaking to perform tropical forest policy analysis. The model is used to compare the long-run structure and composition of a stand arising from the choice of implementing either conventional logging techniques or more carefully planned and executed reduced-impact logging (RIL) techniques, contrasted against a baseline projection of an unlogged forest. Results from log and leave scenarios show that a stand logged according to Brazilian management requirements will require well over 120 years to recover its initial commercial volume, regardless of logging technique employed. Implementing RIL, however, accelerates this recovery. Scenarios imposing a 40-year cutting cycle raise the possibility of sustainable harvest volumes, although at significantly lower levels than is implied by current regulations. Meeting current Brazilian forest policy goals may require an increase in the planned total area of permanent production forest or the widespread adoption of silvicultural practices that increase stand recovery and volume accumulation rates after RIL harvests. Published by Elsevier B.V.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Functional magnetic resonance imaging (fMRI) based on BOLD signal has been used to indirectly measure the local neural activity induced by cognitive tasks or stimulation. Most fMRI data analysis is carried out using the general linear model (GLM), a statistical approach which predicts the changes in the observed BOLD response based on an expected hemodynamic response function (HRF). In cases when the task is cognitively complex or in cases of diseases, variations in shape and/or delay may reduce the reliability of results. A novel exploratory method using fMRI data, which attempts to discriminate between neurophysiological signals induced by the stimulation protocol from artifacts or other confounding factors, is introduced in this paper. This new method is based on the fusion between correlation analysis and the discrete wavelet transform, to identify similarities in the time course of the BOLD signal in a group of volunteers. We illustrate the usefulness of this approach by analyzing fMRI data from normal subjects presented with standardized human face pictures expressing different degrees of sadness. The results show that the proposed wavelet correlation analysis has greater statistical power than conventional GLM or time domain intersubject correlation analysis. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aims We conducted a meta-analysis to evaluate the accuracy of quantitative stress myocardial contrast echocardiography (MCE) in coronary artery disease (CAD). Methods and results Database search was performed through January 2008. We included studies evaluating accuracy of quantitative stress MCE for detection of CAD compared with coronary angiography or single-photon emission computed tomography (SPECT) and measuring reserve parameters of A, beta, and A beta. Data from studies were verified and supplemented by the authors of each study. Using random effects meta-analysis, we estimated weighted mean difference (WMD), likelihood ratios (LRs), diagnostic odds ratios (DORs), and summary area under curve (AUC), all with 95% confidence interval (0). Of 1443 studies, 13 including 627 patients (age range, 38-75 years) and comparing MCE with angiography (n = 10), SPECT (n = 1), or both (n = 2) were eligible. WMD (95% CI) were significantly less in CAD group than no-CAD group: 0.12 (0.06-0.18) (P < 0.001), 1.38 (1.28-1.52) (P < 0.001), and 1.47 (1.18-1.76) (P < 0.001) for A, beta, and A beta reserves, respectively. Pooled LRs for positive test were 1.33 (1.13-1.57), 3.76 (2.43-5.80), and 3.64 (2.87-4.78) and LRs for negative test were 0.68 (0.55-0.83), 0.30 (0.24-0.38), and 0.27 (0.22-0.34) for A, beta, and A beta reserves, respectively. Pooled DORs were 2.09 (1.42-3.07), 15.11 (7.90-28.91), and 14.73 (9.61-22.57) and AUCs were 0.637 (0.594-0.677), 0.851 (0.828-0.872), and 0.859 (0.842-0.750) for A, beta, and A beta reserves, respectively. Conclusion Evidence supports the use of quantitative MCE as a non-invasive test for detection of CAD. Standardizing MCE quantification analysis and adherence to reporting standards for diagnostic tests could enhance the quality of evidence in this field.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: To develop a model to predict the bleeding source and identify the cohort amongst patients with acute gastrointestinal bleeding (GIB) who require urgent intervention, including endoscopy. Patients with acute GIB, an unpredictable event, are most commonly evaluated and managed by non-gastroenterologists. Rapid and consistently reliable risk stratification of patients with acute GIB for urgent endoscopy may potentially improve outcomes amongst such patients by targeting scarce health-care resources to those who need it the most. Design and methods: Using ICD-9 codes for acute GIB, 189 patients with acute GIB and all. available data variables required to develop and test models were identified from a hospital medical records database. Data on 122 patients was utilized for development of the model and on 67 patients utilized to perform comparative analysis of the models. Clinical data such as presenting signs and symptoms, demographic data, presence of co-morbidities, laboratory data and corresponding endoscopic diagnosis and outcomes were collected. Clinical data and endoscopic diagnosis collected for each patient was utilized to retrospectively ascertain optimal management for each patient. Clinical presentations and corresponding treatment was utilized as training examples. Eight mathematical models including artificial neural network (ANN), support vector machine (SVM), k-nearest neighbor, linear discriminant analysis (LDA), shrunken centroid (SC), random forest (RF), logistic regression, and boosting were trained and tested. The performance of these models was compared using standard statistical analysis and ROC curves. Results: Overall the random forest model best predicted the source, need for resuscitation, and disposition with accuracies of approximately 80% or higher (accuracy for endoscopy was greater than 75%). The area under ROC curve for RF was greater than 0.85, indicating excellent performance by the random forest model Conclusion: While most mathematical models are effective as a decision support system for evaluation and management of patients with acute GIB, in our testing, the RF model consistently demonstrated the best performance. Amongst patients presenting with acute GIB, mathematical models may facilitate the identification of the source of GIB, need for intervention and allow optimization of care and healthcare resource allocation; these however require further validation. (c) 2007 Elsevier B.V. All rights reserved.