50 resultados para random forest data analysis
em Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho"
Resumo:
Linear mixed effects models are frequently used to analyse longitudinal data, due to their flexibility in modelling the covariance structure between and within observations. Further, it is easy to deal with unbalanced data, either with respect to the number of observations per subject or per time period, and with varying time intervals between observations. In most applications of mixed models to biological sciences, a normal distribution is assumed both for the random effects and for the residuals. This, however, makes inferences vulnerable to the presence of outliers. Here, linear mixed models employing thick-tailed distributions for robust inferences in longitudinal data analysis are described. Specific distributions discussed include the Student-t, the slash and the contaminated normal. A Bayesian framework is adopted, and the Gibbs sampler and the Metropolis-Hastings algorithms are used to carry out the posterior analyses. An example with data on orthodontic distance growth in children is discussed to illustrate the methodology. Analyses based on either the Student-t distribution or on the usual Gaussian assumption are contrasted. The thick-tailed distributions provide an appealing robust alternative to the Gaussian process for modelling distributions of the random effects and of residuals in linear mixed models, and the MCMC implementation allows the computations to be performed in a flexible manner.
Resumo:
Background: Meat quality involves many traits, such as marbling, tenderness, juiciness, and backfat thickness, all of which require attention from livestock producers. Backfat thickness improvement by means of traditional selection techniques in Canchim beef cattle has been challenging due to its low heritability, and it is measured late in an animal's life. Therefore, the implementation of new methodologies for identification of single nucleotide polymorphisms (SNPs) linked to backfat thickness are an important strategy for genetic improvement of carcass and meat quality.Results: The set of SNPs identified by the random forest approach explained as much as 50% of the deregressed estimated breeding value (dEBV) variance associated with backfat thickness, and a small set of 5 SNPs were able to explain 34% of the dEBV for backfat thickness. Several quantitative trait loci (QTL) for fat-related traits were found in the surrounding areas of the SNPs, as well as many genes with roles in lipid metabolism.Conclusions: These results provided a better understanding of the backfat deposition and regulation pathways, and can be considered a starting point for future implementation of a genomic selection program for backfat thickness in Canchim beef cattle. © 2013 Mokry et al.; licensee BioMed Central Ltd.
Resumo:
In this paper a set of Brazilian commercial gasoline representative samples from São Paulo State, selected by HCA, plus six samples obtained directly from refineries were analysed by a high-sensitive gas chromatographic (GC) method ASTM D6733. The levels of saturated hydrocarbons and anhydrous ethanol obtained by GC were correlated with the quality obtained from Brazilian Government Petroleum, Natural Gas and Biofuels Agency (ANP) specifications through exploratory analysis (HCA and PCA). This correlation showed that the GC method, together with HCA and PCA, could be employed as a screening technique to determine compliance with the prescribed legal standards of Brazilian gasoline.
Resumo:
In this work, initial crystallographic studies of human haemoglobin (Hb) crystallized in isoionic and oxygen-free PEG solution are presented. Under these conditions, functional measurements of the O-2-linked binding of water molecules and release of protons have evidenced that Hb assumes an unforeseen new allosteric conformation. The determination of the high-resolution structure of the crystal of human deoxy-Hb fully stripped of anions may provide a structural explanation for the role of anions in the allosteric properties of Hb and, particularly, for the influence of chloride on the Bohr effect, the mechanism by which Hb oxygen affinity is regulated by pH. X-ray diffraction data were collected to 1.87 Angstrom resolution using a synchrotron-radiation source. Crystals belong to the space group P2(1)2(1)2 and preliminary analysis revealed the presence of one tetramer in the asymmetric unit. The structure is currently being refined using maximum-likelihood protocols.
Resumo:
Hemoglobin remains, despite the enormous amount of research involving this molecule, as a prototype for allosteric models and new conformations. Functional studies carried out on Hemoglobin-I from the South-American Catfish Liposarcus anisitsi [1] suggest the existence of conformational states beyond those already described for human hemoglobin, which could be confirmed crystallographically. The present work represents the initial steps towards that goal.
Resumo:
The present study introduces a multi-agent architecture designed for doing automation process of data integration and intelligent data analysis. Different from other approaches the multi-agent architecture was designed using a multi-agent based methodology. Tropos, an agent based methodology was used for design. Based on the proposed architecture, we describe a Web based application where the agents are responsible to analyse petroleum well drilling data to identify possible abnormalities occurrence. The intelligent data analysis methods used was the Neural Network.
Resumo:
In this paper is reported the use of the chromatographic profiles of volatiles to determine disease markers in plants - in this case, leaves of Eucalyptus globulus contaminated by the necrotroph fungus Teratosphaeria nubilosa. The volatile fraction was isolated by headspace solid phase microextraction (HS-SPME) and analyzed by comprehensive two-dimensional gas chromatography-fast quadrupole mass spectrometry (GC. ×. GC-qMS). For the correlation between the metabolic profile described by the chromatograms and the presence of the infection, unfolded-partial least squares discriminant analysis (U-PLS-DA) with orthogonal signal correction (OSC) were employed. The proposed method was checked to be independent of factors such as the age of the harvested plants. The manipulation of the mathematical model obtained also resulted in graphic representations similar to real chromatograms, which allowed the tentative identification of more than 40 compounds potentially useful as disease biomarkers for this plant/pathogen pair. The proposed methodology can be considered as highly reliable, since the diagnosis is based on the whole chromatographic profile rather than in the detection of a single analyte. © 2013 Elsevier B.V..
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Bacillus thuringiensis is a Gram-positive bacterium which main characteristic is the production of Cry proteins, that is toxic to some insects. These proteins, when ingested by susceptible insects, become active causing their death. In nature, it is possible to found B. thuringiensis strains which produce these proteins, but they differ in productivity (some of these isolates are more productive then others), and as to the toxicity levels of the produced proteins. Two B. thuringiensis strains that were highly effective against Spodoptera frugiperda larvae were chosen to verifying genetic mutation implication on Cry proteins productivity. One strain with a prolific spores production, while the other one only produced small amounts of spores. A genomic mutant library of these two isolates was, separately, constructed by genome Tn-5 transposon random insertion. Data analysis showed that mutation had a direct effect on the spores production, inducing an increase as well as a decrease in the production, according to the different strain observed. These results indicate, for the first time, that it is possible to use the described technique with B. thuringiensis, as well as the possibility to genetically breeding this bacteria. Another possibility introduced here is the possibility to do functional genetic studies mediated by mutagenesis in this bacterium.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Cutting analysis is a important and crucial task task to detect and prevent problems during the petroleum well drilling process. Several studies have been developed for drilling inspection, but none of them takes care about analysing the generated cutting at the vibrating shale shakers. Here we proposed a system to analyse the cutting's concentration at the vibrating shale shakers, which can indicate problems during the petroleum well drilling process, such that the collapse of the well borehole walls. Cutting's images are acquired and sent to the data analysis module, which has as the main goal to extract features and to classify frames according to one of three previously classes of cutting's volume. A collection of supervised classifiers were applied in order to allow comparisons about their accuracy and efficiency. We used the Optimum-Path Forest (OPF), Artificial Neural Network using Multi layer Perceptrons (ANN-MLP), Support Vector Machines (SVM) and a Bayesian Classifier (BC) for this task. The first one outperformed all the remaining classifiers. Recall that we are also the first to introduce the OPF classifier in this field of knowledge. Very good results show the robustness of the proposed system, which can be also integrated with other commonly system (Mud-Logging) in order to improve the last one's efficiency.
Resumo:
Cuttings return analysis is an important tool to detect and prevent problems during the petroleum well drilling process. Several measurements and tools have been developed for drilling problems detection, including mud logging, PWD and downhole torque information. Cuttings flow meters were developed in the past to provide information regarding cuttings return at the shale shakers. Their use, however, significantly impact the operation including rig space issues, interferences in geological analysis besides, additional personel required. This article proposes a non intrusive system to analyze the cuttings concentration at the shale shakers, which can indicate problems during drilling process, such as landslide, the collapse of the well borehole walls. Cuttings images are acquired by a high definition camera installed above the shakers and sent to a computer coupled with a data analysis system which aims the quantification and closure of a cuttings material balance in the well surface system domain. No additional people at the rigsite are required to operate the system. Modern Artificial intelligence techniques are used for pattern recognition and data analysis. Techniques include the Optimum-Path Forest (OPF), Artificial Neural Network using Multilayer Perceptrons (ANN-MLP), Support Vector Machines (SVM) and a Bayesian Classifier (BC). Field test results conducted on offshore floating vessels are presented. Results show the robustness of the proposed system, which can be also integrated with other data to improve the efficiency of drilling problems detection. Copyright 2010, IADC/SPE Drilling Conference and Exhibition.
Resumo:
Background: Uterine Leiomyomas (ULs) are the most common benign tumours affecting women of reproductive age. ULs represent a major problem in public health, as they are the main indication for hysterectomy. Approximately 40-50% of ULs have non-random cytogenetic abnormalities, and half of ULs may have copy number alterations (CNAs). Gene expression microarrays studies have demonstrated that cell proliferation genes act in response to growth factors and steroids. However, only a few genes mapping to CNAs regions were found to be associated with ULs. Methodology: We applied an integrative analysis using genomic and transcriptomic data to identify the pathways and molecular markers associated with ULs. Fifty-one fresh frozen specimens were evaluated by array CGH (JISTIC) and gene expression microarrays (SAM). The CONEXIC algorithm was applied to integrate the data. Principal Findings: The integrated analysis identified the top 30 significant genes (P<0.01), which comprised genes associated with cancer, whereas the protein-protein interaction analysis indicated a strong association between FANCA and BRCA1. Functional in silico analysis revealed target molecules for drugs involved in cell proliferation, including FGFR1 and IGFBP5. Transcriptional and protein analyses showed that FGFR1 (P = 0.006 and P<0.01, respectively) and IGFBP5 (P = 0.0002 and P = 0.006, respectively) were up-regulated in the tumours when compared with the adjacent normal myometrium. Conclusions: The integrative genomic and transcriptomic approach indicated that FGFR1 and IGFBP5 amplification, as well as the consequent up-regulation of the protein products, plays an important role in the aetiology of ULs and thus provides data for potential drug therapies development to target genes associated with cellular proliferation in ULs. © 2013 Cirilo et al.
Resumo:
An important tool for the heart disease diagnosis is the analysis of electrocardiogram (ECG) signals, since the non-invasive nature and simplicity of the ECG exam. According to the application, ECG data analysis consists of steps such as preprocessing, segmentation, feature extraction and classification aiming to detect cardiac arrhythmias (i.e.; cardiac rhythm abnormalities). Aiming to made a fast and accurate cardiac arrhythmia signal classification process, we apply and analyze a recent and robust supervised graph-based pattern recognition technique, the optimum-path forest (OPF) classifier. To the best of our knowledge, it is the first time that OPF classifier is used to the ECG heartbeat signal classification task. We then compare the performance (in terms of training and testing time, accuracy, specificity, and sensitivity) of the OPF classifier to the ones of other three well-known expert system classifiers, i.e.; support vector machine (SVM), Bayesian and multilayer artificial neural network (MLP), using features extracted from six main approaches considered in literature for ECG arrhythmia analysis. In our experiments, we use the MIT-BIH Arrhythmia Database and the evaluation protocol recommended by The Association for the Advancement of Medical Instrumentation. A discussion on the obtained results shows that OPF classifier presents a robust performance, i.e.; there is no need for parameter setup, as well as a high accuracy at an extremely low computational cost. Moreover, in average, the OPF classifier yielded greater performance than the MLP and SVM classifiers in terms of classification time and accuracy, and to produce quite similar performance to the Bayesian classifier, showing to be a promising technique for ECG signal analysis. © 2012 Elsevier Ltd. All rights reserved.
Resumo:
The aim of this note is to describe preliminary results on assessment of land use by cattle, obtained in a pilot study using Geographic Information System (GIS). The research was carried out on a semi-natural pasture in Sweden, where the geographic positions of one cow were recorded during 25 consecutive days during summer. The cow, wearing a GPS collar, was integrated in a herd of 53 Hereford cattle. Each location point registered for the animal was considered as a sampling unit (N=3,097). The spatial distribution of ground declivity, water sources, cattle tracks, and classes of woody vegetation cover (forest, grassland with trees and open grassland) were recorded. The storage, processing and data analysis were carried out using the Idrisi and GS+ softwares. Three occupation zones were identified in function of the variation in the space used by the animal, which were occupied in a cyclical pattern; with the animal moving from one zone to another in cycles of five days. It was also clear that the cattle distribution in the area was neither random nor uniform, and it was affected by environmental characteristics that act as conditioners on its distribution. These preliminary results suggest that definition of zones of occupation and the environmental conditioners are promising tools to understand the land use by cattle