986 resultados para R-Statistical computing


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Microarray allow to monitoring simultaneously thousands of genes, where the abundance of the transcripts under a same experimental condition at the same time can be quantified. Among various available array technologies, double channel cDNA microarray experiments have arisen in numerous technical protocols associated to genomic studies, which is the focus of this work. Microarray experiments involve many steps and each one can affect the quality of raw data. Background correction and normalization are preprocessing techniques to clean and correct the raw data when undesirable fluctuations arise from technical factors. Several recent studies showed that there is no preprocessing strategy that outperforms others in all circumstances and thus it seems difficult to provide general recommendations. In this work, it is proposed to use exploratory techniques to visualize the effects of preprocessing methods on statistical analysis of cancer two-channel microarray data sets, where the cancer types (classes) are known. For selecting differential expressed genes the arrow plot was used and the graph of profiles resultant from the correspondence analysis for visualizing the results. It was used 6 background methods and 6 normalization methods, performing 36 pre-processing methods and it was analyzed in a published cDNA microarray database (Liver) available at http://genome-www5.stanford.edu/ which microarrays were already classified by cancer type. All statistical analyses were performed using the R statistical software.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

INTRODUCTION: We describe the epidemiology of intestinal parasites in patients from an AIDS reference service in Northeastern São Paulo, Brazil. METHODS: Retrospective evaluation was done for all HIV-1/AIDS-positive patients whose Hospital de Base/São José do Rio Preto laboratorial analysis was positive for enteroparasites after diagnosis of HIV-1 infection, from January 1998 to December 2008. Statistical analysis was performed using the R statistical software version 2.4.1. The level of significance adopted was 5%. RESULTS: The most frequent protozoan was Isospora belli (4.2%), followed by Giardia lamblia (3.5%), Entamoeba coli (2.8%), and Cryptosporidium parvum (0.3%). Ancylostoma duodenale (1.4%) was the most frequently detected helminth, while Taenia saginata and Strongiloides stercoralis were found in 0.7% of the samples. The results showed that diarrhea was significantly associated with giardiasis and isosporiasis. However, no association was observed between CD4+ cell counts, viral load, and the characteristics of any particular parasite. CONCLUSIONS: Our data may be useful for further comparisons with other Brazilian regions and other developing countries. The data may also provide important clues toward improving the understanding, prevention, and control of enteric parasites around the world.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dissertação de mestrado Engenharia e Gestão da Qualidade

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Projecte de recerca elaborat a partir d’una estada a la Universitat de Bonn, Alemanya, entre agost i desembre del 2008. Recentement, arran de la creació del Registre de Càncer de Catalunya, s'ha el.laborat un nou "estat de la qüestió" del càncer a Catalunya, que ha permès obtenir una imatge complerta de la incidència, mortalitat i supervivència del càncer a Catalunya, a partir de les dades obtingudes pels registres poblacionals del càncer de Girona i Tarragona pel que fa a la incidència del càncer, i pel registre de Mortalitat de Catalunya, pel que fa a la mortalitat per càncer. El projecte realitzat ha tingut dos objectius principals. En primer lloc, desenvolupar un conjunt integrat de funcions per al càlcul automatitzat de la incidència, mortalitat i supervivència, així com l'ajust dels models estadístics que permeten avaluar les tendències i obtenir les projeccions del càncer pels anys futurs. En segon lloc, s'han aplicat les funcions a les dades disponibles i s'han obtingut els resultats a Catalunya, que inclou les projeccions de la incìdència i mortalitat per càncer a Catalunya fins a l'any 2020. Tos dos objectius han estat substancialment assolits. Pel que fa al primer, s'ha desenvolupat un fitxer font en R que conté les macros i funcions utilitzades. Pel que fa al segon, les anàlisis realitzades han estat emprades per a la realització d'una monografia sobre el càncer a Catalunya, que actualment està acceptada per la seva publicació. Els resultats mostren que la incidència per càncer ha augmentat i està previst que així continuï, tot i que es preveu un esmoertiment de l'augment pels homes. Pel que fa a la mortalitat s'observa un recent decrement que es preveu que es mantingui en el futur, essent aquest major pels homes respecte les dones.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

El principal objectiu del projecte era desenvolupar millores conceptuals i metodològiques que permetessin una millor predicció dels canvis en la distribució de les espècies (a una escala de paisatge) derivats de canvis ambientals en un context dominat per pertorbacions. En un primer estudi, vàrem comparar l'eficàcia de diferents models dinàmics per a predir la distribució de l'hortolà (Emberiza hortulana). Els nostres resultats indiquen que un model híbrid que combini canvis en la qualitat de l'hàbitat, derivats de canvis en el paisatge, amb un model poblacional espacialment explícit és una aproximació adequada per abordar canvis en la distribució d'espècies en contextos de dinàmica ambiental elevada i una capacitat de dispersió limitada de l'espècie objectiu. En un segon estudi abordarem la calibració mitjançant dades de seguiment de models de distribució dinàmics per a 12 espècies amb preferència per hàbitats oberts. Entre les conclusions extretes destaquem: (1) la necessitat de que les dades de seguiment abarquin aquelles àrees on es produeixen els canvis de qualitat; (2) el biaix que es produeix en la estimació dels paràmetres del model d'ocupació quan la hipòtesi de canvi de paisatge o el model de qualitat d'hàbitat són incorrectes. En el darrer treball estudiarem el possible impacte en 67 espècies d’ocells de diferents règims d’incendis, definits a partir de combinacions de nivells de canvi climàtic (portant a un augment esperat de la mida i freqüència d’incendis forestals), i eficiència d’extinció per part dels bombers. Segons els resultats dels nostres models, la combinació de factors antropogènics del regim d’incendis, tals com l’abandonament rural i l’extinció, poden ser més determinants per als canvis de distribució que els efectes derivats del canvi climàtic. Els productes generats inclouen tres publicacions científiques, una pàgina web amb resultats del projecte i una llibreria per a l'entorn estadístic R.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The pipeline for macro- and microarray analyses (PMmA) is a set of scripts with a web interface developed to analyze DNA array data generated by array image quantification software. PMmA is designed for use with single- or double-color array data and to work as a pipeline in five classes (data format, normalization, data analysis, clustering, and array maps). It can also be used as a plugin in the BioArray Software Environment, an open-source database for array analysis, or used in a local version of the web service. All scripts in PMmA were developed in the PERL programming language and statistical analysis functions were implemented in the R statistical language. Consequently, our package is a platform-independent software. Our algorithms can correctly select almost 90% of the differentially expressed genes, showing a superior performance compared to other methods of analysis. The pipeline software has been applied to 1536 expressed sequence tags macroarray public data of sugarcane exposed to cold for 3 to 48 h. PMmA identified thirty cold-responsive genes previously unidentified in this public dataset. Fourteen genes were up-regulated, two had a variable expression and the other fourteen were down-regulated in the treatments. These new findings certainly were a consequence of using a superior statistical analysis approach, since the original study did not take into account the dependence of data variability on the average signal intensity of each gene. The web interface, supplementary information, and the package source code are available, free, to non-commercial users at http://ipe.cbmeg.unicamp.br/pub/PMmA.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis describes an ancillary project to the Early Diagnosis of Mesothelioma and Lung Cancer in Prior Asbestos Workers study and was conducted to determine the effects of asbestos exposure, pulmonary function and cigarette smoking in the prediction of pulmonary fibrosis. 613 workers who were occupationally exposed to asbestos for an average of 25.9 (SD=14.69) years were sampled from Sarnia, Ontario. A structured questionnaire was administered during a face-to-face interview along with a low-dose computed tomography (LDCT) of the thorax. Of them, 65 workers (10.7%, 95%CI 8.12—12.24) had LDCT-detected pulmonary fibrosis. The model predicting fibrosis included the variables age, smoking (dichotomized), post FVC % splines and post- FEV1% splines. This model had a receiver operator characteristic area under the curve of 0.738. The calibration of the model was evaluated with R statistical program and the bootstrap optimism-corrected calibration slope was 0.692. Thus, our model demonstrated moderate predictive performance.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Les analyses effectuées dans le cadre de ce mémoire ont été réalisées à l'aide du module MatchIt disponible sous l’environnent d'analyse statistique R. / Statistical analyzes of this thesis were performed using the MatchIt package available in the statistical analysis environment R.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

En este trabajo se describe la solución ideada para la implantación de un Sistema de Información Geográfica que debe dar servicio al Instituto Universitario del Agua y del Medio Ambiente de la Universidad de Murcia y al Instituto Euromediterráneo del Agua. Dada la naturaleza de ambas instituciones, se trata de una herramienta orientada fundamentalmente al estudio de recursos hídricos y procesos hidrológicos. El proceso se inició con una identificación de las necesidades de los usuarios (con perfiles y requerimiento diferentes) y el posterior desarrollo del diseño conceptual que pudiera asegurar la satisfacción de estas necesidades. Debido a que los requerimientos de los usuarios así lo demandaban, se ha tenido en cuenta tanto a usuarios que trabajan en entorno linux como a otros que lo hacen en entorno windows. Se ha optado por un sistema basado en software libre utilizando GRASS para el manejo de información raster y modelización; postgis (sobre postgreSQL) y GRASS para la gestión de información vectorial; y QGIS, gvSIG y Kosmo como interfaces gráficas de usuario. Otros programas utilizados para propósitos específicos han sido R, Mapserver o GMT

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Adaptive methods which “equidistribute” a given positive weight function are now used fairly widely for selecting discrete meshes. The disadvantage of such schemes is that the resulting mesh may not be smoothly varying. In this paper a technique is developed for equidistributing a function subject to constraints on the ratios of adjacent steps in the mesh. Given a weight function $f \geqq 0$ on an interval $[a,b]$ and constants $c$ and $K$, the method produces a mesh with points $x_0 = a,x_{j + 1} = x_j + h_j ,j = 0,1, \cdots ,n - 1$ and $x_n = b$ such that\[ \int_{xj}^{x_{j + 1} } {f \leqq c\quad {\text{and}}\quad \frac{1} {K}} \leqq \frac{{h_{j + 1} }} {{h_j }} \leqq K\quad {\text{for}}\, j = 0,1, \cdots ,n - 1 . \] A theoretical analysis of the procedure is presented, and numerical algorithms for implementing the method are given. Examples show that the procedure is effective in practice. Other types of constraints on equidistributing meshes are also discussed. The principal application of the procedure is to the solution of boundary value problems, where the weight function is generally some error indicator, and accuracy and convergence properties may depend on the smoothness of the mesh. Other practical applications include the regrading of statistical data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Methods for producing nonuniform transformations, or regradings, of discrete data are discussed. The transformations are useful in image processing, principally for enhancement and normalization of scenes. Regradings which “equidistribute” the histogram of the data, that is, which transform it into a constant function, are determined. Techniques for smoothing the regrading, dependent upon a continuously variable parameter, are presented. Generalized methods for constructing regradings such that the histogram of the data is transformed into any prescribed function are also discussed. Numerical algorithms for implementing the procedures and applications to specific examples are described.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Produced water is characterized as one of the most common wastes generated during exploration and production of oil. This work aims to develop methodologies based on comparative statistical processes of hydrogeochemical analysis of production zones in order to minimize types of high-cost interventions to perform identification test fluids - TIF. For the study, 27 samples were collected from five different production zones were measured a total of 50 chemical species. After the chemical analysis was applied the statistical data, using the R Statistical Software, version 2.11.1. Statistical analysis was performed in three steps. In the first stage, the objective was to investigate the behavior of chemical species under study in each area of production through the descriptive graphical analysis. The second step was to identify a function that classify production zones from each sample, using discriminant analysis. In the training stage, the rate of correct classification function of discriminant analysis was 85.19%. The next stage of processing of the data used for Principal Component Analysis, by reducing the number of variables obtained from the linear combination of chemical species, try to improve the discriminant function obtained in the second stage and increase the discrimination power of the data, but the result was not satisfactory. In Profile Analysis curves were obtained for each production area, based on the characteristics of the chemical species present in each zone. With this study it was possible to develop a method using hydrochemistry and statistical analysis that can be used to distinguish the water produced in mature fields of oil, so that it is possible to identify the zone of production that is contributing to the excessive elevation of the water volume.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The study of variability becomes increasingly important nowadays . Studying the behavior of rainfall before external events is of paramount importance. The region of Vale do Paraíba , it is important to study variability , since the region is influenced by the ocean and constant cold fronts that end causing precipitation during most months of the year . This study aims to analyze the variability in rain UGRHI - 2 by analyzing the interference of ENSO events / Southern Oscillation and the Convergence Zone South Atlantic (SACZ) in the amount and distribution of rainfall. The UGRHI helped were created for distribution and control of water in the state of São Paulo , divided watersheds were avoided so that problems such as poor distribution and water shortages in some areas of the state . To study variability , various software , including Excel , Variowin and R statistical package , the subroutine Climatol were used , with the goal of developing isolines showing the spatial distribution of rainfall anomalies in the years studied also the anomaly index was studied rain (IAC) , noting more effectively the years of positive and negative anomaly , with the purpose of studying the temporal variability of rainfall in the study area.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This article gives an overview over the methods used in the low--level analysis of gene expression data generated using DNA microarrays. This type of experiment allows to determine relative levels of nucleic acid abundance in a set of tissues or cell populations for thousands of transcripts or loci simultaneously. Careful statistical design and analysis are essential to improve the efficiency and reliability of microarray experiments throughout the data acquisition and analysis process. This includes the design of probes, the experimental design, the image analysis of microarray scanned images, the normalization of fluorescence intensities, the assessment of the quality of microarray data and incorporation of quality information in subsequent analyses, the combination of information across arrays and across sets of experiments, the discovery and recognition of patterns in expression at the single gene and multiple gene levels, and the assessment of significance of these findings, considering the fact that there is a lot of noise and thus random features in the data. For all of these components, access to a flexible and efficient statistical computing environment is an essential aspect.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

DNA extraction was carried out as described on the MICROBIS project pages (http://icomm.mbl.edu/microbis ) using a commercially available extraction kit. We amplified the hypervariable regions V4-V6 of archaeal and bacterial 16S rRNA genes using PCR and several sets of forward and reverse primers (http://vamps.mbl.edu/resources/primers.php). Massively parallel tag sequencing of the PCR products was carried out on a 454 Life Sciences GS FLX sequencer at Marine Biological Laboratory, Woods Hole, MA, following the same experimental conditions for all samples. Sequence reads were submitted to a rigorous quality control procedure based on mothur v30 (doi:10.1128/AEM.01541-09) including denoising of the flow grams using an algorithm based on PyroNoise (doi:10.1038/nmeth.1361), removal of PCR errors and a chimera check using uchime (doi:10.1093/bioinformatics/btr381). The reads were taxonomically assigned according to the SILVA taxonomy (SSURef v119, 07-2014; doi:10.1093/nar/gks1219) implemented in mothur and clustered at 98% ribosomal RNA gene V4-V6 sequence identity. V4-V6 amplicon sequence abundance tables were standardized to account for unequal sampling effort using 1000 (Archaea) and 2300 (Bacteria) randomly chosen sequences without replacement using mothur and then used to calculate inverse Simpson diversity indices and Chao1 richness (doi:10.2307/4615964). Bray-Curtis dissimilarities (doi:10.2307/1942268) between all samples were calculated and used for 2-dimensional non metric multidimensional scaling (NMDS) ordinations with 20 random starts (doi:10.1007/BF02289694). Stress values below 0.2 indicated that the multidimensional dataset was well represented by the 2D ordination. NMDS ordinations were compared and tested using Procrustes correlation analysis (doi:10.1007/BF02291478). All analyses were carried out with the R statistical environment and the packages vegan (available at: http://cran.r-project.org/package=vegan), labdsv (available at: http://cran.r-project.org/package=labdsv), as well as with custom R scripts. Operational taxonomic units at 98% sequence identity (OTU0.03) that occurred only once in the whole dataset were termed absolute single sequence OTUs (SSOabs; doi:10.1038/ismej.2011.132). OTU0.03 sequences that occurred only once in at least one sample, but may occur more often in other samples were termed relative single sequence OTUs (SSOrel). SSOrel are particularly interesting for community ecology, since they comprise rare organisms that might become abundant when conditions change.16S rRNA amplicons and metagenomic reads have been stored in the sequence read archive under SRA project accession number SRP042162.