987 resultados para R software


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The MIGCLIM R package is a function library for the open source R software that enables the implementation of species-specific dispersal constraints into projections of species distribution models under environmental change and/or landscape fragmentation scenarios. The model is based on a cellular automaton and the basic modeling unit is a cell that is inhabited or not. Model parameters include dispersal distance and kernel, long distance dispersal, barriers to dispersal, propagule production potential and habitat invasibility. The MIGCLIM R package has been designed to be highly flexible in the parameter values it accepts, and to offer good compatibility with existing species distribution modeling software. Possible applications include the projection of future species distributions under environmental change conditions and modeling the spread of invasive species.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A methodology for downscaling solar irradiation from satellite-derived databases is described using R software. Different packages such as raster, parallel, solaR, gstat, sp and rasterVis are considered in this study for improving solar resource estimation in areas with complex topography, in which downscaling is a very useful tool for reducing inherent deviations in satellite-derived irradiation databases, which lack of high global spatial resolution. A topographical analysis of horizon blocking and sky-view is developed with a digital elevation model to determine what fraction of hourly solar irradiation reaches the Earth's surface. Eventually, kriging with external drift is applied for a better estimation of solar irradiation throughout the region analyzed. This methodology has been implemented as an example within the region of La Rioja in northern Spain, and the mean absolute error found is a striking 25.5% lower than with the original database.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: A common task in analyzing microarray data is to determine which genes are differentially expressed across two (or more) kind of tissue samples or samples submitted under experimental conditions. Several statistical methods have been proposed to accomplish this goal, generally based on measures of distance between classes. It is well known that biological samples are heterogeneous because of factors such as molecular subtypes or genetic background that are often unknown to the experimenter. For instance, in experiments which involve molecular classification of tumors it is important to identify significant subtypes of cancer. Bimodal or multimodal distributions often reflect the presence of subsamples mixtures. Consequently, there can be genes differentially expressed on sample subgroups which are missed if usual statistical approaches are used. In this paper we propose a new graphical tool which not only identifies genes with up and down regulations, but also genes with differential expression in different subclasses, that are usually missed if current statistical methods are used. This tool is based on two measures of distance between samples, namely the overlapping coefficient (OVL) between two densities and the area under the receiver operating characteristic (ROC) curve. The methodology proposed here was implemented in the open-source R software. Results: This method was applied to a publicly available dataset, as well as to a simulated dataset. We compared our results with the ones obtained using some of the standard methods for detecting differentially expressed genes, namely Welch t-statistic, fold change (FC), rank products (RP), average difference (AD), weighted average difference (WAD), moderated t-statistic (modT), intensity-based moderated t-statistic (ibmT), significance analysis of microarrays (samT) and area under the ROC curve (AUC). On both datasets all differentially expressed genes with bimodal or multimodal distributions were not selected by all standard selection procedures. We also compared our results with (i) area between ROC curve and rising area (ABCR) and (ii) the test for not proper ROC curves (TNRC). We found our methodology more comprehensive, because it detects both bimodal and multimodal distributions and different variances can be considered on both samples. Another advantage of our method is that we can analyze graphically the behavior of different kinds of differentially expressed genes. Conclusion: Our results indicate that the arrow plot represents a new flexible and useful tool for the analysis of gene expression profiles from microarrays.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies

Relevância:

60.00% 60.00%

Publicador:

Resumo:

INTRODUCTION: Forecasting dengue cases in a population by using time-series models can provide useful information that can be used to facilitate the planning of public health interventions. The objective of this article was to develop a forecasting model for dengue incidence in Campinas, southeast Brazil, considering the Box-Jenkins modeling approach. METHODS: The forecasting model for dengue incidence was performed with R software using the seasonal autoregressive integrated moving average (SARIMA) model. We fitted a model based on the reported monthly incidence of dengue from 1998 to 2008, and we validated the model using the data collected between January and December of 2009. RESULTS: SARIMA (2,1,2) (1,1,1)12 was the model with the best fit for data. This model indicated that the number of dengue cases in a given month can be estimated by the number of dengue cases occurring one, two and twelve months prior. The predicted values for 2009 are relatively close to the observed values. CONCLUSIONS: The results of this article indicate that SARIMA models are useful tools for monitoring dengue incidence. We also observe that the SARIMA model is capable of representing with relative precision the number of cases in a next year.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This data article is referred to the research article entitled The role of ascorbate peroxidase, guaiacol peroxidase, and polysaccharides in cassava (Manihot esculenta Crantz) roots under postharvest physiological deterioration by Uarrota et al. (2015). Food Chemistry 197, Part A, 737746. The stress duo to PPD of cassava roots leads to the formation of ROS which are extremely harmful and accelerates cassava spoiling. To prevent or alleviate injuries from ROS, plants have evolved antioxidant systems that include non-enzymatic and enzymatic defence systems such as ascorbate peroxidase, guaiacol peroxidase and polysaccharides. In this data article can be found a dataset called newdata, in RData format, with 60 observations and 06 variables. The first 02 variables (Samples and Cultivars) and the last 04, spectrophotometric data of ascorbate peroxidase, guaiacol peroxidase, tocopherol, total proteins and arcsined data of cassava PPD scoring. For further interpretation and analysis in R software, a report is also provided. Means of all variables and standard deviations are also provided in the Supplementary tables (data.long3.RData, data.long4.RData and meansEnzymes.RData), raw data of PPD scoring without transformation (PPDmeans.RData) and days of storage (days.RData) are also provided for data analysis reproducibility in R software.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Report for the scientific sojourn at the University of Reading, United Kingdom, from January until May 2008. The main objectives have been firstly to infer population structure and parameters in demographic models using a total of 13 microsatellite loci for genotyping approximately 30 individuals per population in 10 Palinurus elephas populations both from Mediterranean and Atlantic waters. Secondly, developing statistical methods to identify discrepant loci, possibly under selection and implement those methods using the R software environment. It is important to consider that the calculation of the probability distribution of the demographic and mutational parameters for a full genetic data set is numerically difficult for complex demographic history (Stephens 2003). The Approximate Bayesian Computation (ABC), based on summary statistics to infer posterior distributions of variable parameters without explicit likelihood calculations, can surmount this difficulty. This would allow to gather information on different demographic prior values (i.e. effective population sizes, migration rate, microsatellite mutation rate, mutational processes) and assay the sensitivity of inferences to demographic priors by assuming different priors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Hardy-Weinberg law, formulated about 100 years ago, states that under certainassumptions, the three genotypes AA, AB and BB at a bi-allelic locus are expected to occur inthe proportions p2, 2pq, and q2 respectively, where p is the allele frequency of A, and q = 1-p.There are many statistical tests being used to check whether empirical marker data obeys theHardy-Weinberg principle. Among these are the classical xi-square test (with or withoutcontinuity correction), the likelihood ratio test, Fisher's Exact test, and exact tests in combinationwith Monte Carlo and Markov Chain algorithms. Tests for Hardy-Weinberg equilibrium (HWE)are numerical in nature, requiring the computation of a test statistic and a p-value.There is however, ample space for the use of graphics in HWE tests, in particular for the ternaryplot. Nowadays, many genetical studies are using genetical markers known as SingleNucleotide Polymorphisms (SNPs). SNP data comes in the form of counts, but from the countsone typically computes genotype frequencies and allele frequencies. These frequencies satisfythe unit-sum constraint, and their analysis therefore falls within the realm of compositional dataanalysis (Aitchison, 1986). SNPs are usually bi-allelic, which implies that the genotypefrequencies can be adequately represented in a ternary plot. Compositions that are in exactHWE describe a parabola in the ternary plot. Compositions for which HWE cannot be rejected ina statistical test are typically “close" to the parabola, whereas compositions that differsignificantly from HWE are “far". By rewriting the statistics used to test for HWE in terms ofheterozygote frequencies, acceptance regions for HWE can be obtained that can be depicted inthe ternary plot. This way, compositions can be tested for HWE purely on the basis of theirposition in the ternary plot (Graffelman & Morales, 2008). This leads to nice graphicalrepresentations where large numbers of SNPs can be tested for HWE in a single graph. Severalexamples of graphical tests for HWE (implemented in R software), will be shown, using SNPdata from different human populations

Relevância:

60.00% 60.00%

Publicador:

Resumo:

There is a demonstrable association between exposure to air pollutants and deaths due to cardiovascular diseases. The objective of this study was to estimate the effects of exposure to sulfur dioxide on mortality due to circulatory diseases in individuals 50 years of age or older residing in São José dos Campos, SP. This was a time-series ecological study for the years 2003 to 2007 using information on deaths due to circulatory disease obtained from Datasus reports. Data on daily levels of pollutants, particulate matter, sulfur dioxide (SO2), ozone, temperature, and humidity were obtained from the São Paulo State Environmental Agency. Moving average models for 2 to 7 days were calculated by Poisson regression using the R software. Exposure to SO2 was analyzed using a unipollutant, bipollutant or multipollutant model adjusted for mean temperature and humidity. The relative risks with 95%CI were obtained and the percent decrease in risk was calculated. There were 1928 deaths with a daily mean (± SD) of 1.05 ± 1.03 (range: 0-6). Exposure to SO2 was significantly associated with mortality due to circulatory disease: RR = 1.04 (95%CI = 1.01 to 1.06) in the 7-day moving average, after adjusting for ozone. There was an 8.5% decrease in risk in the multipollutant model, proportional to a decrease of SO2 concentrations. The results of this study suggest that residents of medium-sized Brazilian cities with characteristics similar to those of São José dos Campos probably have health problems due to exposure to air pollutants.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Hardy-Weinberg law, formulated about 100 years ago, states that under certain assumptions, the three genotypes AA, AB and BB at a bi-allelic locus are expected to occur in the proportions p2, 2pq, and q2 respectively, where p is the allele frequency of A, and q = 1-p. There are many statistical tests being used to check whether empirical marker data obeys the Hardy-Weinberg principle. Among these are the classical xi-square test (with or without continuity correction), the likelihood ratio test, Fisher's Exact test, and exact tests in combination with Monte Carlo and Markov Chain algorithms. Tests for Hardy-Weinberg equilibrium (HWE) are numerical in nature, requiring the computation of a test statistic and a p-value. There is however, ample space for the use of graphics in HWE tests, in particular for the ternary plot. Nowadays, many genetical studies are using genetical markers known as Single Nucleotide Polymorphisms (SNPs). SNP data comes in the form of counts, but from the counts one typically computes genotype frequencies and allele frequencies. These frequencies satisfy the unit-sum constraint, and their analysis therefore falls within the realm of compositional data analysis (Aitchison, 1986). SNPs are usually bi-allelic, which implies that the genotype frequencies can be adequately represented in a ternary plot. Compositions that are in exact HWE describe a parabola in the ternary plot. Compositions for which HWE cannot be rejected in a statistical test are typically “close" to the parabola, whereas compositions that differ significantly from HWE are “far". By rewriting the statistics used to test for HWE in terms of heterozygote frequencies, acceptance regions for HWE can be obtained that can be depicted in the ternary plot. This way, compositions can be tested for HWE purely on the basis of their position in the ternary plot (Graffelman & Morales, 2008). This leads to nice graphical representations where large numbers of SNPs can be tested for HWE in a single graph. Several examples of graphical tests for HWE (implemented in R software), will be shown, using SNP data from different human populations

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Existing distributed hydrologic models are complex and computationally demanding for using as a rapid-forecasting policy-decision tool, or even as a class-room educational tool. In addition, platform dependence, specific input/output data structures and non-dynamic data-interaction with pluggable software components inside the existing proprietary frameworks make these models restrictive only to the specialized user groups. RWater is a web-based hydrologic analysis and modeling framework that utilizes the commonly used R software within the HUBzero cyber infrastructure of Purdue University. RWater is designed as an integrated framework for distributed hydrologic simulation, along with subsequent parameter optimization and visualization schemes. RWater provides platform independent web-based interface, flexible data integration capacity, grid-based simulations, and user-extensibility. RWater uses RStudio to simulate hydrologic processes on raster based data obtained through conventional GIS pre-processing. The program integrates Shuffled Complex Evolution (SCE) algorithm for parameter optimization. Moreover, RWater enables users to produce different descriptive statistics and visualization of the outputs at different temporal resolutions. The applicability of RWater will be demonstrated by application on two watersheds in Indiana for multiple rainfall events.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Os impactos das variações climáticas tem sido um tema amplamente pesquisado na macroeconomia mundial e também em setores como agricultura, energia e seguros. Já para o setor de varejo, uma busca nos principais periódicos brasileiros não retornou nenhum estudo específico. Em economias mais desenvolvidas produtos de seguros atrelados ao clima são amplamente negociados e através deste trabalho visamos também avaliar a possibilidade de desenvolvimento deste mercado no Brasil. O presente trabalho buscou avaliar os impactos das variações climáticas nas vendas do varejo durante período de aproximadamente 18 meses (564 dias) para 253 cidades brasileiras. As informações de variações climáticas (precipitação, temperatura, velocidade do vento, umidade relativa, insolação e pressão atmosférica) foram obtidas através do INMET (Instituto Nacional de Meteorologia) e cruzadas com as informações transacionais de até 206 mil clientes ativos de uma amostra não balanceada, oriundos de uma instituição financeira do ramo de cartões de crédito. Ambas as bases possuem periodicidade diária. A metodologia utilizada para o modelo econométrico foram os dados de painel com efeito fixo para avaliação de dados longitudinais através dos softwares de estatística / econometria EViews (software proprietário da IHS) e R (software livre). A hipótese nula testada foi de que o clima influencia nas decisões de compra dos clientes no curto prazo, hipótese esta provada pelas análises realizadas. Assumindo que o comportamento do consumidor do varejo não muda devido à seleção do meio de pagamento, ao chover as vendas do varejo em moeda local são impactadas negativamente. A explicação está na redução da quantidade total de transações e não o valor médio das transações. Ao excluir da base as cidades de São Paulo e Rio de Janeiro não houve alteração na significância e relevância dos resultados. Por outro lado, a chuva possui efeito de substituição entre as vendas online e offline. Quando analisado setores econômicos para observar se há comportamento diferenciado entre consumo e compras não observou-se alteração nos resultados. Ao incluirmos variáveis demográficas, concluímos que as mulheres e pessoas com maior faixa de idade apresentam maior histórico de compras. Ao avaliar o impacto da chuva em um determinado dia e seu impacto nos próximos 6 à 29 dias observamos que é significante para a quantidade de transações porém o impacto no volume de vendas não foi significante.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The consumption of energy on the planet is currently based on fossil fuels. They are responsible for adverse effects on the environment. Renewables propose solutions for this scenario, but must face issues related to the capacity of the power supply. Wind energy offshore emerging as a promising alternative. The speed and stability are greater winds over oceans, but the variability of these may cause inconvenience to the generation of electric power fluctuations. To reduce this, a combination of wind farms geographically distributed was proposed. The greater the distance between them, the lower the correlation between the wind velocity, increasing the likelihood that together achieve more stable power system with less fluctuations in power generation. The efficient use of production capacity of the wind park however, depends on their distribution in marine environments. The objective of this research was to analyze the optimal allocation of wind farms offshore on the east coast of the U.S. by Modern Portfolio Theory. The Modern Portfolio Theory was used so that the process of building portfolios of wind energy offshore contemplate the particularity of intermittency of wind, through calculations of return and risk of the production of wind farms. The research was conducted with 25.934 observations of energy produced by wind farms 11 hypothetical offshore, from the installation of 01 simulated ocean turbine with a capacity of 5 MW. The data show hourly time resolution and covers the period between January 1, 1998 until December 31, 2002. Through the Matlab R software, six were calculated minimum variance portfolios, each for a period of time distinct. Given the inequality of the variability of wind over time, set up four strategies rebalancing to evaluate the performance of the related portfolios, which enabled us to identify the most beneficial to the stability of the wind energy production offshore. The results showed that the production of wind energy for 1998, 1999, 2000 and 2001 should be considered by the portfolio weights calculated for the same periods, respectively. Energy data for 2002 should use the weights derived from the portfolio calculated in the previous time period. Finally, the production of wind energy in the period 1998-2002 should also be weighted by 1/11. It follows therefore that the portfolios found failed to show reduced levels of variability when compared to the individual production of wind farms hypothetical offshore

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The objective of this study was to evaluate the use of probit and logit link functions for the genetic evaluation of early pregnancy using simulated data. The following simulation/analysis structures were constructed: logit/logit, logit/probit, probit/logit, and probit/probit. The percentages of precocious females were 5, 10, 15, 20, 25 and 30% and were adjusted based on a change in the mean of the latent variable. The parametric heritability (h²) was 0.40. Simulation and genetic evaluation were implemented in the R software. Heritability estimates (ĥ²) were compared with h² using the mean squared error. Pearson correlations between predicted and true breeding values and the percentage of coincidence between true and predicted ranking, considering the 10% of bulls with the highest breeding values (TOP10) were calculated. The mean ĥ² values were under- and overestimated for all percentages of precocious females when logit/probit and probit/logit models used. In addition, the mean squared errors of these models were high when compared with those obtained with the probit/probit and logit/logit models. Considering ĥ², probit/probit and logit/logit were also superior to logit/probit and probit/logit, providing values close to the parametric heritability. Logit/probit and probit/logit presented low Pearson correlations, whereas the correlations obtained with probit/probit and logit/logit ranged from moderate to high. With respect to the TOP10 bulls, logit/probit and probit/logit presented much lower percentages than probit/probit and logit/logit. The genetic parameter estimates and predictions of breeding values of the animals obtained with the logit/logit and probit/probit models were similar. In contrast, the results obtained with probit/logit and logit/probit were not satisfactory. There is need to compare the estimation and prediction ability of logit and probit link functions.