993 resultados para Statistical Error


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Among the largest resources for biological sequence data is the large amount of expressed sequence tags (ESTs) available in public and proprietary databases. ESTs provide information on transcripts but for technical reasons they often contain sequencing errors. Therefore, when analyzing EST sequences computationally, such errors must be taken into account. Earlier attempts to model error prone coding regions have shown good performance in detecting and predicting these while correcting sequencing errors using codon usage frequencies. In the research presented here, we improve the detection of translation start and stop sites by integrating a more complex mRNA model with codon usage bias based error correction into one hidden Markov model (HMM), thus generalizing this error correction approach to more complex HMMs. We show that our method maintains the performance in detecting coding sequences.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Public opinion surveys have become progressively incorporated into systems of official statistics. Surveys of the economic climate are usually qualitative because they collect opinions of businesspeople and/or experts about the long-term indicators described by a number of variables. In such cases the responses are expressed in ordinal numbers, that is, the respondents verbally report, for example, whether during a given trimester the sales or the new orders have increased, decreased or remained the same as in the previous trimester. These data allow to calculate the percent of respondents in the total population (results are extrapolated), who select every one of the three options. Data are often presented in the form of an index calculated as the difference between the percent of those who claim that a given variable has improved in value and of those who claim that it has deteriorated. As in any survey conducted on a sample the question of the measurement of the sample error of the results has to be addressed, since the error influences both the reliability of the results and the calculation of the sample size adequate for a desired confidence interval. The results presented here are based on data from the Survey of the Business Climate (Encuesta de Clima Empresarial) developed through the collaboration of the Statistical Institute of Catalonia (Institut d’Estadística de Catalunya) with the Chambers of Commerce (Cámaras de Comercio) of Sabadell and Terrassa.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper develops methods for Stochastic Search Variable Selection (currently popular with regression and Vector Autoregressive models) for Vector Error Correction models where there are many possible restrictions on the cointegration space. We show how this allows the researcher to begin with a single unrestricted model and either do model selection or model averaging in an automatic and computationally efficient manner. We apply our methods to a large UK macroeconomic model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose a novel empirical extension of the standard market microstructure order flow model. The main idea is that heterogeneity of beliefs in the foreign exchange market can cause model instability and such instability has not been fully accounted for in the existing empirical literature. We investigate this issue using two di¤erent data sets and focusing on out- of-sample forecasts. Forecasting power is measured using standard statistical tests and, additionally, using an alternative approach based on measuring the economic value of forecasts after building a portfolio of assets. We nd there is a substantial economic value on conditioning on the proposed models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

‘Modern’ Phillips curve theories predict inflation is an integrated, or near integrated, process. However, inflation appears bounded above and below in developed economies and so cannot be ‘truly’ integrated and more likely stationary around a shifting mean. If agents believe inflation is integrated as in the ‘modern’ theories then they are making systematic errors concerning the statistical process of inflation. An alternative theory of the Phillips curve is developed that is consistent with the ‘true’ statistical process of inflation. It is demonstrated that United States inflation data is consistent with the alternative theory but not with the existing ‘modern’ theories.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

'Modern' theories of the Phillips curve imply that inflation is an integrated, or near integrated process. This paper explains this implication and why these 'modern' theories are logically inconsistent with what is commonly known about the statistical process of inflation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The usual way to investigate the statistical properties of finitely generated subgroups of free groups, and of finite presentations of groups, is based on the so-called word-based distribution: subgroups are generated (finite presentations are determined) by randomly chosen k-tuples of reduced words, whose maximal length is allowed to tend to infinity. In this paper we adopt a different, though equally natural point of view: we investigate the statistical properties of the same objects, but with respect to the so-called graph-based distribution, recently introduced by Bassino, Nicaud and Weil. Here, subgroups (and finite presentations) are determined by randomly chosen Stallings graphs whose number of vertices tends to infinity. Our results show that these two distributions behave quite differently from each other, shedding a new light on which properties of finitely generated subgroups can be considered frequent or rare. For example, we show that malnormal subgroups of a free group are negligible in the raph-based distribution, while they are exponentially generic in the word-based distribution. Quite surprisingly, a random finite presentation generically presents the trivial group in this new distribution, while in the classical one it is known to generically present an infinite hyperbolic group.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

1. Species distribution modelling is used increasingly in both applied and theoretical research to predict how species are distributed and to understand attributes of species' environmental requirements. In species distribution modelling, various statistical methods are used that combine species occurrence data with environmental spatial data layers to predict the suitability of any site for that species. While the number of data sharing initiatives involving species' occurrences in the scientific community has increased dramatically over the past few years, various data quality and methodological concerns related to using these data for species distribution modelling have not been addressed adequately. 2. We evaluated how uncertainty in georeferences and associated locational error in occurrences influence species distribution modelling using two treatments: (1) a control treatment where models were calibrated with original, accurate data and (2) an error treatment where data were first degraded spatially to simulate locational error. To incorporate error into the coordinates, we moved each coordinate with a random number drawn from the normal distribution with a mean of zero and a standard deviation of 5 km. We evaluated the influence of error on the performance of 10 commonly used distributional modelling techniques applied to 40 species in four distinct geographical regions. 3. Locational error in occurrences reduced model performance in three of these regions; relatively accurate predictions of species distributions were possible for most species, even with degraded occurrences. Two species distribution modelling techniques, boosted regression trees and maximum entropy, were the best performing models in the face of locational errors. The results obtained with boosted regression trees were only slightly degraded by errors in location, and the results obtained with the maximum entropy approach were not affected by such errors. 4. Synthesis and applications. To use the vast array of occurrence data that exists currently for research and management relating to the geographical ranges of species, modellers need to know the influence of locational error on model quality and whether some modelling techniques are particularly robust to error. We show that certain modelling techniques are particularly robust to a moderate level of locational error and that useful predictions of species distributions can be made even when occurrence data include some error.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Restriction site-associated DNA sequencing (RADseq) provides researchers with the ability to record genetic polymorphism across thousands of loci for nonmodel organisms, potentially revolutionizing the field of molecular ecology. However, as with other genotyping methods, RADseq is prone to a number of sources of error that may have consequential effects for population genetic inferences, and these have received only limited attention in terms of the estimation and reporting of genotyping error rates. Here we use individual sample replicates, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome. We then use sample replicates to (i) optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci; and (ii) quantify error rates for loci, alleles and single-nucleotide polymorphisms. As an empirical example, we use a double-digest RAD data set of a nonmodel plant species, Berberis alpina, collected from high-altitude mountains in Mexico.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Coronary artery calcification (CAC) is quantified based on a computed tomography (CT) scan image. A calcified region is identified. Modified expectation maximization (MEM) of a statistical model for the calcified and background material is used to estimate the partial calcium content of the voxels. The algorithm limits the region over which MEM is performed. By using MEM, the statistical properties of the model are iteratively updated based on the calculated resultant calcium distribution from the previous iteration. The estimated statistical properties are used to generate a map of the partial calcium content in the calcified region. The volume of calcium in the calcified region is determined based on the map. The experimental results on a cardiac phantom, scanned 90 times using 15 different protocols, demonstrate that the proposed method is less sensitive to partial volume effect and noise, with average error of 9.5% (standard deviation (SD) of 5-7mm(3)) compared with 67% (SD of 3-20mm(3)) for conventional techniques. The high reproducibility of the proposed method for 35 patients, scanned twice using the same protocol at a minimum interval of 10 min, shows that the method provides 2-3 times lower interscan variation than conventional techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents reflexions about statistical considerations on illicit drug profiling and more specifically about the calculation of threshold for determining of the seizure are linked or not. The specific case of heroin and cocaine profiling is presented with the necessary details on the target profiling variables (major alkaloids) selected and the analytical method used. Statistical approach to compare illicit drug seizures is also presented with the introduction of different scenarios dealing with different data pre-treatment or transformation of variables.The main aim consists to demonstrate the influence of data pre-treatment on the statistical outputs. A thorough study of the evolution of the true positive rate (TP) and the false positive rate (FP) in heroin and cocaine comparison is then proposed to investigate this specific topic and to demonstrate that there is no universal approach available and that the calculations have to be revaluate for each new specific application.