920 resultados para nonparametric inference
Resumo:
Low concentrations of elements in geochemical analyses have the peculiarity of beingcompositional data and, for a given level of significance, are likely to be beyond thecapabilities of laboratories to distinguish between minute concentrations and completeabsence, thus preventing laboratories from reporting extremely low concentrations of theanalyte. Instead, what is reported is the detection limit, which is the minimumconcentration that conclusively differentiates between presence and absence of theelement. A spatially distributed exhaustive sample is employed in this study to generateunbiased sub-samples, which are further censored to observe the effect that differentdetection limits and sample sizes have on the inference of population distributionsstarting from geochemical analyses having specimens below detection limit (nondetects).The isometric logratio transformation is used to convert the compositional data in thesimplex to samples in real space, thus allowing the practitioner to properly borrow fromthe large source of statistical techniques valid only in real space. The bootstrap method isused to numerically investigate the reliability of inferring several distributionalparameters employing different forms of imputation for the censored data. The casestudy illustrates that, in general, best results are obtained when imputations are madeusing the distribution best fitting the readings above detection limit and exposes theproblems of other more widely used practices. When the sample is spatially correlated, itis necessary to combine the bootstrap with stochastic simulation
Resumo:
First: A continuous-time version of Kyle's model (Kyle 1985), known as the Back's model (Back 1992), of asset pricing with asymmetric information, is studied. A larger class of price processes and of noise traders' processes are studied. The price process, as in Kyle's model, is allowed to depend on the path of the market order. The process of the noise traders' is an inhomogeneous Lévy process. Solutions are found by the Hamilton-Jacobi-Bellman equations. With the insider being risk-neutral, the price pressure is constant, and there is no equilibirium in the presence of jumps. If the insider is risk-averse, there is no equilibirium in the presence of either jumps or drifts. Also, it is analised when the release time is unknown. A general relation is established between the problem of finding an equilibrium and of enlargement of filtrations. Random announcement time is random is also considered. In such a case the market is not fully efficient and there exists equilibrium if the sensitivity of prices with respect to the global demand is time decreasing according with the distribution of the random time. Second: Power variations. it is considered, the asymptotic behavior of the power variation of processes of the form _integral_0^t u(s-)dS(s), where S_ is an alpha-stable process with index of stability 0&alpha&2 and the integral is an Itô integral. Stable convergence of corresponding fluctuations is established. These results provide statistical tools to infer the process u from discrete observations. Third: A bond market is studied where short rates r(t) evolve as an integral of g(t-s)sigma(s) with respect to W(ds), where g and sigma are deterministic and W is the stochastic Wiener measure. Processes of this type are particular cases of ambit processes. These processes are in general not of the semimartingale kind.
Resumo:
In the forensic examination of DNA mixtures, the question of how to set the total number of contributors (N) presents a topic of ongoing interest. Part of the discussion gravitates around issues of bias, in particular when assessments of the number of contributors are not made prior to considering the genotypic configuration of potential donors. Further complication may stem from the observation that, in some cases, there may be numbers of contributors that are incompatible with the set of alleles seen in the profile of a mixed crime stain, given the genotype of a potential contributor. In such situations, procedures that take a single and fixed number contributors as their output can lead to inferential impasses. Assessing the number of contributors within a probabilistic framework can help avoiding such complication. Using elements of decision theory, this paper analyses two strategies for inference on the number of contributors. One procedure is deterministic and focuses on the minimum number of contributors required to 'explain' an observed set of alleles. The other procedure is probabilistic using Bayes' theorem and provides a probability distribution for a set of numbers of contributors, based on the set of observed alleles as well as their respective rates of occurrence. The discussion concentrates on mixed stains of varying quality (i.e., different numbers of loci for which genotyping information is available). A so-called qualitative interpretation is pursued since quantitative information such as peak area and height data are not taken into account. The competing procedures are compared using a standard scoring rule that penalizes the degree of divergence between a given agreed value for N, that is the number of contributors, and the actual value taken by N. Using only modest assumptions and a discussion with reference to a casework example, this paper reports on analyses using simulation techniques and graphical models (i.e., Bayesian networks) to point out that setting the number of contributors to a mixed crime stain in probabilistic terms is, for the conditions assumed in this study, preferable to a decision policy that uses categoric assumptions about N.
Resumo:
A method to estimate an extreme quantile that requires no distributional assumptions is presented. The approach is based on transformed kernel estimation of the cumulative distribution function (cdf). The proposed method consists of a double transformation kernel estimation. We derive optimal bandwidth selection methods that have a direct expression for the smoothing parameter. The bandwidth can accommodate to the given quantile level. The procedure is useful for large data sets and improves quantile estimation compared to other methods in heavy tailed distributions. Implementation is straightforward and R programs are available.
Resumo:
Background: Two genes are called synthetic lethal (SL) if mutation of either alone is not lethal, but mutation of both leads to death or a significant decrease in organism's fitness. The detection of SL gene pairs constitutes a promising alternative for anti-cancer therapy. As cancer cells exhibit a large number of mutations, the identification of these mutated genes' SL partners may provide specific anti-cancer drug candidates, with minor perturbations to the healthy cells. Since existent SL data is mainly restricted to yeast screenings, the road towards human SL candidates is limited to inference methods. Results: In the present work, we use phylogenetic analysis and database manipulation (BioGRID for interactions, Ensembl and NCBI for homology, Gene Ontology for GO attributes) in order to reconstruct the phylogenetically-inferred SL gene network for human. In addition, available data on cancer mutated genes (COSMIC and Cancer Gene Census databases) as well as on existent approved drugs (DrugBank database) supports our selection of cancer-therapy candidates.Conclusions: Our work provides a complementary alternative to the current methods for drug discovering and gene target identification in anti-cancer research. Novel SL screening analysis and the use of highly curated databases would contribute to improve the results of this methodology.
Resumo:
The paper presents a competence-based instructional design system and a way to provide a personalization of navigation in the course content. The navigation aid tool builds on the competence graph and the student model, which includes the elements of uncertainty in the assessment of students. An individualized navigation graph is constructed for each student, suggesting the competences the student is more prepared to study. We use fuzzy set theory for dealing with uncertainty. The marks of the assessment tests are transformed into linguistic terms and used for assigning values to linguistic variables. For each competence, the level of difficulty and the level of knowing its prerequisites are calculated based on the assessment marks. Using these linguistic variables and approximate reasoning (fuzzy IF-THEN rules), a crisp category is assigned to each competence regarding its level of recommendation.
Resumo:
Aim Recently developed parametric methods in historical biogeography allow researchers to integrate temporal and palaeogeographical information into the reconstruction of biogeographical scenarios, thus overcoming a known bias of parsimony-based approaches. Here, we compare a parametric method, dispersal-extinction-cladogenesis (DEC), against a parsimony-based method, dispersal-vicariance analysis (DIVA), which does not incorporate branch lengths but accounts for phylogenetic uncertainty through a Bayesian empirical approach (Bayes-DIVA). We analyse the benefits and limitations of each method using the cosmopolitan plant family Sapindaceae as a case study.Location World-wide.Methods Phylogenetic relationships were estimated by Bayesian inference on a large dataset representing generic diversity within Sapindaceae. Lineage divergence times were estimated by penalized likelihood over a sample of trees from the posterior distribution of the phylogeny to account for dating uncertainty in biogeographical reconstructions. We compared biogeographical scenarios between Bayes-DIVA and two different DEC models: one with no geological constraints and another that employed a stratified palaeogeographical model in which dispersal rates were scaled according to area connectivity across four time slices, reflecting the changing continental configuration over the last 110 million years.Results Despite differences in the underlying biogeographical model, Bayes-DIVA and DEC inferred similar biogeographical scenarios. The main differences were: (1) in the timing of dispersal events - which in Bayes-DIVA sometimes conflicts with palaeogeographical information, and (2) in the lower frequency of terminal dispersal events inferred by DEC. Uncertainty in divergence time estimations influenced both the inference of ancestral ranges and the decisiveness with which an area can be assigned to a node.Main conclusions By considering lineage divergence times, the DEC method gives more accurate reconstructions that are in agreement with palaeogeographical evidence. In contrast, Bayes-DIVA showed the highest decisiveness in unequivocally reconstructing ancestral ranges, probably reflecting its ability to integrate phylogenetic uncertainty. Care should be taken in defining the palaeogeographical model in DEC because of the possibility of overestimating the frequency of extinction events, or of inferring ancestral ranges that are outside the extant species ranges, owing to dispersal constraints enforced by the model. The wide-spanning spatial and temporal model proposed here could prove useful for testing large-scale biogeographical patterns in plants.
Resumo:
Mathematical methods combined with measurements of single-cell dynamics provide a means to reconstruct intracellular processes that are only partly or indirectly accessible experimentally. To obtain reliable reconstructions, the pooling of measurements from several cells of a clonal population is mandatory. However, cell-to-cell variability originating from diverse sources poses computational challenges for such process reconstruction. We introduce a scalable Bayesian inference framework that properly accounts for population heterogeneity. The method allows inference of inaccessible molecular states and kinetic parameters; computation of Bayes factors for model selection; and dissection of intrinsic, extrinsic and technical noise. We show how additional single-cell readouts such as morphological features can be included in the analysis. We use the method to reconstruct the expression dynamics of a gene under an inducible promoter in yeast from time-lapse microscopy data.
Resumo:
Adaptation to different ecological environments can promote speciation. Although numerous examples of such 'ecological speciation' now exist, the genomic basis of the process, and the role of gene flow in it, remains less understood. This is, at least in part, because systems that are well characterized in terms of their ecology often lack genomic resources. In this study, we characterize the transcriptome of Timema cristinae stick insects, a system that has been researched intensively in terms of ecological speciation, but for which genomic resources have not been previously developed. Specifically, we obtained >1 million 454 sequencing reads that assembled into 84,937 contigs representing approximately 18,282 unique genes and tens of thousands of potential molecular markers. Second, as an illustration of their utility, we used these genomic resources to assess multilocus genetic divergence within both an ecotype pair and a species pair of Timema stick insects. The results suggest variable levels of genetic divergence and gene flow among taxon pairs and genes and illustrate a first step towards future genomic work in Timema.
Resumo:
Condence intervals in econometric time series regressions suffer fromnotorious coverage problems. This is especially true when the dependencein the data is noticeable and sample sizes are small to moderate, as isoften the case in empirical studies. This paper suggests using thestudentized block bootstrap and discusses practical issues, such as thechoice of the block size. A particular data-dependent method is proposedto automate the method. As a side note, it is pointed out that symmetricconfidence intervals are preferred over equal-tailed ones, since theyexhibit improved coverage accuracy. The improvements in small sampleperformance are supported by a simulation study.
Resumo:
How much would output increase if underdeveloped economies were toincrease their levels of schooling? We contribute to the development accounting literature by describing a non-parametric upper bound on theincrease in output that can be generated by more schooling. The advantage of our approach is that the upper bound is valid for any number ofschooling levels with arbitrary patterns of substitution/complementarity.Another advantage is that the upper bound is robust to certain forms ofendogenous technology response to changes in schooling. We also quantify the upper bound for all economies with the necessary data, compareour results with the standard development accounting approach, andprovide an update on the results using the standard approach for a largesample of countries.