990 resultados para threshold random variable
Resumo:
Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^
Resumo:
Division of labor is a widely studied aspect of colony behavior of social insects. Division of labor models indicate how individuals distribute themselves in order to perform different tasks simultaneously. However, models that study division of labor from a dynamical system point of view cannot be found in the literature. In this paper, we define a division of labor model as a discrete-time dynamical system, in order to study the equilibrium points and their properties related to convergence and stability. By making use of this analytical model, an adaptive algorithm based on division of labor can be designed to satisfy dynamic criteria. In this way, we have designed and tested an algorithm that varies the response thresholds in order to modify the dynamic behavior of the system. This behavior modification allows the system to adapt to specific environmental and collective situations, making the algorithm a good candidate for distributed control applications. The variable threshold algorithm is based on specialization mechanisms. It is able to achieve an asymptotically stable behavior of the system in different environments and independently of the number of individuals. The algorithm has been successfully tested under several initial conditions and number of individuals.
Resumo:
The reliability of measurement refers to unsystematic error in observed responses. Investigations of the prevalence of random error in stated estimates of willingness to pay (WTP) are important to an understanding of why tests of validity in CV can fail. However, published reliability studies have tended to adopt empirical methods that have practical and conceptual limitations when applied to WTP responses. This contention is supported in a review of contingent valuation reliability studies that demonstrate important limitations of existing approaches to WTP reliability. It is argued that empirical assessments of the reliability of contingent values may be better dealt with by using multiple indicators to measure the latent WTP distribution. This latent variable approach is demonstrated with data obtained from a WTP study for stormwater pollution abatement. Attitude variables were employed as a way of assessing the reliability of open-ended WTP (with benchmarked payment cards) for stormwater pollution abatement. The results indicated that participants' decisions to pay were reliably measured, but not the magnitude of the WTP bids. This finding highlights the need to better discern what is actually being measured in VVTP studies, (C) 2003 Elsevier B.V. All rights reserved.
Resumo:
This paper considers a stochastic SIR (susceptible-infective-removed) epidemic model in which individuals may make infectious contacts in two ways, both within 'households' (which for ease of exposition are assumed to have equal size) and along the edges of a random graph describing additional social contacts. Heuristically-motivated branching process approximations are described, which lead to a threshold parameter for the model and methods for calculating the probability of a major outbreak, given few initial infectives, and the expected proportion of the population who are ultimately infected by such a major outbreak. These approximate results are shown to be exact as the number of households tends to infinity by proving associated limit theorems. Moreover, simulation studies indicate that these asymptotic results provide good approximations for modestly-sized finite populations. The extension to unequal sized households is discussed briefly.
Resumo:
In this paper we determine the local and global resilience of random graphs G(n,p) (p >> n(-1)) with respect to the property of containing a cycle of length at least (1 - alpha)n. Roughly speaking, given alpha > 0, we determine the smallest r(g) (G, alpha) with the property that almost surely every subgraph of G = G(n,p) having more than r(g) (G, alpha)vertical bar E(G)vertical bar edges contains a cycle of length at least (1 - alpha)n (global resilience). We also obtain, for alpha < 1/2, the smallest r(l) (G, alpha) such that any H subset of G having deg(H) (v) larger than r(l) (G, alpha) deg(G) (v) for all v is an element of V(G) contains a cycle of length at least (1 - alpha)n (local resilience). The results above are in fact proved in the more general setting of pseudorandom graphs.
Resumo:
Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.
Resumo:
The adaptive process in motor learning was examined in terms of effects of varying amounts of constant practice performed before random practice. Participants pressed five response keys sequentially, the last one coincident with the lighting of a final visual stimulus provided by a complex coincident timing apparatus. Different visual stimulus speeds were used during the random practice. 33 children (M age=11.6 yr.) were randomly assigned to one of three experimental groups: constant-random, constant-random 33%, and constant-random 66%. The constant-random group practiced constantly until they reached a criterion of performance stabilization three consecutive trials within 50 msec. of error. The other two groups had additional constant practice of 33 and 66%, respectively, of the number of trials needed to achieve the stabilization criterion. All three groups performed 36 trials under random practice; in the adaptation phase, they practiced at a different visual stimulus speed adopted in the stabilization phase. Global performance measures were absolute, constant, and variable errors, and movement pattern was analyzed by relative timing and overall movement time. There was no group difference in relation to global performance measures and overall movement time. However, differences between the groups were observed on movement pattern, since constant-random 66% group changed its relative timing performance in the adaptation phase.
Resumo:
PURPOSE: To evaluate the impact of atypical retardation patterns (ARP) on detection of progressive retinal nerve fiber layer (RNFL) loss using scanning laser polarimetry with variable corneal compensation (VCC). DESIGN: Observational cohort study. METHODS: The study included 377 eyes of 221 patients with a median follow-up of 4.0 years. Images were obtained annually with the GDx VCC (Carl Zeiss Med, itec Inc, Dublin, California, USA), along with optic disc stereophotographs and standard automated perimetry (SAP) visual fields. Progression was determined by the Guided Progression Analysis software for SAP and by masked assessment of stereophotographs by expert graders. The typical scan score (TSS) was used to quantify the presence of ARPs on GDx VCC images. Random coefficients models were used to evaluate the relationship between ARP and RNFL thickness measurements over time. RESULTS: Thirty-eight eyes (10%) showed progression over time on visual fields, stereophotographs, or both. Changes in TSS scores from baseline were significantly associated with changes in RNFL thickness measurements in both progressing and nonprogressing eyes. Each I unit increase in TSS score was associated with a 0.19-mu m decrease in RNFL thickness measurement (P < .001) over time. CONCLUSIONS: ARPs had a significant effect on detection of progressive RNFL loss with the GDx VCC. Eyes with large amounts of atypical patterns, great fluctuations on these patterns over time, or both may show changes in measurements that can appear falsely as glaucomatous progression or can mask true changes in the RNFL. (Am J Ophthalmol 2009;148:155-163. (C) 2009 by Elsevier Inc. All rights reserved.)
Resumo:
A finite-element method is used to study the elastic properties of random three-dimensional porous materials with highly interconnected pores. We show that Young's modulus, E, is practically independent of Poisson's ratio of the solid phase, nu(s), over the entire solid fraction range, and Poisson's ratio, nu, becomes independent of nu(s) as the percolation threshold is approached. We represent this behaviour of nu in a flow diagram. This interesting but approximate behaviour is very similar to the exactly known behaviour in two-dimensional porous materials. In addition, the behaviour of nu versus nu(s) appears to imply that information in the dilute porosity limit can affect behaviour in the percolation threshold limit. We summarize the finite-element results in terms of simple structure-property relations, instead of tables of data, to make it easier to apply the computational results. Without using accurate numerical computations, one is limited to various effective medium theories and rigorous approximations like bounds and expansions. The accuracy of these equations is unknown for general porous media. To verify a particular theory it is important to check that it predicts both isotropic elastic moduli, i.e. prediction of Young's modulus alone is necessary but not sufficient. The subtleties of Poisson's ratio behaviour actually provide a very effective method for showing differences between the theories and demonstrating their ranges of validity. We find that for moderate- to high-porosity materials, none of the analytical theories is accurate and, at present, numerical techniques must be relied upon.
Resumo:
The risk of cardiac events in patients undergoing major noncardiac surgery is dependent on their clinical characteristics and the results of stress testing. The purpose of this study was to develop a composite approach to defining levels of risk and to examine whether different approaches to prophylaxis influenced this prediction of outcome. One hundred forty-five consecutive patients (aged 68 +/- 9 years, 79 men) with >1 clinical risk variable were studied with standard dobutamine-atropine stress echo before major noncardiac surgery. Risk levels were stratified according to the presence of ischemia (new or worsening wall motion abnormality), ischemic threshold (heart rate at development of ischemia), and number of clinical risk variables. Patients were followed for perioperative events (during hospital admission) and death or infarction over the subsequent 16 10 months. Ten perioperative events occurred in 105 patients who proceeded to surgery (10%, 95% confidence interval [CI] 5% to 17%), 40 being cancelled because of cardiac or other risk. No ischemia was identified in 56 patients, 1 of whom (1.8%) had a perioperative infarction. Of the 49 patients with ischemia, 22 (45%) had 1 or 2 clinical risk factors; 2 (9%, 95% CI 1% to 29%) had events. Another 15 patients had a high ischemic threshold and 3 or 4 risk factors; 3 (20%, 95% Cl 4% to 48%) had events. Twelve patients had a low ischemic threshold and 3 or 4 risk factors; 4 (33%, 95% CI 10% to 65%) had events. Preoperative myocardial revascularization was performed in only 3 patients, none of whom had events. Perioperative and long-term events occurred despite the use of beta blockers; 7 of 41 eta blocker-treated patients had a perioperative event (17%, 95% CI 7% to 32%); these treated patients were at higher anticipated risk than untreated patients (20 +/- 24% vs 10 +/- 19%, p = 0.02). The total event rate over late follow-up was 13%, and was predicted by dobutamine-atropine stress echo results and heart rate response. (C) 2002 by Excerpta Medica, Inc.
Resumo:
One of the main implications of the efficient market hypothesis (EMH) is that expected future returns on financial assets are not predictable if investors are risk neutral. In this paper we argue that financial time series offer more information than that this hypothesis seems to supply. In particular we postulate that runs of very large returns can be predictable for small time periods. In order to prove this we propose a TAR(3,1)-GARCH(1,1) model that is able to describe two different types of extreme events: a first type generated by large uncertainty regimes where runs of extremes are not predictable and a second type where extremes come from isolated dread/joy events. This model is new in the literature in nonlinear processes. Its novelty resides on two features of the model that make it different from previous TAR methodologies. The regimes are motivated by the occurrence of extreme values and the threshold variable is defined by the shock affecting the process in the preceding period. In this way this model is able to uncover dependence and clustering of extremes in high as well as in low volatility periods. This model is tested with data from General Motors stocks prices corresponding to two crises that had a substantial impact in financial markets worldwide; the Black Monday of October 1987 and September 11th, 2001. By analyzing the periods around these crises we find evidence of statistical significance of our model and thereby of predictability of extremes for September 11th but not for Black Monday. These findings support the hypotheses of a big negative event producing runs of negative returns in the first case, and of the burst of a worldwide stock market bubble in the second example. JEL classification: C12; C15; C22; C51 Keywords and Phrases: asymmetries, crises, extreme values, hypothesis testing, leverage effect, nonlinearities, threshold models
Resumo:
We study the concept of propagation connectivity on random 3-uniform hypergraphs. This concept is inspired by a simple linear time algorithm for solving instances of certain constraint satisfaction problems. We derive upper and lower bounds for the propagation connectivity threshold, and point out some algorithmic implications.
Resumo:
In this paper we consider extensions of smooth transition autoregressive (STAR) models to situations where the threshold is a time-varying function of variables that affect the separation of regimes of the time series under consideration. Our specification is motivated by the observation that unusually high/low values for an economic variable may sometimes be best thought of in relative terms. State-dependent logistic STAR and contemporaneous-threshold STAR models are introduced and discussed. These models are also used to investigate the dynamics of U.S. short-term interest rates, where the threshold is allowed to be a function of past output growth and inflation.
Resumo:
This article analyzes empirically the main existing theories on income and population city growth: increasing returns to scale, locational fundamentals and random growth. To do this we implement a threshold nonlinearity test that extends standard linear growth regression models to a dataset on urban, climatological and macroeconomic variables on 1,175 U.S. cities. Our analysis reveals the existence of increasing returns when per-capita income levels are beyond $19; 264. Despite this, income growth is mostly explained by social and locational fundamentals. Population growth also exhibits two distinct equilibria determined by a threshold value of 116,300 inhabitants beyond which city population grows at a higher rate. Income and population growth do not go hand in hand, implying an optimal level of population beyond which income growth stagnates or deteriorates
Resumo:
Analyzing the relationship between the baseline value and subsequent change of a continuous variable is a frequent matter of inquiry in cohort studies. These analyses are surprisingly complex, particularly if only two waves of data are available. It is unclear for non-biostatisticians where the complexity of this analysis lies and which statistical method is adequate.With the help of simulated longitudinal data of body mass index in children,we review statistical methods for the analysis of the association between the baseline value and subsequent change, assuming linear growth with time. Key issues in such analyses are mathematical coupling, measurement error, variability of change between individuals, and regression to the mean. Ideally, it is better to rely on multiple repeated measurements at different times and a linear random effects model is a standard approach if more than two waves of data are available. If only two waves of data are available, our simulations show that Blomqvist's method - which consists in adjusting for measurement error variance the estimated regression coefficient of observed change on baseline value - provides accurate estimates. The adequacy of the methods to assess the relationship between the baseline value and subsequent change depends on the number of data waves, the availability of information on measurement error, and the variability of change between individuals.