Biblioteca Digital

963 resultados para STATISTICAL MODELS

Effect sizes and standardization in neighbourhood models of forest stands: potential biases and misinterpretations

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Effects of conspecific neighbours on survival and growth of trees have been found to be related to species abundance. Both positive and negative relationships may explain observed abundance patterns. Surprisingly, it is rarely tested whether such relationships could be biased or even spurious due to transforming neighbourhood variables or influences of spatial aggregation, distance decay of neighbour effects and standardization of effect sizes. To investigate potential biases, communities of 20 identical species were simulated with log-series abundances but without species-specific interactions. No relationship of conspecific neighbour effects on survival or growth with species abundance was expected. Survival and growth of individuals was simulated in random and aggregated spatial patterns using no, linear, or squared distance decay of neighbour effects. Regression coefficients of statistical neighbourhood models were unbiased and unrelated to species abundance. However, variation in the number of conspecific neighbours was positively or negatively related to species abundance depending on transformations of neighbourhood variables, spatial pattern and distance decay. Consequently, effect sizes and standardized regression coefficients, often used in model fitting across large numbers of species, were also positively or negatively related to species abundance depending on transformation of neighbourhood variables, spatial pattern and distance decay. Tests using randomized tree positions and identities provide the best benchmarks by which to critically evaluate relationships of effect sizes or standardized regression coefficients with tree species abundance. This will better guard against potential misinterpretations.

Global models of planet formation and evolution

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Despite the strong increase in observational data on extrasolar planets, the processes that led to the formation of these planets are still not well understood. However, thanks to the high number of extrasolar planets that have been discovered, it is now possible to look at the planets as a population that puts statistical constraints on theoretical formation models. A method that uses these constraints is planetary population synthesis where synthetic planetary populations are generated and compared to the actual population. The key element of the population synthesis method is a global model of planet formation and evolution. These models directly predict observable planetary properties based on properties of the natal protoplanetary disc, linking two important classes of astrophysical objects. To do so, global models build on the simplified results of many specialized models that address one specific physical mechanism. We thoroughly review the physics of the sub-models included in global formation models. The sub-models can be classified as models describing the protoplanetary disc (of gas and solids), those that describe one (proto)planet (its solid core, gaseous envelope and atmosphere), and finally those that describe the interactions (orbital migration and N-body interaction). We compare the approaches taken in different global models, discuss the links between specialized and global models, and identify physical processes that require improved descriptions in future work. We then shortly address important results of planetary population synthesis like the planetary mass function or the mass-radius relationship. With these statistical results, the global effects of physical mechanisms occurring during planet formation and evolution become apparent, and specialized models describing them can be put to the observational test. Owing to their nature as meta models, global models depend on the results of specialized models, and therefore on the development of the field of planet formation theory as a whole. Because there are important uncertainties in this theory, it is likely that the global models will in future undergo significant modifications. Despite these limitations, global models can already now yield many testable predictions. With future global models addressing the geophysical characteristics of the synthetic planets, it should eventually become possible to make predictions about the habitability of planets based on their formation and evolution.

Performance Evaluation of the New Connecticut Leading Employment Index Using Lead Profiles and BVAR Models

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dua and Miller (1996) created leading and coincident employment indexes for the state of Connecticut, following Moore's (1981) work at the national level. The performance of the Dua-Miller indexes following the recession of the early 1990s fell short of expectations. This paper performs two tasks. First, it describes the process of revising the Connecticut Coincident and Leading Employment Indexes. Second, it analyzes the statistical properties and performance of the new indexes by comparing the lead profiles of the new and old indexes as well as their out-of-sample forecasting performance, using the Bayesian Vector Autoregressive (BVAR) method. The new indexes show improved performance in dating employment cycle chronologies. The lead profile test demonstrates that superiority in a rigorous, non-parametric statistic fashion. The mixed evidence on the BVAR forecasting experiments illustrates the truth in the Granger and Newbold (1986) caution that leading indexes properly predict cycle turning points and do not necessarily provide accurate forecasts except at turning points, a view that our results support.

A full pedigree based method for the statistical assessment of genetic anticipation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genetic anticipation is defined as a decrease in age of onset or increase in severity as the disorder is transmitted through subsequent generations. Anticipation has been noted in the literature for over a century. Recently, anticipation in several diseases including Huntington's Disease, Myotonic Dystrophy and Fragile X Syndrome were shown to be caused by expansion of triplet repeats. Anticipation effects have also been observed in numerous mental disorders (e.g. Schizophrenia, Bipolar Disorder), cancers (Li-Fraumeni Syndrome, Leukemia) and other complex diseases. ^ Several statistical methods have been applied to determine whether anticipation is a true phenomenon in a particular disorder, including standard statistical tests and newly developed affected parent/affected child pair methods. These methods have been shown to be inappropriate for assessing anticipation for a variety of reasons, including familial correlation and low power. Therefore, we have developed family-based likelihood modeling approaches to model the underlying transmission of the disease gene and penetrance function and hence detect anticipation. These methods can be applied in extended families, thus improving the power to detect anticipation compared with existing methods based only upon parents and children. The first method we have proposed is based on the regressive logistic hazard model. This approach models anticipation by a generational covariate. The second method allows alleles to mutate as they are transmitted from parents to offspring and is appropriate for modeling the known triplet repeat diseases in which the disease alleles can become more deleterious as they are transmitted across generations. ^ To evaluate the new methods, we performed extensive simulation studies for data simulated under different conditions to evaluate the effectiveness of the algorithms to detect genetic anticipation. Results from analysis by the first method yielded empirical power greater than 87% based on the 5% type I error critical value identified in each simulation depending on the method of data generation and current age criteria. Analysis by the second method was not possible due to the current formulation of the software. The application of this method to Huntington's Disease and Li-Fraumeni Syndrome data sets revealed evidence for a generation effect in both cases. ^

The Devil’s Calculus: Mathematical Models of Civil War

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In spite of the movement to turn political science into a real science, various mathematical methods that are now the staples of physics, biology, and even economics are thoroughly uncommon in political science, especially the study of civil war. This study seeks to apply such methods - specifically, ordinary differential equations (ODEs) - to model civil war based on what one might dub the capabilities school of thought, which roughly states that civil wars end only when one side’s ability to make war falls far enough to make peace truly attractive. I construct several different ODE-based models and then test them all to see which best predicts the instantaneous capabilities of both sides of the Sri Lankan civil war in the period from 1990 to 1994 given parameters and initial conditions. The model that the tests declare most accurate gives very accurate predictions of state military capabilities and reasonable short term predictions of cumulative deaths. Analysis of the model reveals the scale of the importance of rebel finances to the sustainability of insurgency, most notably that the number of troops required to put down the Tamil Tigers is reduced by nearly a full order of magnitude when Tiger foreign funding is stopped. The study thus demonstrates that accurate foresight may come of relatively simple dynamical models, and implies the great potential of advanced and currently unconventional non-statistical mathematical methods in political science.

Estimating parameters in Markov models for longitudinal studies with missing data or surrogate outcomes

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The discrete-time Markov chain is commonly used in describing changes of health states for chronic diseases in a longitudinal study. Statistical inferences on comparing treatment effects or on finding determinants of disease progression usually require estimation of transition probabilities. In many situations when the outcome data have some missing observations or the variable of interest (called a latent variable) can not be measured directly, the estimation of transition probabilities becomes more complicated. In the latter case, a surrogate variable that is easier to access and can gauge the characteristics of the latent one is usually used for data analysis. ^ This dissertation research proposes methods to analyze longitudinal data (1) that have categorical outcome with missing observations or (2) that use complete or incomplete surrogate observations to analyze the categorical latent outcome. For (1), different missing mechanisms were considered for empirical studies using methods that include EM algorithm, Monte Carlo EM and a procedure that is not a data augmentation method. For (2), the hidden Markov model with the forward-backward procedure was applied for parameter estimation. This method was also extended to cover the computation of standard errors. The proposed methods were demonstrated by the Schizophrenia example. The relevance of public health, the strength and limitations, and possible future research were also discussed. ^

A computerized statistical framework for coalescent analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Coalescent theory represents the most significant progress in theoretical population genetics in the past three decades. The coalescent theory states that all genes or alleles in a given population are ultimately inherited from a single ancestor shared by all members of the population, known as the most recent common ancestor. It is now widely recognized as a cornerstone for rigorous statistical analyses of molecular data from population [1]. The scientists have developed a large number of coalescent models and methods[2,3,4,5,6], which are not only applied in coalescent analysis and process, but also in today’s population genetics and genome studies, even public health. The thesis aims at completing a statistical framework based on computers for coalescent analysis. This framework provides a large number of coalescent models and statistic methods to assist students and researchers in coalescent analysis, whose results are presented in various formats as texts, graphics and printed pages. In particular, it also supports to create new coalescent models and statistical methods. ^

Assessment of the effect on statistical power of regression model misspecification by using techniques of mathematical statistics and simulation study

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^

A STUDY OF INFLUENZA AND OF STATISTICAL METHODS USED TO REPORT INFLUENZA

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper defines and compares several models for describing excess influenza pneumonia mortality in Houston. First, the methodology used by the Center for Disease Control is examined and several variations of this methodology are studied. All of the models examined emphasize the difficulty of omitting epidemic weeks.^ In an attempt to find a better method of describing expected and epidemic mortality, time series methods are examined. Grouping in four-week periods, truncating the data series to adjust epidemic periods, and seasonally-adjusting the series y(,t), by:^ (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI)^ is the best method examined. This new series w(,t) is stationary and a moving average model MA(1) gives a good fit for forecasting influenza and pneumonia mortality in Houston.^ Influenza morbidity, other causes of death, sex, race, age, climate variables, environmental factors, and school absenteeism are all examined in terms of their relationship to influenza and pneumonia mortality. Both influenza morbidity and ischemic heart disease mortality show a very high relationship that remains when seasonal trends are removed from the data. However, when jointly modeling the three series it is obvious that the simple time series MA(1) model of truncated, seasonally-adjusted four-week data gives a better forecast.^

Using spatial linear models with SAR and CAR structure to examine Texas lung cancer incidence rates

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Scholars have found that socioeconomic status was one of the key factors that influenced early-stage lung cancer incidence rates in a variety of regions. This thesis examined the association between median household income and lung cancer incidence rates in Texas counties. A total of 254 individual counties in Texas with corresponding lung cancer incidence rates from 2004 to 2008 and median household incomes in 2006 were collected from the National Cancer Institute Surveillance System. A simple linear model and spatial linear models with two structures, Simultaneous Autoregressive Structure (SAR) and Conditional Autoregressive Structure (CAR), were used to link median household income and lung cancer incidence rates in Texas. The residuals of the spatial linear models were analyzed with Moran's I and Geary's C statistics, and the statistical results were used to detect similar lung cancer incidence rate clusters and disease patterns in Texas.^

Comparison of machine learning approaches and statistical methods for estimation of airborne chemical exposures

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Accurate quantitative estimation of exposure using retrospective data has been one of the most challenging tasks in the exposure assessment field. To improve these estimates, some models have been developed using published exposure databases with their corresponding exposure determinants. These models are designed to be applied to reported exposure determinants obtained from study subjects or exposure levels assigned by an industrial hygienist, so quantitative exposure estimates can be obtained. ^ In an effort to improve the prediction accuracy and generalizability of these models, and taking into account that the limitations encountered in previous studies might be due to limitations in the applicability of traditional statistical methods and concepts, the use of computer science- derived data analysis methods, predominantly machine learning approaches, were proposed and explored in this study. ^ The goal of this study was to develop a set of models using decision trees/ensemble and neural networks methods to predict occupational outcomes based on literature-derived databases, and compare, using cross-validation and data splitting techniques, the resulting prediction capacity to that of traditional regression models. Two cases were addressed: the categorical case, where the exposure level was measured as an exposure rating following the American Industrial Hygiene Association guidelines and the continuous case, where the result of the exposure is expressed as a concentration value. Previously developed literature-based exposure databases for 1,1,1 trichloroethane, methylene dichloride and, trichloroethylene were used. ^ When compared to regression estimations, results showed better accuracy of decision trees/ensemble techniques for the categorical case while neural networks were better for estimation of continuous exposure values. Overrepresentation of classes and overfitting were the main causes for poor neural network performance and accuracy. Estimations based on literature-based databases using machine learning techniques might provide an advantage when they are applied to other methodologies that combine `expert inputs' with current exposure measurements, like the Bayesian Decision Analysis tool. The use of machine learning techniques to more accurately estimate exposures from literature-based exposure databases might represent the starting point for the independence from the expert judgment.^

Age models and summer sea surface temperature and winter sea ice concentration for the EPILOG-LGM time slice in the Pacific Southern Ocean

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sea surface temperatures and sea-ice extent are the most critical variables to evaluate the Southern Ocean paleoceanographic evolution in relation to the development of the global carbon cycle, atmospheric CO2 variability and ocean-atmosphere circulation. In contrast to the Atlantic and the Indian sectors, the Pacific sector of the Southern Ocean has been insufficiently investigated so far. To cover this gap of information we present diatom-based estimates of summer sea surface temperature (SSST) and winter sea-ice concentration (WSI) from 17 sites in the polar South Pacific to study the Last Glacial Maximum (LGM) at the EPILOG time slice (19,000-23,000 cal. years BP). Applied statistical methods are the Imbrie and Kipp Method (IKM) and the Modern Analog Technique (MAT) to estimate temperature and sea-ice concentration, respectively. Our data display a distinct LGM east-west differentiation in SSST and WSI with steeper latitudinal temperature gradients and a winter sea-ice edge located consistently north of the Pacific-Antarctic Ridge in the Ross sea sector. In the eastern sector of our study area, which is governed by the Amundsen Abyssal Plain, the estimates yield weaker latitudinal SSST gradients together with a variable extended winter sea-ice field. In this sector, sea-ice extent may have reached sporadically the area of the present Subantarctic Front at its maximum LGM expansion. This pattern points to topographic forcing as major controller of the frontal system location and sea-ice extent in the western Pacific sector whereas atmospheric conditions like the Southern Annular Mode and the ENSO affected the oceanographic conditions in the eastern Pacific sector. Although it is difficult to depict the location and the physical nature of frontal systems separating the glacial Southern Ocean water masses into different zones, we found a distinct temperature gradient in latitudes straddled by the modern Southern Subtropical Front. Considering that the glacial temperatures north of this zone are similar to the modern, we suggest that this represents the Glacial Southern Subtropical Front (GSSTF), which delimits the zone of strongest glacial SSST cooling (>4K) to its North. The southern boundary of the zone of maximum cooling is close to the glacial 4°C isotherm. This isotherm, which is in the range of SSST at the modern Antarctic Polar Front (APF), represents a circum-Antarctic feature and marks the northern edge of the glacial Antarctic Circumpolar Current (ACC). We also assume that a glacial front was established at the northern average winter sea ice edge, comparable with the modern Southern Antarctic Circumpolar Current Front (SACCF). During the glacial, this front would be located in the area of the modern APF. The northward deflection of colder than modern surface waters along the South American continent leads to a significant cooling of the glacial Humboldt Current surface waters (4-8K), which affects the temperature regimes as far north as into tropical latitudes. The glacial reduction of ACC temperatures may also result in the significant cooling in the Atlantic and Indian Southern Ocean, thus may enhance thermal differentiation of the Southern Ocean and Antarctic continental cooling. Comparison with temperature and sea ice simulations for the last glacial based on numerical simulations show that the majority of modern models overestimate summer and winter sea ice cover and that there exists few models that reproduce our temperature data rather well.

Comparison of methods for downscaling runoff from regional climate models in Spanish basins

Relevância:

30.00% 30.00%

Publicador:

Resumo:

At present there is much literature that refers to the advantages and disadvantages of different methods of statistical and dynamical downscaling of climate variables projected by climate models. Less attention has been paid to other indirect variables, like runoff, which play a signiﬁcant role in evaluating the impact of climate change on hydrological systems. Runoff presents a much greater bias in climate models than other climate variables, like temperature or precipitation. It is very important to identify the methods that minimize bias while downscaling runoff from the gridded results of climate models to the basin scale

Regularization for sparsity in statistical analysis and machine learning

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Pragmatism is the leading motivation of regularization. We can understand regularization as a modification of the maximum-likelihood estimator so that a reasonable answer could be given in an unstable or ill-posed situation. To mention some typical examples, this happens when fitting parametric or non-parametric models with more parameters than data or when estimating large covariance matrices. Regularization is usually used, in addition, to improve the bias-variance tradeoff of an estimation. Then, the definition of regularization is quite general, and, although the introduction of a penalty is probably the most popular type, it is just one out of multiple forms of regularization. In this dissertation, we focus on the applications of regularization for obtaining sparse or parsimonious representations, where only a subset of the inputs is used. A particular form of regularization, L1-regularization, plays a key role for reaching sparsity. Most of the contributions presented here revolve around L1-regularization, although other forms of regularization are explored (also pursuing sparsity in some sense). In addition to present a compact review of L1-regularization and its applications in statistical and machine learning, we devise methodology for regression, supervised classification and structure induction of graphical models. Within the regression paradigm, we focus on kernel smoothing learning, proposing techniques for kernel design that are suitable for high dimensional settings and sparse regression functions. We also present an application of regularized regression techniques for modeling the response of biological neurons. Supervised classification advances deal, on the one hand, with the application of regularization for obtaining a na¨ıve Bayes classifier and, on the other hand, with a novel algorithm for brain-computer interface design that uses group regularization in an efficient manner. Finally, we present a heuristic for inducing structures of Gaussian Bayesian networks using L1-regularization as a filter. El pragmatismo es la principal motivación de la regularización. Podemos entender la regularización como una modificación del estimador de máxima verosimilitud, de tal manera que se pueda dar una respuesta cuando la configuración del problema es inestable. A modo de ejemplo, podemos mencionar el ajuste de modelos paramétricos o no paramétricos cuando hay más parámetros que casos en el conjunto de datos, o la estimación de grandes matrices de covarianzas. Se suele recurrir a la regularización, además, para mejorar el compromiso sesgo-varianza en una estimación. Por tanto, la definición de regularización es muy general y, aunque la introducción de una función de penalización es probablemente el método más popular, éste es sólo uno de entre varias posibilidades. En esta tesis se ha trabajado en aplicaciones de regularización para obtener representaciones dispersas, donde sólo se usa un subconjunto de las entradas. En particular, la regularización L1 juega un papel clave en la búsqueda de dicha dispersión. La mayor parte de las contribuciones presentadas en la tesis giran alrededor de la regularización L1, aunque también se exploran otras formas de regularización (que igualmente persiguen un modelo disperso). Además de presentar una revisión de la regularización L1 y sus aplicaciones en estadística y aprendizaje de máquina, se ha desarrollado metodología para regresión, clasificación supervisada y aprendizaje de estructura en modelos gráficos. Dentro de la regresión, se ha trabajado principalmente en métodos de regresión local, proponiendo técnicas de diseño del kernel que sean adecuadas a configuraciones de alta dimensionalidad y funciones de regresión dispersas. También se presenta una aplicación de las técnicas de regresión regularizada para modelar la respuesta de neuronas reales. Los avances en clasificación supervisada tratan, por una parte, con el uso de regularización para obtener un clasificador naive Bayes y, por otra parte, con el desarrollo de un algoritmo que usa regularización por grupos de una manera eficiente y que se ha aplicado al diseño de interfaces cerebromáquina. Finalmente, se presenta una heurística para inducir la estructura de redes Bayesianas Gaussianas usando regularización L1 a modo de filtro.

Mechanical properties of self-consolidating concrete using conventional concrete models

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objective of this study is to analyze the applicability of current models used for estimating the mechanical properties of conventional concrete to self-consolidating concrete (SCC). The mechanical properties evaluated are modulus of elasticity, tensile strength,and modulus of rupture. As part of the study, it was necessary to build an extensive database that included the proportions and mechanical properties of 627 mixtures from 138 different references. The same models that are currently used for calculating the mechanical properties of conventional concrete were applied to SCC to evaluate their applicability to this type of concrete. The models considered are the ACI 318, ACI 363R, and EC2. These are the most commonly used models worldwide. In the first part of the study, the overall behavior and adaptability of the different models to SCC is evaluated. The specific characterization parameters for each concrete mixture are used to calculate the various mechanical properties applying the different estimation models. The second part of the analysis consists of comparing the experimental results of all the mixtures included in the database with the estimated results to evaluate the applicability of these models to SCC. Various statistical procedures, such as regression analysis and residual analysis, are used to compare the predicted and measured properties. It terms of general applicability, the evaluated models are suitable for estimating the modulus of elasticity, tensile strength, and modulus of rupture of SCC. These models have a rather low sensitivity, however, and adjust well only to mean values. This is because the models use the compressive strength as the main variable to characterize the concrete and do not consider other variables that affect these properties.

«
1
2
...
46
47
48
49
50
51
52
...
64
65
»