134 resultados para Linear models (Statistics)


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cette thèse s'intéresse à étudier les propriétés extrémales de certains modèles de risque d'intérêt dans diverses applications de l'assurance, de la finance et des statistiques. Cette thèse se développe selon deux axes principaux, à savoir: Dans la première partie, nous nous concentrons sur deux modèles de risques univariés, c'est-à- dire, un modèle de risque de déflation et un modèle de risque de réassurance. Nous étudions le développement des queues de distribution sous certaines conditions des risques commun¬s. Les principaux résultats sont ainsi illustrés par des exemples typiques et des simulations numériques. Enfin, les résultats sont appliqués aux domaines des assurances, par exemple, les approximations de Value-at-Risk, d'espérance conditionnelle unilatérale etc. La deuxième partie de cette thèse est consacrée à trois modèles à deux variables: Le premier modèle concerne la censure à deux variables des événements extrême. Pour ce modèle, nous proposons tout d'abord une classe d'estimateurs pour les coefficients de dépendance et la probabilité des queues de distributions. Ces estimateurs sont flexibles en raison d'un paramètre de réglage. Leurs distributions asymptotiques sont obtenues sous certaines condi¬tions lentes bivariées de second ordre. Ensuite, nous donnons quelques exemples et présentons une petite étude de simulations de Monte Carlo, suivie par une application sur un ensemble de données réelles d'assurance. L'objectif de notre deuxième modèle de risque à deux variables est l'étude de coefficients de dépendance des queues de distributions obliques et asymétriques à deux variables. Ces distri¬butions obliques et asymétriques sont largement utiles dans les applications statistiques. Elles sont générées principalement par le mélange moyenne-variance de lois normales et le mélange de lois normales asymétriques d'échelles, qui distinguent la structure de dépendance de queue comme indiqué par nos principaux résultats. Le troisième modèle de risque à deux variables concerne le rapprochement des maxima de séries triangulaires elliptiques obliques. Les résultats théoriques sont fondés sur certaines hypothèses concernant le périmètre aléatoire sous-jacent des queues de distributions. -- This thesis aims to investigate the extremal properties of certain risk models of interest in vari¬ous applications from insurance, finance and statistics. This thesis develops along two principal lines, namely: In the first part, we focus on two univariate risk models, i.e., deflated risk and reinsurance risk models. Therein we investigate their tail expansions under certain tail conditions of the common risks. Our main results are illustrated by some typical examples and numerical simu¬lations as well. Finally, the findings are formulated into some applications in insurance fields, for instance, the approximations of Value-at-Risk, conditional tail expectations etc. The second part of this thesis is devoted to the following three bivariate models: The first model is concerned with bivariate censoring of extreme events. For this model, we first propose a class of estimators for both tail dependence coefficient and tail probability. These estimators are flexible due to a tuning parameter and their asymptotic distributions are obtained under some second order bivariate slowly varying conditions of the model. Then, we give some examples and present a small Monte Carlo simulation study followed by an application on a real-data set from insurance. The objective of our second bivariate risk model is the investigation of tail dependence coefficient of bivariate skew slash distributions. Such skew slash distributions are extensively useful in statistical applications and they are generated mainly by normal mean-variance mixture and scaled skew-normal mixture, which distinguish the tail dependence structure as shown by our principle results. The third bivariate risk model is concerned with the approximation of the component-wise maxima of skew elliptical triangular arrays. The theoretical results are based on certain tail assumptions on the underlying random radius.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Alcohol consumption may affect the course of HIV infection and/or antiretroviral therapy (ART). The authors investigated the association between self-reported alcohol consumption and HIV surrogate markers in both treated and untreated individuals. DESIGN: Prospective cohort study. METHODS: Over a 7-year period, the authors analyzed 2 groups of individuals in the Swiss HIV Cohort Study: (1) ART-naïve individuals remaining off ART and (2) individuals initiating first ART. For individuals initiating first ART, time-dependent Cox proportional hazards models were used to assess the association between alcohol consumption, virological failure, and ART interruption. For both groups, trajectories of log-transformed CD4 cell counts were analyzed using linear mixed models with repeated measures. RESULTS: The authors included 2982 individuals initiating first ART and 2085 ART naives. In individuals initiating first ART, 241 (8%) experienced virological failure. Alcohol consumption was not associated with virological failure. ART interruption was noted in 449 (15%) individuals and was more prevalent in severe compared with none/light health risk drinkers [hazard ratio: 2.24, 95% confidence interval: 1.42 to 3.52]. The association remained significant even after adjusting for nonadherence. The authors did not find an association between alcohol consumption and change in CD4 cell count over time in either group. CONCLUSIONS: No effect of alcohol consumption on either virological failure or CD4 cell count in both groups of ART-initiating and ART-naive individuals was found. However, severe drinkers were more likely to interrupt ART. Efforts on ART continuation should be especially implemented in individuals reporting high alcohol consumption.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Knowledge of the genetic structure of plant populations is necessary for the understanding of the dynamics of major ecological processes. It also has applications in conservation biology and risk assessment for genetically modified crops. This paper reports the genetic structure of a linear population of sea beet, Beta vulgaris ssp. maritima (the wild relative of sugar beet), on Furzey Island, Poole Harbour. The relative spatial positions of the plants were accurately mapped and the plants were scored for variation at isozyme and RFLP loci. Structure was analysed by repeated subdivision of the population to find the average size of a randomly mating group. Estimates of F-ST between randomly mating units were then made, and gave patterns consistent with the structure of the population being determined largely by founder effects. The implications of these results for the monitoring of transgene spread in wild sea beet populations are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An active strain formulation for orthotropic constitutive laws arising in cardiac mechanics modeling is introduced and studied. The passive mechanical properties of the tissue are described by the Holzapfel-Ogden relation. In the active strain formulation, the Euler-Lagrange equations for minimizing the total energy are written in terms of active and passive deformation factors, where the active part is assumed to depend, at the cell level, on the electrodynamics and on the specific orientation of the cardiac cells. The well-posedness of the linear system derived from a generic Newton iteration of the original problem is analyzed and different mechanical activation functions are considered. In addition, the active strain formulation is compared with the classical active stress formulation from both numerical and modeling perspectives. Taylor-Hood and MINI finite elements are employed to discretize the mechanical problem. The results of several numerical experiments show that the proposed formulation is mathematically consistent and is able to represent the main key features of the phenomenon, while allowing savings in computational costs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Sitting between your past and your future doesn't mean you are in the present. Dakota Skye Complex systems science is an interdisciplinary field grouping under the same umbrella dynamical phenomena from social, natural or mathematical sciences. The emergence of a higher order organization or behavior, transcending that expected of the linear addition of the parts, is a key factor shared by all these systems. Most complex systems can be modeled as networks that represent the interactions amongst the system's components. In addition to the actual nature of the part's interactions, the intrinsic topological structure of underlying network is believed to play a crucial role in the remarkable emergent behaviors exhibited by the systems. Moreover, the topology is also a key a factor to explain the extraordinary flexibility and resilience to perturbations when applied to transmission and diffusion phenomena. In this work, we study the effect of different network structures on the performance and on the fault tolerance of systems in two different contexts. In the first part, we study cellular automata, which are a simple paradigm for distributed computation. Cellular automata are made of basic Boolean computational units, the cells; relying on simple rules and information from- the surrounding cells to perform a global task. The limited visibility of the cells can be modeled as a network, where interactions amongst cells are governed by an underlying structure, usually a regular one. In order to increase the performance of cellular automata, we chose to change its topology. We applied computational principles inspired by Darwinian evolution, called evolutionary algorithms, to alter the system's topological structure starting from either a regular or a random one. The outcome is remarkable, as the resulting topologies find themselves sharing properties of both regular and random network, and display similitudes Watts-Strogtz's small-world network found in social systems. Moreover, the performance and tolerance to probabilistic faults of our small-world like cellular automata surpasses that of regular ones. In the second part, we use the context of biological genetic regulatory networks and, in particular, Kauffman's random Boolean networks model. In some ways, this model is close to cellular automata, although is not expected to perform any task. Instead, it simulates the time-evolution of genetic regulation within living organisms under strict conditions. The original model, though very attractive by it's simplicity, suffered from important shortcomings unveiled by the recent advances in genetics and biology. We propose to use these new discoveries to improve the original model. Firstly, we have used artificial topologies believed to be closer to that of gene regulatory networks. We have also studied actual biological organisms, and used parts of their genetic regulatory networks in our models. Secondly, we have addressed the improbable full synchronicity of the event taking place on. Boolean networks and proposed a more biologically plausible cascading scheme. Finally, we tackled the actual Boolean functions of the model, i.e. the specifics of how genes activate according to the activity of upstream genes, and presented a new update function that takes into account the actual promoting and repressing effects of one gene on another. Our improved models demonstrate the expected, biologically sound, behavior of previous GRN model, yet with superior resistance to perturbations. We believe they are one step closer to the biological reality.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper proposes an approach aimed at detecting optimal model parameter combinations to achieve the most representative description of uncertainty in the model performance. A classification problem is posed to find the regions of good fitting models according to the values of a cost function. Support Vector Machine (SVM) classification in the parameter space is applied to decide if a forward model simulation is to be computed for a particular generated model. SVM is particularly designed to tackle classification problems in high-dimensional space in a non-parametric and non-linear way. SVM decision boundaries determine the regions that are subject to the largest uncertainty in the cost function classification, and, therefore, provide guidelines for further iterative exploration of the model space. The proposed approach is illustrated by a synthetic example of fluid flow through porous media, which features highly variable response due to the parameter values' combination.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Due to practical difficulties in obtaining direct genetic estimates of effective sizes, conservation biologists have to rely on so-called 'demographic models' which combine life-history and mating-system parameters with F-statistics in order to produce indirect estimates of effective sizes. However, for the same practical reasons that prevent direct genetic estimates, the accuracy of demographic models is difficult to evaluate. Here we use individual-based, genetically explicit computer simulations in order to investigate the accuracy of two such demographic models aimed at investigating the hierarchical structure of populations. We show that, by and large, these models provide good estimates under a wide range of mating systems and dispersal patterns. However, one of the models should be avoided whenever the focal species' breeding system approaches monogamy with no sex bias in dispersal or when a substructure within social groups is suspected because effective sizes may then be strongly overestimated. The timing during the life cycle at which F-statistics are evaluated is also of crucial importance and attention should be paid to it when designing field sampling since different demographic models assume different timings. Our study shows that individual-based, genetically explicit models provide a promising way of evaluating the accuracy of demographic models of effective size and delineate their field of applicability.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years there has been an explosive growth in the development of adaptive and data driven methods. One of the efficient and data-driven approaches is based on statistical learning theory (Vapnik 1998). The theory is based on Structural Risk Minimisation (SRM) principle and has a solid statistical background. When applying SRM we are trying not only to reduce training error ? to fit the available data with a model, but also to reduce the complexity of the model and to reduce generalisation error. Many nonlinear learning procedures recently developed in neural networks and statistics can be understood and interpreted in terms of the structural risk minimisation inductive principle. A recent methodology based on SRM is called Support Vector Machines (SVM). At present SLT is still under intensive development and SVM find new areas of application (www.kernel-machines.org). SVM develop robust and non linear data models with excellent generalisation abilities that is very important both for monitoring and forecasting. SVM are extremely good when input space is high dimensional and training data set i not big enough to develop corresponding nonlinear model. Moreover, SVM use only support vectors to derive decision boundaries. It opens a way to sampling optimization, estimation of noise in data, quantification of data redundancy etc. Presentation of SVM for spatially distributed data is given in (Kanevski and Maignan 2004).

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In groundwater applications, Monte Carlo methods are employed to model the uncertainty on geological parameters. However, their brute-force application becomes computationally prohibitive for highly detailed geological descriptions, complex physical processes, and a large number of realizations. The Distance Kernel Method (DKM) overcomes this issue by clustering the realizations in a multidimensional space based on the flow responses obtained by means of an approximate (computationally cheaper) model; then, the uncertainty is estimated from the exact responses that are computed only for one representative realization per cluster (the medoid). Usually, DKM is employed to decrease the size of the sample of realizations that are considered to estimate the uncertainty. We propose to use the information from the approximate responses for uncertainty quantification. The subset of exact solutions provided by DKM is then employed to construct an error model and correct the potential bias of the approximate model. Two error models are devised that both employ the difference between approximate and exact medoid solutions, but differ in the way medoid errors are interpolated to correct the whole set of realizations. The Local Error Model rests upon the clustering defined by DKM and can be seen as a natural way to account for intra-cluster variability; the Global Error Model employs a linear interpolation of all medoid errors regardless of the cluster to which the single realization belongs. These error models are evaluated for an idealized pollution problem in which the uncertainty of the breakthrough curve needs to be estimated. For this numerical test case, we demonstrate that the error models improve the uncertainty quantification provided by the DKM algorithm and are effective in correcting the bias of the estimate computed solely from the MsFV results. The framework presented here is not specific to the methods considered and can be applied to other combinations of approximate models and techniques to select a subset of realizations

Relevância:

30.00% 30.00%

Publicador:

Resumo:

1. Identifying those areas suitable for recolonization by threatened species is essential to support efficient conservation policies. Habitat suitability models (HSM) predict species' potential distributions, but the quality of their predictions should be carefully assessed when the species-environment equilibrium assumption is violated.2. We studied the Eurasian otter Lutra lutra, whose numbers are recovering in southern Italy. To produce widely applicable results, we chose standard HSM procedures and looked for the models' capacities in predicting the suitability of a recolonization area. We used two fieldwork datasets: presence-only data, used in the Ecological Niche Factor Analyses (ENFA), and presence-absence data, used in a Generalized Linear Model (GLM). In addition to cross-validation, we independently evaluated the models with data from a recolonization event, providing presences on a previously unoccupied river.3. Three of the models successfully predicted the suitability of the recolonization area, but the GLM built with data before the recolonization disagreed with these predictions, missing the recolonized river's suitability and badly describing the otter's niche. Our results highlighted three points of relevance to modelling practices: (1) absences may prevent the models from correctly identifying areas suitable for a species spread; (2) the selection of variables may lead to randomness in the predictions; and (3) the Area Under Curve (AUC), a commonly used validation index, was not well suited to the evaluation of model quality, whereas the Boyce Index (CBI), based on presence data only, better highlighted the models' fit to the recolonization observations.4. For species with unstable spatial distributions, presence-only models may work better than presence-absence methods in making reliable predictions of suitable areas for expansion. An iterative modelling process, using new occurrences from each step of the species spread, may also help in progressively reducing errors.5. Synthesis and applications. Conservation plans depend on reliable models of the species' suitable habitats. In non-equilibrium situations, such as the case for threatened or invasive species, models could be affected negatively by the inclusion of absence data when predicting the areas of potential expansion. Presence-only methods will here provide a better basis for productive conservation management practices.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Swiss death certification data over the period 1951-1984 for total cancer mortality and 30 major cancer sites in the population aged 25 to 74 years were analysed using a log-linear Poisson model with arbitrary constraints on the parameters to isolate the effects of birth cohort, calendar period of death and age. The overall pattern of total cancer mortality in males was stable for period values and showed some moderate decreases in cohort values restricted to the generations born after 1930. Cancer mortality trends were more favourable in females, with steady, though moderate, declines in both cohort and period values. According to the estimates from the model, the worst affected generation for male lung cancer was that born around 1910, and a flattening of trends or some moderate decline was observed for more recent cohorts, although this decline was considerably more limited than in other European countries. There were decreases in cohort and period values for stomach, intestine and oesophageal cancer in both sexes and (cervix) uteri in females. Increases were observed in both cohort and period trends for pancreas and liver in males and for several other neoplasms, including prostate, brain, leukaemias and lymphomas, restricted, however, for the latter sites, to the earlier cohorts and hence partly attributable to improved diagnosis and certification in the elderly. Although age values for lung cancer in females were around 10-times lower than in males, upward trends in female lung cancer cohort values were observed in subsequent cohorts and for period values from the late 1960's onwards. Therefore, future trends in female lung cancer mortality should continue to be monitored. The application of these age/period/cohort models thus provides a summary guide for the reading and interpretation of cancer mortality trends, although it cannot replace careful inspection of single age-specific rates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Developmental constraints have been postulated to limit the space of feasible phenotypes and thus shape animal evolution. These constraints have been suggested to be the strongest during either early or mid-embryogenesis, which corresponds to the early conservation model or the hourglass model, respectively. Conflicting results have been reported, but in recent studies of animal transcriptomes the hourglass model has been favored. Studies usually report descriptive statistics calculated for all genes over all developmental time points. This introduces dependencies between the sets of compared genes and may lead to biased results. Here we overcome this problem using an alternative modular analysis. We used the Iterative Signature Algorithm to identify distinct modules of genes co-expressed specifically in consecutive stages of zebrafish development. We then performed a detailed comparison of several gene properties between modules, allowing for a less biased and more powerful analysis. Notably, our analysis corroborated the hourglass pattern at the regulatory level, with sequences of regulatory regions being most conserved for genes expressed in mid-development but not at the level of gene sequence, age, or expression, in contrast to some previous studies. The early conservation model was supported with gene duplication and birth that were the most rare for genes expressed in early development. Finally, for all gene properties, we observed the least conservation for genes expressed in late development or adult, consistent with both models. Overall, with the modular approach, we showed that different levels of molecular evolution follow different patterns of developmental constraints. Thus both models are valid, but with respect to different genomic features.