947 resultados para genetics, statistical genetics, variable models


Relevância:

100.00% 100.00%

Publicador:

Resumo:

An attractive way to improve our understanding of sex determination evolution is to study the underlying mechanisms in closely related species and in a phylogenetic perspective. Hymenopterans are well suited owing to the diverse sex determination mechanisms, including different types of Complementary Sex Determination (CSD) and maternal control sex determination. We investigated different types of CSD in four species within the braconid wasp genus Asobara that exhibit diverse life-history traits. Nine to thirteen generations of inbreeding were monitored for diploid male production, brood size, offspring sex ratio, and pupal mortality as indicators for CSD. In addition, simulation models were developed to compare these observations to predicted patterns for multilocus CSD with up to ten loci. The inbreeding regime did not result in diploid male production, decreased brood sizes, substantially increased offspring sex ratios nor in increased pupal mortality. The simulations further allowed us to reject CSD with up to ten loci, which is a strong refutation of the multilocus CSD model. We discuss how the absence of CSD can be reconciled with the variation in life-history traits among Asobara species, and the ramifications for the phylogenetic distribution of sex determination mechanisms in the Hymenoptera.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, Species Distribution Models (SDMs) are a widely used tool. Using different statistical approaches these models reconstruct the realized niche of a species using presence data and a set of variables, often topoclimatic. There utilization range is quite large from understanding single species requirements, to the creation of nature reserve based on species hotspots, or modeling of climate change impact, etc... Most of the time these models are using variables at a resolution of 50km x 50km or 1 km x 1 km. However in some cases these models are used with resolutions below the kilometer scale and thus called high resolution models (100 m x 100 m or 25 m x 25 m). Quite recently a new kind of data has emerged enabling precision up to lm x lm and thus allowing very high resolution modeling. However these new variables are very costly and need an important amount of time to be processed. This is especially the case when these variables are used in complex calculation like models projections over large areas. Moreover the importance of very high resolution data in SDMs has not been assessed yet and is not well understood. Some basic knowledge on what drive species presence-absences is still missing. Indeed, it is not clear whether in mountain areas like the Alps coarse topoclimatic gradients are driving species distributions or if fine scale temperature or topography are more important or if their importance can be neglected when balance to competition or stochasticity. In this thesis I investigated the importance of very high resolution data (2-5m) in species distribution models using either very high resolution topographic, climatic or edaphic variables over a 2000m elevation gradient in the Western Swiss Alps. I also investigated more local responses of these variables for a subset of species living in this area at two precise elvation belts. During this thesis I showed that high resolution data necessitates very good datasets (species and variables for the models) to produce satisfactory results. Indeed, in mountain areas, temperature is the most important factor driving species distribution and needs to be modeled at very fine resolution instead of being interpolated over large surface to produce satisfactory results. Despite the instinctive idea that topographic should be very important at high resolution, results are mitigated. However looking at the importance of variables over a large gradient buffers the importance of the variables. Indeed topographic factors have been shown to be highly important at the subalpine level but their importance decrease at lower elevations. Wether at the mountane level edaphic and land use factors are more important high resolution topographic data is more imporatant at the subalpine level. Finally the biggest improvement in the models happens when edaphic variables are added. Indeed, adding soil variables is of high importance and variables like pH are overpassing the usual topographic variables in SDMs in term of importance in the models. To conclude high resolution is very important in modeling but necessitate very good datasets. Only increasing the resolution of the usual topoclimatic predictors is not sufficient and the use of edaphic predictors has been highlighted as fundamental to produce significantly better models. This is of primary importance, especially if these models are used to reconstruct communities or as basis for biodiversity assessments. -- Ces dernières années, l'utilisation des modèles de distribution d'espèces (SDMs) a continuellement augmenté. Ces modèles utilisent différents outils statistiques afin de reconstruire la niche réalisée d'une espèce à l'aide de variables, notamment climatiques ou topographiques, et de données de présence récoltées sur le terrain. Leur utilisation couvre de nombreux domaines allant de l'étude de l'écologie d'une espèce à la reconstruction de communautés ou à l'impact du réchauffement climatique. La plupart du temps, ces modèles utilisent des occur-rences issues des bases de données mondiales à une résolution plutôt large (1 km ou même 50 km). Certaines bases de données permettent cependant de travailler à haute résolution, par conséquent de descendre en dessous de l'échelle du kilomètre et de travailler avec des résolutions de 100 m x 100 m ou de 25 m x 25 m. Récemment, une nouvelle génération de données à très haute résolution est apparue et permet de travailler à l'échelle du mètre. Les variables qui peuvent être générées sur la base de ces nouvelles données sont cependant très coûteuses et nécessitent un temps conséquent quant à leur traitement. En effet, tout calcul statistique complexe, comme des projections de distribution d'espèces sur de larges surfaces, demande des calculateurs puissants et beaucoup de temps. De plus, les facteurs régissant la distribution des espèces à fine échelle sont encore mal connus et l'importance de variables à haute résolution comme la microtopographie ou la température dans les modèles n'est pas certaine. D'autres facteurs comme la compétition ou la stochasticité naturelle pourraient avoir une influence toute aussi forte. C'est dans ce contexte que se situe mon travail de thèse. J'ai cherché à comprendre l'importance de la haute résolution dans les modèles de distribution d'espèces, que ce soit pour la température, la microtopographie ou les variables édaphiques le long d'un important gradient d'altitude dans les Préalpes vaudoises. J'ai également cherché à comprendre l'impact local de certaines variables potentiellement négligées en raison d'effets confondants le long du gradient altitudinal. Durant cette thèse, j'ai pu monter que les variables à haute résolution, qu'elles soient liées à la température ou à la microtopographie, ne permettent qu'une amélioration substantielle des modèles. Afin de distinguer une amélioration conséquente, il est nécessaire de travailler avec des jeux de données plus importants, tant au niveau des espèces que des variables utilisées. Par exemple, les couches climatiques habituellement interpolées doivent être remplacées par des couches de température modélisées à haute résolution sur la base de données de terrain. Le fait de travailler le long d'un gradient de température de 2000m rend naturellement la température très importante au niveau des modèles. L'importance de la microtopographie est négligeable par rapport à la topographie à une résolution de 25m. Cependant, lorsque l'on regarde à une échelle plus locale, la haute résolution est une variable extrêmement importante dans le milieu subalpin. À l'étage montagnard par contre, les variables liées aux sols et à l'utilisation du sol sont très importantes. Finalement, les modèles de distribution d'espèces ont été particulièrement améliorés par l'addition de variables édaphiques, principalement le pH, dont l'importance supplante ou égale les variables topographique lors de leur ajout aux modèles de distribution d'espèces habituels.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although exceptions may be readily identified, two generalizations concerning genetic differences among species may be drawn from the available allozyme and chromosome data. First, structural gene differences among species vary widely. In many cases, species pairs do not differ more than intraspecific populations. This suggests that either very few or no gene substitutions are required to produce barriers to reproduction (Avise 1976). Second, chromosome form and/or number differs among even closely related species (White 1963; 1978; Fredga 1977; Wright 1970). Many of the observed chromosomal differences involve translocational rearrangements; these produce severe fitness depression in heterozygotes and were, thus, long considered unlikely candidates for the fixation required of genetic changes leading to speciation (Wright 1977). Nonetheless, the fact that species differences are frequently translocational argues convincingly for their fixation despite prejudices to the contrary. Haldane's rule states that in the F of interspecific crosses, the heterogametic sex is absent or sterile in the preponderance of cases (Haldane 1932). This rule definitely applies in the genus Dr°sophila (Ehrman 1962). Sex chromosome translocations do not impose a fitness depression as severe as that imposed by autosomal translocations, and X-Y translocations may account for Haldane's rule (Haldane 1932). Consequently a study of the fit ness parameters of an X·yL and a yS chromosome in Drosophila melanogaster populations was initiated by Tracey (1972). Preliminary results suggested that x.yL//YSmales enjoyed a mating advantage with X·yL//X·yL females, that this advantage was frequency dependent, that the translocation produced sexual isolation and that interactions between the yL, yS and a yellow marker contributed to the observed isolation (Tracey and Espinet 1976; Espinet and Tracey 1976). Encouraged by the results of these prelimimary studies, further experiments were performed to clarify the genetic nature of the observed sexual isolation, S the reality of the y frequency dependent fitness .and the behavioural changes, if any, produced by the translocation. The results of this work are reported herein. Although the marker genes used in earlier studies, sparkling poliert an d yellow have both been found to affect activity,but only yellow effects asymmetric sexual isolation. In addition yellow effects isolation through an interaction with the T(X-y) chromosomes, yS also effects isolation, and translocational strains are isolated from those of normal karyotype in the absence of marker gene differences. When yS chromosomes are in competition with y chromosomes on an X.yL background, yS males are at a distinct advantage only when their frequency is less than 97%. The sex chromosome translocation alters the normal courtship pattern by the incorporation of circling between vibration and licking in the male repertoire. Finally a model of speciation base on the fixation of this sex chromosome translocation in a geographically isolated gene pool is proposed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper analyzes the measure of systemic importance ∆CoV aR proposed by Adrian and Brunnermeier (2009, 2010) within the context of a similar class of risk measures used in the risk management literature. In addition, we develop a series of testing procedures, based on ∆CoV aR, to identify and rank the systemically important institutions. We stress the importance of statistical testing in interpreting the measure of systemic importance. An empirical application illustrates the testing procedures, using equity data for three European banks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A significant challenge in the prediction of climate change impacts on ecosystems and biodiversity is quantifying the sources of uncertainty that emerge within and between different models. Statistical species niche models have grown in popularity, yet no single best technique has been identified reflecting differing performance in different situations. Our aim was to quantify uncertainties associated with the application of 2 complimentary modelling techniques. Generalised linear mixed models (GLMM) and generalised additive mixed models (GAMM) were used to model the realised niche of ombrotrophic Sphagnum species in British peatlands. These models were then used to predict changes in Sphagnum cover between 2020 and 2050 based on projections of climate change and atmospheric deposition of nitrogen and sulphur. Over 90% of the variation in the GLMM predictions was due to niche model parameter uncertainty, dropping to 14% for the GAMM. After having covaried out other factors, average variation in predicted values of Sphagnum cover across UK peatlands was the next largest source of variation (8% for the GLMM and 86% for the GAMM). The better performance of the GAMM needs to be weighed against its tendency to overfit the training data. While our niche models are only a first approximation, we used them to undertake a preliminary evaluation of the relative importance of climate change and nitrogen and sulphur deposition and the geographic locations of the largest expected changes in Sphagnum cover. Predicted changes in cover were all small (generally <1% in an average 4 m2 unit area) but also highly uncertain. Peatlands expected to be most affected by climate change in combination with atmospheric pollution were Dartmoor, Brecon Beacons and the western Lake District.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Accurate decadal climate predictions could be used to inform adaptation actions to a changing climate. The skill of such predictions from initialised dynamical global climate models (GCMs) may be assessed by comparing with predictions from statistical models which are based solely on historical observations. This paper presents two benchmark statistical models for predicting both the radiatively forced trend and internal variability of annual mean sea surface temperatures (SSTs) on a decadal timescale based on the gridded observation data set HadISST. For both statistical models, the trend related to radiative forcing is modelled using a linear regression of SST time series at each grid box on the time series of equivalent global mean atmospheric CO2 concentration. The residual internal variability is then modelled by (1) a first-order autoregressive model (AR1) and (2) a constructed analogue model (CA). From the verification of 46 retrospective forecasts with start years from 1960 to 2005, the correlation coefficient for anomaly forecasts using trend with AR1 is greater than 0.7 over parts of extra-tropical North Atlantic, the Indian Ocean and western Pacific. This is primarily related to the prediction of the forced trend. More importantly, both CA and AR1 give skillful predictions of the internal variability of SSTs in the subpolar gyre region over the far North Atlantic for lead time of 2 to 5 years, with correlation coefficients greater than 0.5. For the subpolar gyre and parts of the South Atlantic, CA is superior to AR1 for lead time of 6 to 9 years. These statistical forecasts are also compared with ensemble mean retrospective forecasts by DePreSys, an initialised GCM. DePreSys is found to outperform the statistical models over large parts of North Atlantic for lead times of 2 to 5 years and 6 to 9 years, however trend with AR1 is generally superior to DePreSys in the North Atlantic Current region, while trend with CA is superior to DePreSys in parts of South Atlantic for lead time of 6 to 9 years. These findings encourage further development of benchmark statistical decadal prediction models, and methods to combine different predictions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Logistic models are studied as a tool to convert dynamical forecast information (deterministic and ensemble) into probability forecasts. A logistic model is obtained by setting the logarithmic odds ratio equal to a linear combination of the inputs. As with any statistical model, logistic models will suffer from overfitting if the number of inputs is comparable to the number of forecast instances. Computational approaches to avoid overfitting by regularization are discussed, and efficient techniques for model assessment and selection are presented. A logit version of the lasso (originally a linear regression technique), is discussed. In lasso models, less important inputs are identified and the corresponding coefficient is set to zero, providing an efficient and automatic model reduction procedure. For the same reason, lasso models are particularly appealing for diagnostic purposes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The usual practice in using a control chart to monitor a process is to take samples of size n from the process every h hours. This article considers the properties of the X̄ chart when the size of each sample depends on what is observed in the preceding sample. The idea is that the sample should be large if the sample point of the preceding sample is close to but not actually outside the control limits and small if the sample point is close to the target. The properties of the variable sample size (VSS) X̄ chart are obtained using Markov chains. The VSS X̄ chart is substantially quicker than the traditional X̄ chart in detecting moderate shifts in the process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent theoretical studies have shown that the X̄ chart with variable sampling intervals (VSI) and the X̄ chart with variable sample size (VSS) are quicker than the traditional X̄ chart in detecting shifts in the process. This article considers the X̄ chart with variable sample size and sampling intervals (VSSI). It is assumed that the amount of time the process remains in control has exponential distribution. The properties of the VSSI X̄ chart are obtained using Markov chains. The VSSI X̄ chart is even quicker than the VSI or VSS X̄ charts in detecting moderate shifts in the process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent studies indicate that polymorphic genetic markers are potentially helpful in resolving genealogical relationships among individuals in a natural population. Genetic data provide opportunities for paternity exclusion when genotypic incompatibilities are observed among individuals, and the present investigation examines the resolving power of genetic markers in unambiguous positive determination of paternity. Under the assumption that the mother for each offspring in a population is unambiguously known, an analytical expression for the fraction of males excluded from paternity is derived for the case where males and females may be derived from two different gene pools. This theoretical formulation can also be used to predict the fraction of births for each of which all but one male can be excluded from paternity. We show that even when the average probability of exclusion approaches unity, a substantial fraction of births yield equivocal mother-father-offspring determinations. The number of loci needed to increase the frequency of unambiguous determinations to a high level is beyond the scope of current electrophoretic studies in most species. Applications of this theory to electrophoretic data on Chamaelirium luteum (L.) shows that in 2255 offspring derived from 273 males and 70 females, only 57 triplets could be unequivocally determined with eight polymorphic protein loci, even though the average combined exclusionary power of these loci was 73%. The distribution of potentially compatible male parents, based on multilocus genotypes, was reasonably well predicted from the allele frequency data available for these loci. We demonstrate that genetic paternity analysis in natural populations cannot be reliably based on exclusionary principles alone. In order to measure the reproductive contributions of individuals in natural populations, more elaborate likelihood principles must be deployed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, disaster preparedness through assessment of medical and special needs persons (MSNP) has taken a center place in public eye in effect of frequent natural disasters such as hurricanes, storm surge or tsunami due to climate change and increased human activity on our planet. Statistical methods complex survey design and analysis have equally gained significance as a consequence. However, there exist many challenges still, to infer such assessments over the target population for policy level advocacy and implementation. ^ Objective. This study discusses the use of some of the statistical methods for disaster preparedness and medical needs assessment to facilitate local and state governments for its policy level decision making and logistic support to avoid any loss of life and property in future calamities. ^ Methods. In order to obtain precise and unbiased estimates for Medical Special Needs Persons (MSNP) and disaster preparedness for evacuation in Rio Grande Valley (RGV) of Texas, a stratified and cluster-randomized multi-stage sampling design was implemented. US School of Public Health, Brownsville surveyed 3088 households in three counties namely Cameron, Hidalgo, and Willacy. Multiple statistical methods were implemented and estimates were obtained taking into count probability of selection and clustering effects. Statistical methods for data analysis discussed were Multivariate Linear Regression (MLR), Survey Linear Regression (Svy-Reg), Generalized Estimation Equation (GEE) and Multilevel Mixed Models (MLM) all with and without sampling weights. ^ Results. Estimated population for RGV was 1,146,796. There were 51.5% female, 90% Hispanic, 73% married, 56% unemployed and 37% with their personal transport. 40% people attained education up to elementary school, another 42% reaching high school and only 18% went to college. Median household income is less than $15,000/year. MSNP estimated to be 44,196 (3.98%) [95% CI: 39,029; 51,123]. All statistical models are in concordance with MSNP estimates ranging from 44,000 to 48,000. MSNP estimates for statistical methods are: MLR (47,707; 95% CI: 42,462; 52,999), MLR with weights (45,882; 95% CI: 39,792; 51,972), Bootstrap Regression (47,730; 95% CI: 41,629; 53,785), GEE (47,649; 95% CI: 41,629; 53,670), GEE with weights (45,076; 95% CI: 39,029; 51,123), Svy-Reg (44,196; 95% CI: 40,004; 48,390) and MLM (46,513; 95% CI: 39,869; 53,157). ^ Conclusion. RGV is a flood zone, most susceptible to hurricanes and other natural disasters. People in the region are mostly Hispanic, under-educated with least income levels in the U.S. In case of any disaster people in large are incapacitated with only 37% have their personal transport to take care of MSNP. Local and state government’s intervention in terms of planning, preparation and support for evacuation is necessary in any such disaster to avoid loss of precious human life. ^ Key words: Complex Surveys, statistical methods, multilevel models, cluster randomized, sampling weights, raking, survey regression, generalized estimation equations (GEE), random effects, Intracluster correlation coefficient (ICC).^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visualization has proven to be a powerful and widely-applicable tool the analysis and interpretation of data. Most visualization algorithms aim to find a projection from the data space down to a two-dimensional visualization space. However, for complex data sets living in a high-dimensional space it is unlikely that a single two-dimensional projection can reveal all of the interesting structure. We therefore introduce a hierarchical visualization algorithm which allows the complete data set to be visualized at the top level, with clusters and sub-clusters of data points visualized at deeper levels. The algorithm is based on a hierarchical mixture of latent variable models, whose parameters are estimated using the expectation-maximization algorithm. We demonstrate the principle of the approach first on a toy data set, and then apply the algorithm to the visualization of a synthetic data set in 12 dimensions obtained from a simulation of multi-phase flows in oil pipelines and to data in 36 dimensions derived from satellite images.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces a new technique in the investigation of limited-dependent variable models. This paper illustrates that variable precision rough set theory (VPRS), allied with the use of a modern method of classification, or discretisation of data, can out-perform the more standard approaches that are employed in economics, such as a probit model. These approaches and certain inductive decision tree methods are compared (through a Monte Carlo simulation approach) in the analysis of the decisions reached by the UK Monopolies and Mergers Committee. We show that, particularly in small samples, the VPRS model can improve on more traditional models, both in-sample, and particularly in out-of-sample prediction. A similar improvement in out-of-sample prediction over the decision tree methods is also shown.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of social diffusion has animated sociological thinking on topics ranging from the spread of an idea, an innovation or a disease, to the foundations of collective behavior and political polarization. While network diffusion has been a productive metaphor, the reality of diffusion processes is often muddier. Ideas and innovations diffuse differently from diseases, but, with a few exceptions, the diffusion of ideas and innovations has been modeled under the same assumptions as the diffusion of disease. In this dissertation, I develop two new diffusion models for "socially meaningful" contagions that address two of the most significant problems with current diffusion models: (1) that contagions can only spread along observed ties, and (2) that contagions do not change as they spread between people. I augment insights from these statistical and simulation models with an analysis of an empirical case of diffusion - the use of enterprise collaboration software in a large technology company. I focus the empirical study on when people abandon innovations, a crucial, and understudied aspect of the diffusion of innovations. Using timestamped posts, I analyze when people abandon software to a high degree of detail.

To address the first problem, I suggest a latent space diffusion model. Rather than treating ties as stable conduits for information, the latent space diffusion model treats ties as random draws from an underlying social space, and simulates diffusion over the social space. Theoretically, the social space model integrates both actor ties and attributes simultaneously in a single social plane, while incorporating schemas into diffusion processes gives an explicit form to the reciprocal influences that cognition and social environment have on each other. Practically, the latent space diffusion model produces statistically consistent diffusion estimates where using the network alone does not, and the diffusion with schemas model shows that introducing some cognitive processing into diffusion processes changes the rate and ultimate distribution of the spreading information. To address the second problem, I suggest a diffusion model with schemas. Rather than treating information as though it is spread without changes, the schema diffusion model allows people to modify information they receive to fit an underlying mental model of the information before they pass the information to others. Combining the latent space models with a schema notion for actors improves our models for social diffusion both theoretically and practically.

The empirical case study focuses on how the changing value of an innovation, introduced by the innovations' network externalities, influences when people abandon the innovation. In it, I find that people are least likely to abandon an innovation when other people in their neighborhood currently use the software as well. The effect is particularly pronounced for supervisors' current use and number of supervisory team members who currently use the software. This case study not only points to an important process in the diffusion of innovation, but also suggests a new approach -- computerized collaboration systems -- to collecting and analyzing data on organizational processes.