35 resultados para Statistical parameters
Resumo:
This thesis presents methods for locating and analyzing cis-regulatory DNA elements involved with the regulation of gene expression in multicellular organisms. The regulation of gene expression is carried out by the combined effort of several transcription factor proteins collectively binding the DNA on the cis-regulatory elements. Only sparse knowledge of the 'genetic code' of these elements exists today. An automatic tool for discovery of putative cis-regulatory elements could help their experimental analysis, which would result in a more detailed view of the cis-regulatory element structure and function. We have developed a computational model for the evolutionary conservation of cis-regulatory elements. The elements are modeled as evolutionarily conserved clusters of sequence-specific transcription factor binding sites. We give an efficient dynamic programming algorithm that locates the putative cis-regulatory elements and scores them according to the conservation model. A notable proportion of the high-scoring DNA sequences show transcriptional enhancer activity in transgenic mouse embryos. The conservation model includes four parameters whose optimal values are estimated with simulated annealing. With good parameter values the model discriminates well between the DNA sequences with evolutionarily conserved cis-regulatory elements and the DNA sequences that have evolved neutrally. In further inquiry, the set of highest scoring putative cis-regulatory elements were found to be sensitive to small variations in the parameter values. The statistical significance of the putative cis-regulatory elements is estimated with the Two Component Extreme Value Distribution. The p-values grade the conservation of the cis-regulatory elements above the neutral expectation. The parameter values for the distribution are estimated by simulating the neutral DNA evolution. The conservation of the transcription factor binding sites can be used in the upstream analysis of regulatory interactions. This approach may provide mechanistic insight to the transcription level data from, e.g., microarray experiments. Here we give a method to predict shared transcriptional regulators for a set of co-expressed genes. The EEL (Enhancer Element Locator) software implements the method for locating putative cis-regulatory elements. The software facilitates both interactive use and distributed batch processing. We have used it to analyze the non-coding regions around all human genes with respect to the orthologous regions in various other species including mouse. The data from these genome-wide analyzes is stored in a relational database which is used in the publicly available web services for upstream analysis and visualization of the putative cis-regulatory elements in the human genome.
Resumo:
Population dynamics are generally viewed as the result of intrinsic (purely density dependent) and extrinsic (environmental) processes. Both components, and potential interactions between those two, have to be modelled in order to understand and predict dynamics of natural populations; a topic that is of great importance in population management and conservation. This thesis focuses on modelling environmental effects in population dynamics and how effects of potentially relevant environmental variables can be statistically identified and quantified from time series data. Chapter I presents some useful models of multiplicative environmental effects for unstructured density dependent populations. The presented models can be written as standard multiple regression models that are easy to fit to data. Chapters II IV constitute empirical studies that statistically model environmental effects on population dynamics of several migratory bird species with different life history characteristics and migration strategies. In Chapter II, spruce cone crops are found to have a strong positive effect on the population growth of the great spotted woodpecker (Dendrocopos major), while cone crops of pine another important food resource for the species do not effectively explain population growth. The study compares rate- and ratio-dependent effects of cone availability, using state-space models that distinguish between process and observation error in the time series data. Chapter III shows how drought, in combination with settling behaviour during migration, produces asymmetric spatially synchronous patterns of population dynamics in North American ducks (genus Anas). Chapter IV investigates the dynamics of a Finnish population of skylark (Alauda arvensis), and point out effects of rainfall and habitat quality on population growth. Because the skylark time series and some of the environmental variables included show strong positive autocorrelation, the statistical significances are calculated using a Monte Carlo method, where random autocorrelated time series are generated. Chapter V is a simulation-based study, showing that ignoring observation error in analyses of population time series data can bias the estimated effects and measures of uncertainty, if the environmental variables are autocorrelated. It is concluded that the use of state-space models is an effective way to reach more accurate results. In summary, there are several biological assumptions and methodological issues that can affect the inferential outcome when estimating environmental effects from time series data, and that therefore need special attention. The functional form of the environmental effects and potential interactions between environment and population density are important to deal with. Other issues that should be considered are assumptions about density dependent regulation, modelling potential observation error, and when needed, accounting for spatial and/or temporal autocorrelation.
Resumo:
The Baltic Sea is a geologically young, large brackish water basin, and few of the species living there have fully adapted to its special conditions. Many of the species live on the edge of their distribution range in terms of one or more environmental variables such as salinity or temperature. Environmental fluctuations are know to cause fluctuations in populations abundance, and this effect is especially strong near the edges of the distribution range, where even small changes in an environmental variable can be critical to the success of a species. This thesis examines which environmental factors are the most important in relation to the success of various commercially exploited fish species in the northern Baltic Sea. It also examines the uncertainties related to fish stocks current and potential status as well as to their relationship with their environment. The aim is to quantify the uncertainties related to fisheries and environmental management, to find potential management strategies that can be used to reduce uncertainty in management results and to develop methodology related to uncertainty estimation in natural resources management. Bayesian statistical methods are utilized due to their ability to treat uncertainty explicitly in all parts of the statistical model. The results show that uncertainty about important parameters of even the most intensively studied fish species such as salmon (Salmo salar L.) and Baltic herring (Clupea harengus membras L.) is large. On the other hand, management approaches that reduce uncertainty can be found. These include utilising information about ecological similarity of fish stocks and species, and using management variables that are directly related to stock parameters that can be measured easily and without extrapolations or assumptions.
Resumo:
In this thesis, the solar wind-magnetosphere-ionosphere coupling is studied observationally, with the main focus on the ionospheric currents in the auroral region. The thesis consists of five research articles and an introductory part that summarises the most important results reached in the articles and places them in a wider context within the field of space physics. Ionospheric measurements are provided by the International Monitor for Auroral Geomagnetic Effects (IMAGE) magnetometer network, by the low-orbit CHAllenging Minisatellite Payload (CHAMP) satellite, by the European Incoherent SCATter (EISCAT) radar, and by the Imager for Magnetopause-to-Aurora Global Exploration (IMAGE) satellite. Magnetospheric observations, on the other hand, are acquired from the four spacecraft of the Cluster mission, and solar wind observations from the Advanced Composition Explorer (ACE) and Wind spacecraft. Within the framework of this study, a new method for determining the ionospheric currents from low-orbit satellite-based magnetic field data is developed. In contrast to previous techniques, all three current density components can be determined on a matching spatial scale, and the validity of the necessary one-dimensionality approximation, and thus, the quality of the results, can be estimated directly from the data. The new method is applied to derive an empirical model for estimating the Hall-to-Pedersen conductance ratio from ground-based magnetic field data, and to investigate the statistical dependence of the large-scale ionospheric currents on solar wind and geomagnetic parameters. Equations describing the amount of field-aligned current in the auroral region, as well as the location of the auroral electrojets, as a function of these parameters are derived. Moreover, the mesoscale (10-1000 km) ionospheric equivalent currents related to two magnetotail plasma sheet phenomena, bursty bulk flows and flux ropes, are studied. Based on the analysis of 22 events, the typical equivalent current pattern related to bursty bulk flows is established. For the flux ropes, on the other hand, only two conjugate events are found. As the equivalent current patterns during these two events are not similar, it is suggested that the ionospheric signatures of a flux rope depend on the orientation and the length of the structure, but analysis of additional events is required to determine the possible ionospheric connection of flux ropes.
Resumo:
An efficient and statistically robust solution for the identification of asteroids among numerous sets of astrometry is presented. In particular, numerical methods have been developed for the short-term identification of asteroids at discovery, and for the long-term identification of scarcely observed asteroids over apparitions, a task which has been lacking a robust method until now. The methods are based on the solid foundation of statistical orbital inversion properly taking into account the observational uncertainties, which allows for the detection of practically all correct identifications. Through the use of dimensionality-reduction techniques and efficient data structures, the exact methods have a loglinear, that is, O(nlog(n)), computational complexity, where n is the number of included observation sets. The methods developed are thus suitable for future large-scale surveys which anticipate a substantial increase in the astrometric data rate. Due to the discontinuous nature of asteroid astrometry, separate sets of astrometry must be linked to a common asteroid from the very first discovery detections onwards. The reason for the discontinuity in the observed positions is the rotation of the observer with the Earth as well as the motion of the asteroid and the observer about the Sun. Therefore, the aim of identification is to find a set of orbital elements that reproduce the observed positions with residuals similar to the inevitable observational uncertainty. Unless the astrometric observation sets are linked, the corresponding asteroid is eventually lost as the uncertainty of the predicted positions grows too large to allow successful follow-up. Whereas the presented identification theory and the numerical comparison algorithm are generally applicable, that is, also in fields other than astronomy (e.g., in the identification of space debris), the numerical methods developed for asteroid identification can immediately be applied to all objects on heliocentric orbits with negligible effects due to non-gravitational forces in the time frame of the analysis. The methods developed have been successfully applied to various identification problems. Simulations have shown that the methods developed are able to find virtually all correct linkages despite challenges such as numerous scarce observation sets, astrometric uncertainty, numerous objects confined to a limited region on the celestial sphere, long linking intervals, and substantial parallaxes. Tens of previously unknown main-belt asteroids have been identified with the short-term method in a preliminary study to locate asteroids among numerous unidentified sets of single-night astrometry of moving objects, and scarce astrometry obtained nearly simultaneously with Earth-based and space-based telescopes has been successfully linked despite a substantial parallax. Using the long-term method, thousands of realistic 3-linkages typically spanning several apparitions have so far been found among designated observation sets each spanning less than 48 hours.
Resumo:
Knowledge of the physical properties of asteroids is crucial in many branches of solar-system research. Knowledge of the spin states and shapes is needed, e.g., for accurate orbit determination and to study the history and evolution of the asteroids. In my thesis, I present new methods for using photometric lightcurves of asteroids in the determination of their spin states and shapes. The convex inversion method makes use of a general polyhedron shape model and provides us at best with an unambiguous spin solution and a convex shape solution that reproduces the main features of the original shape. Deriving information about the non-convex shape features is, in principle, also possible, but usually requires a priori information about the object. Alternatively, a distribution of non-convex solutions, describing the scale of the non-convexities, is also possible to be obtained. Due to insufficient number of absolute observations and inaccurately defined asteroid phase curves, the $c/b$-ratio, i.e., the flatness of the shape model is often somewhat ill-defined. However, especially in the case of elongated objects, the flatness seems to be quite well constrained, even in the case when only relative lightcurves are available. The results prove that it is, contrary to the earlier misbelief, possible to derive shape information from the lightcurve data if a sufficiently wide range of observing geometries is covered by the observations. Along with the more accurate shape models, also the rotational states, i.e., spin vectors and rotation periods, are defined with improved accuracy. The shape solutions obtained so far reveal a population of irregular objects whose most descriptive shape characteristics, however, can be expressed with only a few parameters. Preliminary statistical analyses for the shapes suggests that there are correlations between shape and other physical properties, such as the size, rotation period and taxonomic type of the asteroids. More shape data of, especially, the smallest and largest asteroids, as well as the fast and slow rotators is called for in order to be able to study the statistics more thoroughly.
Resumo:
The core aim of machine learning is to make a computer program learn from the experience. Learning from data is usually defined as a task of learning regularities or patterns in data in order to extract useful information, or to learn the underlying concept. An important sub-field of machine learning is called multi-view learning where the task is to learn from multiple data sets or views describing the same underlying concept. A typical example of such scenario would be to study a biological concept using several biological measurements like gene expression, protein expression and metabolic profiles, or to classify web pages based on their content and the contents of their hyperlinks. In this thesis, novel problem formulations and methods for multi-view learning are presented. The contributions include a linear data fusion approach during exploratory data analysis, a new measure to evaluate different kinds of representations for textual data, and an extension of multi-view learning for novel scenarios where the correspondence of samples in the different views or data sets is not known in advance. In order to infer the one-to-one correspondence of samples between two views, a novel concept of multi-view matching is proposed. The matching algorithm is completely data-driven and is demonstrated in several applications such as matching of metabolites between humans and mice, and matching of sentences between documents in two languages.
Resumo:
Bone mass accrual and maintenance are regulated by a complex interplay between genetic and environmental factors. Recent studies have revealed an important role for the low-density lipoprotein receptor-related protein 5 (LRP5) in this process. The aim of this thesis study was to identify novel variants in the LRP5 gene and to further elucidate the association of LRP5 and its variants with various bone health related clinical characteristics. The results of our studies show that loss-of-function mutations in LRP5 cause severe osteoporosis not only in homozygous subjects but also in the carriers of these mutations, who have significantly reduced bone mineral density (BMD) and increased susceptibility to fractures. In addition, we demonstrated for the first time that a common polymorphic LRP5 variant (p.A1330V) was associated with reduced peak bone mass, an important determinant of BMD and osteoporosis in later life. The results from these two studies are concordant with results seen in other studies on LRP5 mutations and in association studies linking genetic variation in LRP5 with BMD and osteoporosis. Several rare LRP5 variants were identified in children with recurrent fractures. Sequencing and multiplex ligation-dependent probe amplification (MLPA) analyses revealed no disease-causing mutations or whole-exon deletions. Our findings from clinical assessments and family-based genotype-phenotype studies suggested that the rare LRP5 variants identified are not the definite cause of fractures in these children. Clinical assessments of our study subjects with LPR5 mutations revealed an unexpectedly high prevalence of impaired glucose tolerance and dyslipidaemia. Moreover, in subsequent studies we discovered that common polymorphic LRP5 variants are associated with unfavorable metabolic characteristics. Changes in lipid profile were already apparent in pre-pubertal children. These results, together with the findings from other studies, suggest an important role for LRP5 also in glucose and lipid metabolism. Our results underscore the important role of LRP5 not only in bone mass accrual and maintenance of skeletal health but also in glucose and lipid metabolism. The role of LRP5 in bone metabolism has long been studied, but further studies with larger study cohorts are still needed to evaluate the specific role of LRP5 variants as metabolic risk factors.
Resumo:
A diffusion/replacement model for new consumer durables designed to be used as a long-term forecasting tool is developed. The model simulates new demand as well as replacement demand over time. The model is called DEMSIM and is built upon a counteractive adoption model specifying the basic forces affecting the adoption behaviour of individual consumers. These forces are the promoting forces and the resisting forces. The promoting forces are further divided into internal and external influences. These influences are operationalized within a multi-segmental diffusion model generating the adoption behaviour of the consumers in each segment as an expected value. This diffusion model is combined with a replacement model built upon the same segmental structure as the diffusion model. This model generates, in turn, the expected replacement behaviour in each segment. To be able to use DEMSIM as a forecasting tool in early stages of a diffusion process estimates of the model parameters are needed as soon as possible after product launch. However, traditional statistical techniques are not very helpful in estimating such parameters in early stages of a diffusion process. To enable early parameter calibration an optimization algorithm is developed by which the main parameters of the diffusion model can be estimated on the basis of very few sales observations. The optimization is carried out in iterative simulation runs. Empirical validations using the optimization algorithm reveal that the diffusion model performs well in early long-term sales forecasts, especially as it comes to the timing of future sales peaks.