127 resultados para statistical lip modelling
em Université de Lausanne, Switzerland
Resumo:
1. Statistical modelling is often used to relate sparse biological survey data to remotely derived environmental predictors, thereby providing a basis for predictively mapping biodiversity across an entire region of interest. The most popular strategy for such modelling has been to model distributions of individual species one at a time. Spatial modelling of biodiversity at the community level may, however, confer significant benefits for applications involving very large numbers of species, particularly if many of these species are recorded infrequently. 2. Community-level modelling combines data from multiple species and produces information on spatial pattern in the distribution of biodiversity at a collective community level instead of, or in addition to, the level of individual species. Spatial outputs from community-level modelling include predictive mapping of community types (groups of locations with similar species composition), species groups (groups of species with similar distributions), axes or gradients of compositional variation, levels of compositional dissimilarity between pairs of locations, and various macro-ecological properties (e.g. species richness). 3. Three broad modelling strategies can be used to generate these outputs: (i) 'assemble first, predict later', in which biological survey data are first classified, ordinated or aggregated to produce community-level entities or attributes that are then modelled in relation to environmental predictors; (ii) 'predict first, assemble later', in which individual species are modelled one at a time as a function of environmental variables, to produce a stack of species distribution maps that is then subjected to classification, ordination or aggregation; and (iii) 'assemble and predict together', in which all species are modelled simultaneously, within a single integrated modelling process. These strategies each have particular strengths and weaknesses, depending on the intended purpose of modelling and the type, quality and quantity of data involved. 4. Synthesis and applications. The potential benefits of modelling large multispecies data sets using community-level, as opposed to species-level, approaches include faster processing, increased power to detect shared patterns of environmental response across rarely recorded species, and enhanced capacity to synthesize complex data into a form more readily interpretable by scientists and decision-makers. Community-level modelling therefore deserves to be considered more often, and more widely, as a potential alternative or supplement to modelling individual species.
Resumo:
The role of land cover change as a significant component of global change has become increasingly recognized in recent decades. Large databases measuring land cover change, and the data which can potentially be used to explain the observed changes, are also becoming more commonly available. When developing statistical models to investigate observed changes, it is important to be aware that the chosen sampling strategy and modelling techniques can influence results. We present a comparison of three sampling strategies and two forms of grouped logistic regression models (multinomial and ordinal) in the investigation of patterns of successional change after agricultural land abandonment in Switzerland. Results indicated that both ordinal and nominal transitional change occurs in the landscape and that the use of different sampling regimes and modelling techniques as investigative tools yield different results. Synthesis and applications. Our multimodel inference identified successfully a set of consistently selected indicators of land cover change, which can be used to predict further change, including annual average temperature, the number of already overgrown neighbouring areas of land and distance to historically destructive avalanche sites. This allows for more reliable decision making and planning with respect to landscape management. Although both model approaches gave similar results, ordinal regression yielded more parsimonious models that identified the important predictors of land cover change more efficiently. Thus, this approach is favourable where land cover change pattern can be interpreted as an ordinal process. Otherwise, multinomial logistic regression is a viable alternative.
Resumo:
Résumé: L'évaluation de l'exposition aux nuisances professionnelles représente une étape importante dans l'analyse de poste de travail. Les mesures directes sont rarement utilisées sur les lieux même du travail et l'exposition est souvent estimée sur base de jugements d'experts. Il y a donc un besoin important de développer des outils simples et transparents, qui puissent aider les spécialistes en hygiène industrielle dans leur prise de décision quant aux niveaux d'exposition. L'objectif de cette recherche est de développer et d'améliorer les outils de modélisation destinés à prévoir l'exposition. Dans un premier temps, une enquête a été entreprise en Suisse parmi les hygiénistes du travail afin d'identifier les besoins (types des résultats, de modèles et de paramètres observables potentiels). Il a été constaté que les modèles d'exposition ne sont guère employés dans la pratique en Suisse, l'exposition étant principalement estimée sur la base de l'expérience de l'expert. De plus, l'émissions de polluants ainsi que leur dispersion autour de la source ont été considérés comme des paramètres fondamentaux. Pour tester la flexibilité et la précision des modèles d'exposition classiques, des expériences de modélisations ont été effectuées dans des situations concrètes. En particulier, des modèles prédictifs ont été utilisés pour évaluer l'exposition professionnelle au monoxyde de carbone et la comparer aux niveaux d'exposition répertoriés dans la littérature pour des situations similaires. De même, l'exposition aux sprays imperméabilisants a été appréciée dans le contexte d'une étude épidémiologique sur une cohorte suisse. Dans ce cas, certains expériences ont été entreprises pour caractériser le taux de d'émission des sprays imperméabilisants. Ensuite un modèle classique à deux-zone a été employé pour évaluer la dispersion d'aérosol dans le champ proche et lointain pendant l'activité de sprayage. D'autres expériences ont également été effectuées pour acquérir une meilleure compréhension des processus d'émission et de dispersion d'un traceur, en se concentrant sur la caractérisation de l'exposition du champ proche. Un design expérimental a été développé pour effectuer des mesures simultanées dans plusieurs points d'une cabine d'exposition, par des instruments à lecture directe. Il a été constaté que d'un point de vue statistique, la théorie basée sur les compartiments est sensée, bien que l'attribution à un compartiment donné ne pourrait pas se faire sur la base des simples considérations géométriques. Dans une étape suivante, des données expérimentales ont été collectées sur la base des observations faites dans environ 100 lieux de travail différents: des informations sur les déterminants observés ont été associées aux mesures d'exposition des informations sur les déterminants observés ont été associé. Ces différentes données ont été employées pour améliorer le modèle d'exposition à deux zones. Un outil a donc été développé pour inclure des déterminants spécifiques dans le choix du compartiment, renforçant ainsi la fiabilité des prévisions. Toutes ces investigations ont servi à améliorer notre compréhension des outils des modélisations ainsi que leurs limitations. L'intégration de déterminants mieux adaptés aux besoins des experts devrait les inciter à employer cet outil dans leur pratique. D'ailleurs, en augmentant la qualité des outils des modélisations, cette recherche permettra non seulement d'encourager leur utilisation systématique, mais elle pourra également améliorer l'évaluation de l'exposition basée sur les jugements d'experts et, par conséquent, la protection de la santé des travailleurs. Abstract Occupational exposure assessment is an important stage in the management of chemical exposures. Few direct measurements are carried out in workplaces, and exposures are often estimated based on expert judgements. There is therefore a major requirement for simple transparent tools to help occupational health specialists to define exposure levels. The aim of the present research is to develop and improve modelling tools in order to predict exposure levels. In a first step a survey was made among professionals to define their expectations about modelling tools (what types of results, models and potential observable parameters). It was found that models are rarely used in Switzerland and that exposures are mainly estimated from past experiences of the expert. Moreover chemical emissions and their dispersion near the source have also been considered as key parameters. Experimental and modelling studies were also performed in some specific cases in order to test the flexibility and drawbacks of existing tools. In particular, models were applied to assess professional exposure to CO for different situations and compared with the exposure levels found in the literature for similar situations. Further, exposure to waterproofing sprays was studied as part of an epidemiological study on a Swiss cohort. In this case, some laboratory investigation have been undertaken to characterize the waterproofing overspray emission rate. A classical two-zone model was used to assess the aerosol dispersion in the near and far field during spraying. Experiments were also carried out to better understand the processes of emission and dispersion for tracer compounds, focusing on the characterization of near field exposure. An experimental set-up has been developed to perform simultaneous measurements through direct reading instruments in several points. It was mainly found that from a statistical point of view, the compartmental theory makes sense but the attribution to a given compartment could ñó~be done by simple geometric consideration. In a further step the experimental data were completed by observations made in about 100 different workplaces, including exposure measurements and observation of predefined determinants. The various data obtained have been used to improve an existing twocompartment exposure model. A tool was developed to include specific determinants in the choice of the compartment, thus largely improving the reliability of the predictions. All these investigations helped improving our understanding of modelling tools and identify their limitations. The integration of more accessible determinants, which are in accordance with experts needs, may indeed enhance model application for field practice. Moreover, while increasing the quality of modelling tool, this research will not only encourage their systematic use, but might also improve the conditions in which the expert judgments take place, and therefore the workers `health protection.
Resumo:
Altitudinal tree lines are mainly constrained by temperature, but can also be influenced by factors such as human activity, particularly in the European Alps, where centuries of agricultural use have affected the tree-line. Over the last decades this trend has been reversed due to changing agricultural practices and land-abandonment. We aimed to combine a statistical land-abandonment model with a forest dynamics model, to take into account the combined effects of climate and human land-use on the Alpine tree-line in Switzerland. Land-abandonment probability was expressed by a logistic regression function of degree-day sum, distance from forest edge, soil stoniness, slope, proportion of employees in the secondary and tertiary sectors, proportion of commuters and proportion of full-time farms. This was implemented in the TreeMig spatio-temporal forest model. Distance from forest edge and degree-day sum vary through feed-back from the dynamics part of TreeMig and climate change scenarios, while the other variables remain constant for each grid cell over time. The new model, TreeMig-LAb, was tested on theoretical landscapes, where the variables in the land-abandonment model were varied one by one. This confirmed the strong influence of distance from forest and slope on the abandonment probability. Degree-day sum has a more complex role, with opposite influences on land-abandonment and forest growth. TreeMig-LAb was also applied to a case study area in the Upper Engadine (Swiss Alps), along with a model where abandonment probability was a constant. Two scenarios were used: natural succession only (100% probability) and a probability of abandonment based on past transition proportions in that area (2.1% per decade). The former showed new forest growing in all but the highest-altitude locations. The latter was more realistic as to numbers of newly forested cells, but their location was random and the resulting landscape heterogeneous. Using the logistic regression model gave results consistent with observed patterns of land-abandonment: existing forests expanded and gaps closed, leading to an increasingly homogeneous landscape.
Resumo:
Over the last decade, the development of statistical models in support of forensic fingerprint identification has been the subject of increasing research attention, spurned on recently by commentators who claim that the scientific basis for fingerprint identification has not been adequately demonstrated. Such models are increasingly seen as useful tools in support of the fingerprint identification process within or in addition to the ACE-V framework. This paper provides a critical review of recent statistical models from both a practical and theoretical perspective. This includes analysis of models of two different methodologies: Probability of Random Correspondence (PRC) models that focus on calculating probabilities of the occurrence of fingerprint configurations for a given population, and Likelihood Ratio (LR) models which use analysis of corresponding features of fingerprints to derive a likelihood value representing the evidential weighting for a potential source.
Resumo:
Rare species have restricted geographic ranges, habitat specialization, and/or small population sizes. Datasets on rare species distribution usually have few observations, limited spatial accuracy and lack of valid absences; conversely they provide comprehensive views of species distributions allowing to realistically capture most of their realized environmental niche. Rare species are the most in need of predictive distribution modelling but also the most difficult to model. We refer to this contrast as the "rare species modelling paradox" and propose as a solution developing modelling approaches that deal with a sufficiently large set of predictors, ensuring that statistical models aren't overfitted. Our novel approach fulfils this condition by fitting a large number of bivariate models and averaging them with a weighted ensemble approach. We further propose that this ensemble forecasting is conducted within a hierarchic multi-scale framework. We present two ensemble models for a test species, one at regional and one at local scale, each based on the combination of 630 models. In both cases, we obtained excellent spatial projections, unusual when modelling rare species. Model results highlight, from a statistically sound approach, the effects of multiple drivers in a same modelling framework and at two distinct scales. From this added information, regional models can support accurate forecasts of range dynamics under climate change scenarios, whereas local models allow the assessment of isolated or synergistic impacts of changes in multiple predictors. This novel framework provides a baseline for adaptive conservation, management and monitoring of rare species at distinct spatial and temporal scales.
Resumo:
BACKGROUND: New HIV infections in men who have sex with men (MSM) have increased in Switzerland since 2000 despite combination antiretroviral therapy (cART). The objectives of this mathematical modelling study were: to describe the dynamics of the HIV epidemic in MSM in Switzerland using national data; to explore the effects of hypothetical prevention scenarios; and to conduct a multivariate sensitivity analysis. METHODOLOGY/PRINCIPAL FINDINGS: The model describes HIV transmission, progression and the effects of cART using differential equations. The model was fitted to Swiss HIV and AIDS surveillance data and twelve unknown parameters were estimated. Predicted numbers of diagnosed HIV infections and AIDS cases fitted the observed data well. By the end of 2010, an estimated 13.5% (95% CI 12.5, 14.6%) of all HIV-infected MSM were undiagnosed and accounted for 81.8% (95% CI 81.1, 82.4%) of new HIV infections. The transmission rate was at its lowest from 1995-1999, with a nadir of 46 incident HIV infections in 1999, but increased from 2000. The estimated number of new infections continued to increase to more than 250 in 2010, although the reproduction number was still below the epidemic threshold. Prevention scenarios included temporary reductions in risk behaviour, annual test and treat, and reduction in risk behaviour to levels observed earlier in the epidemic. These led to predicted reductions in new infections from 2 to 26% by 2020. Parameters related to disease progression and relative infectiousness at different HIV stages had the greatest influence on estimates of the net transmission rate. CONCLUSIONS/SIGNIFICANCE: The model outputs suggest that the increase in HIV transmission amongst MSM in Switzerland is the result of continuing risky sexual behaviour, particularly by those unaware of their infection status. Long term reductions in the incidence of HIV infection in MSM in Switzerland will require increased and sustained uptake of effective interventions.
Resumo:
The research considers the problem of spatial data classification using machine learning algorithms: probabilistic neural networks (PNN) and support vector machines (SVM). As a benchmark model simple k-nearest neighbor algorithm is considered. PNN is a neural network reformulation of well known nonparametric principles of probability density modeling using kernel density estimator and Bayesian optimal or maximum a posteriori decision rules. PNN is well suited to problems where not only predictions but also quantification of accuracy and integration of prior information are necessary. An important property of PNN is that they can be easily used in decision support systems dealing with problems of automatic classification. Support vector machine is an implementation of the principles of statistical learning theory for the classification tasks. Recently they were successfully applied for different environmental topics: classification of soil types and hydro-geological units, optimization of monitoring networks, susceptibility mapping of natural hazards. In the present paper both simulated and real data case studies (low and high dimensional) are considered. The main attention is paid to the detection and learning of spatial patterns by the algorithms applied.
Resumo:
Summary Ecotones are sensitive to change because they contain high numbers of species living at the margin of their environmental tolerance. This is equally true of tree-lines, which are determined by attitudinal or latitudinal temperature gradients. In the current context of climate change, they are expected to undergo modifications in position, tree biomass and possibly species composition. Attitudinal and latitudinal tree-lines differ mainly in the steepness of the underlying temperature gradient: distances are larger at latitudinal tree-lines, which could have an impact on the ability of tree species to migrate in response to climate change. Aside from temperature, tree-lines are also affected on a more local level by pressure from human activities. These are also changing as a consequence of modifications in our societies and may interact with the effects of climate change. Forest dynamics models are often used for climate change simulations because of their mechanistic processes. The spatially-explicit model TreeMig was used as a base to develop a model specifically tuned for the northern European and Alpine tree-line ecotones. For the latter, a module for land-use change processes was also added. The temperature response parameters for the species in the model were first calibrated by means of tree-ring data from various species and sites at both tree-lines. This improved the growth response function in the model, but also lead to the conclusion that regeneration is probably more important than growth for controlling tree-line position and species' distributions. The second step was to implement the module for abandonment of agricultural land in the Alps, based on an existing spatial statistical model. The sensitivity of its most important variables was tested and the model's performance compared to other modelling approaches. The probability that agricultural land would be abandoned was strongly influenced by the distance from the nearest forest and the slope, bath of which are proxies for cultivation costs. When applied to a case study area, the resulting model, named TreeMig-LAb, gave the most realistic results. These were consistent with observed consequences of land-abandonment such as the expansion of the existing forest and closing up of gaps. This new model was then applied in two case study areas, one in the Swiss Alps and one in Finnish Lapland, under a variety of climate change scenarios. These were based on forecasts of temperature change over the next century by the IPCC and the HadCM3 climate model (ΔT: +1.3, +3.5 and +5.6 °C) and included a post-change stabilisation period of 300 years. The results showed radical disruptions at both tree-lines. With the most conservative climate change scenario, species' distributions simply shifted, but it took several centuries reach a new equilibrium. With the more extreme scenarios, some species disappeared from our study areas (e.g. Pinus cembra in the Alps) or dwindled to very low numbers, as they ran out of land into which they could migrate. The most striking result was the lag in the response of most species, independently from the climate change scenario or tree-line type considered. Finally, a statistical model of the effect of reindeer (Rangifer tarandus) browsing on the growth of Pinus sylvestris was developed, as a first step towards implementing human impacts at the boreal tree-line. The expected effect was an indirect one, as reindeer deplete the ground lichen cover, thought to protect the trees against adverse climate conditions. The model showed a small but significant effect of browsing, but as the link with the underlying climate variables was unclear and the model was not spatial, it was not usable as such. Developing the TreeMig-LAb model allowed to: a) establish a method for deriving species' parameters for the growth equation from tree-rings, b) highlight the importance of regeneration in determining tree-line position and species' distributions and c) improve the integration of social sciences into landscape modelling. Applying the model at the Alpine and northern European tree-lines under different climate change scenarios showed that with most forecasted levels of temperature increase, tree-lines would suffer major disruptions, with shifts in distributions and potential extinction of some tree-line species. However, these responses showed strong lags, so these effects would not become apparent before decades and could take centuries to stabilise. Résumé Les écotones son sensibles au changement en raison du nombre élevé d'espèces qui y vivent à la limite de leur tolérance environnementale. Ceci s'applique également aux limites des arbres définies par les gradients de température altitudinaux et latitudinaux. Dans le contexte actuel de changement climatique, on s'attend à ce qu'elles subissent des modifications de leur position, de la biomasse des arbres et éventuellement des essences qui les composent. Les limites altitudinales et latitudinales diffèrent essentiellement au niveau de la pente des gradients de température qui les sous-tendent les distance sont plus grandes pour les limites latitudinales, ce qui pourrait avoir un impact sur la capacité des espèces à migrer en réponse au changement climatique. En sus de la température, la limite des arbres est aussi influencée à un niveau plus local par les pressions dues aux activités humaines. Celles-ci sont aussi en mutation suite aux changements dans nos sociétés et peuvent interagir avec les effets du changement climatique. Les modèles de dynamique forestière sont souvent utilisés pour simuler les effets du changement climatique, car ils sont basés sur la modélisation de processus. Le modèle spatialement explicite TreeMig a été utilisé comme base pour développer un modèle spécialement adapté pour la limite des arbres en Europe du Nord et dans les Alpes. Pour cette dernière, un module servant à simuler des changements d'utilisation du sol a également été ajouté. Tout d'abord, les paramètres de la courbe de réponse à la température pour les espèces inclues dans le modèle ont été calibrées au moyen de données dendrochronologiques pour diverses espèces et divers sites des deux écotones. Ceci a permis d'améliorer la courbe de croissance du modèle, mais a également permis de conclure que la régénération est probablement plus déterminante que la croissance en ce qui concerne la position de la limite des arbres et la distribution des espèces. La seconde étape consistait à implémenter le module d'abandon du terrain agricole dans les Alpes, basé sur un modèle statistique spatial existant. La sensibilité des variables les plus importantes du modèle a été testée et la performance de ce dernier comparée à d'autres approches de modélisation. La probabilité qu'un terrain soit abandonné était fortement influencée par la distance à la forêt la plus proche et par la pente, qui sont tous deux des substituts pour les coûts liés à la mise en culture. Lors de l'application en situation réelle, le nouveau modèle, baptisé TreeMig-LAb, a donné les résultats les plus réalistes. Ceux-ci étaient comparables aux conséquences déjà observées de l'abandon de terrains agricoles, telles que l'expansion des forêts existantes et la fermeture des clairières. Ce nouveau modèle a ensuite été mis en application dans deux zones d'étude, l'une dans les Alpes suisses et l'autre en Laponie finlandaise, avec divers scénarios de changement climatique. Ces derniers étaient basés sur les prévisions de changement de température pour le siècle prochain établies par l'IPCC et le modèle climatique HadCM3 (ΔT: +1.3, +3.5 et +5.6 °C) et comprenaient une période de stabilisation post-changement climatique de 300 ans. Les résultats ont montré des perturbations majeures dans les deux types de limites de arbres. Avec le scénario de changement climatique le moins extrême, les distributions respectives des espèces ont subi un simple glissement, mais il a fallu plusieurs siècles pour qu'elles atteignent un nouvel équilibre. Avec les autres scénarios, certaines espèces ont disparu de la zone d'étude (p. ex. Pinus cembra dans les Alpes) ou ont vu leur population diminuer parce qu'il n'y avait plus assez de terrains disponibles dans lesquels elles puissent migrer. Le résultat le plus frappant a été le temps de latence dans la réponse de la plupart des espèces, indépendamment du scénario de changement climatique utilisé ou du type de limite des arbres. Finalement, un modèle statistique de l'effet de l'abroutissement par les rennes (Rangifer tarandus) sur la croissance de Pinus sylvestris a été développé, comme première étape en vue de l'implémentation des impacts humains sur la limite boréale des arbres. L'effet attendu était indirect, puisque les rennes réduisent la couverture de lichen sur le sol, dont on attend un effet protecteur contre les rigueurs climatiques. Le modèle a mis en évidence un effet modeste mais significatif, mais étant donné que le lien avec les variables climatiques sous jacentes était peu clair et que le modèle n'était pas appliqué dans l'espace, il n'était pas utilisable tel quel. Le développement du modèle TreeMig-LAb a permis : a) d'établir une méthode pour déduire les paramètres spécifiques de l'équation de croissance ä partir de données dendrochronologiques, b) de mettre en évidence l'importance de la régénération dans la position de la limite des arbres et la distribution des espèces et c) d'améliorer l'intégration des sciences sociales dans les modèles de paysage. L'application du modèle aux limites alpines et nord-européennes des arbres sous différents scénarios de changement climatique a montré qu'avec la plupart des niveaux d'augmentation de température prévus, la limite des arbres subirait des perturbations majeures, avec des glissements d'aires de répartition et l'extinction potentielle de certaines espèces. Cependant, ces réponses ont montré des temps de latence importants, si bien que ces effets ne seraient pas visibles avant des décennies et pourraient mettre plusieurs siècles à se stabiliser.
Resumo:
Remote sensing using airborne imaging spectroscopy (AIS) is known to retrieve fundamental optical properties of ecosystems. However, the value of these properties for predicting plant species distribution remains unclear. Here, we assess whether such data can add value to topographic variables for predicting plant distributions in French and Swiss alpine grasslands. We fitted statistical models with high spectral and spatial resolution reflectance data and tested four optical indices sensitive to leaf chlorophyll content, leaf water content and leaf area index. We found moderate added-value of AIS data for predicting alpine plant species distribution. Contrary to expectations, differences between species distribution models (SDMs) were not linked to their local abundance or phylogenetic/functional similarity. Moreover, spectral signatures of species were found to be partly site-specific. We discuss current limits of AIS-based SDMs, highlighting issues of scale and informational content of AIS data.
Resumo:
Depth-averaged velocities and unit discharges within a 30 km reach of one of the world's largest rivers, the Rio Parana, Argentina, were simulated using three hydrodynamic models with different process representations: a reduced complexity (RC) model that neglects most of the physics governing fluid flow, a two-dimensional model based on the shallow water equations, and a three-dimensional model based on the Reynolds-averaged Navier-Stokes equations. Row characteristics simulated using all three models were compared with data obtained by acoustic Doppler current profiler surveys at four cross sections within the study reach. This analysis demonstrates that, surprisingly, the performance of the RC model is generally equal to, and in some instances better than, that of the physics based models in terms of the statistical agreement between simulated and measured flow properties. In addition, in contrast to previous applications of RC models, the present study demonstrates that the RC model can successfully predict measured flow velocities. The strong performance of the RC model reflects, in part, the simplicity of the depth-averaged mean flow patterns within the study reach and the dominant role of channel-scale topographic features in controlling the flow dynamics. Moreover, the very low water surface slopes that typify large sand-bed rivers enable flow depths to be estimated reliably in the RC model using a simple fixed-lid planar water surface approximation. This approach overcomes a major problem encountered in the application of RC models in environments characterised by shallow flows and steep bed gradients. The RC model is four orders of magnitude faster than the physics based models when performing steady-state hydrodynamic calculations. However, the iterative nature of the RC model calculations implies a reduction in computational efficiency relative to some other RC models. A further implication of this is that, if used to simulate channel morphodynamics, the present RC model may offer only a marginal advantage in terms of computational efficiency over approaches based on the shallow water equations. These observations illustrate the trade off between model realism and efficiency that is a key consideration in RC modelling. Moreover, this outcome highlights a need to rethink the use of RC morphodynamic models in fluvial geomorphology and to move away from existing grid-based approaches, such as the popular cellular automata (CA) models, that remain essentially reductionist in nature. In the case of the world's largest sand-bed rivers, this might be achieved by implementing the RC model outlined here as one element within a hierarchical modelling framework that would enable computationally efficient simulation of the morphodynamics of large rivers over millennial time scales. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
This paper investigates the use of ensemble of predictors in order to improve the performance of spatial prediction methods. Support vector regression (SVR), a popular method from the field of statistical machine learning, is used. Several instances of SVR are combined using different data sampling schemes (bagging and boosting). Bagging shows good performance, and proves to be more computationally efficient than training a single SVR model while reducing error. Boosting, however, does not improve results on this specific problem.
Resumo:
Background: Bone health is a concern when treating early stage breast cancer patients with adjuvant aromatase inhibitors. Early detection of patients (pts) at risk of osteoporosis and fractures may be helpful for starting preventive therapies and selecting the most appropriate endocrine therapy schedule. We present statistical models describing the evolution of lumbar and hip bone mineral density (BMD) in pts treated with tamoxifen (T), letrozole (L) and sequences of T and L. Methods: Available dual-energy x-ray absorptiometry exams (DXA) of pts treated in trial BIG 1-98 were retrospectively collected from Swiss centers. Treatment arms: A) T for 5 years, B) L for 5 years, C) 2 years of T followed by 3 years of L and, D) 2 years of L followed by 3 years of T. Pts without DXA were used as a control for detecting selection biases. Patients randomized to arm A were subsequently allowed an unplanned switch from T to L. Allowing for variations between DXA machines and centres, two repeated measures models, using a covariance structure that allow for different times between DXA, were used to estimate changes in hip and lumbar BMD (g/cm2) from trial randomization. Prospectively defined covariates, considered as fixed effects in the multivariable models in an intention to treat analysis, at the time of trial randomization were: age, height, weight, hysterectomy, race, known osteoporosis, tobacco use, prior bone fracture, prior hormone replacement therapy (HRT), bisphosphonate use and previous neo-/adjuvant chemotherapy (ChT). Similarly, the T-scores for lumbar and hip BMD measurements were modeled using a per-protocol approach (allowing for treatment switch in arm A), specifically studying the effect of each therapy upon T-score percentage. Results: A total of 247 out of 546 pts had between 1 and 5 DXA; a total of 576 DXA were collected. Number of DXA measurements per arm were; arm A 133, B 137, C 141 and D 135. The median follow-up time was 5.8 years. Significant factors positively correlated with lumbar and hip BMD in the multivariate analysis were weight, previous HRT use, neo-/adjuvant ChT, hysterectomy and height. Significant negatively correlated factors in the models were osteoporosis, treatment arm (B/C/D vs. A), time since endocrine therapy start, age and smoking (current vs. never).Modeling the T-score percentage, differences from T to L were -4.199% (p = 0.036) and -4.907% (p = 0.025) for the hip and lumbar measurements respectively, before any treatment switch occurred. Conclusions: Our statistical models describe the lumbar and hip BMD evolution for pts treated with L and/or T. The results of both localisations confirm that, contrary to expectation, the sequential schedules do not seem less detrimental for the BMD than L monotherapy. The estimated difference in BMD T-score percent is at least 4% from T to L.
Resumo:
1. The ecological niche is a fundamental biological concept. Modelling species' niches is central to numerous ecological applications, including predicting species invasions, identifying reservoirs for disease, nature reserve design and forecasting the effects of anthropogenic and natural climate change on species' ranges. 2. A computational analogue of Hutchinson's ecological niche concept (the multidimensional hyperspace of species' environmental requirements) is the support of the distribution of environments in which the species persist. Recently developed machine-learning algorithms can estimate the support of such high-dimensional distributions. We show how support vector machines can be used to map ecological niches using only observations of species presence to train distribution models for 106 species of woody plants and trees in a montane environment using up to nine environmental covariates. 3. We compared the accuracy of three methods that differ in their approaches to reducing model complexity. We tested models with independent observations of both species presence and species absence. We found that the simplest procedure, which uses all available variables and no pre-processing to reduce correlation, was best overall. Ecological niche models based on support vector machines are theoretically superior to models that rely on simulating pseudo-absence data and are comparable in empirical tests. 4. Synthesis and applications. Accurate species distribution models are crucial for effective environmental planning, management and conservation, and for unravelling the role of the environment in human health and welfare. Models based on distribution estimation rather than classification overcome theoretical and practical obstacles that pervade species distribution modelling. In particular, ecological niche models based on machine-learning algorithms for estimating the support of a statistical distribution provide a promising new approach to identifying species' potential distributions and to project changes in these distributions as a result of climate change, land use and landscape alteration.
Resumo:
Abstract One of the most important issues in molecular biology is to understand regulatory mechanisms that control gene expression. Gene expression is often regulated by proteins, called transcription factors which bind to short (5 to 20 base pairs),degenerate segments of DNA. Experimental efforts towards understanding the sequence specificity of transcription factors is laborious and expensive, but can be substantially accelerated with the use of computational predictions. This thesis describes the use of algorithms and resources for transcriptionfactor binding site analysis in addressing quantitative modelling, where probabilitic models are built to represent binding properties of a transcription factor and can be used to find new functional binding sites in genomes. Initially, an open-access database(HTPSELEX) was created, holding high quality binding sequences for two eukaryotic families of transcription factors namely CTF/NF1 and LEFT/TCF. The binding sequences were elucidated using a recently described experimental procedure called HTP-SELEX, that allows generation of large number (> 1000) of binding sites using mass sequencing technology. For each HTP-SELEX experiments we also provide accurate primary experimental information about the protein material used, details of the wet lab protocol, an archive of sequencing trace files, and assembled clone sequences of binding sequences. The database also offers reasonably large SELEX libraries obtained with conventional low-throughput protocols.The database is available at http://wwwisrec.isb-sib.ch/htpselex/ and and ftp://ftp.isrec.isb-sib.ch/pub/databases/htpselex. The Expectation-Maximisation(EM) algorithm is one the frequently used methods to estimate probabilistic models to represent the sequence specificity of transcription factors. We present computer simulations in order to estimate the precision of EM estimated models as a function of data set parameters(like length of initial sequences, number of initial sequences, percentage of nonbinding sequences). We observed a remarkable robustness of the EM algorithm with regard to length of training sequences and the degree of contamination. The HTPSELEX database and the benchmarked results of the EM algorithm formed part of the foundation for the subsequent project, where a statistical framework called hidden Markov model has been developed to represent sequence specificity of the transcription factors CTF/NF1 and LEF1/TCF using the HTP-SELEX experiment data. The hidden Markov model framework is capable of both predicting and classifying CTF/NF1 and LEF1/TCF binding sites. A covariance analysis of the binding sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism. We next tested the LEF1/TCF model by computing binding scores for a set of LEF1/TCF binding sequences for which relative affinities were determined experimentally using non-linear regression. The predicted and experimentally determined binding affinities were in good correlation.