917 resultados para structuration of lexical data bases
Resumo:
The temporal dynamics of species diversity are shaped by variations in the rates of speciation and extinction, and there is a long history of inferring these rates using first and last appearances of taxa in the fossil record. Understanding diversity dynamics critically depends on unbiased estimates of the unobserved times of speciation and extinction for all lineages, but the inference of these parameters is challenging due to the complex nature of the available data. Here, we present a new probabilistic framework to jointly estimate species-specific times of speciation and extinction and the rates of the underlying birth-death process based on the fossil record. The rates are allowed to vary through time independently of each other, and the probability of preservation and sampling is explicitly incorporated in the model to estimate the true lifespan of each lineage. We implement a Bayesian algorithm to assess the presence of rate shifts by exploring alternative diversification models. Tests on a range of simulated data sets reveal the accuracy and robustness of our approach against violations of the underlying assumptions and various degrees of data incompleteness. Finally, we demonstrate the application of our method with the diversification of the mammal family Rhinocerotidae and reveal a complex history of repeated and independent temporal shifts of both speciation and extinction rates, leading to the expansion and subsequent decline of the group. The estimated parameters of the birth-death process implemented here are directly comparable with those obtained from dated molecular phylogenies. Thus, our model represents a step towards integrating phylogenetic and fossil information to infer macroevolutionary processes.
Resumo:
Radioactive soil-contamination mapping and risk assessment is a vital issue for decision makers. Traditional approaches for mapping the spatial concentration of radionuclides employ various regression-based models, which usually provide a single-value prediction realization accompanied (in some cases) by estimation error. Such approaches do not provide the capability for rigorous uncertainty quantification or probabilistic mapping. Machine learning is a recent and fast-developing approach based on learning patterns and information from data. Artificial neural networks for prediction mapping have been especially powerful in combination with spatial statistics. A data-driven approach provides the opportunity to integrate additional relevant information about spatial phenomena into a prediction model for more accurate spatial estimates and associated uncertainty. Machine-learning algorithms can also be used for a wider spectrum of problems than before: classification, probability density estimation, and so forth. Stochastic simulations are used to model spatial variability and uncertainty. Unlike regression models, they provide multiple realizations of a particular spatial pattern that allow uncertainty and risk quantification. This paper reviews the most recent methods of spatial data analysis, prediction, and risk mapping, based on machine learning and stochastic simulations in comparison with more traditional regression models. The radioactive fallout from the Chernobyl Nuclear Power Plant accident is used to illustrate the application of the models for prediction and classification problems. This fallout is a unique case study that provides the challenging task of analyzing huge amounts of data ('hard' direct measurements, as well as supplementary information and expert estimates) and solving particular decision-oriented problems.
Resumo:
Although cross-sectional diffusion tensor imaging (DTI) studies revealed significant white matter changes in mild cognitive impairment (MCI), the utility of this technique in predicting further cognitive decline is debated. Thirty-five healthy controls (HC) and 67 MCI subjects with DTI baseline data were neuropsychologically assessed at one year. Among them, there were 40 stable (sMCI; 9 single domain amnestic, 7 single domain frontal, 24 multiple domain) and 27 were progressive (pMCI; 7 single domain amnestic, 4 single domain frontal, 16 multiple domain). Fractional anisotropy (FA) and longitudinal, radial, and mean diffusivity were measured using Tract-Based Spatial Statistics. Statistics included group comparisons and individual classification of MCI cases using support vector machines (SVM). FA was significantly higher in HC compared to MCI in a distributed network including the ventral part of the corpus callosum, right temporal and frontal pathways. There were no significant group-level differences between sMCI versus pMCI or between MCI subtypes after correction for multiple comparisons. However, SVM analysis allowed for an individual classification with accuracies up to 91.4% (HC versus MCI) and 98.4% (sMCI versus pMCI). When considering the MCI subgroups separately, the minimum SVM classification accuracy for stable versus progressive cognitive decline was 97.5% in the multiple domain MCI group. SVM analysis of DTI data provided highly accurate individual classification of stable versus progressive MCI regardless of MCI subtype, indicating that this method may become an easily applicable tool for early individual detection of MCI subjects evolving to dementia.
Resumo:
With the dramatic increase in the volume of experimental results in every domain of life sciences, assembling pertinent data and combining information from different fields has become a challenge. Information is dispersed over numerous specialized databases and is presented in many different formats. Rapid access to experiment-based information about well-characterized proteins helps predict the function of uncharacterized proteins identified by large-scale sequencing. In this context, universal knowledgebases play essential roles in providing access to data from complementary types of experiments and serving as hubs with cross-references to many specialized databases. This review outlines how the value of experimental data is optimized by combining high-quality protein sequences with complementary experimental results, including information derived from protein 3D-structures, using as an example the UniProt knowledgebase (UniProtKB) and the tools and links provided on its website ( http://www.uniprot.org/ ). It also evokes precautions that are necessary for successful predictions and extrapolations.
Resumo:
The present study proposes a modification in one of the most frequently applied effect size procedures in single-case data analysis the percent of nonoverlapping data. In contrast to other techniques, the calculus and interpretation of this procedure is straightforward and it can be easily complemented by visual inspection of the graphed data. Although the percent of nonoverlapping data has been found to perform reasonably well in N = 1 data, the magnitude of effect estimates it yields can be distorted by trend and autocorrelation. Therefore, the data correction procedure focuses on removing the baseline trend from data prior to estimating the change produced in the behavior due to intervention. A simulation study is carried out in order to compare the original and the modified procedures in several experimental conditions. The results suggest that the new proposal is unaffected by trend and autocorrelation and can be used in case of unstable baselines and sequentially related measurements.
Resumo:
The present study focuses on single-case data analysis and specifically on two procedures for quantifying differences between baseline and treatment measurements The first technique tested is based on generalized least squares regression analysis and is compared to a proposed non-regression technique, which allows obtaining similar information. The comparison is carried out in the context of generated data representing a variety of patterns (i.e., independent measurements, different serial dependence underlying processes, constant or phase-specific autocorrelation and data variability, different types of trend, and slope and level change). The results suggest that the two techniques perform adequately for a wide range of conditions and researchers can use both of them with certain guarantees. The regression-based procedure offers more efficient estimates, whereas the proposed non-regression procedure is more sensitive to intervention effects. Considering current and previous findings, some tentative recommendations are offered to applied researchers in order to help choosing among the plurality of single-case data analysis techniques.
Resumo:
Continuous field mapping has to address two conflicting remote sensing requirements when collecting training data. On one hand, continuous field mapping trains fractional land cover and thus favours mixed training pixels. On the other hand, the spectral signature has to be preferably distinct and thus favours pure training pixels. The aim of this study was to evaluate the sensitivity of training data distribution along fractional and spectral gradients on the resulting mapping performance. We derived four continuous fields (tree, shrubherb, bare, water) from aerial photographs as response variables and processed corresponding spectral signatures from multitemporal Landsat 5 TM data as explanatory variables. Subsequent controlled experiments along fractional cover gradients were then based on generalised linear models. Resulting fractional and spectral distribution differed between single continuous fields, but could be satisfactorily trained and mapped. Pixels with fractional or without respective cover were much more critical than pure full cover pixels. Error distribution of continuous field models was non-uniform with respect to horizontal and vertical spatial distribution of target fields. We conclude that a sampling for continuous field training data should be based on extent and densities in the fractional and spectral, rather than the real spatial space. Consequently, adequate training plots are most probably not systematically distributed in the real spatial space, but cover the gradient and covariate structure of the fractional and spectral space well. (C) 2009 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.
Resumo:
Statistical summaries of streamflow data collected at 156 streamflow-gaging stations in Iowa are presented in this report. All gaging stations included for analysis have at least 10 years of continuous record collected before or through September 1996. The statistical summaries include (1) statistics of monthly and annual mean discharges; (2) monthly and annual flow durations; (3) magnitudes and frequencies of instantaneous peak discharges (flood frequencies); and (4) magnitudes and frequencies of high and low discharges. Also presented for each gaging station is a graph of the annual mean flows and, for most stations, selected values from the most-recent stage-discharge rating table.
Resumo:
1. Few examples of habitat-modelling studies of rare and endangered species exist in the literature, although from a conservation perspective predicting their distribution would prove particularly useful. Paucity of data and lack of valid absences are the probable reasons for this shortcoming. Analytic solutions to accommodate the lack of absence include the ecological niche factor analysis (ENFA) and the use of generalized linear models (GLM) with simulated pseudo-absences. 2. In this study we tested a new approach to generating pseudo-absences, based on a preliminary ENFA habitat suitability (HS) map, for the endangered species Eryngium alpinum. This method of generating pseudo-absences was compared with two others: (i) use of a GLM with pseudo-absences generated totally at random, and (ii) use of an ENFA only. 3. The influence of two different spatial resolutions (i.e. grain) was also assessed for tackling the dilemma of quality (grain) vs. quantity (number of occurrences). Each combination of the three above-mentioned methods with the two grains generated a distinct HS map. 4. Four evaluation measures were used for comparing these HS maps: total deviance explained, best kappa, Gini coefficient and minimal predicted area (MPA). The last is a new evaluation criterion proposed in this study. 5. Results showed that (i) GLM models using ENFA-weighted pseudo-absence provide better results, except for the MPA value, and that (ii) quality (spatial resolution and locational accuracy) of the data appears to be more important than quantity (number of occurrences). Furthermore, the proposed MPA value is suggested as a useful measure of model evaluation when used to complement classical statistical measures. 6. Synthesis and applications. We suggest that the use of ENFA-weighted pseudo-absence is a possible way to enhance the quality of GLM-based potential distribution maps and that data quality (i.e. spatial resolution) prevails over quantity (i.e. number of data). Increased accuracy of potential distribution maps could help to define better suitable areas for species protection and reintroduction.
Resumo:
The draft of the new law on the confidentiality of personal data severely curtails medical and epidemiological research. This might be detrimental and dangerous to public health. The project therefore has to be amended.
Resumo:
This report evaluates the use of remotely sensed images in implementing the Iowa DOT LRS that is currently in the stages of system architecture. The Iowa Department of Transportation is investing a significant amount of time and resources into creation of a linear referencing system (LRS). A significant portion of the effort in implementing the system will be creation of a datum, which includes geographically locating anchor points and then measuring anchor section distances between those anchor points. Currently, system architecture and evaluation of different data collection methods to establish the LRS datum is being performed for the DOT by an outside consulting team.
Resumo:
Many eukaryote organisms are polyploid. However, despite their importance, evolutionary inference of polyploid origins and modes of inheritance has been limited by a need for analyses of allele segregation at multiple loci using crosses. The increasing availability of sequence data for nonmodel species now allows the application of established approaches for the analysis of genomic data in polyploids. Here, we ask whether approximate Bayesian computation (ABC), applied to realistic traditional and next-generation sequence data, allows correct inference of the evolutionary and demographic history of polyploids. Using simulations, we evaluate the robustness of evolutionary inference by ABC for tetraploid species as a function of the number of individuals and loci sampled, and the presence or absence of an outgroup. We find that ABC adequately retrieves the recent evolutionary history of polyploid species on the basis of both old and new sequencing technologies. The application of ABC to sequence data from diploid and polyploid species of the plant genus Capsella confirms its utility. Our analysis strongly supports an allopolyploid origin of C. bursa-pastoris about 80 000 years ago. This conclusion runs contrary to previous findings based on the same data set but using an alternative approach and is in agreement with recent findings based on whole-genome sequencing. Our results indicate that ABC is a promising and powerful method for revealing the evolution of polyploid species, without the need to attribute alleles to a homeologous chromosome pair. The approach can readily be extended to more complex scenarios involving higher ploidy levels.
Resumo:
BACKGROUND: Several European HIV observational data bases have, over the last decade, accumulated a substantial number of resistance test results and developed large sample repositories, There is a need to link these efforts together, We here describe the development of such a novel tool that allows to bind these data bases together in a distributed fashion for which the control and data remains with the cohorts rather than classic data mergers.METHODS: As proof-of-concept we entered two basic queries into the tool: available resistance tests and available samples. We asked for patients still alive after 1998-01-01, and between 180 and 195 cm of height, and how many samples or resistance tests there would be available for these patients, The queries were uploaded with the tool to a central web server from which each participating cohort downloaded the queries with the tool and ran them against their database, The numbers gathered were then submitted back to the server and we could accumulate the number of available samples and resistance tests.RESULTS: We obtained the following results from the cohorts on available samples/resistance test: EuResist: not availableI11,194; EuroSIDA: 20,71611,992; ICONA: 3,751/500; Rega: 302/302; SHCS: 53,78311,485, In total, 78,552 samples and 15,473 resistance tests were available amongst these five cohorts. Once these data items have been identified, it is trivial to generate lists of relevant samples that would be usefuI for ultra deep sequencing in addition to the already available resistance tests, Saon the tool will include small analysis packages that allow each cohort to pull a report on their cohort profile and also survey emerging resistance trends in their own cohort,CONCLUSIONS: We plan on providing this tool to all cohorts within the Collaborative HIV and Anti-HIV Drug Resistance Network (CHAIN) and will provide the tool free of charge to others for any non-commercial use, The potential of this tool is to ease collaborations, that is, in projects requiring data to speed up identification of novel resistance mutations by increasing the number of observations across multiple cohorts instead of awaiting single cohorts or studies to reach the critical number needed to address such issues.
Resumo:
The World Wide Web, the world¿s largest resource for information, has evolved from organizing information using controlled, top-down taxonomies to a bottom up approach that emphasizes assigning meaning to data via mechanisms such as the Social Web (Web 2.0). Tagging adds meta-data, (weak semantics) to the content available on the web. This research investigates the potential for repurposing this layer of meta-data. We propose a multi-phase approach that exploits user-defined tags to identify and extract domain-level concepts. We operationalize this approach and assess its feasibility by application to a publicly available tag repository. The paper describes insights gained from implementing and applying the heuristics contained in the approach, as well as challenges and implications of repurposing tags for extraction of domain-level concepts.
Resumo:
La gouvernance de l'Internet est une thématique récente dans la politique mondiale. Néanmoins, elle est devenue au fil des années un enjeu économique et politique important. La question a même pris une importance particulière au cours des derniers mois en devenant un sujet d'actualité récurrent. Forte de ce constat, c ette recherche retrace l'histoire de la gouvernance de l'Internet depuis son émergence comme enjeu politique dans les années 1980 jusqu'à la fin du Sommet Mondial sur la Société de l'Information (SMSI) en 2005. Plutôt que de se focaliser sur l'une ou l'autre des institutions impliquées dans la régulation du réseau informatique mondial, cette recherche analyse l'émergence et l'évolution historique d'un espace de luttes rassemblant un nombre croissant d'acteurs différents. Cette évolution est décrite à travers le prisme de la relation dialectique entre élites et non-élites et de la lutte autour de la définition de la gouvernance de l'Internet. Cette thèse explore donc la question de comment les relations au sein des élites de la gouvernance de l'Internet et entre ces élites et les non-élites expliquent l'emergence, l'évolution et la structuration d'un champ relativement autonome de la politique mondiale centré sur la gouvernance de l'Internet. Contre les perspectives dominantes réaliste et libérales, cette recherche s'ancre dans une approche issue de la combinaison des traditions hétérodoxes en économie politique internationale et des apports de la sociologie politique internationale. Celle-ci s'articule autour des concepts de champ, d'élites et d'hégémonie. Le concept de champ, développé par Bourdieu inspire un nombre croissant d'études de la politique mondiale. Il permet à la fois une étude différenciée de la mondialisation et l'émergence d'espaces de lutte et de domination au niveau transnational. La sociologie des élites, elle, permet une approche pragmatique et centrée sur les acteurs des questions de pouvoir dans la mondialisation. Cette recherche utilise plus particulièrement le concept d'élite du pouvoir de Wright Mills pour étudier l'unification d'élites a priori différentes autour de projets communs. Enfin, cette étude reprend le concept néo-gramscien d'hégémonie afin d'étudier à la fois la stabilité relative du pouvoir d'une élite garantie par la dimension consensuelle de la domination, et les germes de changement contenus dans tout ordre international. A travers l'étude des documents produits au cours de la période étudiée et en s'appuyant sur la création de bases de données sur les réseaux d'acteurs, cette étude s'intéresse aux débats qui ont suivi la commercialisation du réseau au début des années 1990 et aux négociations lors du SMSI. La première période a abouti à la création de l'Internet Corporation for Assigned Names and Numbers (ICANN) en 1998. Cette création est le résultat de la recherche d'un consensus entre les discours dominants des années 1990. C'est également le fruit d'une coalition entre intérêts au sein d'une élite du pouvoir de la gouvernance de l'Internet. Cependant, cette institutionnalisation de l'Internet autour de l'ICANN excluait un certain nombre d'acteurs et de discours qui ont depuis tenté de renverser cet ordre. Le SMSI a été le cadre de la remise en cause du mode de gouvernance de l'Internet par les États exclus du système, des universitaires et certaines ONG et organisations internationales. C'est pourquoi le SMSI constitue la seconde période historique étudiée dans cette thèse. La confrontation lors du SMSI a donné lieu à une reconfiguration de l'élite du pouvoir de la gouvernance de l'Internet ainsi qu'à une redéfinition des frontières du champ. Un nouveau projet hégémonique a vu le jour autour d'éléments discursifs tels que le multipartenariat et autour d'insitutions telles que le Forum sur la Gouvernance de l'Internet. Le succès relatif de ce projet a permis une stabilité insitutionnelle inédite depuis la fin du SMSI et une acceptation du discours des élites par un grand nombre d'acteurs du champ. Ce n'est que récemment que cet ordre a été remis en cause par les pouvoirs émergents dans la gouvernance de l'Internet. Cette thèse cherche à contribuer au débat scientifique sur trois plans. Sur le plan théorique, elle contribue à l'essor d'un dialogue entre approches d'économie politique mondiale et de sociologie politique internationale afin d'étudier à la fois les dynamiques structurelles liées au processus de mondialisation et les pratiques localisées des acteurs dans un domaine précis. Elle insiste notamment sur l'apport de les notions de champ et d'élite du pouvoir et sur leur compatibilité avec les anlayses néo-gramsciennes de l'hégémonie. Sur le plan méthodologique, ce dialogue se traduit par une utilisation de méthodes sociologiques telles que l'anlyse de réseaux d'acteurs et de déclarations pour compléter l'analyse qualitative de documents. Enfin, sur le plan empirique, cette recherche offre une perspective originale sur la gouvernance de l'Internet en insistant sur sa dimension historique, en démontrant la fragilité du concept de gouvernance multipartenaire (multistakeholder) et en se focalisant sur les rapports de pouvoir et les liens entre gouvernance de l'Internet et mondialisation. - Internet governance is a recent issue in global politics. However, it gradually became a major political and economic issue. It recently became even more important and now appears regularly in the news. Against this background, this research outlines the history of Internet governance from its emergence as a political issue in the 1980s to the end of the World Summit on the Information Society (WSIS) in 2005. Rather than focusing on one or the other institution involved in Internet governance, this research analyses the emergence and historical evolution of a space of struggle affecting a growing number of different actors. This evolution is described through the analysis of the dialectical relation between elites and non-elites and through the struggle around the definition of Internet governance. The thesis explores the question of how the relations among the elites of Internet governance and between these elites and non-elites explain the emergence, the evolution, and the structuration of a relatively autonomous field of world politics centred around Internet governance. Against dominant realist and liberal perspectives, this research draws upon a cross-fertilisation of heterodox international political economy and international political sociology. This approach focuses on concepts such as field, elites and hegemony. The concept of field, as developed by Bourdieu, is increasingly used in International Relations to build a differentiated analysis of globalisation and to describe the emergence of transnational spaces of struggle and domination. Elite sociology allows for a pragmatic actor-centred analysis of the issue of power in the globalisation process. This research particularly draws on Wright Mill's concept of power elite in order to explore the unification of different elites around shared projects. Finally, this thesis uses the Neo-Gramscian concept of hegemony in order to study both the consensual dimension of domination and the prospect of change contained in any international order. Through the analysis of the documents produced within the analysed period, and through the creation of databases of networks of actors, this research focuses on the debates that followed the commercialisation of the Internet throughout the 1990s and during the WSIS. The first time period led to the creation of the Internet Corporation for Assigned Names and Numbers (ICANN) in 1998. This creation resulted from the consensus-building between the dominant discourses of the time. It also resulted from the coalition of interests among an emerging power elite. However, this institutionalisation of Internet governance around the ICANN excluded a number of actors and discourses that resisted this mode of governance. The WSIS became the institutional framework within which the governance system was questioned by some excluded states, scholars, NGOs and intergovernmental organisations. The confrontation between the power elite and counter-elites during the WSIS triggered a reconfiguration of the power elite as well as a re-definition of the boundaries of the field. A new hegemonic project emerged around discursive elements such as the idea of multistakeholderism and institutional elements such as the Internet Governance Forum. The relative success of the hegemonic project allowed for a certain stability within the field and an acceptance by most non-elites of the new order. It is only recently that this order began to be questioned by the emerging powers of Internet governance. This research provides three main contributions to the scientific debate. On the theoretical level, it contributes to the emergence of a dialogue between International Political Economy and International Political Sociology perspectives in order to analyse both the structural trends of the globalisation process and the located practices of actors in a given issue-area. It notably stresses the contribution of concepts such as field and power elite and their compatibility with a Neo-Gramscian framework to analyse hegemony. On the methodological level, this perspective relies on the use of mixed methods, combining qualitative content analysis with social network analysis of actors and statements. Finally, on the empirical level, this research provides an original perspective on Internet governance. It stresses the historical dimension of current Internet governance arrangements. It also criticise the notion of multistakeholde ism and focuses instead on the power dynamics and the relation between Internet governance and globalisation.