979 resultados para Distributional semantics
Resumo:
[EU]Lan honetan semantika distribuzionalaren eta ikasketa automatikoaren erabilera aztertzen dugu itzulpen automatiko estatistikoa hobetzeko. Bide horretan, erregresio logistikoan oinarritutako ikasketa automatikoko eredu bat proposatzen dugu hitz-segiden itzulpen- probabilitatea modu dinamikoan modelatzeko. Proposatutako eredua itzulpen automatiko estatistikoko ohiko itzulpen-probabilitateen orokortze bat dela frogatzen dugu, eta testuinguruko nahiz semantika distribuzionaleko informazioa barneratzeko baliatu ezaugarri lexiko, hitz-cluster eta hitzen errepresentazio bektorialen bidez. Horretaz gain, semantika distribuzionaleko ezagutza itzulpen automatiko estatistikoan txertatzeko beste hurbilpen bat lantzen dugu: hitzen errepresentazio bektorial elebidunak erabiltzea hitz-segiden itzulpenen antzekotasuna modelatzeko. Gure esperimentuek proposatutako ereduen baliagarritasuna erakusten dute, emaitza itxaropentsuak eskuratuz oinarrizko sistema sendo baten gainean. Era berean, gure lanak ekarpen garrantzitsuak egiten ditu errepresentazio bektorialen mapaketa elebidunei eta hitzen errepresentazio bektorialetan oinarritutako hitz-segiden antzekotasun neurriei dagokienean, itzulpen automatikoaz haratago balio propio bat dutenak semantika distribuzionalaren arloan.
Resumo:
DIANA es un proyecto coordinado en el que participan el grupo de Ingeniería del Lenguaje Natural y Reconocimiento de Formas (ELiRF) de la Universitat Politècnica de València y el grupo Centre de Llenguatge i Computació (CLiC) de la Universitat de Barcelona. Se trata de un proyecto del programa de I+D (TIN2012-38603) financiado por el Ministerio de Economía y Competitividad. Paolo Rosso coordina el proyecto DIANA y lidera el subproyecto DIANA-Applications y M. Antònia Martí lidera el subproyecto DIANA-Constructions.
Resumo:
1. Species distribution modelling is used increasingly in both applied and theoretical research to predict how species are distributed and to understand attributes of species' environmental requirements. In species distribution modelling, various statistical methods are used that combine species occurrence data with environmental spatial data layers to predict the suitability of any site for that species. While the number of data sharing initiatives involving species' occurrences in the scientific community has increased dramatically over the past few years, various data quality and methodological concerns related to using these data for species distribution modelling have not been addressed adequately. 2. We evaluated how uncertainty in georeferences and associated locational error in occurrences influence species distribution modelling using two treatments: (1) a control treatment where models were calibrated with original, accurate data and (2) an error treatment where data were first degraded spatially to simulate locational error. To incorporate error into the coordinates, we moved each coordinate with a random number drawn from the normal distribution with a mean of zero and a standard deviation of 5 km. We evaluated the influence of error on the performance of 10 commonly used distributional modelling techniques applied to 40 species in four distinct geographical regions. 3. Locational error in occurrences reduced model performance in three of these regions; relatively accurate predictions of species distributions were possible for most species, even with degraded occurrences. Two species distribution modelling techniques, boosted regression trees and maximum entropy, were the best performing models in the face of locational errors. The results obtained with boosted regression trees were only slightly degraded by errors in location, and the results obtained with the maximum entropy approach were not affected by such errors. 4. Synthesis and applications. To use the vast array of occurrence data that exists currently for research and management relating to the geographical ranges of species, modellers need to know the influence of locational error on model quality and whether some modelling techniques are particularly robust to error. We show that certain modelling techniques are particularly robust to a moderate level of locational error and that useful predictions of species distributions can be made even when occurrence data include some error.
Resumo:
Low concentrations of elements in geochemical analyses have the peculiarity of beingcompositional data and, for a given level of significance, are likely to be beyond thecapabilities of laboratories to distinguish between minute concentrations and completeabsence, thus preventing laboratories from reporting extremely low concentrations of theanalyte. Instead, what is reported is the detection limit, which is the minimumconcentration that conclusively differentiates between presence and absence of theelement. A spatially distributed exhaustive sample is employed in this study to generateunbiased sub-samples, which are further censored to observe the effect that differentdetection limits and sample sizes have on the inference of population distributionsstarting from geochemical analyses having specimens below detection limit (nondetects).The isometric logratio transformation is used to convert the compositional data in thesimplex to samples in real space, thus allowing the practitioner to properly borrow fromthe large source of statistical techniques valid only in real space. The bootstrap method isused to numerically investigate the reliability of inferring several distributionalparameters employing different forms of imputation for the censored data. The casestudy illustrates that, in general, best results are obtained when imputations are madeusing the distribution best fitting the readings above detection limit and exposes theproblems of other more widely used practices. When the sample is spatially correlated, itis necessary to combine the bootstrap with stochastic simulation
Distributional Issues in Regulatory Policy Implementation : the Case of Air Quality Control Policies
Resumo:
We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyse the ratios of the data values. The usual approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property. This weighted log-ratio analysis is theoretically equivalent to spectral mapping , a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modelling. The weighted log-ratio methodology is applied here to frequency data in linguistics and to chemical compositional data in archaeology.
Resumo:
Here, we investigate the geographical constancy in the specificity level of the specialized lure-and-trap pollination antagonism involving the widespread European Arum maculatum and its associated Psychodid pollinators. Until now, studies concurred in demonstrating that one single insect species, Psychoda phalaenoides, efficiently cross-pollinated plants; researches were, however, performed locally in western Europe. In this study we characterize for the first time the flower visitors' composition at the scale of the distribution range of A. maculatum by intensively collecting plants and insects throughout the European continent. We further correlate local climatic characteristics with the community composition of visiting arthropods.Our results show that flowers are generally visited by P. phalaenoides females, but not over the whole distribution range of the plant. In some regions this fly species is less frequent or even absent and another species, Psycha grisescens, becomes the prevailing visitor. This variability is geographically structured and can be explained by climatic factors: the proportion of P. grisescens increases with higher annual precipitations and lower precipitations in the warmest trimester, two characteristics typical of the Mediterranean zone. Climate thus seems driving the specificity of this interaction, by potentially affecting the phenology of one or both interacting species, or even of volatile and heat production in the plant. This result therefore challenges the specificity of other presumably one-to-one interactions covering wide distribution ranges, and provides an example of the direct effect that the abiotic environment can have on the fate of plant-insect interactions.
Resumo:
Low concentrations of elements in geochemical analyses have the peculiarity of being compositional data and, for a given level of significance, are likely to be beyond the capabilities of laboratories to distinguish between minute concentrations and complete absence, thus preventing laboratories from reporting extremely low concentrations of the analyte. Instead, what is reported is the detection limit, which is the minimum concentration that conclusively differentiates between presence and absence of the element. A spatially distributed exhaustive sample is employed in this study to generate unbiased sub-samples, which are further censored to observe the effect that different detection limits and sample sizes have on the inference of population distributions starting from geochemical analyses having specimens below detection limit (nondetects). The isometric logratio transformation is used to convert the compositional data in the simplex to samples in real space, thus allowing the practitioner to properly borrow from the large source of statistical techniques valid only in real space. The bootstrap method is used to numerically investigate the reliability of inferring several distributional parameters employing different forms of imputation for the censored data. The case study illustrates that, in general, best results are obtained when imputations are made using the distribution best fitting the readings above detection limit and exposes the problems of other more widely used practices. When the sample is spatially correlated, it is necessary to combine the bootstrap with stochastic simulation
Resumo:
Aquesta tesi tracta la jerarquia i l'heterogeneïtat dels sistemes fluvials que afecten l'estructura de les comunitats bentòniques de diatomees. A nivell regional, es van buscar diferents grups de punts i les seves espècies indicadores, es va estudiar la resposta de les comunitats de diatomees als gradients ambientals, es va avaluar la utilitat de diferents índexs de diatomees i es va buscar el millor sistema de classificació per a condicions de referència. A nivell de conca, es volien definir els factors que determinen la distribució longitudinal de la diversitat de les comunitats de diatomees. Finalment, a nivell d'hàbitat es van determinar quins factors afecten les algues i els cianobacteris a aquesta escala i es va examinar la contribució relativa de l'ambient i l'espai en la distribució de la biomassa i composició d'algues i cianobacteris. Per tant, els diferents capítols d'aquesta tesi han estat desenvolupats seguint aquest esquema.
Resumo:
Aim To evaluate whether observed geographical shifts in the distribution of the blue-winged macaw (Primolius maracana) are related to ongoing processes of global climate change. This species is vulnerable to extinction and has shown striking range retractions in recent decades, withdrawing broadly from southern portions of its historical distribution. Its range reduction has generally been attributed to the effects of habitat loss; however, as this species has also disappeared from large forested areas, consideration of other factors that may act in concert is merited.Location Historical distribution of the blue-winged macaw in Brazil, eastern Paraguay and northern Argentina.Methods We used a correlative approach to test a hypothesis of causation of observed shifts by reduction of habitable areas mediated by climate change. We developed models of the ecological niche requirements of the blue-winged macaw, based on point-occurrence data and climate scenarios for pre-1950 and post-1950 periods, and tested model predictivity for anticipating geographical distributions within time periods. Then we projected each model to the other time period and compared distributions predicted under both climate scenarios to assess shifts of habitable areas across decades and to evaluate an explanation for observed range retractions.Results Differences between predicted distributions of the blue-winged macaw over the twentieth century were, in general, minor and no change in suitability of landscapes was predicted across large areas of the species' original range in different time periods. No tendency towards range retraction in the south was predicted, rather conditions in the southern part of the species' range tended to show improvement for the species.Main conclusions Our test permitted elimination of climate change as a likely explanation for the observed shifts in the distribution of the blue-winged macaw, and points rather to other causal explanations (e.g. changing regional land use, emerging diseases).
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)