962 resultados para Data quality problems
Resumo:
Background and purpose Survey data quality is a combination of the representativeness of the sample, the accuracy and precision of measurements, data processing and management with several subcomponents in each. The purpose of this paper is to show how, in the final risk factor surveys of the WHO MONICA Project, information on data quality were obtained, quantified, and used in the analysis. Methods and results In the WHO MONICA (Multinational MONItoring of trends and determinants in CArdiovascular disease) Project, the information about the data quality components was documented in retrospective quality assessment reports. On the basis of the documented information and the survey data, the quality of each data component was assessed and summarized using quality scores. The quality scores were used in sensitivity testing of the results both by excluding populations with low quality scores and by weighting the data by its quality scores. Conclusions Detailed documentation of all survey procedures with standardized protocols, training, and quality control are steps towards optimizing data quality. Quantifying data quality is a further step. Methods used in the WHO MONICA Project could be adopted to improve quality in other health surveys.
Resumo:
Even when data repositories exhibit near perfect data quality, users may formulate queries that do not correspond to the information requested. Users poor information retrieval performance may arise from either problems understanding of the data models that represent the real world systems, or their query skills. This research focuses on users understanding of the data structures, i.e., their ability to map the information request and the data model. The Bunge-Wand-Weber ontology was used to formulate three sets of hypotheses. Two laboratory experiments (one using a small data model and one using a larger data model) tested the effect of ontological clarity on users performance when undertaking component, record, and aggregate level tasks. The results indicate for the hypotheses associated with different representations but equivalent semantics that parsimonious data model participants performed better for component level tasks but that ontologically clearer data model participants performed better for record and aggregate level tasks.
Resumo:
In order to survive in the increasingly customer-oriented marketplace, continuous quality improvement marks the fastest growing quality organizations success. In recent years, attention has been focused on intelligent systems which have shown great promise in supporting quality control. However, only a small number of the currently used systems are reported to be operating effectively because they are designed to maintain a quality level within the specified process, rather than to focus on cooperation within the production workflow. This paper proposes an intelligent system with a newly designed algorithm and the universal process data exchange standard to overcome the challenges of demanding customers who seek high-quality and low-cost products. The intelligent quality management system is equipped with the distributed process mining feature to provide all levels of employees with the ability to understand the relationships between processes, especially when any aspect of the process is going to degrade or fail. An example of generalized fuzzy association rules are applied in manufacturing sector to demonstrate how the proposed iterative process mining algorithm finds the relationships between distributed process parameters and the presence of quality problems.
Resumo:
Substantial altimetry datasets collected by different satellites have only become available during the past five years, but the future will bring a variety of new altimetry missions, both parallel and consecutive in time. The characteristics of each produced dataset vary with the different orbital heights and inclinations of the spacecraft, as well as with the technical properties of the radar instrument. An integral analysis of datasets with different properties offers advantages both in terms of data quantity and data quality. This thesis is concerned with the development of the means for such integral analysis, in particular for dynamic solutions in which precise orbits for the satellites are computed simultaneously. The first half of the thesis discusses the theory and numerical implementation of dynamic multi-satellite altimetry analysis. The most important aspect of this analysis is the application of dual satellite altimetry crossover points as a bi-directional tracking data type in simultaneous orbit solutions. The central problem is that the spatial and temporal distributions of the crossovers are in conflict with the time-organised nature of traditional solution methods. Their application to the adjustment of the orbits of both satellites involved in a dual crossover therefore requires several fundamental changes of the classical least-squares prediction/correction methods. The second part of the thesis applies the developed numerical techniques to the problems of precise orbit computation and gravity field adjustment, using the altimetry datasets of ERS-1 and TOPEX/Poseidon. Although the two datasets can be considered less compatible that those of planned future satellite missions, the obtained results adequately illustrate the merits of a simultaneous solution technique. In particular, the geographically correlated orbit error is partially observable from a dataset consisting of crossover differences between two sufficiently different altimetry datasets, while being unobservable from the analysis of altimetry data of both satellites individually. This error signal, which has a substantial gravity-induced component, can be employed advantageously in simultaneous solutions for the two satellites in which also the harmonic coefficients of the gravity field model are estimated.
Resumo:
The evaluation of geospatial data quality and trustworthiness presents a major challenge to geospatial data users when making a dataset selection decision. The research presented here therefore focused on defining and developing a GEO label a decision support mechanism to assist data users in efficient and effective geospatial dataset selection on the basis of quality, trustworthiness and fitness for use. This thesis thus presents six phases of research and development conducted to: (a) identify the informational aspects upon which users rely when assessing geospatial dataset quality and trustworthiness; (2) elicit initial user views on the GEO label role in supporting dataset comparison and selection; (3) evaluate prototype label visualisations; (4) develop a Web service to support GEO label generation; (5) develop a prototype GEO label-based dataset discovery and intercomparison decision support tool; and (6) evaluate the prototype tool in a controlled human-subject study. The results of the studies revealed, and subsequently confirmed, eight geospatial data informational aspects that were considered important by users when evaluating geospatial dataset quality and trustworthiness, namely: producer information, producer comments, lineage information, compliance with standards, quantitative quality information, user feedback, expert reviews, and citations information. Following an iterative user-centred design (UCD) approach, it was established that the GEO label should visually summarise availability and allow interrogation of these key informational aspects. A Web service was developed to support generation of dynamic GEO label representations and integrated into a number of real-world GIS applications. The service was also utilised in the development of the GEO LINC tool a GEO label-based dataset discovery and intercomparison decision support tool. The results of the final evaluation study indicated that (a) the GEO label effectively communicates the availability of dataset quality and trustworthiness information and (b) GEO LINC successfully facilitates at a glance dataset intercomparison and fitness for purpose-based dataset selection.
Resumo:
The speed with which data has moved from being scarce, expensive and valuable, thus justifying detailed and careful verification and analysis to a situation where the streams of detailed data are almost too large to handle has caused a series of shifts to occur. Legal systems already have severe problems keeping up with, or even in touch with, the rate at which unexpected outcomes flow from information technology. The capacity to harness massive quantities of existing data has driven Big Data applications until recently. Now the data flows in real time are rising swiftly, become more invasive and offer monitoring potential that is eagerly sought by commerce and government alike. The ambiguities as to who own this often quite remarkably intrusive personal data need to be resolved and rapidly - but are likely to encounter rising resistance from industrial and commercial bodies who see this data flow as theirs. There have been many changes in ICT that has led to stresses in the resolution of the conflicts between IP exploiters and their customers, but this one is of a different scale due to the wide potential for individual customisation of pricing, identification and the rising commercial value of integrated streams of diverse personal data. A new reconciliation between the parties involved is needed. New business models, and a shift in the current confusions over who owns what data into alignments that are in better accord with the community expectations. After all they are the customers, and the emergence of information monopolies needs to be balanced by appropriate consumer/subject rights. This will be a difficult discussion, but one that is needed to realise the great benefits to all that are clearly available if these issues can be positively resolved. The customers need to make these data flow contestable in some form. These Big data flows are only going to grow and become ever more instructive. A better balance is necessary, For the first time these changes are directly affecting governance of democracies, as the very effective micro targeting tools deployed in recent elections have shown. Yet the data gathered is not available to the subjects. This is not a survivable social model. The Private Data Commons needs our help. Businesses and governments exploit big data without regard for issues of legality, data quality, disparate data meanings, and process quality. This often results in poor decisions, with individuals bearing the greatest risk. The threats harbored by big data extend far beyond the individual, however, and call for new legal structures, business processes, and concepts such as a Private Data Commons. This Web extra is the audio part of a video in which author Marcus Wigan expands on his article "Big Data's Big Unintended Consequences" and discusses how businesses and governments exploit big data without regard for issues of legality, data quality, disparate data meanings, and process quality. This often results in poor decisions, with individuals bearing the greatest risk. The threats harbored by big data extend far beyond the individual, however, and call for new legal structures, business processes, and concepts such as a Private Data Commons.
Resumo:
The term Artificial intelligence acquired a lot of baggage since its introduction and in its current incarnation is synonymous with Deep Learning. The sudden availability of data and computing resources has opened the gates to myriads of applications. Not all are created equal though, and problems might arise especially for fields not closely related to the tasks that pertain tech companies that spearheaded DL. The perspective of practitioners seems to be changing, however. Human-Centric AI emerged in the last few years as a new way of thinking DL and AI applications from the ground up, with a special attention at their relationship with humans. The goal is designing a system that can gracefully integrate in already established workflows, as in many real-world scenarios AI may not be good enough to completely replace its humans. Often this replacement may even be unneeded or undesirable. Another important perspective comes from, Andrew Ng, a DL pioneer, who recently started shifting the focus of development from better models towards better, and smaller, data. He defined his approach Data-Centric AI. Without downplaying the importance of pushing the state of the art in DL, we must recognize that if the goal is creating a tool for humans to use, more raw performance may not align with more utility for the final user. A Human-Centric approach is compatible with a Data-Centric one, and we find that the two overlap nicely when human expertise is used as the driving force behind data quality. This thesis documents a series of case-studies where these approaches were employed, to different extents, to guide the design and implementation of intelligent systems. We found human expertise proved crucial in improving datasets and models. The last chapter includes a slight deviation, with studies on the pandemic, still preserving the human and data centric perspective.
Resumo:
Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.
Resumo:
Much information on flavonoid content of Brazilian foods has already been obtained; however, this information is spread in scientific publications and non-published data. The objectives of this work were to compile and evaluate the quality of national flavonoid data according to the United States Department of Agriculture`s Data Quality Evaluation System (USDA-DQES) with few modifications, for future dissemination in the TBCA-USP (Brazilian Food Composition Database). For the compilation, the most abundant compounds in the flavonoid subclasses were considered (flavonols, flavones, isoflavones, flavanones, flavan-3-ols, and anthocyanidins) and the analysis of the compounds by HPLC was adopted as criteria for data inclusion. The evaluation system considers five categories, and the maximum score assigned to each category is 20. For each data, a confidence code (CC) was attributed (A, B, C and D), indicating the quality and reliability of the information. Flavonoid data (773) present in 197 Brazilian foods were evaluated. The CC ""C"" (as average) was attributed to 99% of the data and ""B"" (above average) to 1%. The main categories assigned low average scores were: number of samples; sampling plan and analytical quality control (average scores 2, 5 and 4, respectively). The analytical method category received an average score of 9. The category assigned the highest score was the sample handling (20 average). These results show that researchers need to be conscious about the importance of the number and plan of evaluated samples and the complete description and documentation of all the processes of methodology execution and analytical quality control. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
The Brazilian Network of Food Data Systems (BRASILFOODS) has been keeping the Brazilian Food Composition Database-USP (TBCA-USP) (http://www.fcf.usp.br/tabela) since 1998. Besides the constant compilation, analysis and update work in the database, the network tries to innovate through the introduction of food information that may contribute to decrease the risk for non-transmissible chronic diseases, such as the profile of carbohydrates and flavonoids in foods. In 2008, data on carbohydrates, individually analyzed, of 112 foods, and 41 data related to the glycemic response produced by foods widely consumed in the country were included in the TBCA-USP. Data (773) about the different flavonoid subclasses of 197 Brazilian foods were compiled and the quality of each data was evaluated according to the USDAs data quality evaluation system. In 2007, BRASILFOODS/USP and INFOODS/FAO organized the 7th International Food Data Conference ""Food Composition and Biodiversity"". This conference was a unique opportunity for interaction between renowned researchers and participants from several countries and it allowed the discussion of aspects that may improve the food composition area. During the period, the LATINFOODS Regional Technical Compilation Committee and BRASILFOODS disseminated to Latin America the Form and Manual for Data Compilation, version 2009, ministered a Food Composition Data Compilation course and developed many activities related to data production and compilation. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
Arriving in Brisbane some six years ago, I could not help being impressed by what may be prosaically described as its atmospheric amenity resources. Perhaps this in part was due to my recent experiences in major urban centres in North America, but since that time, that sparkling quality and the blue skies seem to have progressively diminished. Unfortunately, there is also objective evidence available to suggest that this apparent deterioration is not merely the result of habituation of the senses. Air pollution data for the city show trends of increasing concentrations of those very substances that have destroyed the attractiveness of major population centres elsewhere, with climates initially as salubrious. Indeed, present figures indicate that photochemical smog in unacceptably high concentrations is rapidly becoming endemic also over Brisbane. These regrettable developments should come as no surprise. The society at large has not been inclined to respond purposefully to warnings of impending environmental problems, despite the experiences and publicity from overseas and even from other cities within Australia. Nor, up to the present, have certain politicians and government officials displayed stances beyond those necessary for the maintenance of a decorum of concern. At this stage, there still exists the possibility for meaningful government action without the embarrassment of losing political favour with the electorate. To the contrary, there is every chance that such action may be turned to advantage with increased public enlightenment. It would be more than a pity to miss perhaps the final remaining opportunity: Queensland is one of the few remaining places in the world with sufficient resources to permit both rational development and high environmental quality. The choice appears to be one of making a relatively minor investment now for a large financial and social gain the near future, or, permitting Brisbane to degenerate gradually into just another stagnated Los Angeles or Sydney. The present monograph attempts to introduce the problem by reviewing the available research on air quality in the Brisbane area. It also tries to elucidate some seemingly obvious, but so far unapplied management approaches. By necessity, such a broad treatment needs to make inroads into extensive ranges of subject areas, including political and legal practices to public perceptions, scientific measurement and statistical analysis to dynamics of air flow. Clearly, it does not pretend to be definitive in any of these fields, but it does try to emphasize those adjustable facets of the human use system of natural resources, too often neglected in favour of air pollution control technology. The crossing of disciplinary boundaries, however, needs no apology: air quality problems are ubiquitous, touching upon space, time and human interaction.
Resumo:
With the proliferation of relational database programs for PC's and other platforms, many business end-users are creating, maintaining, and querying their own databases. More importantly, business end-users use the output of these queries as the basis for operational, tactical, and strategic decisions. Inaccurate data reduce the expected quality of these decisions. Implementing various input validation controls, including higher levels of normalisation, can reduce the number of data anomalies entering the databases. Even in well-maintained databases, however, data anomalies will still accumulate. To improve the quality of data, databases can be queried periodically to locate and correct anomalies. This paper reports the results of two experiments that investigated the effects of different data structures on business end-users' abilities to detect data anomalies in a relational database. The results demonstrate that both unnormalised and higher levels of normalisation lower the effectiveness and efficiency of queries relative to the first normal form. First normal form databases appear to provide the most effective and efficient data structure for business end-users formulating queries to detect data anomalies.
Resumo:
Many municipal activities require updated large-scale maps that include both topographic and thematic information. For this purpose, the efficient use of very high spatial resolution (VHR) satellite imagery suggests the development of approaches that enable a timely discrimination, counting and delineation of urban elements according to legal technical specifications and quality standards. Therefore, the nature of this data source and expanding range of applications calls for objective methods and quantitative metrics to assess the quality of the extracted information which go beyond traditional thematic accuracy alone. The present work concerns the development and testing of a new approach for using technical mapping standards in the quality assessment of buildings automatically extracted from VHR satellite imagery. Feature extraction software was employed to map buildings present in a pansharpened QuickBird image of Lisbon. Quality assessment was exhaustive and involved comparisons of extracted features against a reference data set, introducing cartographic constraints from scales 1:1000, 1:5000, and 1:10,000. The spatial data quality elements subject to evaluation were: thematic (attribute) accuracy, completeness, and geometric quality assessed based on planimetric deviation from the reference map. Tests were developed and metrics analyzed considering thresholds and standards for the large mapping scales most frequently used by municipalities. Results show that values for completeness varied with mapping scales and were only slightly superior for scale 1:10,000. Concerning the geometric quality, a large percentage of extracted features met the strict topographic standards of planimetric deviation for scale 1:10,000, while no buildings were compliant with the specification for scale 1:1000.
Resumo:
SUMMARYSpecies distribution models (SDMs) represent nowadays an essential tool in the research fields of ecology and conservation biology. By combining observations of species occurrence or abundance with information on the environmental characteristic of the observation sites, they can provide information on the ecology of species, predict their distributions across the landscape or extrapolate them to other spatial or time frames. The advent of SDMs, supported by geographic information systems (GIS), new developments in statistical models and constantly increasing computational capacities, has revolutionized the way ecologists can comprehend species distributions in their environment. SDMs have brought the tool that allows describing species realized niches across a multivariate environmental space and predict their spatial distribution. Predictions, in the form of probabilistic maps showing the potential distribution of the species, are an irreplaceable mean to inform every single unit of a territory about its biodiversity potential. SDMs and the corresponding spatial predictions can be used to plan conservation actions for particular species, to design field surveys, to assess the risks related to the spread of invasive species, to select reserve locations and design reserve networks, and ultimately, to forecast distributional changes according to scenarios of climate and/or land use change.By assessing the effect of several factors on model performance and on the accuracy of spatial predictions, this thesis aims at improving techniques and data available for distribution modelling and at providing the best possible information to conservation managers to support their decisions and action plans for the conservation of biodiversity in Switzerland and beyond. Several monitoring programs have been put in place from the national to the global scale, and different sources of data now exist and start to be available to researchers who want to model species distribution. However, because of the lack of means, data are often not gathered at an appropriate resolution, are sampled only over limited areas, are not spatially explicit or do not provide a sound biological information. A typical example of this is data on 'habitat' (sensu biota). Even though this is essential information for an effective conservation planning, it often has to be approximated from land use, the closest available information. Moreover, data are often not sampled according to an established sampling design, which can lead to biased samples and consequently to spurious modelling results. Understanding the sources of variability linked to the different phases of the modelling process and their importance is crucial in order to evaluate the final distribution maps that are to be used for conservation purposes.The research presented in this thesis was essentially conducted within the framework of the Landspot Project, a project supported by the Swiss National Science Foundation. The main goal of the project was to assess the possible contribution of pre-modelled 'habitat' units to model the distribution of animal species, in particular butterfly species, across Switzerland. While pursuing this goal, different aspects of data quality, sampling design and modelling process were addressed and improved, and implications for conservation discussed. The main 'habitat' units considered in this thesis are grassland and forest communities of natural and anthropogenic origin as defined in the typology of habitats for Switzerland. These communities are mainly defined at the phytosociological level of the alliance. For the time being, no comprehensive map of such communities is available at the national scale and at fine resolution. As a first step, it was therefore necessary to create distribution models and maps for these communities across Switzerland and thus to gather and collect the necessary data. In order to reach this first objective, several new developments were necessary such as the definition of expert models, the classification of the Swiss territory in environmental domains, the design of an environmentally stratified sampling of the target vegetation units across Switzerland, the development of a database integrating a decision-support system assisting in the classification of the relevs, and the downscaling of the land use/cover data from 100 m to 25 m resolution.The main contributions of this thesis to the discipline of species distribution modelling (SDM) are assembled in four main scientific papers. In the first, published in Journal of Riogeography different issues related to the modelling process itself are investigated. First is assessed the effect of five different stepwise selection methods on model performance, stability and parsimony, using data of the forest inventory of State of Vaud. In the same paper are also assessed: the effect of weighting absences to ensure a prevalence of 0.5 prior to model calibration; the effect of limiting absences beyond the environmental envelope defined by presences; four different methods for incorporating spatial autocorrelation; and finally, the effect of integrating predictor interactions. Results allowed to specifically enhance the GRASP tool (Generalized Regression Analysis and Spatial Predictions) that now incorporates new selection methods and the possibility of dealing with interactions among predictors as well as spatial autocorrelation. The contribution of different sources of remotely sensed information to species distribution models was also assessed. The second paper (to be submitted) explores the combined effects of sample size and data post-stratification on the accuracy of models using data on grassland distribution across Switzerland collected within the framework of the Landspot project and supplemented with other important vegetation databases. For the stratification of the data, different spatial frameworks were compared. In particular, environmental stratification by Swiss Environmental Domains was compared to geographical stratification either by biogeographic regions or political states (cantons). The third paper (to be submitted) assesses the contribution of pre- modelled vegetation communities to the modelling of fauna. It is a two-steps approach that combines the disciplines of community ecology and spatial ecology and integrates their corresponding concepts of habitat. First are modelled vegetation communities per se and then these 'habitat' units are used in order to model animal species habitat. A case study is presented with grassland communities and butterfly species. Different ways of integrating vegetation information in the models of butterfly distribution were also evaluated. Finally, a glimpse to climate change is given in the fourth paper, recently published in Ecological Modelling. This paper proposes a conceptual framework for analysing range shifts, namely a catalogue of the possible patterns of change in the distribution of a species along elevational or other environmental gradients and an improved quantitative methodology to identify and objectively describe these patterns. The methodology was developed using data from the Swiss national common breeding bird survey and the article presents results concerning the observed shifts in the elevational distribution of breeding birds in Switzerland.The overall objective of this thesis is to improve species distribution models as potential inputs for different conservation tools (e.g. red lists, ecological networks, risk assessment of the spread of invasive species, vulnerability assessment in the context of climate change). While no conservation issues or tools are directly tested in this thesis, the importance of the proposed improvements made in species distribution modelling is discussed in the context of the selection of reserve networks.RESUMELes modles de distribution d'espces (SDMs) reprsentent aujourd'hui un outil essentiel dans les domaines de recherche de l'cologie et de la biologie de la conservation. En combinant les observations de la prsence des espces ou de leur abondance avec des informations sur les caractristiques environnementales des sites d'observation, ces modles peuvent fournir des informations sur l'cologie des espces, prdire leur distribution travers le paysage ou l'extrapoler dans l'espace et le temps. Le dploiement des SDMs, soutenu par les systmes d'information gographique (SIG), les nouveaux dveloppements dans les modles statistiques, ainsi que la constante augmentation des capacits de calcul, a rvolutionn la faon dont les cologistes peuvent comprendre la distribution des espces dans leur environnement. Les SDMs ont apport l'outil qui permet de dcrire la niche ralise des espces dans un espace environnemental multivari et prdire leur distribution spatiale. Les prdictions, sous forme de carte probabilistes montrant la distribution potentielle de l'espce, sont un moyen irremplaable d'informer chaque unit du territoire de sa biodiversit potentielle. Les SDMs et les prdictions spatiales correspondantes peuvent tre utiliss pour planifier des mesures de conservation pour des espces particulires, pour concevoir des plans d'chantillonnage, pour valuer les risques lis la propagation d'espces envahissantes, pour choisir l'emplacement de rserves et les mettre en rseau, et finalement, pour prvoir les changements de rpartition en fonction de scnarios de changement climatique et/ou d'utilisation du sol. En valuant l'effet de plusieurs facteurs sur la performance des modles et sur la prcision des prdictions spatiales, cette thse vise amliorer les techniques et les donnes disponibles pour la modlisation de la distribution des espces et fournir la meilleure information possible aux gestionnaires pour appuyer leurs dcisions et leurs plans d'action pour la conservation de la biodiversit en Suisse et au-del. Plusieurs programmes de surveillance ont t mis en place de l'chelle nationale l'chelle globale, et diffrentes sources de donnes sont dsormais disponibles pour les chercheurs qui veulent modliser la distribution des espces. Toutefois, en raison du manque de moyens, les donnes sont souvent collectes une rsolution inapproprie, sont chantillonnes sur des zones limites, ne sont pas spatialement explicites ou ne fournissent pas une information cologique suffisante. Un exemple typique est fourni par les donnes sur 'l'habitat' (sensu biota). Mme s'il s'agit d'une information essentielle pour des mesures de conservation efficaces, elle est souvent approxime par l'utilisation du sol, l'information qui s'en approche le plus. En outre, les donnes ne sont souvent pas chantillonnes selon un plan d'chantillonnage tabli, ce qui biaise les chantillons et par consquent les rsultats de la modlisation. Comprendre les sources de variabilit lies aux diffrentes phases du processus de modlisation s'avre crucial afin d'valuer l'utilisation des cartes de distribution prdites des fins de conservation.La recherche prsente dans cette thse a t essentiellement mene dans le cadre du projet Landspot, un projet soutenu par le Fond National Suisse pour la Recherche. L'objectif principal de ce projet tait d'valuer la contribution d'units 'd'habitat' pr-modlises pour modliser la rpartition des espces animales, notamment de papillons, travers la Suisse. Tout en poursuivant cet objectif, diffrents aspects touchant la qualit des donnes, au plan d'chantillonnage et au processus de modlisation sont abords et amliors, et leurs implications pour la conservation des espces discutes. Les principaux 'habitats' considrs dans cette thse sont des communauts de prairie et de fort d'origine naturelle et anthropique telles que dfinies dans la typologie des habitats de Suisse. Ces communauts sont principalement dfinies au niveau phytosociologique de l'alliance. Pour l'instant aucune carte de la distribution de ces communauts n'est disponible l'chelle nationale et rsolution fine. Dans un premier temps, il a donc t ncessaire de crer des modles de distribution de ces communauts travers la Suisse et par consquent de recueillir les donnes ncessaires. Afin d'atteindre ce premier objectif, plusieurs nouveaux dveloppements ont t ncessaires, tels que la dfinition de modles experts, la classification du territoire suisse en domaines environnementaux, la conception d'un chantillonnage environnementalement stratifi des units de vgtation cibles dans toute la Suisse, la cration d'une base de donnes intgrant un systme d'aide la dcision pour la classification des relevs, et le downscaling des donnes de couverture du sol de 100 m 25 m de rsolution. Les principales contributions de cette thse la discipline de la modlisation de la distribution d'espces (SDM) sont rassembles dans quatre articles scientifiques. Dans le premier article, publi dans le Journal of Biogeography, diffrentes questions lies au processus de modlisation sont tudies en utilisant les donnes de l'inventaire forestier de l'Etat de Vaud. Tout d'abord sont valus les effets de cinq mthodes de slection pas--pas sur la performance, la stabilit et la parcimonie des modles. Dans le mme article sont galement valus: l'effet de la pondration des absences afin d'assurer une prvalence de 0.5 lors de la calibration du modle; l'effet de limiter les absences au-del de l'enveloppe dfinie par les prsences; quatre mthodes diffrentes pour l'intgration de l'autocorrlation spatiale; et enfin, l'effet de l'intgration d'interactions entre facteurs. Les rsultats prsents dans cet article ont permis d'amliorer l'outil GRASP qui intgre dsonnais de nouvelles mthodes de slection et la possibilit de traiter les interactions entre variables explicatives, ainsi que l'autocorrlation spatiale. La contribution de diffrentes sources de donnes issues de la tldtection a galement t value. Le deuxime article (en voie de soumission) explore les effets combins de la taille de l'chantillon et de la post-stratification sur le la prcision des modles. Les donnes utilises ici sont celles concernant la rpartition des prairies de Suisse recueillies dans le cadre du projet Landspot et compltes par d'autres sources. Pour la stratification des donnes, diffrents cadres spatiaux ont t compars. En particulier, la stratification environnementale par les domaines environnementaux de Suisse a t compare la stratification gographique par les rgions biogographiques ou par les cantons. Le troisime article (en voie de soumission) value la contribution de communauts vgtales pr-modlises la modlisation de la faune. C'est une approche en deux tapes qui combine les disciplines de l'cologie des communauts et de l'cologie spatiale en intgrant leurs concepts de 'habitat' respectifs. Les communauts vgtales sont modlises d'abord, puis ces units de 'habitat' sont utilises pour modliser les espces animales. Une tude de cas est prsente avec des communauts prairiales et des espces de papillons. Diffrentes faons d'intgrer l'information sur la vgtation dans les modles de rpartition des papillons sont values. Enfin, un clin d'oeil aux changements climatiques dans le dernier article, publi dans Ecological Modelling. Cet article propose un cadre conceptuel pour l'analyse des changements dans la distribution des espces qui comprend notamment un catalogue des diffrentes formes possibles de changement le long d'un gradient d'lvation ou autre gradient environnemental, et une mthode quantitative amliore pour identifier et dcrire ces dplacements. Cette mthodologie a t dveloppe en utilisant des donnes issues du monitoring des oiseaux nicheurs rpandus et l'article prsente les rsultats concernant les dplacements observs dans la distribution altitudinale des oiseaux nicheurs en Suisse.L'objectif gnral de cette thse est d'amliorer les modles de distribution des espces en tant que source d'information possible pour les diffrents outils de conservation (par exemple, listes rouges, rseaux cologiques, valuation des risques de propagation d'espces envahissantes, valuation de la vulnrabilit des espces dans le contexte de changement climatique). Bien que ces questions de conservation ne soient pas directement testes dans cette thse, l'importance des amliorations proposes pour la modlisation de la distribution des espces est discute la fin de ce travail dans le contexte de la slection de rseaux de rserves.
Resumo:
The EHLASS survey was set up in April 1986 as a five-year demonstration project. The objective was to monitor home and leisure accidents in a harmonised manner, throughout the EU, to determine their causes, the circumstances of their occurrence, their consequences and, most importantly, to provide information on consumer products involved. Armed with accurate information, it was felt that consumer policy could be directed at the most serious problems andthe best use could be made of available resources. Data collection systems were set up for the collection of EHLASS data in the casualty departments of selected hospitals in each of the member states. The information was subsequently gathered together by the European Commission in Brussels. Extensive analysis was undertaken on 778,838 accidents reported throughout the EU. Centralised analysis of EHLASS data proved problematic due to lack of co-ordination in data quality. In 1989 it was decided that each member state should produce its own annual EHLASS report in a harmonised format specified by the European Commission. This report is the ninth such report for Ireland. Download the Report here