951 resultados para Spatial Data Quality
Resumo:
This dissertation documents the everyday lives and spaces of a population of youth typically constructed as out of place, and the broader urban context in which they are rendered as such. Thirty-three female and transgender street youth participated in the development of this youth-based participatory action research (YPAR) project utilizing geo-ethnographic methods, auto-photography, and archival research throughout a six-phase, eighteen-month research process in Bogotá, Colombia. ^ This dissertation details the participatory writing process that enabled the YPAR research team to destabilize dominant representations of both street girls and urban space and the participatory mapping process that enabled the development of a youth vision of the city through cartographic images. The maps display individual and aggregate spatial data indicating trends within and making comparisons between three subgroups of the research population according to nine spatial variables. These spatial data, coupled with photographic and ethnographic data, substantiate that street girls’ mobilities and activity spaces intersect with and are altered by state-sponsored urban renewal projects and paramilitary-led social cleansing killings, both efforts to clean up Bogotá by purging the city center of deviant populations and places. ^ Advancing an ethical approach to conducting research with excluded populations, this dissertation argues for the enactment of critical field praxis and care ethics within a YPAR framework to incorporate young people as principal research actors rather than merely voices represented in adultist academic discourse. Interjection of considerations of space, gender, and participation into the study of street youth produce new ways of envisioning the city and the role of young people in research. Instead of seeing the city from a panoptic view, Bogotá is revealed through the eyes of street youth who participated in the construction and feminist visualization of a new cartography and counter-map of the city grounded in embodied, situated praxis. This dissertation presents a socially responsible approach to conducting action-research with high-risk youth by documenting how street girls reclaim their right to the city on paper and in practice; through maps of their everyday exclusion in Bogotá followed by activism to fight against it.^
Resumo:
Modern geographical databases, which are at the core of geographic information systems (GIS), store a rich set of aspatial attributes in addition to geographic data. Typically, aspatial information comes in textual and numeric format. Retrieving information constrained on spatial and aspatial data from geodatabases provides GIS users the ability to perform more interesting spatial analyses, and for applications to support composite location-aware searches; for example, in a real estate database: “Find the nearest homes for sale to my current location that have backyard and whose prices are between $50,000 and $80,000”. Efficient processing of such queries require combined indexing strategies of multiple types of data. Existing spatial query engines commonly apply a two-filter approach (spatial filter followed by nonspatial filter, or viceversa), which can incur large performance overheads. On the other hand, more recently, the amount of geolocation data has grown rapidly in databases due in part to advances in geolocation technologies (e.g., GPS-enabled smartphones) that allow users to associate location data to objects or events. The latter poses potential data ingestion challenges of large data volumes for practical GIS databases. In this dissertation, we first show how indexing spatial data with R-trees (a typical data pre-processing task) can be scaled in MapReduce—a widely-adopted parallel programming model for data intensive problems. The evaluation of our algorithms in a Hadoop cluster showed close to linear scalability in building R-tree indexes. Subsequently, we develop efficient algorithms for processing spatial queries with aspatial conditions. Novel techniques for simultaneously indexing spatial with textual and numeric data are developed to that end. Experimental evaluations with real-world, large spatial datasets measured query response times within the sub-second range for most cases, and up to a few seconds for a small number of cases, which is reasonable for interactive applications. Overall, the previous results show that the MapReduce parallel model is suitable for indexing tasks in spatial databases, and the adequate combination of spatial and aspatial attribute indexes can attain acceptable response times for interactive spatial queries with constraints on aspatial data.
Resumo:
Construction organizations typically deal with large volumes of project data containing valuable information. It is found that these organizations do not use these data effectively for planning and decision-making. There are two reasons. First, the information systems in construction organizations are designed to support day-to-day construction operations. The data stored in these systems are often non-validated, non-integrated and are available in a format that makes it difficult for decision makers to use in order to make timely decisions. Second, the organizational structure and the IT infrastructure are often not compatible with the information systems thereby resulting in higher operational costs and lower productivity. These two issues have been investigated in this research with the objective of developing systems that are structured for effective decision-making. ^ A framework was developed to guide storage and retrieval of validated and integrated data for timely decision-making and to enable construction organizations to redesign their organizational structure and IT infrastructure matched with information system capabilities. The research was focused on construction owner organizations that were continuously involved in multiple construction projects. Action research and Data warehousing techniques were used to develop the framework. ^ One hundred and sixty-three construction owner organizations were surveyed in order to assess their data needs, data management practices and extent of use of information systems in planning and decision-making. For in-depth analysis, Miami-Dade Transit (MDT) was selected which is in-charge of all transportation-related construction projects in the Miami-Dade county. A functional model and a prototype system were developed to test the framework. The results revealed significant improvements in data management and decision-support operations that were examined through various qualitative (ease in data access, data quality, response time, productivity improvement, etc.) and quantitative (time savings and operational cost savings) measures. The research results were first validated by MDT and then by a representative group of twenty construction owner organizations involved in various types of construction projects. ^
Resumo:
Construction organizations typically deal with large volumes of project data containing valuable information. It is found that these organizations do not use these data effectively for planning and decision-making. There are two reasons. First, the information systems in construction organizations are designed to support day-to-day construction operations. The data stored in these systems are often non-validated, nonintegrated and are available in a format that makes it difficult for decision makers to use in order to make timely decisions. Second, the organizational structure and the IT infrastructure are often not compatible with the information systems thereby resulting in higher operational costs and lower productivity. These two issues have been investigated in this research with the objective of developing systems that are structured for effective decision-making. A framework was developed to guide storage and retrieval of validated and integrated data for timely decision-making and to enable construction organizations to redesign their organizational structure and IT infrastructure matched with information system capabilities. The research was focused on construction owner organizations that were continuously involved in multiple construction projects. Action research and Data warehousing techniques were used to develop the framework. One hundred and sixty-three construction owner organizations were surveyed in order to assess their data needs, data management practices and extent of use of information systems in planning and decision-making. For in-depth analysis, Miami-Dade Transit (MDT) was selected which is in-charge of all transportation-related construction projects in the Miami-Dade county. A functional model and a prototype system were developed to test the framework. The results revealed significant improvements in data management and decision-support operations that were examined through various qualitative (ease in data access, data quality, response time, productivity improvement, etc.) and quantitative (time savings and operational cost savings) measures. The research results were first validated by MDT and then by a representative group of twenty construction owner organizations involved in various types of construction projects.
Resumo:
This dissertation considered the development of two papers, both related to mortality in Brazil. In the first article, "The context of mortality according to the three broad groups of causes of death in Brazilian capitals, 2000 and 2010", the objective was to analyze the mortality rate according to the three major groups of causes of death in Brazilian capitals. In the second article, "Typology and characteristics of mortality from external causes in the municipalities in the Northeast of Brazil, 2000 and 2010", it was built up a typology for the Northeastern municipalities taking into account information on mortality from external causes and a set of indicators related to socioeconomic, demographic, and infrastructure aspects of such municipalities, both articles for the years 2000 and 2010. Thus, we used data from the Mortality Information System of the Ministry of Health. Furthermore, it was used information from the Demographic Census for those years. The variables relating to socioeconomic and demographic conditions used in this study were those available on the home page of the United Nations Program for Development. The variables relating to socioeconomic and demographic conditions used in this study were those available on the home page of the United Nations Program for Development. Was used in Article 1 the pro-rata distribution method to accomplish the redistribution of ill-defined causes. Moreover, made use of the technique of cluster analysis with the aim of grouping the capital that had proportions of deaths from ill-defined causes similar to each other. Already in Section 2, we used the technique of Empirical Bayesian estimation; spatial statistics technique; and finally, the Grade of Membership method to find types of municipalities from information on mortality from external causes associated with socioeconomic, demographic and infrastructure variables. As the main results, it stands out in Article 1, in relation to data quality, we observed the formation of four groups of similar capital between themselves, as the proportion of illdefined causes. Regarding the behavior of mortality, according to the three major groups of causes of death, it was noted both for 2000 and for 2010 the prevalence of deaths from noncommunicable diseases for both sexes, although the reduction was identified rates in some of the capitals. Communicable diseases stood out as the second cause of death among women. Also, we found that deaths due to external causes are responsible for the second cause of death among men, as well as presenting an increase among women. As for the Article 2, stands out, in general, not just an extension of mortality from external causes in the municipalities, as well as an enlargement of the configurator stain existence of external cause deaths for the whole area of Northeast. Regarding the typology of municipalities, three vi extreme profiles were buit: the profile 1, which comprises municipalities with high rates of mortality from external causes and the best social indicators; the profile 2, that was composed of municipalities that are characterized by having low mortality rates from external causes and the lowest social indicators; and the profile 3, that brings together municipalities with intermediate mortality rates and median values considered in relation to social indicators. Although we have not seen changes in the characteristics of the profiles, we observed an increase in the proportion of municipalities that belong to the extreme profile 3, taking into account the mixed profiles.
Resumo:
An array of Bio-Argo floats equipped with radiometric sensors has been recently deployed in various open ocean areas representative of the diversity of trophic and bio-optical conditions prevailing in the so-called Case 1 waters. Around solar noon and almost everyday, each float acquires 0-250 m vertical profiles of Photosynthetically Available Radiation and downward irradiance at three wavelengths (380, 412 and 490 nm). Up until now, more than 6500 profiles for each radiometric channel have been acquired. As these radiometric data are collected out of operator’s control and regardless of meteorological conditions, specific and automatic data processing protocols have to be developed. Here, we present a data quality-control procedure aimed at verifying profile shapes and providing near real-time data distribution. This procedure is specifically developed to: 1) identify main issues of measurements (i.e. dark signal, atmospheric clouds, spikes and wave-focusing occurrences); 2) validate the final data with a hierarchy of tests to ensure a scientific utilization. The procedure, adapted to each of the four radiometric channels, is designed to flag each profile in a way compliant with the data management procedure used by the Argo program. Main perturbations in the light field are identified by the new protocols with good performances over the whole dataset. This highlights its potential applicability at the global scale. Finally, the comparison with modeled surface irradiances allows assessing the accuracy of quality-controlled measured irradiance values and identifying any possible evolution over the float lifetime due to biofouling and instrumental drift.
Resumo:
An array of Bio-Argo floats equipped with radiometric sensors has been recently deployed in various open ocean areas representative of the diversity of trophic and bio-optical conditions prevailing in the so-called Case 1 waters. Around solar noon and almost everyday, each float acquires 0-250 m vertical profiles of Photosynthetically Available Radiation and downward irradiance at three wavelengths (380, 412 and 490 nm). Up until now, more than 6500 profiles for each radiometric channel have been acquired. As these radiometric data are collected out of operator’s control and regardless of meteorological conditions, specific and automatic data processing protocols have to be developed. Here, we present a data quality-control procedure aimed at verifying profile shapes and providing near real-time data distribution. This procedure is specifically developed to: 1) identify main issues of measurements (i.e. dark signal, atmospheric clouds, spikes and wave-focusing occurrences); 2) validate the final data with a hierarchy of tests to ensure a scientific utilization. The procedure, adapted to each of the four radiometric channels, is designed to flag each profile in a way compliant with the data management procedure used by the Argo program. Main perturbations in the light field are identified by the new protocols with good performances over the whole dataset. This highlights its potential applicability at the global scale. Finally, the comparison with modeled surface irradiances allows assessing the accuracy of quality-controlled measured irradiance values and identifying any possible evolution over the float lifetime due to biofouling and instrumental drift.
Resumo:
We present a new method for ecologically sustainable land use planning within multiple land use schemes. Our aims were (1) to develop a method that can be used to locate important areas based on their ecological values; (2) to evaluate the quality, quantity, availability, and usability of existing ecological data sets; and (3) to demonstrate the use of the method in Eastern Finland, where there are requirements for the simultaneous development of nature conservation, tourism, and recreation. We compiled all available ecological data sets from the study area, complemented the missing data using habitat suitability modeling, calculated the total ecological score (TES) for each 1 ha grid cell in the study area, and finally, demonstrated the use of TES in assessing the success of nature conservation in covering ecologically valuable areas and locating ecologically sustainable areas for tourism and recreational infrastructure. The method operated quite well at the level required for regional and local scale planning. The quality, quantity, availability, and usability of existing data sets were generally high, and they could be further complemented by modeling. There are still constraints that limit the use of the method in practical land use planning. However, as increasing data become available and open access, and modeling tools improve, the usability and applicability of the method will increase.
Resumo:
The speed with which data has moved from being scarce, expensive and valuable, thus justifying detailed and careful verification and analysis to a situation where the streams of detailed data are almost too large to handle has caused a series of shifts to occur. Legal systems already have severe problems keeping up with, or even in touch with, the rate at which unexpected outcomes flow from information technology. The capacity to harness massive quantities of existing data has driven Big Data applications until recently. Now the data flows in real time are rising swiftly, become more invasive and offer monitoring potential that is eagerly sought by commerce and government alike. The ambiguities as to who own this often quite remarkably intrusive personal data need to be resolved – and rapidly - but are likely to encounter rising resistance from industrial and commercial bodies who see this data flow as ‘theirs’. There have been many changes in ICT that has led to stresses in the resolution of the conflicts between IP exploiters and their customers, but this one is of a different scale due to the wide potential for individual customisation of pricing, identification and the rising commercial value of integrated streams of diverse personal data. A new reconciliation between the parties involved is needed. New business models, and a shift in the current confusions over who owns what data into alignments that are in better accord with the community expectations. After all they are the customers, and the emergence of information monopolies needs to be balanced by appropriate consumer/subject rights. This will be a difficult discussion, but one that is needed to realise the great benefits to all that are clearly available if these issues can be positively resolved. The customers need to make these data flow contestable in some form. These Big data flows are only going to grow and become ever more instructive. A better balance is necessary, For the first time these changes are directly affecting governance of democracies, as the very effective micro targeting tools deployed in recent elections have shown. Yet the data gathered is not available to the subjects. This is not a survivable social model. The Private Data Commons needs our help. Businesses and governments exploit big data without regard for issues of legality, data quality, disparate data meanings, and process quality. This often results in poor decisions, with individuals bearing the greatest risk. The threats harbored by big data extend far beyond the individual, however, and call for new legal structures, business processes, and concepts such as a Private Data Commons. This Web extra is the audio part of a video in which author Marcus Wigan expands on his article "Big Data's Big Unintended Consequences" and discusses how businesses and governments exploit big data without regard for issues of legality, data quality, disparate data meanings, and process quality. This often results in poor decisions, with individuals bearing the greatest risk. The threats harbored by big data extend far beyond the individual, however, and call for new legal structures, business processes, and concepts such as a Private Data Commons.
Resumo:
This document does NOT address the issue of oxygen data quality control (either real-time or delayed mode). As a preliminary step towards that goal, this document seeks to ensure that all countries deploying floats equipped with oxygen sensors document the data and metadata related to these floats properly. We produced this document in response to action item 14 from the AST-10 meeting in Hangzhou (March 22-23, 2009). Action item 14: Denis Gilbert to work with Taiyo Kobayashi and Virginie Thierry to ensure DACs are processing oxygen data according to recommendations. If the recommendations contained herein are followed, we will end up with a more uniform set of oxygen data within the Argo data system, allowing users to begin analysing not only their own oxygen data, but also those of others, in the true spirit of Argo data sharing. Indications provided in this document are valid as of the date of writing this document. It is very likely that changes in sensors, calibrations and conversions equations will occur in the future. Please contact V. Thierry (vthierry@ifremer.fr) for any inconsistencies or missing information. A dedicated webpage on the Argo Data Management website (www) contains all information regarding Argo oxygen data management : current and previous version of this cookbook, oxygen sensor manuals, calibration sheet examples, examples of matlab code to process oxygen data, test data, etc..
Resumo:
By providing vehicle-to-vehicle and vehicle-to-infrastructure wireless communications, vehicular ad hoc networks (VANETs), also known as the “networks on wheels”, can greatly enhance traffic safety, traffic efficiency and driving experience for intelligent transportation system (ITS). However, the unique features of VANETs, such as high mobility and uneven distribution of vehicular nodes, impose critical challenges of high efficiency and reliability for the implementation of VANETs. This dissertation is motivated by the great application potentials of VANETs in the design of efficient in-network data processing and dissemination. Considering the significance of message aggregation, data dissemination and data collection, this dissertation research targets at enhancing the traffic safety and traffic efficiency, as well as developing novel commercial applications, based on VANETs, following four aspects: 1) accurate and efficient message aggregation to detect on-road safety relevant events, 2) reliable data dissemination to reliably notify remote vehicles, 3) efficient and reliable spatial data collection from vehicular sensors, and 4) novel promising applications to exploit the commercial potentials of VANETs. Specifically, to enable cooperative detection of safety relevant events on the roads, the structure-less message aggregation (SLMA) scheme is proposed to improve communication efficiency and message accuracy. The scheme of relative position based message dissemination (RPB-MD) is proposed to reliably and efficiently disseminate messages to all intended vehicles in the zone-of-relevance in varying traffic density. Due to numerous vehicular sensor data available based on VANETs, the scheme of compressive sampling based data collection (CS-DC) is proposed to efficiently collect the spatial relevance data in a large scale, especially in the dense traffic. In addition, with novel and efficient solutions proposed for the application specific issues of data dissemination and data collection, several appealing value-added applications for VANETs are developed to exploit the commercial potentials of VANETs, namely general purpose automatic survey (GPAS), VANET-based ambient ad dissemination (VAAD) and VANET based vehicle performance monitoring and analysis (VehicleView). Thus, by improving the efficiency and reliability in in-network data processing and dissemination, including message aggregation, data dissemination and data collection, together with the development of novel promising applications, this dissertation will help push VANETs further to the stage of massive deployment.
Resumo:
Recent marine long-offset transient electromagnetic (LOTEM) measurements yielded the offshore delineation of a fresh groundwater body beneath the seafloor in the region of Bat Yam, Israel. The LOTEM application was effective in detecting this freshwater body underneath the Mediterranean Sea and allowed an estimation of its seaward extent. However, the measured data set was insufficient to understand the hydrogeological configuration and mechanism controlling the occurrence of this fresh groundwater discovery. Especially the lateral geometry of the freshwater boundary, important for the hydrogeological modelling, could not be resolved. Without such an understanding, a rational management of this unexploited groundwater reservoir is not possible. Two new high-resolution marine time-domain electromagnetic methods are theoretically developed to derive the hydrogeological structure of the western aquifer boundary. The first is called Circular Electric Dipole (CED). It is the land-based analogous of the Vertical Electric Dipole (VED), which is commonly applied to detect resistive structures in the subsurface. Although the CED shows exceptional detectability characteristics in the step-off signal towards the sub-seafloor freshwater body, an actual application was not carried out in the extent of this study. It was found that the method suffers from an insufficient signal strength to adequately delineate the resistive aquifer under realistic noise conditions. Moreover, modelling studies demonstrated that severe signal distortions are caused by the slightest geometrical inaccuracies. As a result, a successful application of CED in Israel proved to be rather doubtful. A second method called Differential Electric Dipole (DED) is developed as an alternative to the intended CED method. Compared to the conventional marine time-domain electromagnetic system that commonly applies a horizontal electric dipole transmitter, the DED is composed of two horizontal electric dipoles in an in-line configuration that share a common central electrode. Theoretically, DED has similar detectability/resolution characteristics compared to the conventional LOTEM system. However, the superior lateral resolution towards multi-dimensional resistivity structures make an application desirable. Furthermore, the method is less susceptible towards geometrical errors making an application in Israel feasible. In the extent of this thesis, the novel marine DED method is substantiated using several one-dimensional (1D) and multi-dimensional (2D/3D) modelling studies. The main emphasis lies on the application in Israel. Preliminary resistivity models are derived from the previous marine LOTEM measurement and tested for a DED application. The DED method is effective in locating the two-dimensional resistivity structure at the western aquifer boundary. Moreover, a prediction regarding the hydrogeological boundary conditions are feasible, provided a brackish water zone exists at the head of the interface. A seafloor-based DED transmitter/receiver system is designed and built at the Institute of Geophysics and Meteorology at the University of Cologne. The first DED measurements were carried out in Israel in April 2016. The acquired data set is the first of its kind. The measured data is processed and subsequently interpreted using 1D inversion. The intended aim of interpreting both step-on and step-off signals failed, due to the insufficient data quality of the latter. Yet, the 1D inversion models of the DED step-on signals clearly detect the freshwater body for receivers located close to the Israeli coast. Additionally, a lateral resistivity contrast is observable in the 1D inversion models that allow to constrain the seaward extent of this freshwater body. A large-scale 2D modelling study followed the 1D interpretation. In total, 425 600 forward calculations are conducted to find a sub-seafloor resistivity distribution that adequately explains the measured data. The results indicate that the western aquifer boundary is located at 3600 m - 3700 m before the coast. Moreover, a brackish water zone of 3 Omega*m to 5 Omega*m with a lateral extent of less than 300 m is likely located at the head of the freshwater aquifer. Based on these results, it is predicted that the sub-seafloor freshwater body is indeed open to the sea and may be vulnerable to seawater intrusion.
Resumo:
The term Artificial intelligence acquired a lot of baggage since its introduction and in its current incarnation is synonymous with Deep Learning. The sudden availability of data and computing resources has opened the gates to myriads of applications. Not all are created equal though, and problems might arise especially for fields not closely related to the tasks that pertain tech companies that spearheaded DL. The perspective of practitioners seems to be changing, however. Human-Centric AI emerged in the last few years as a new way of thinking DL and AI applications from the ground up, with a special attention at their relationship with humans. The goal is designing a system that can gracefully integrate in already established workflows, as in many real-world scenarios AI may not be good enough to completely replace its humans. Often this replacement may even be unneeded or undesirable. Another important perspective comes from, Andrew Ng, a DL pioneer, who recently started shifting the focus of development from “better models” towards better, and smaller, data. He defined his approach Data-Centric AI. Without downplaying the importance of pushing the state of the art in DL, we must recognize that if the goal is creating a tool for humans to use, more raw performance may not align with more utility for the final user. A Human-Centric approach is compatible with a Data-Centric one, and we find that the two overlap nicely when human expertise is used as the driving force behind data quality. This thesis documents a series of case-studies where these approaches were employed, to different extents, to guide the design and implementation of intelligent systems. We found human expertise proved crucial in improving datasets and models. The last chapter includes a slight deviation, with studies on the pandemic, still preserving the human and data centric perspective.
Resumo:
Artificial Intelligence (AI) and Machine Learning (ML) are novel data analysis techniques providing very accurate prediction results. They are widely adopted in a variety of industries to improve efficiency and decision-making, but they are also being used to develop intelligent systems. Their success grounds upon complex mathematical models, whose decisions and rationale are usually difficult to comprehend for human users to the point of being dubbed as black-boxes. This is particularly relevant in sensitive and highly regulated domains. To mitigate and possibly solve this issue, the Explainable AI (XAI) field became prominent in recent years. XAI consists of models and techniques to enable understanding of the intricated patterns discovered by black-box models. In this thesis, we consider model-agnostic XAI techniques, which can be applied to Tabular data, with a particular focus on the Credit Scoring domain. Special attention is dedicated to the LIME framework, for which we propose several modifications to the vanilla algorithm, in particular: a pair of complementary Stability Indices that accurately measure LIME stability, and the OptiLIME policy which helps the practitioner finding the proper balance among explanations' stability and reliability. We subsequently put forward GLEAMS a model-agnostic surrogate interpretable model which requires to be trained only once, while providing both Local and Global explanations of the black-box model. GLEAMS produces feature attributions and what-if scenarios, from both dataset and model perspective. Eventually, we argue that synthetic data are an emerging trend in AI, being more and more used to train complex models instead of original data. To be able to explain the outcomes of such models, we must guarantee that synthetic data are reliable enough to be able to translate their explanations to real-world individuals. To this end we propose DAISYnt, a suite of tests to measure synthetic tabular data quality and privacy.
Resumo:
OBJETIVO: Conhecer a qualidade dos dados de internação por causas externas em São José dos Campos, São Paulo. MÉTODO: Foram estudadas as internações pelo Sistema Único de Saúde por lesões decorrentes de causas externas no primeiro semestre de 2003, no Hospital Municipal, referência para o atendimento ao trauma no Município, por meio da comparação dos dados registrados no Sistema de Informações Hospitalares com os prontuários de 990 internações. A concordância das variáveis relativas à vítima, à internação e ao agravo foi avaliada pela taxa bruta de concordância e pelo coeficiente Kappa. As lesões e as causas externas foram codificadas segundo a 10ª revisão da Classificação Internacional de Doenças, respectivamente, capítulos XIX e XX. RESULTADOS: A taxa de concordância bruta foi de boa qualidade para as variáveis relativas à vítima e à internação, variando de 89,0% a 99,2%. As lesões tiveram concordância ótima, exceto os traumatismos do pescoço (k=0,73), traumatismos múltiplos (k=0,67) e fraturas do tórax (k=0,49). As causas externas tiveram concordância ótima para acidentes de transporte (k=0,90) e quedas (k=0,83). A confiabilidade foi menor para agressões (k=0,50), causas indeterminadas (k=0,37), e complicações da assistência médica (k=0,03). Houve concordância ótima nos acidentes de transporte em pedestres, ciclistas e motociclistas. CONCLUSÃO: A maioria das variáveis de estudo teve boa qualidade no nível de agregação analisado. Algumas variáveis relativas à vítima e alguns tipos de causas externas necessitam de aperfeiçoamento da qualidade dos dados. O perfil da morbidade hospitalar encontrado confirmou os acidentes de transporte como importante causa externa de internação hospitalar no Município.