904 resultados para INFORMATION EXTRACTION FROM DOCUMENTS


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Text Mining has opened a vast array of possibilities concerning automatic information retrieval from large amounts of text documents. A variety of themes and types of documents can be easily analyzed. More complex features such as those used in Forensic Linguistics can gather deeper understanding from the documents, making possible performing di cult tasks such as author identi cation. In this work we explore the capabilities of simpler Text Mining approaches to author identification of unstructured documents, in particular the ability to distinguish poetic works from two of Fernando Pessoas' heteronyms: Alvaro de Campos and Ricardo Reis. Several processing options were tested and accuracies of 97% were reached, which encourage further developments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Much of the information of historical documents about the territory and property are defined on textual form. This information is mostly geographic and defines territorial areas, its limits and boundaries. For the treatment of this data, we have defined one information system where the treatment of the documental references for the study of the settlement and landscape implies a systematization of the information, normalization, integration and graphic and cartographic representation. This methodology was applied to the case study of the boundary of the monastery-diocese of Dume, in Braga - Portugal, for which there are countless documents and references to this site, but where the urban pressure has mischaracterized very significantly the landscape, making the identification of territorial limits quite difficult. The work carried out to give spatial and cartographic expression to the data, by defining viewing criteria according to the recorded information, proved to be a central working tool in the boundary study and in understanding the dynamics of the sites in the various cultural periods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work covers two aspects. First, it generally compares and summarizes the similarities and differences of state of the art feature detector and descriptor and second it presents a novel approach of detecting intestinal content (in particular bubbles) in capsule endoscopy images. Feature detectors and descriptors providing invariance to change of perspective, scale, signal-noise-ratio and lighting conditions are important and interesting topics in current research and the number of possible applications seems to be numberless. After analysing a selection of in the literature presented approaches, this work investigates in their suitability for applications information extraction in capsule endoscopy images. Eventually, a very good performing detector of intestinal content in capsule endoscopy images is presented. A accurate detection of intestinal content is crucial for all kinds of machine learning approaches and other analysis on capsule endoscopy studies because they occlude the field of view of the capsule camera and therefore those frames need to be excluded from analysis. As a so called “byproduct” of this investigation a graphical user interface supported Feature Analysis Tool is presented to execute and compare the discussed feature detectors and descriptor on arbitrary images, with configurable parameters and visualized their output. As well the presented bubble classifier is part of this tool and if a ground truth is available (or can also be generated using this tool) a detailed visualization of the validation result will be performed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

L'objectiu d'aquest projecte és l'estudi de la plagiabilitat dels lliuraments de les Proves d'Avaluació Continuada i pràctiques dels estudiants de la UOC així com l'estudi dels diferents mitjans per evitar-la.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Institute of Public health in Ireland (IPH) produces population prevalence estimates and forecasts for a number of chronic conditions among adults. IPH has now applied the methodology to examine health conditions among young children across the island of Ireland.This report uses information collected from parents in the Millennium Cohort Study (MCS) along with population data collected in the 2011 Northern Ireland Census to estimate the prevalence of any longstanding condition, asthma, eczema, sight problems and hearing problems among seven-year-olds in Northern Ireland in 2011. The analysis identifies risk factors associated with each condition and provides estimates of the prevalence of these conditions for each of the 11 Local Government Districts.A report on health conditions among three-year-olds in the Republic of Ireland has previously been published by the IPH.See the Chronic Conditions Hub for more details.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We show for the first time that the ventral diverticulum of the mosquito gut (impermeable sugar storage organ) harbors microorganisms. The gut diverticulum from newly emerged and non-fed Aedes aegypti was dissected under aseptic conditions, homogenized and plated on BHI medium. Microbial isolates were identified by sequencing of 16S rDNA for bacteria and 28S rDNA for yeast. A direct DNA extraction from Ae. aegypti gut diverticulum was also performed. The bacterial isolates were: Bacillus sp., Bacillus subtilis and Serratia sp. The latter was the predominant bacteria found in our isolations. The yeast species identified was Pichia caribbica.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Introduction The Andalusian Public Health System Virtual Library (Biblioteca Virtual del Sistema Sanitario Público de Andalucía, BV-SSPA) was set up in June 2006. It consists of a regional government action with the aim of democratizing the health professional access to quality scientific information, regardless of the professional workplace. Andalusia is a region with more than 8 million inhabitants, with 100,000 health professionals for 41 hospitals, 1,500 primary healthcare centres, and 28 centres for non-medical attention purposes (research, management, and educational centres). Objectives The Department of Development, Research and Investigation (R+D+i) of the Andalusian Regional Government has, among its duties, the task of evaluating the hospitals and centres of the Andalusian Public Health System (SSPA) in order to distribute its funding. Among the criteria used is the evaluation of the scientific output, which is measured using bibliometry. It is well-known that the bibliometry has a series of limitations and problems that should be taken into account, especially when it is used for non-information sciences, such us career, funding, etc. A few years ago, the bibliometric reports were done separately in each centre, but without using preset and well-defined criteria, elements which are basic when we need to compare the results of the reports. It was possible to find some hospitals which were including Meeting Abstracts in their figures, while others do not, and the same was happening with Erratum and many other differences. Therefore, the main problem that the Department of R+D+i had to deal with, when they were evaluating the health system, was that bibliometric data was not accurate and reports were not comparable. With the aim of having an unified criteria for the whole system, the Department of R+D+i ordered the BV-SSPA to do the year analysis of the scientific output of the system, using some well defined criteria and indicators, among whichstands out the Impact Factor. Materials and Methods As the Impact Factor is the bibliometric indicator that the virtual library is asked to consider, it is necessary to use the database Web of Science (WoS), since it is its owner and editor. The WoS includes the databases Science Citation Index (SCI), Social Sciences Citation Index (SSCI) and Arts & Humanities Citation Index. To gather all the documents, SCI and SSCI are used; to obtain the Impact Factor and quartils, it is used the Journal Citation Reports, JCR. Unlike other bibliographic databases, such us MEDLINE, the bibliometric database WoS includes the address of all the authors. In order to retrieve all the scientific output of the SSPA, we have done general searches, which are afterwards processed by a tool developed by our library. We have done nine different searches using the field ‘address’; eight of them including ‘Spain’ and each one of the eight Andalusian Regions, and the other one combining ‘Spain’ with all those cities where there are health centres, since we have detected that there are some authors that do not use the region in their signatures. These are some of the search strategies: AD=Malaga and AD=Spain AD=Sevill* and AD=Spain AD=SPAIN AND (AD=GUADIX OR AD=BAZA OR AD=MOTRIL) Further more, the field ‘year’ is used to determine the period. To exploit the data, the BV-SSPA has developed a tool called Impactia. It is a web application which uses a database to store the information of the documents generated by the SSPA. Impactia allows the user to automatically process the retrieved documents, assigning them to their correspondent centres. In order to do the classification of documents automaticaly, it was necessary to detect the huge variability of names of the centres that the authors use in their signatures. Therefore, Impactia knows that if an author signs as “Hospital Universitario Virgen Macarena”, “HVM” or “Hosp. Virgin Macarena”, he belongs to the same centre. The figure attached shows the variability found for the Empresa Publica Hospital de Poniente. Besides the documents from WoS, Impactia includes the documents indexed in Scopus and in other databases, where we do bibliographic searches using similar strategies to the later ones. Aware that in the health centres and hospitals there is a lot of grey literature that is not gathered in databases, Impactia allows the centres to feed the application with these documents, so that all the SSPA scientific output is gathered and organised in a centralized place. The ones responsible of localizing this gray literature are the librarians of each one of the centres. They can also do statements to the documents and indicators that are collected and calculated by Impactia. The bulk upload of documents from WoS and Scopus into Impactia is monthly done. One of the main issues that we found during the development of Impactia was the need of dealing with duplicated documents obtained from different sources. Taking into account that sometimes titles might be written differently, with slashes, comas, and so on, Impactia detects the duplicates using the field ‘DOI’ if it is available or comparing the fields: page start, page end and ISSN. Therefore it is possible to guarantee the absence of duplicates. Results The data gathered in Impactia becomes available to the administrative teams and hospitals managers, through an easy web page that allows them to know at any moment, and with just one click, the detailed information of the scientific output of their hospitals, including useful graphs such as percentage of document types, journals where their scientists usually publish, annual comparatives, bibliometric indicators and so on. They can also compare the different centres of the SSPA. Impactia allows the user to download the data from the application, so that he can work with this information or include them in their centres’ reports. This application saves the health system many working hours. It was previously done manually by forty one librarians, while now it is done by only one person in the BV-SSPA during two days a month. To sum up, the benefits of Impactia are: It has shown its effectiveness in the automatic classification, treatment and analysis of the data. It has become an essential tool for all managers to evaluate quickly and easily the scientific production of their centers. It optimizes the human resources of the SSPA, saving time and money. It is the reference point for the Department of R+D+i to do the scientific health staff evaluation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study was initiated to investigate partial melting within the high-grade metamorphic rocks beneath the Little Cottonwood contact aureole (Utah, USA), in order to understand the melt generation, melt migration, and geometry of initial melt distribution on grain scale during crustal anatexis. The emplacement of the Little Cottonwood stock produced a contact aureole in the pelitic host rocks of the Big Cottonwood formation (BC). Metamorphic isogrades in pelitic rocks range form biotite to 2nd sillimanite grade as a function of distance from the contact. Migmatites are restricted to the highest grade and resulted form partial melting of the BC formation rocks. First melt was produced by a combined muscovite/biotite dehydration reaction in the sillimanite + k-feldspar stability field. Melt extraction from the pelites resulted in restites (magnetite + cordierite + alumosilicate ± biotite) surrounded by feldspar enriched quartzite zones. This texture is the result of gradual infiltration of partial melts into the quartzite. Larger, discrete melt accumulation occurred in extensional or transpressional domains such as boudin necks, veins, and ductile shear zones. Melt composition are Si02- rich, crystallized as pegmatites, and apparently were very mobile. They were able to infiltrate the quartzite pervaisivly. These melts are similar in composition to first melts produced in the hydrothermal partial melt experiments at 2kbar between 700 - 800°C on fine grained high metamorphic rocks (andalusite-cordierited-biotite-zone) of the BC formation. The experimental melts are water rich and in disequilibrium with the melting rock. Initial melt composition is heterogeneous for short run duration, reflective a lack of chemical equilibrium between individual melt pools. Rock core scale heterogeneity decreased with time indicating partial homogenization of melt compositions. A simultaneous shift of melt composition to higher silica content with time was observed. The silica content of the melt increased due to local melt/mineral reactions. Melt textures indicate that reactive melt transport is most efficient along grain boundaries rimmed by dissimilar grains. Melt heterogeneity resulted in chemical potential gradients which are major driving forces for initial melt migration and govern melt distribution during initial melting. An additional subject of the thesis is the crystal size distributions of opaque minerals in a fine-grained, high-grade meta-pelite of the Big Cottonwood which were obtained from 3D X-ray tomography (uCT) and 2D thin section analysis. µCT delivers accurate size distributions within a restricted range (~ a factor of 20 in size in a single 3D image), while the absolute number of crystals is difficult to obtain from these sparsely distributed, small crystals on the basis of 2D images. Crystal size distributions obtained from both methods are otherwise similar. - Ce travail de recherche a été entrepris dans le but d'étudier les processus de fusion partielle dans les roches fortement métamorphiques de l'auréole de contact de Little Cottonwood (Utah, USA) et ceci afin de comprendre la génération de liquide de fusion, la migration de ces liquides et la géométrie de la distribution initiale des liquides de fusion à l'échelle du grain durant l'anatexie de la croûte. L'emplacement du petit massif intrusif de Little Cottonwood a produit une auréole de contact dans les roches pélitiques encaissantes appartenant à la Foimation du Big Cottonwood (BC). Les isogrades métamorphiques dans les roches pélitiques varient de l'isograde de la biotite à la deuxième isograde de la sillimanite en fonction de la distance par rapport au massif intrusif. Les migmatites sont restreintes aux zones montrant le plus haut degré métamorphique et résultent de la fusion partielle des roches de la Formation de BC. Le premier liquide de fusion a été produit par la réaction de déshydratation combinée de la muscovite et de la biotite dans le champ de stabilité du feldspath potassique Pt de la sillimanite. L'extraction du liquide de fusion des pélites forme des restites (magnétites + cordiérite + aluminosilicate ± biotite) entourées par des zones de quartzites enrichies en feldspath. Cette texture est le résultat de l'infiltration graduelle du liquide de fusion partielle dans les quartzites. Des accumulations distinctes et plus larges de liquide de fusion ont lieu dans des domaines d'extension ou de transpression tels que les boudins, les veines, et les zones de cisaillement ductile. La composition des liquides de fusion est similaire à celle des liquides pegmatoïdes, et ces liquides sont apparemment très mobiles et capables d'infiltrer les quartzites. Ces liquides de fusion ont la même composition que les premiers liquides produits dans les expériences hydrotheunales de fusion partielle à 2kbar et entre 700-800° C sur les roches finement grenues et hautement métamorphiques (andalousite-cordiérite-biotite zone) de la Formation de BC. Les liquides de fusion obtenus expérimentalement sont riches en eau et sont en déséquilibre avec la roche en fusion. La composition initiale des liquides de fusion est hétérogène pour les expériences de courte durée et reflète l'absence d'équilibre chimique entre les différentes zones d'accumulation des liquides de fusion. L'hétérogénéité à l'échelle du noyau s'estompe avec le temps et témoigne de l'homogénéisation de la composition des liquides de fusion. Par ailleurs, on observe parallèlement un décalage de la composition des liquides vers des compositions plus riches en silice au cours du temps. Le contenu en silice des liquides de fusion évolue vers un liquide pegmatitique en raison des réactions liquides/minéraux. Les textures des liquides de fusion indiquent que le transport des liquides est plus efficace le long des bordures de grains bordés par des grains différents. Aucun changement apparent du volume total n'est visible. L'hétérogénéité des liquides s'accompagne d'un gradient de potentiel chimique qui sert de moteur principal à la migration des liquides et à la distribution des liquides durant la fusion. Un sujet complémentaire de ce travail de thèse réside dans l'étude de la distribution de la taille des cristaux opaques dans les pélites finement grenues et fortement métamorphiques de la Formation de Big Cottonwood. Les distributions de taille ont été obtenues suite à l'analyse d'images 3D acquise par tomographie ainsi que par analyse de lames minces. La microtomographie par rayon X fournit une distribution de taille précise sur une marge restreinte (- un facteur de taille 20 dans une seule image 3D), alors que le nombre absolu de cristaux est difficile à obtenir sur la base d'image 2D en raison de la petite taille et de la faible abondance de ces cristaux. Les distributions de taille obtenues par les deux méthodes sont sinon similaire. Abstact: Chemical differentiation of the primitive Earth was due to melting and separation of melts. Today, melt generation and emplacement is still the dominant process for the growth of the crust. Most granite formation is due to partial melting of the lower crust, followed by transport of magma through the crust to the shallow crust where it is emplaced. Partial melting and melt segregation are essential steps before such a granitic magma can ascent through the crust. The chemistry and physics of partial melting and segregation is complex. Hence detailed studies, in which field observations yield critical information that can be compared to experimental observations, are crucial to the understanding of these fundamental processes that lead and are leading to the chemical stratification of the Earth. The research presented in this thesis is a combined field and experimental study of partial melting of high-grade meta-pelitic rocks of the Little Cottonwood contact aureole (Utah, USA). Contact metamorphic rocks are ideal for textural studies of melt generation, since the relatively short times of the metamorphic event prevents much of the recrystallization which plagues textural studies of lower crustal rocks. The purpose of the study is to characterize melt generation, identify melting reactions, and to constrain melt formation, segregation and migration mechanisms. In parallel an experimental study was undertaken to investigate melt in the high grade meta pelitic rocks, to confirm melt composition, and to compare textures of the partial molten rock cores in the absence of deformation. Results show that a pegmatoidal melt is produced by partial melting of the pelitic rocks. This melt is highly mobile. It is capable of pervasive infiltration of the adjacent quartzite. Infiltration results in rounded quartz grains bordered by a thin feldspar rim. Using computed micro X-ray tomography these melt networks can be imaged. The infiltrated melt leads to rheological weakening and to a decompaction of the solid quartzite. Such decompaction can explain the recent discovery of abundant xenocrysts in many magmas, since it favors the isolation of mineral grains. Pervasive infiltration is apparently strongly influenced by melt viscosity and melt-crystal wetting behavior, both of which depend on the water content of melt and the temperature. In all experiments the first melt is produced on grain boundaries, dominantly by the local minerals. Grain scale heterogeneity of a melting rock leads thus to chemical concentration gradients in the melt, which are the driving force for initial melt migration. Pervasive melt films along grain boundaries leading to an interconnected network are immediately established. The initial chemical heterogeneities in the melt diminish with time. Résumé large public: La différenciation chimique de la Terre primitive est la conséquence de la fusion des roches et de la séparation des liquides qui en résultent. Aujourd'hui, la production de liquide magmatique est toujours le mécanisme dominant pour la croissance de la croûte terrestre. Ainsi la formation de la plupart des granites est un processus qui implique la production de magma par fusion partielle de la croûte inférieure, la migration de ces magmas à travers la croûte et finalement son emplacement dans les niveaux superficielle de la croûte terrestre. Au cours de cette évolution, les processus de fusion partielle et de ségrégation sont des étapes indispensables à l'ascension des granites à travers la croûte. Les conditions physico-chimiques nécessaires à la fusion partielle et à l'extraction de ces liquides sont complexes. C'est pourquoi des études détaillées des processus de fusion partielle sont cruciales pour la compréhension de ces mécanismes fondamentaux responsables de la stratification chimique de la Terre. Parmi ces études, les observations de terrain apportent notamment des informations déterminantes qui peuvent être comparées aux données expérimentales. Le travail de recherche présenté dans ce mémoire de thèse associe études de terrain et données expérimentales sur la fusion partielle des roches pélitiques de haut degré métamorphiques provenant de l'auréole de contact de Little Cottonwood (Utah, USA). Les roches du métamorphisme de contact sont idéales pour l'étude de la folination de liquide de fusion. En effet, la durée relativement courte de ce type d'événement métamorphique prévient en grande partie la recristallisation qui perturbe les études de texture des roches dans la croûte inférieure. Le but de cette étude est de caractériser la génération des liquides de fusion, d'identifier les réactions responsables de la fusion de ces roches et de contraindre la formation de ces liquides et leur mécanisme de ségrégation et de migration. Parallèlement, des travaux expérimentaux ont été entrepris pour reproduire la fusion partielle de ces roches en laboratoire. Cette étude a été effectuée dans le but de confirmer la composition chimique des liquides, et de comparer les textures obtenues en l'absence de déformation. Les résultats montrent qu'un liquide de fusion pegmatoïde est produit par fusion partielle des roches pélitiques. La grande mobilité de ce liquide permet une infiltration pénétrative dans les quarzites. Ces infiltrations se manifestent par des grains de quartz arrondis entourés par une fine bordure de feldspath. L'utilisation de la tomography à rayons X a permis d'obtenir des images de ce réseau de liquide de fusion. L'infiltration de liquide de fusion entraîne un affaiblissement de la rhéologie de la roche ainsi qu'une décompaction des quartzites massifs. Une telle décompaction peut expliquer la découverte récente d'abondants xénocristaux dans beaucoup de magmas, puisque elle favorise l'isolation des minéraux. L'infiltration pénétrative est apparemment fortement influencée par la viscosité du fluide de fusion et le comportement de la tension superficielle entre les cristaux et le liquide, les deux étant dépendant du contenu en eau dans le liquide de fusion et de la température. Dans toutes les expériences, le premier liquide est produit sur les bordures de grains, principalement par les minéraux locaux. L'hétérogénéité à l'échelle des grains d'une roche en fusion conduit donc à un gradient de concentration chimique dans le liquide, qui sert de moteur à l'initiation de la migration du liquide. Des fines couches de liquide de fusion le long de bordures de grains formant un réseau enchevêtré s'établit immédiatement. Les hétérogénéités chimiques initiales dans le liquide s'estompent avec le temps.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the first part of this research, three stages were stated for a program to increase the information extracted from ink evidence and maximise its usefulness to the criminal and civil justice system. These stages are (a) develop a standard methodology for analysing ink samples by high-performance thin layer chromatography (HPTLC) in reproducible way, when ink samples are analysed at different time, locations and by different examiners; (b) compare automatically and objectively ink samples; and (c) define and evaluate theoretical framework for the use of ink evidence in forensic context. This report focuses on the second of the three stages. Using the calibration and acquisition process described in the previous report, mathematical algorithms are proposed to automatically and objectively compare ink samples. The performances of these algorithms are systematically studied for various chemical and forensic conditions using standard performance tests commonly used in biometrics studies. The results show that different algorithms are best suited for different tasks. Finally, this report demonstrates how modern analytical and computer technology can be used in the field of ink examination and how tools developed and successfully applied in other fields of forensic science can help maximising its impact within the field of questioned documents.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Reliable information is a crucial factor influencing decision-making and, thus, fitness in all animals. A common source of information comes from inadvertent cues produced by the behavior of conspecifics. Here we use a system of experimental evolution with robots foraging in an arena containing a food source to study how communication strategies can evolve to regulate information provided by such cues. The robots could produce information by emitting blue light, which the other robots could perceive with their cameras. Over the first few generations, the robots quickly evolved to successfully locate the food, while emitting light randomly. This behavior resulted in a high intensity of light near food, which provided social information allowing other robots to more rapidly find the food. Because robots were competing for food, they were quickly selected to conceal this information. However, they never completely ceased to produce information. Detailed analyses revealed that this somewhat surprising result was due to the strength of selection on suppressing information declining concomitantly with the reduction in information content. Accordingly, a stable equilibrium with low information and considerable variation in communicative behaviors was attained by mutation selection. Because a similar coevolutionary process should be common in natural systems, this may explain why communicative strategies are so variable in many animal species.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The project aims at advancing the state of the art in the use of context information for classification of image and video data. The use of context in the classification of images has been showed of great importance to improve the performance of actual object recognition systems. In our project we proposed the concept of Multi-scale Feature Labels as a general and compact method to exploit the local and global context. The feature extraction from the discriminative probability or classification confidence label field is of great novelty. Moreover the use of a multi-scale representation of the feature labels lead to a compact and efficient description of the context. The goal of the project has been also to provide a general-purpose method and prove its suitability in different image/video analysis problem. The two-year project generated 5 journal publications (plus 2 under submission), 10 conference publications (plus 2 under submission) and one patent (plus 1 pending). Of these publications, a relevant number make use of the main result of this project to improve the results in detection and/or segmentation of objects.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

T cells belong to two mutually exclusive lineages expressing either alpha beta or gamma delta T-cell receptors (TCR). Although alpha beta and gamma delta cells are known to share a common precursor the role of TCR rearrangement and specificity in the lineage commitment process is controversial. Instructive lineage commitment models endow the alpha beta or gamma delta TCR with a deterministic role in lineage choice, whereas separate lineage models invoke TCR-independent lineage commitment followed by TCR-dependent selection and maturation of alpha beta and gamma delta cells. Here we review the published data pertaining to the role of the TCR in alpha beta/gamma delta lineage commitment and provide some additional information obtained from recent intracellular TCR staining studies. We conclude that a variant of the separate lineage model is best able to accommodate all of the available experimental results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hemorrhage represents a set of causes that focuses on women during the pregnancy and puerperal period, and that, with improper attention, results in death. The authors aimed to analyze maternal deaths related to hemorrhage that occurred in the state of Santa Catarina, Brazil. The data were obtained from the Mortality Information System and Live Births Information System from the Brazilian Ministry of Health. This was a descriptive study, in which 491 maternal deaths that occurred in the period 1997-2010 were analyzed. Of these, 61 were related to hemorrhage, corresponding to 12.42%; postpartum hemorrhage was the most prevalent cause, with 26 deaths, followed by placental abruption with 15, representing 67.21% of the cases. The maternal mortality from hemorrhage is a public health problem in the state of Santa Catarina, due to its high prevalence and the fact that its underlying causes are preventable.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The coverage and volume of geo-referenced datasets are extensive and incessantly¦growing. The systematic capture of geo-referenced information generates large volumes¦of spatio-temporal data to be analyzed. Clustering and visualization play a key¦role in the exploratory data analysis and the extraction of knowledge embedded in¦these data. However, new challenges in visualization and clustering are posed when¦dealing with the special characteristics of this data. For instance, its complex structures,¦large quantity of samples, variables involved in a temporal context, high dimensionality¦and large variability in cluster shapes.¦The central aim of my thesis is to propose new algorithms and methodologies for¦clustering and visualization, in order to assist the knowledge extraction from spatiotemporal¦geo-referenced data, thus improving making decision processes.¦I present two original algorithms, one for clustering: the Fuzzy Growing Hierarchical¦Self-Organizing Networks (FGHSON), and the second for exploratory visual data analysis:¦the Tree-structured Self-organizing Maps Component Planes. In addition, I present¦methodologies that combined with FGHSON and the Tree-structured SOM Component¦Planes allow the integration of space and time seamlessly and simultaneously in¦order to extract knowledge embedded in a temporal context.¦The originality of the FGHSON lies in its capability to reflect the underlying structure¦of a dataset in a hierarchical fuzzy way. A hierarchical fuzzy representation of¦clusters is crucial when data include complex structures with large variability of cluster¦shapes, variances, densities and number of clusters. The most important characteristics¦of the FGHSON include: (1) It does not require an a-priori setup of the number¦of clusters. (2) The algorithm executes several self-organizing processes in parallel.¦Hence, when dealing with large datasets the processes can be distributed reducing the¦computational cost. (3) Only three parameters are necessary to set up the algorithm.¦In the case of the Tree-structured SOM Component Planes, the novelty of this algorithm¦lies in its ability to create a structure that allows the visual exploratory data analysis¦of large high-dimensional datasets. This algorithm creates a hierarchical structure¦of Self-Organizing Map Component Planes, arranging similar variables' projections in¦the same branches of the tree. Hence, similarities on variables' behavior can be easily¦detected (e.g. local correlations, maximal and minimal values and outliers).¦Both FGHSON and the Tree-structured SOM Component Planes were applied in¦several agroecological problems proving to be very efficient in the exploratory analysis¦and clustering of spatio-temporal datasets.¦In this thesis I also tested three soft competitive learning algorithms. Two of them¦well-known non supervised soft competitive algorithms, namely the Self-Organizing¦Maps (SOMs) and the Growing Hierarchical Self-Organizing Maps (GHSOMs); and the¦third was our original contribution, the FGHSON. Although the algorithms presented¦here have been used in several areas, to my knowledge there is not any work applying¦and comparing the performance of those techniques when dealing with spatiotemporal¦geospatial data, as it is presented in this thesis.¦I propose original methodologies to explore spatio-temporal geo-referenced datasets¦through time. Our approach uses time windows to capture temporal similarities and¦variations by using the FGHSON clustering algorithm. The developed methodologies¦are used in two case studies. In the first, the objective was to find similar agroecozones¦through time and in the second one it was to find similar environmental patterns¦shifted in time.¦Several results presented in this thesis have led to new contributions to agroecological¦knowledge, for instance, in sugar cane, and blackberry production.¦Finally, in the framework of this thesis we developed several software tools: (1)¦a Matlab toolbox that implements the FGHSON algorithm, and (2) a program called¦BIS (Bio-inspired Identification of Similar agroecozones) an interactive graphical user¦interface tool which integrates the FGHSON algorithm with Google Earth in order to¦show zones with similar agroecological characteristics.