861 resultados para multiple data sources
Resumo:
This paper presents a poverty profile for Brazil, based on three different sources of household data for 1996. We use PPV consumption data to estimate poverty and indigence lines. “Contagem” data is used to allow for an unprecedented refinement of the country’s poverty map. Poverty measures and shares are also presented for a wide range of population subgroups, based on the PNAD 1996, with new adjustments for imputed rents and spatial differences in cost of living. Robustness of the profile is verified with respect to different poverty lines, spatial price deflators, and equivalence scales. Overall poverty incidence ranges from 23% with respect to an indigence line to 45% with respect to a more generous poverty line. More importantly, however, poverty is found to vary significantly across regions and city sizes, with rural areas, small and medium towns and the metropolitan peripheries of the North and Northeast regions being poorest.
Resumo:
The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^
Resumo:
Effective management of invasive fishes depends on the availability of updated information about their distribution and spatial dispersion. Forensic analysis was performed using online and published data on the European catfish, Silurus glanis L., a recent invader in the Tagus catchment (Iberian Peninsula). Eighty records were obtained mainly from anglers’ fora and blogs, and more recently from www.youtube.com. Since the first record in 1998, S. glanis expanded its geographic range by 700 km of river network, occurring mainly in reservoirs and in high-order reaches. Human-mediated and natural dispersal events were identified, with the former occurring during the first years of invasion and involving movements of >50 km. Downstream dispersal directionality was predominant. The analysis of online data from anglers was found to provide useful information on the distribution and dispersal patterns of this non-native fish, and is potentially applicable as a preliminary, exploratory assessment tool for other non-native fishes.
Resumo:
High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.
Resumo:
Background: With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results: PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions: PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Magdeburg, Univ., Fak. für Informatik, Diss., 2012
Resumo:
Un reto al ejecutar las aplicaciones en un cluster es lograr mejorar las prestaciones utilizando los recursos de manera eficiente, y este reto es mayor al utilizar un ambiente distribuido. Teniendo en cuenta este reto, se proponen un conjunto de reglas para realizar el cómputo en cada uno de los nodos, basado en el análisis de cómputo y comunicaciones de las aplicaciones, se analiza un esquema de mapping de celdas y un método para planificar el orden de ejecución, tomando en consideración la ejecución por prioridad, donde las celdas de fronteras tienen una mayor prioridad con respecto a las celdas internas. En la experimentación se muestra el solapamiento del computo interno con las comunicaciones de las celdas fronteras, obteniendo resultados donde el Speedup aumenta y los niveles de eficiencia se mantienen por encima de un 85%, finalmente se obtiene ganancias de los tiempos de ejecución, concluyendo que si se puede diseñar un esquemas de solapamiento que permita que la ejecución de las aplicaciones SPMD en un cluster se hagan de forma eficiente.
Resumo:
The All-Ireland Health Data Inventory. Part 1 is a catalogue of key sources of health data in the Republic and Northern Ireland. It includes relevant datasets from the major information reviews, conducted in the North and South, in the past few years. Information is essential for informed decision making and service provision. This inventory draws together information sources to facilitate such decision making. The inventory is intended as a resource for health professionals, researchers and the general public, providing the first phase of a ‘one-stop’ catalogue of health data. The datasets have been catalogued using an expanding numbering system which will allow for the inclusion of future resources. The Institute of Public Health in Ireland is in the process of expanding the Inventory to include further data sources.
Resumo:
��The number of people suffering dementia will triple in the next 40 years, according to a new study by the World Health Organization, leading to catastrophic social and financial costs. Dementia, a brain illness that affects memory, behavior and the ability to perform even common tasks, affects mostly older people; Alzheimer's causes many cases. Read the report:Global burden of dementia in the year 2050: summary of methods and data sources
Resumo:
Links to data sources and methods as used in the production of erpho's 2008 Health Inequalities Profiles. This year's profiles cover the same indicators as previous profiles. Changes since last year:> A fifth time period: 2005-07> Updated populations > IMD 2007> Standardised against European Standard Population> Added comparator area 'All but most deprived' (80/20)
Resumo:
Whereas numerical modeling using finite-element methods (FEM) can provide transient temperature distribution in the component with enough accuracy, it is of the most importance the development of compact dynamic thermal models that can be used for electrothermal simulation. While in most cases single power sources are considered, here we focus on the simultaneous presence of multiple sources. The thermal model will be in the form of a thermal impedance matrix containing the thermal impedance transfer functions between two arbitrary ports. Eachindividual transfer function element ( ) is obtained from the analysis of the thermal temperature transient at node ¿ ¿ after a power step at node ¿ .¿ Different options for multiexponential transient analysis are detailed and compared. Among the options explored, small thermal models can be obtained by constrained nonlinear least squares (NLSQ) methods if the order is selected properly using validation signals. The methods are applied to the extraction of dynamic compact thermal models for a new ultrathin chip stack technology (UTCS).