984 resultados para Data linking
Resumo:
During the exploration and mapping of new caves in Serra do Ramalho karst area, southern Bahia state, cavers from the Grupo Bambuí de Pesquisas Espeleológicas - GBPE (Belo Horizonte) noticed the presence of troglomorphic catfishes (species with reduced eyes and/or melanic pigmentation), which we intensively investigated with regards to their ecology and behavior since 2005. Non-troglomorphic fishes regularly found in the studied caves were included in this investigation. We present here data on the natural history of two troglobitic (exclusively subterranean troglomorphic species) fishes - Rhamdia enfurnada Bichuette & Trajano, 2005 (Heptapteridae; Gruna do Enfurnado) and Trichomycterus undescribed species (Trichomycteridae; Lapa dos Peixes and Gruna da Água Clara), and non-troglomorphic Hoplias cf. malabaricus, probably a troglophile (able to form populations both in epigean and subterranean habitats) in the Gruna do Enfurnado, and Pimelodella sp., a species with a sink population in the Lapa dos Peixes.
Resumo:
Geographic Data Warehouses (GDW) are one of the main technologies used in decision-making processes and spatial analysis, and the literature proposes several conceptual and logical data models for GDW. However, little effort has been focused on studying how spatial data redundancy affects SOLAP (Spatial On-Line Analytical Processing) query performance over GDW. In this paper, we investigate this issue. Firstly, we compare redundant and non-redundant GDW schemas and conclude that redundancy is related to high performance losses. We also analyze the issue of indexing, aiming at improving SOLAP query performance on a redundant GDW. Comparisons of the SB-index approach, the star-join aided by R-tree and the star-join aided by GiST indicate that the SB-index significantly improves the elapsed time in query processing from 25% up to 99% with regard to SOLAP queries defined over the spatial predicates of intersection, enclosure and containment and applied to roll-up and drill-down operations. We also investigate the impact of the increase in data volume on the performance. The increase did not impair the performance of the SB-index, which highly improved the elapsed time in query processing. Performance tests also show that the SB-index is far more compact than the star-join, requiring only a small fraction of at most 0.20% of the volume. Moreover, we propose a specific enhancement of the SB-index to deal with spatial data redundancy. This enhancement improved performance from 80 to 91% for redundant GDW schemas.
Resumo:
Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.
Resumo:
OBJECTIVE: To estimate the spatial intensity of urban violence events using wavelet-based methods and emergency room data. METHODS: Information on victims attended at the emergency room of a public hospital in the city of São Paulo, Southeastern Brazil, from January 1, 2002 to January 11, 2003 were obtained from hospital records. The spatial distribution of 3,540 events was recorded and a uniform random procedure was used to allocate records with incomplete addresses. Point processes and wavelet analysis technique were used to estimate the spatial intensity, defined as the expected number of events by unit area. RESULTS: Of all georeferenced points, 59% were accidents and 40% were assaults. There is a non-homogeneous spatial distribution of the events with high concentration in two districts and three large avenues in the southern area of the city of São Paulo. CONCLUSIONS: Hospital records combined with methodological tools to estimate intensity of events are useful to study urban violence. The wavelet analysis is useful in the computation of the expected number of events and their respective confidence bands for any sub-region and, consequently, in the specification of risk estimates that could be used in decision-making processes for public policies.
Resumo:
The mature larva and pupa of Fulgeochlizus bruchi (Candèze, 1896) are described and illustrated. Bioluminescent patterns are also given. Comments, new data on the first instar larva and natural history data are presented. The first instar larvae differ from the mature larvae mainly in their chaetotaxy, which is sparse and more symmetrically distributed.
Resumo:
OBJETIVO: Estimar a prevalência de defeitos congênitos (DC) em uma coorte de nascidos vivos (NV) vinculando-se os bancos de dados do Sistema de Informação de Mortalidade (SIM) e do Sistema de Informação sobre Nascidos Vivos (SINASC). MÉTODOS: Estudo descritivo para avaliar as declarações de nascido vivo como fonte de informação sobre DC. A população de estudo é uma coorte de NV hospitalares do 1º semestre de 2006 de mães residentes e ocorridos no Município de São Paulo no período de 01/01/2006 a 30/06/2006, obtida por meio da vinculação dos bancos de dados das declarações de nascido vivo e óbitos neonatais provenientes da coorte. RESULTADOS: Os DC mais prevalentes segundo o SINASC foram: malformações congênitas (MC) e deformidades do aparelho osteomuscular (44,7%), MC do sistema nervoso (10,0%) e anomalias cromossômicas (8,6%). Após a vinculação, houve uma recuperação de 80,0% de indivíduos portadores de DC do aparelho circulatório, 73,3% de DC do aparelho respiratório e 62,5% de DC do aparelho digestivo. O SINASC fez 55,2% das notificações de DC e o SIM notificou 44,8%, mostrando-se importante para a recuperação de informações de DC. Segundo o SINASC, a taxa de prevalência de DC na coorte foi de 75,4%00 NV; com os dados vinculados com o SIM, essa taxa passou para 86,2%00 NV. CONCLUSÕES: A complementação de dados obtida pela vinculação SIM/SINASC fornece um perfil mais real da prevalência de DC do que aquele registrado pelo SINASC, que identifica os DC mais visíveis, enquanto o SIM identifica os mais letais, mostrando a importância do uso conjunto das duas fontes de dados.
Resumo:
The objective of this study was to estimate the regressions calibration for the dietary data that were measured using the quantitative food frequency questionnaire (QFFQ) in the Natural History of HPV Infection in Men: the HIM Study in Brazil. A sample of 98 individuals from the HIM study answered one QFFQ and three 24-hour recalls (24HR) at interviews. The calibration was performed using linear regression analysis in which the 24HR was the dependent variable and the QFFQ was the independent variable. Age, body mass index, physical activity, income and schooling were used as adjustment variables in the models. The geometric means between the 24HR and the calibration-corrected QFFQ were statistically equal. The dispersion graphs between the instruments demonstrate increased correlation after making the correction, although there is greater dispersion of the points with worse explanatory power of the models. Identification of the regressions calibration for the dietary data of the HIM study will make it possible to estimate the effect of the diet on HPV infection, corrected for the measurement error of the QFFQ.
Resumo:
Information on fruits and vegetables consumption in Brazil in the three levels of dietary data was analyzed and compared. Data about national supply came from Food Balance Sheets compiled by the FAO; household availability information was obtained from the Brazilian National Household Budget Survey (HBS); and actual intake information came from a large individual dietary intake survey that was representative of the adult population of São Paulo city. All sources of information were collected between 2002 and 2003. A subset of the HBS, representative of São Paulo city, was used in our analysis in order to improve the quality of the comparison with actual intake data. The ratio of national supply to household availability of fruits and vegetables was 2.6 while the ratio of national supply to actual intake was 4.0. The discrepancy ratio in the comparison between household availability and actual intake was smaller, 1.6. While the use of supply and availability data has advantages, as lower cost, must be taken into account that these sources tend to overestimate actual intake of fruits and vegetables.
Resumo:
study-specific results, their findings should be interpreted with caution
Resumo:
Bovine rumen protein with two levels of residual lipids (1.9 per cent or 3.8 per cent) was subjected to thermoplastic extrusion under different temperatures and moisture contents. Protein solubility in different buffers, disulphide cross-linking and molecular weight distribution were determined on the extrudates. After extrusion, samples with 1.9 per cent residual lipids content had a higher concentration of protein insoluble by undetermined forces, irrespective of feed moisture and processing temperature used. Lipid content of 3.8 per cent in the feed material resulted in more protein participating in the extrudate network through non-covalent interactions (hydrophobic and electrostatic) and disulphide bonds. A small dependency of the extrusion process on moisture and temperature and a marked dependency on lipid content, especially phospholipid, was observed, Electrophoresis under non-reducing conditions showed that protein extrusion with low feed moisture promoted high molecular breakdown inside the barrel, probably due to intense shear force, and further protein aggregation at the die end
Resumo:
Diagnostic methods have been an important tool in regression analysis to detect anomalies, such as departures from error assumptions and the presence of outliers and influential observations with the fitted models. Assuming censored data, we considered a classical analysis and Bayesian analysis assuming no informative priors for the parameters of the model with a cure fraction. A Bayesian approach was considered by using Markov Chain Monte Carlo Methods with Metropolis-Hasting algorithms steps to obtain the posterior summaries of interest. Some influence methods, such as the local influence, total local influence of an individual, local influence on predictions and generalized leverage were derived, analyzed and discussed in survival data with a cure fraction and covariates. The relevance of the approach was illustrated with a real data set, where it is shown that, by removing the most influential observations, the decision about which model best fits the data is changed.
Resumo:
We consider a nontrivial one-species population dynamics model with finite and infinite carrying capacities. Time-dependent intrinsic and extrinsic growth rates are considered in these models. Through the model per capita growth rate we obtain a heuristic general procedure to generate scaling functions to collapse data into a simple linear behavior even if an extrinsic growth rate is included. With this data collapse, all the models studied become independent from the parameters and initial condition. Analytical solutions are found when time-dependent coefficients are considered. These solutions allow us to perceive nontrivial transitions between species extinction and survival and to calculate the transition's critical exponents. Considering an extrinsic growth rate as a cancer treatment, we show that the relevant quantity depends not only on the intensity of the treatment, but also on when the cancerous cell growth is maximum.
Resumo:
Background: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. Methods/Principal Findings: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of ""what if'' situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. Conclusion/Significance: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.
Resumo:
Introduction: Work disability is a major consequence of rheumatoid arthritis (RA), associated not only with traditional disease activity variables, but also more significantly with demographic, functional, occupational, and societal variables. Recent reports suggest that the use of biologic agents offers potential for reduced work disability rates, but the conclusions are based on surrogate disease activity measures derived from studies primarily from Western countries. Methods: The Quantitative Standard Monitoring of Patients with RA (QUEST-RA) multinational database of 8,039 patients in 86 sites in 32 countries, 16 with high gross domestic product (GDP) (>24K US dollars (USD) per capita) and 16 low-GDP countries (<11K USD), was analyzed for work and disability status at onset and over the course of RA and clinical status of patients who continued working or had stopped working in high-GDP versus low-GDP countries according to all RA Core Data Set measures. Associations of work disability status with RA Core Data Set variables and indices were analyzed using descriptive statistics and regression analyses. Results: At the time of first symptoms, 86% of men (range 57%-100% among countries) and 64% (19%-87%) of women <65 years were working. More than one third (37%) of these patients reported subsequent work disability because of RA. Among 1,756 patients whose symptoms had begun during the 2000s, the probabilities of continuing to work were 80% (95% confidence interval (CI) 78%-82%) at 2 years and 68% (95% CI 65%-71%) at 5 years, with similar patterns in high-GDP and low-GDP countries. Patients who continued working versus stopped working had significantly better clinical status for all clinical status measures and patient self-report scores, with similar patterns in high-GDP and low-GDP countries. However, patients who had stopped working in high-GDP countries had better clinical status than patients who continued working in low-GDP countries. The most significant identifier of work disability in all subgroups was Health Assessment Questionnaire (HAQ) functional disability score. Conclusions: Work disability rates remain high among people with RA during this millennium. In low-GDP countries, people remain working with high levels of disability and disease activity. Cultural and economic differences between societies affect work disability as an outcome measure for RA.
Resumo:
Background: High-density tiling arrays and new sequencing technologies are generating rapidly increasing volumes of transcriptome and protein-DNA interaction data. Visualization and exploration of this data is critical to understanding the regulatory logic encoded in the genome by which the cell dynamically affects its physiology and interacts with its environment. Results: The Gaggle Genome Browser is a cross-platform desktop program for interactively visualizing high-throughput data in the context of the genome. Important features include dynamic panning and zooming, keyword search and open interoperability through the Gaggle framework. Users may bookmark locations on the genome with descriptive annotations and share these bookmarks with other users. The program handles large sets of user-generated data using an in-process database and leverages the facilities of SQL and the R environment for importing and manipulating data. A key aspect of the Gaggle Genome Browser is interoperability. By connecting to the Gaggle framework, the genome browser joins a suite of interconnected bioinformatics tools for analysis and visualization with connectivity to major public repositories of sequences, interactions and pathways. To this flexible environment for exploring and combining data, the Gaggle Genome Browser adds the ability to visualize diverse types of data in relation to its coordinates on the genome. Conclusions: Genomic coordinates function as a common key by which disparate biological data types can be related to one another. In the Gaggle Genome Browser, heterogeneous data are joined by their location on the genome to create information-rich visualizations yielding insight into genome organization, transcription and its regulation and, ultimately, a better understanding of the mechanisms that enable the cell to dynamically respond to its environment.