124 resultados para clustered binary data
Resumo:
Geographic Data Warehouses (GDW) are one of the main technologies used in decision-making processes and spatial analysis, and the literature proposes several conceptual and logical data models for GDW. However, little effort has been focused on studying how spatial data redundancy affects SOLAP (Spatial On-Line Analytical Processing) query performance over GDW. In this paper, we investigate this issue. Firstly, we compare redundant and non-redundant GDW schemas and conclude that redundancy is related to high performance losses. We also analyze the issue of indexing, aiming at improving SOLAP query performance on a redundant GDW. Comparisons of the SB-index approach, the star-join aided by R-tree and the star-join aided by GiST indicate that the SB-index significantly improves the elapsed time in query processing from 25% up to 99% with regard to SOLAP queries defined over the spatial predicates of intersection, enclosure and containment and applied to roll-up and drill-down operations. We also investigate the impact of the increase in data volume on the performance. The increase did not impair the performance of the SB-index, which highly improved the elapsed time in query processing. Performance tests also show that the SB-index is far more compact than the star-join, requiring only a small fraction of at most 0.20% of the volume. Moreover, we propose a specific enhancement of the SB-index to deal with spatial data redundancy. This enhancement improved performance from 80 to 91% for redundant GDW schemas.
Resumo:
Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.
Resumo:
This paper deals with the emission of gravitational radiation in the context of a previously studied metric nonsymmetric theory of gravitation. The part coming from the symmetric part of the metric coincides with the mass quadrupole moment result of general relativity. The one associated to the antisymmetric part of the metric involves the dipole moment of the fermionic charge of the system. The results are applied to binary star systems and the decrease of the period of the elliptical motion is calculated.
Resumo:
OBJECTIVE: To estimate the spatial intensity of urban violence events using wavelet-based methods and emergency room data. METHODS: Information on victims attended at the emergency room of a public hospital in the city of São Paulo, Southeastern Brazil, from January 1, 2002 to January 11, 2003 were obtained from hospital records. The spatial distribution of 3,540 events was recorded and a uniform random procedure was used to allocate records with incomplete addresses. Point processes and wavelet analysis technique were used to estimate the spatial intensity, defined as the expected number of events by unit area. RESULTS: Of all georeferenced points, 59% were accidents and 40% were assaults. There is a non-homogeneous spatial distribution of the events with high concentration in two districts and three large avenues in the southern area of the city of São Paulo. CONCLUSIONS: Hospital records combined with methodological tools to estimate intensity of events are useful to study urban violence. The wavelet analysis is useful in the computation of the expected number of events and their respective confidence bands for any sub-region and, consequently, in the specification of risk estimates that could be used in decision-making processes for public policies.
Resumo:
The mature larva and pupa of Fulgeochlizus bruchi (Candèze, 1896) are described and illustrated. Bioluminescent patterns are also given. Comments, new data on the first instar larva and natural history data are presented. The first instar larvae differ from the mature larvae mainly in their chaetotaxy, which is sparse and more symmetrically distributed.
Resumo:
OBJETIVO: Estimar a proporção de automedicação em adultos de baixa renda e identificar fatores associados. MÉTODOS: Foram utilizados dados de inquérito populacional realizado no município de São Paulo em 2005, cujo plano amostral incluiu dois domínios, favela e não favela, com amostragem por conglomerados em dois estágios, totalizando 3.226 indivíduos elegíveis. Além de características sociodemográficas e econômicas, foram analisados: uso de medicamentos nos 15 dias anteriores à entrevista, tipo de acesso (gratuito, comprado ou outra) aos medicamentos e os tipos de morbidades (crônicas ou agudas) tratadas, em análise de regressão logística múltipla. RESULTADOS: A proporção de automedicação foi de 27% a 32%. Automedicação esteve fortemente associada à morbidade aguda, ao acesso ao medicamento por compra, à idade menor que 47 anos e medicamentos do grupo terapêutico que atuam no sistema nervoso central. O grupo que atua no sistema nervoso central foi o mais utilizado na automedicação. CONCLUSÕES: O acesso gratuito aos medicamentos mostrou-se fator de proteção para a automedicação. A distribuição de medicamentos e o atendimento adequado devem ser considerados para orientação e redução dos riscos que o uso irracional de medicamentos pode gerar à saúde.
Resumo:
OBJETIVO: Analisar a distribuição espacial e sazonal da leptospirose, identificando possíveis componentes ecológicos e sociais para a sua transmissão. MÉTODOS: Foram georreferenciados 2.490 casos em cada distrito do município de São Paulo, SP, registrados de 1998 a 2006. Os dados foram obtidos do Sistema de Informação de Agravos de Notificação. Foram realizados mapas temáticos com as variáveis taxa de incidência, letalidade, taxa de alfabetização, renda média mensal, número de moradores por domicilio, abastecimento de água e rede de esgoto. Para identificar o padrão espacial (disperso, em aglomerado ou randômico), foram analisadas pelo Índice de Moran global e local. Foi utilizado o coeficiente de correlação de Spearman para testar associações entre as variáveis com padrão espacial em aglomerados. RESULTADOS: O padrão espacial em aglomerados foi observado nas variáveis taxa de incidência de leptospirose, taxa de alfabetização, renda média mensal, número de moradores por domicílio, abastecimento de água e rede de esgoto. Foram notificados 773 casos no período seco e 1.717 no úmido. A incidência e a letalidade estão correlacionadas com as condições socioeconômicas da população, independentemente do período. CONCLUSÕES: A leptospirose está distribuída por todo o município de São Paulo e sua incidência aumenta no período das chuvas. No período seco, os locais de aparecimento dos casos coincidem com as áreas de piores condições de moradia e, durante o período úmido, também aumenta em outros distritos, provavelmente devido à proximidade de rios e córregos.
Resumo:
OBJETIVO: Investigar a associação entre hipertensão arterial referida (HAr) e indicadores antropométricos de gordura, corporal e abdominal em idosos do município de São Paulo. MÉTODOS: Os dados de 1894 idosos foram baseados na pesquisa Saúde, Bem-Estar e Envelhecimento - SABE, 2000. Os indicadores antropométricos utilizados foram: Índice de Massa Corporal (IMC), perímetro da cintura (PC), razão cintura/quadril (RCQ) e razão cintura/estatura (RCE). Utilizou-se regressão logística binária, estratificada por sexo. RESULTADOS: A hipertensão arterial associou-se aos indicadores antropométricos. No modelo final (ajustado para idade, escolaridade, tabagismo, atividade física e diabetes), em ambos os sexos, o IMC apresentou maior força estatística, apesar de, nas mulheres, apresentar-se similar aos outros indicadores. À exceção da RCE, em homens, a HAr associou-se, positiva e independentemente, aos outros indicadores. CONCLUSÃO: Os resultados sugerem a relevância desses indicadores, para, precocemente, detectar os riscos para o desenvolvimento dessa doença e intervir na sua prevenção e controle.
Resumo:
The objective of this study was to estimate the regressions calibration for the dietary data that were measured using the quantitative food frequency questionnaire (QFFQ) in the Natural History of HPV Infection in Men: the HIM Study in Brazil. A sample of 98 individuals from the HIM study answered one QFFQ and three 24-hour recalls (24HR) at interviews. The calibration was performed using linear regression analysis in which the 24HR was the dependent variable and the QFFQ was the independent variable. Age, body mass index, physical activity, income and schooling were used as adjustment variables in the models. The geometric means between the 24HR and the calibration-corrected QFFQ were statistically equal. The dispersion graphs between the instruments demonstrate increased correlation after making the correction, although there is greater dispersion of the points with worse explanatory power of the models. Identification of the regressions calibration for the dietary data of the HIM study will make it possible to estimate the effect of the diet on HPV infection, corrected for the measurement error of the QFFQ.
Resumo:
Information on fruits and vegetables consumption in Brazil in the three levels of dietary data was analyzed and compared. Data about national supply came from Food Balance Sheets compiled by the FAO; household availability information was obtained from the Brazilian National Household Budget Survey (HBS); and actual intake information came from a large individual dietary intake survey that was representative of the adult population of São Paulo city. All sources of information were collected between 2002 and 2003. A subset of the HBS, representative of São Paulo city, was used in our analysis in order to improve the quality of the comparison with actual intake data. The ratio of national supply to household availability of fruits and vegetables was 2.6 while the ratio of national supply to actual intake was 4.0. The discrepancy ratio in the comparison between household availability and actual intake was smaller, 1.6. While the use of supply and availability data has advantages, as lower cost, must be taken into account that these sources tend to overestimate actual intake of fruits and vegetables.
Resumo:
Fifty Bursa of Fabricius (BF) were examined by conventional optical microscopy and digital images were acquired and processed using Matlab® 6.5 software. The Artificial Neuronal Network (ANN) was generated using Neuroshell® Classifier software and the optical and digital data were compared. The ANN was able to make a comparable classification of digital and optical scores. The use of ANN was able to classify correctly the majority of the follicles, reaching sensibility and specificity of 89% and 96%, respectively. When the follicles were scored and grouped in a binary fashion the sensibility increased to 90% and obtained the maximum value for the specificity of 92%. These results demonstrate that the use of digital image analysis and ANN is a useful tool for the pathological classification of the BF lymphoid depletion. In addition it provides objective results that allow measuring the dimension of the error in the diagnosis and classification therefore making comparison between databases feasible.
Resumo:
study-specific results, their findings should be interpreted with caution
Resumo:
Diagnostic methods have been an important tool in regression analysis to detect anomalies, such as departures from error assumptions and the presence of outliers and influential observations with the fitted models. Assuming censored data, we considered a classical analysis and Bayesian analysis assuming no informative priors for the parameters of the model with a cure fraction. A Bayesian approach was considered by using Markov Chain Monte Carlo Methods with Metropolis-Hasting algorithms steps to obtain the posterior summaries of interest. Some influence methods, such as the local influence, total local influence of an individual, local influence on predictions and generalized leverage were derived, analyzed and discussed in survival data with a cure fraction and covariates. The relevance of the approach was illustrated with a real data set, where it is shown that, by removing the most influential observations, the decision about which model best fits the data is changed.
Resumo:
Background: Population antimicrobial use may influence resistance emergence. Resistance is an ecological phenomenon due to potential transmissibility. We investigated spatial and temporal patterns of ciprofloxacin (CIP) population consumption related to E. coli resistance emergence and dissemination in a major Brazilian city. A total of 4,372 urinary tract infection E. coli cases, with 723 CIP resistant, were identified in 2002 from two outpatient centres. Cases were address geocoded in a digital map. Raw CIP consumption data was transformed into usage density in DDDs by CIP selling points influence zones determination. A stochastic model coupled with a Geographical Information System was applied for relating resistance and usage density and for detecting city areas of high/low resistance risk. Results: E. coli CIP resistant cluster emergence was detected and significantly related to usage density at a level of 5 to 9 CIP DDDs. There were clustered hot-spots and a significant global spatial variation in the residual resistance risk after allowing for usage density. Conclusions: There were clustered hot-spots and a significant global spatial variation in the residual resistance risk after allowing for usage density. The usage density of 5-9 CIP DDDs per 1,000 inhabitants within the same influence zone was the resistance triggering level. This level led to E. coli resistance clustering, proving that individual resistance emergence and dissemination was affected by antimicrobial population consumption.
Resumo:
We consider a nontrivial one-species population dynamics model with finite and infinite carrying capacities. Time-dependent intrinsic and extrinsic growth rates are considered in these models. Through the model per capita growth rate we obtain a heuristic general procedure to generate scaling functions to collapse data into a simple linear behavior even if an extrinsic growth rate is included. With this data collapse, all the models studied become independent from the parameters and initial condition. Analytical solutions are found when time-dependent coefficients are considered. These solutions allow us to perceive nontrivial transitions between species extinction and survival and to calculate the transition's critical exponents. Considering an extrinsic growth rate as a cancer treatment, we show that the relevant quantity depends not only on the intensity of the treatment, but also on when the cancerous cell growth is maximum.