927 resultados para missing data recovery


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Geographic Data Warehouses (GDW) are one of the main technologies used in decision-making processes and spatial analysis, and the literature proposes several conceptual and logical data models for GDW. However, little effort has been focused on studying how spatial data redundancy affects SOLAP (Spatial On-Line Analytical Processing) query performance over GDW. In this paper, we investigate this issue. Firstly, we compare redundant and non-redundant GDW schemas and conclude that redundancy is related to high performance losses. We also analyze the issue of indexing, aiming at improving SOLAP query performance on a redundant GDW. Comparisons of the SB-index approach, the star-join aided by R-tree and the star-join aided by GiST indicate that the SB-index significantly improves the elapsed time in query processing from 25% up to 99% with regard to SOLAP queries defined over the spatial predicates of intersection, enclosure and containment and applied to roll-up and drill-down operations. We also investigate the impact of the increase in data volume on the performance. The increase did not impair the performance of the SB-index, which highly improved the elapsed time in query processing. Performance tests also show that the SB-index is far more compact than the star-join, requiring only a small fraction of at most 0.20% of the volume. Moreover, we propose a specific enhancement of the SB-index to deal with spatial data redundancy. This enhancement improved performance from 80 to 91% for redundant GDW schemas.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJECTIVE: To estimate the spatial intensity of urban violence events using wavelet-based methods and emergency room data. METHODS: Information on victims attended at the emergency room of a public hospital in the city of São Paulo, Southeastern Brazil, from January 1, 2002 to January 11, 2003 were obtained from hospital records. The spatial distribution of 3,540 events was recorded and a uniform random procedure was used to allocate records with incomplete addresses. Point processes and wavelet analysis technique were used to estimate the spatial intensity, defined as the expected number of events by unit area. RESULTS: Of all georeferenced points, 59% were accidents and 40% were assaults. There is a non-homogeneous spatial distribution of the events with high concentration in two districts and three large avenues in the southern area of the city of São Paulo. CONCLUSIONS: Hospital records combined with methodological tools to estimate intensity of events are useful to study urban violence. The wavelet analysis is useful in the computation of the expected number of events and their respective confidence bands for any sub-region and, consequently, in the specification of risk estimates that could be used in decision-making processes for public policies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this study was to test the hypothesis of differences in performance including differences in ST-T wave changes between healthy men and women submitted to an exercise stress test. Two hundred (45.4%) men and 241 (54.6%) women (mean age: 38.7 ± 11.0 years) were submitted to an exercise stress test. Physiologic and electrocardiographic variables were compared by the Student t-test and the chi-square test. To test the hypothesis of differences in ST-segment changes, data were ranked with functional models based on weighted least squares. To evaluate the influence of gender and age on the diagnosis of ST-segment abnormality, a logistic model was adjusted; P < 0.05 was considered to be significant. Rate-pressure product, duration of exercise and estimated functional capacity were higher in men (P < 0.05). Sixteen (6.7%) women and 9 (4.5%) men demonstrated ST-segment upslope ≥0.15 mV or downslope ≥0.10 mV; the difference was not statistically significant. Age increase of one year added 4% to the chance of upsloping of segment ST ≥0.15 mV or downsloping of segment ST ≥0.1 mV (P = 0.03; risk ratio = 1.040, 95% confidence interval (CI) = 1.002-1.080). Heart rate recovery was higher in women (P < 0.05). The chance of women showing an increase of systolic blood pressure ≤30 mmHg was 85% higher (P = 0.01; risk ratio = 1.85, 95%CI = 1.1-3.05). No significant difference in the frequency of ST-T wave changes was observed between men and women. Other differences may be related to different physical conditioning.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Results obtained in a pilot-scale unit designed for COD removal and p-TBC (p-tert-butylcatechol) recovery from a butadiene washing stream (pH 14, 200,000 mg COD L-1, highly toxic) at a petrochemical industry are presented. By adding H3PO4, phase separation is achieved and p-TBC is successfully recovered (88 g L-1 of washing stream). Information (time for phase separation and organic phase characterization) was gathered for designing a future industrial unit. The estimated heat generation rate was 990 kJ min-1 and 15 min were enough to promote phase separation for a liquid column of approximately 1.15 m.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A catalogue is provided with the type material of four superfamilies of "Acalyptrate" (Conopoidea, Diopsoidea, Nerioidea and Tephritoidea) held in the collection of the Museu de Zoologia da Universidade de São Paulo (MZUSP), São Paulo, Brazil. Concerning the taxa dealt with herein, the Diptera collection of MZUSP held 77 holotypes, 4 "allotypes" and 194 paratypes. In this paper, information about data labels, preservation and missing structures of the type specimens is given.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The mature larva and pupa of Fulgeochlizus bruchi (Candèze, 1896) are described and illustrated. Bioluminescent patterns are also given. Comments, new data on the first instar larva and natural history data are presented. The first instar larvae differ from the mature larvae mainly in their chaetotaxy, which is sparse and more symmetrically distributed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

OBJETIVO: Estimar a prevalência de defeitos congênitos (DC) em uma coorte de nascidos vivos (NV) vinculando-se os bancos de dados do Sistema de Informação de Mortalidade (SIM) e do Sistema de Informação sobre Nascidos Vivos (SINASC). MÉTODOS: Estudo descritivo para avaliar as declarações de nascido vivo como fonte de informação sobre DC. A população de estudo é uma coorte de NV hospitalares do 1º semestre de 2006 de mães residentes e ocorridos no Município de São Paulo no período de 01/01/2006 a 30/06/2006, obtida por meio da vinculação dos bancos de dados das declarações de nascido vivo e óbitos neonatais provenientes da coorte. RESULTADOS: Os DC mais prevalentes segundo o SINASC foram: malformações congênitas (MC) e deformidades do aparelho osteomuscular (44,7%), MC do sistema nervoso (10,0%) e anomalias cromossômicas (8,6%). Após a vinculação, houve uma recuperação de 80,0% de indivíduos portadores de DC do aparelho circulatório, 73,3% de DC do aparelho respiratório e 62,5% de DC do aparelho digestivo. O SINASC fez 55,2% das notificações de DC e o SIM notificou 44,8%, mostrando-se importante para a recuperação de informações de DC. Segundo o SINASC, a taxa de prevalência de DC na coorte foi de 75,4%00 NV; com os dados vinculados com o SIM, essa taxa passou para 86,2%00 NV. CONCLUSÕES: A complementação de dados obtida pela vinculação SIM/SINASC fornece um perfil mais real da prevalência de DC do que aquele registrado pelo SINASC, que identifica os DC mais visíveis, enquanto o SIM identifica os mais letais, mostrando a importância do uso conjunto das duas fontes de dados.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The objective of this study was to estimate the regressions calibration for the dietary data that were measured using the quantitative food frequency questionnaire (QFFQ) in the Natural History of HPV Infection in Men: the HIM Study in Brazil. A sample of 98 individuals from the HIM study answered one QFFQ and three 24-hour recalls (24HR) at interviews. The calibration was performed using linear regression analysis in which the 24HR was the dependent variable and the QFFQ was the independent variable. Age, body mass index, physical activity, income and schooling were used as adjustment variables in the models. The geometric means between the 24HR and the calibration-corrected QFFQ were statistically equal. The dispersion graphs between the instruments demonstrate increased correlation after making the correction, although there is greater dispersion of the points with worse explanatory power of the models. Identification of the regressions calibration for the dietary data of the HIM study will make it possible to estimate the effect of the diet on HPV infection, corrected for the measurement error of the QFFQ.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Information on fruits and vegetables consumption in Brazil in the three levels of dietary data was analyzed and compared. Data about national supply came from Food Balance Sheets compiled by the FAO; household availability information was obtained from the Brazilian National Household Budget Survey (HBS); and actual intake information came from a large individual dietary intake survey that was representative of the adult population of São Paulo city. All sources of information were collected between 2002 and 2003. A subset of the HBS, representative of São Paulo city, was used in our analysis in order to improve the quality of the comparison with actual intake data. The ratio of national supply to household availability of fruits and vegetables was 2.6 while the ratio of national supply to actual intake was 4.0. The discrepancy ratio in the comparison between household availability and actual intake was smaller, 1.6. While the use of supply and availability data has advantages, as lower cost, must be taken into account that these sources tend to overestimate actual intake of fruits and vegetables.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Three comparative assays were performed seeking to improve the sensitivity of the diagnosis of Bordetella bronchiseptica infection analyzing swine nasal swabs. An initial assay compared the recovery of B. bronchiseptica from swabs simultaneously inoculated with B. bronchiseptica and some interfering bacteria, immersed into three transport formulations (Amies with charcoal, trypticase soy broth and phosphate buffer according to Soerensen supplemented with 5% of bovine fetal serum) and submitted to different temperatures (10ºC and 27ºC) and periods of incubation (24, 72 and 120 hours). A subsequent assay compared three selective media (MacConkey agar, modified selective medium G20G and a ceftiofur medium) for their recovery capabilities from clinical specimens. One last assay compared the polymerase chain reaction to the three selective media. In the first assay, the recovery of B. bronchiseptica from transport systems was better at 27ºC and the three formulations had good performances at this temperature, but the collection of qualitative and quantitative analysis indicated the advantage of Amies medium for nasal swabs transportation. The second assay indicated that MacConkey agar and modified G20G had similar results and were superior to the ceftiofur medium. In the final assay, polymerase chain reaction presented superior capability of B. bronchiseptica detection to culture procedures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Avaliaram-se a acurácia, a precisão e a robustez dos indicadores cutina, lignina em detergente ácido, óxido crômico e coleta total de fezes na estimativa da digestibilidade aparente da matéria orgânica de dietas para equinos. Para tal, foram utilizados quatro equinos machos, com idade aproximada de 10 meses e média de peso de 197kg (170 a 216kg). O experimento foi realizado em quatro períodos, com duração de 11 dias cada, sendo os oito primeiros usados para adaptação às dietas e os três subsequentes, para colheita de material. O delineamento experimental foi em quadrado latino 4X4. A ponderação dos coeficientes de digestibilidade da matéria orgânica pelos indicadores foi efetuada por meio do viés. A acurácia e a precisão foram determinadas pela comparação entre os dados preditos e observados, e a robustez pela comparação dos vieses com outros fatores estudados. A cutina não se mostrou eficiente como indicador interno, pois superestimou a digestibilidade aparente da matéria orgânica e resultou em menor acurácia e precisão. O oxido crômico apresentou baixa recuperação fecal e subestimou a digestibilidade aparente da matéria orgânica, embora tenha sido o mais preciso. A lignina em detergente ácido foi o indicador que obteve a melhor recuperação fecal e foi o mais acurado, portanto, o indicador mais eficiente.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

study-specific results, their findings should be interpreted with caution

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bovine rumen protein with two levels of residual lipids (1.9 per cent or 3.8 per cent) was subjected to thermoplastic extrusion under different temperatures and moisture contents. Protein solubility in different buffers, disulphide cross-linking and molecular weight distribution were determined on the extrudates. After extrusion, samples with 1.9 per cent residual lipids content had a higher concentration of protein insoluble by undetermined forces, irrespective of feed moisture and processing temperature used. Lipid content of 3.8 per cent in the feed material resulted in more protein participating in the extrudate network through non-covalent interactions (hydrophobic and electrostatic) and disulphide bonds. A small dependency of the extrusion process on moisture and temperature and a marked dependency on lipid content, especially phospholipid, was observed, Electrophoresis under non-reducing conditions showed that protein extrusion with low feed moisture promoted high molecular breakdown inside the barrel, probably due to intense shear force, and further protein aggregation at the die end

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Diagnostic methods have been an important tool in regression analysis to detect anomalies, such as departures from error assumptions and the presence of outliers and influential observations with the fitted models. Assuming censored data, we considered a classical analysis and Bayesian analysis assuming no informative priors for the parameters of the model with a cure fraction. A Bayesian approach was considered by using Markov Chain Monte Carlo Methods with Metropolis-Hasting algorithms steps to obtain the posterior summaries of interest. Some influence methods, such as the local influence, total local influence of an individual, local influence on predictions and generalized leverage were derived, analyzed and discussed in survival data with a cure fraction and covariates. The relevance of the approach was illustrated with a real data set, where it is shown that, by removing the most influential observations, the decision about which model best fits the data is changed.