980 resultados para Missing values structures
Resumo:
We examine the problem of combining Mexican inflation predictions or projections provided by a biweekly survey of professional forecasters. Consumer price inflation in Mexico is measured twice a month. We consider several combining methods and advocate the use of dimension reduction techniques whose performance is compared with different benchmark methods, including the simplest average prediction. Missing values in the database are imputed by two different databased methods. The results obtained are basically robust to the choice of the imputation method. A preliminary analysis of the data was based on its panel data structure and showed the potential usefulness of using dimension reduction techniques to combine the experts' predictions. The main findings are: the first monthly predictions are best combined by way of the first principal component of the predictions available; the best second monthly prediction is obtained by calculating the median prediction and is more accurate than the first one.
Resumo:
Pós-graduação em Ciência da Computação - IBILCE
Resumo:
Durante o processo de extração do conhecimento em bases de dados, alguns problemas podem ser encontrados como por exemplo, a ausência de determinada instância de um atributo. A ocorrência de tal problemática pode causar efeitos danosos nos resultados finais do processo, pois afeta diretamente a qualidade dos dados a ser submetido a um algoritmo de aprendizado de máquina. Na literatura, diversas propostas são apresentadas a fim de contornar tal dano, dentre eles está a de imputação de dados, a qual estima um valor plausível para substituir o ausente. Seguindo essa área de solução para o problema de valores ausentes, diversos trabalhos foram analisados e algumas observações foram realizadas como, a pouca utilização de bases sintéticas que simulem os principais mecanismos de ausência de dados e uma recente tendência a utilização de algoritmos bio-inspirados como tratamento do problema. Com base nesse cenário, esta dissertação apresenta um método de imputação de dados baseado em otimização por enxame de partículas, pouco explorado na área, e o aplica para o tratamento de bases sinteticamente geradas, as quais consideram os principais mecanismos de ausência de dados, MAR, MCAR e NMAR. Os resultados obtidos ao comprar diferentes configurações do método à outros dois conhecidos na área (KNNImpute e SVMImpute) são promissores para sua utilização na área de tratamento de valores ausentes uma vez que alcançou os melhores valores na maioria dos experimentos realizados.
Resumo:
Introduzione:l’interferone (IFN) usato per l’eradicazione del virus dell’Epatite C, induce effetti collaterali anche riferibili alla sfera psichica. I dati sugli eventi avversi di tipo psichiatrico dei nuovi farmaci antivirali (DAA) sono limitati. Lo scopo di questo studio è di valutare lo sviluppo di effetti collaterali di tipo psichiatrico in corso di due distinti schemi di trattamento: IFN-peghilato e ribavirina [terapia duplice (standard o SOC)]; DAA in associazione a IFN-peghilato e ribavirina (terapia triplice). Metodi: pazienti HCV+ consecutivi seguiti presso l’Ambulatorio delle Epatiti Croniche della Semeiotica Medica del Dipartimento di Scienze Mediche e Chirurgiche dell’Università di Bologna in procinto di intraprendere un trattamento antivirale a base di IFN, sottoposti ad esame psicodiagnostico composto da intervista clinica semistrutturata e test autosomministrati: BDI, STAXI-2, Hamilton Anxiety Scale, MMPI – 2. Risultati: Sono stati arruolati 84 pazienti, 57/84 (67.9%) nel gruppo in triplice e 27/84 nel gruppo SOC. Quasi tutti i pazienti arruolati hanno eseguito l’intervista clinica iniziale (82/84; 97.6%), mentre scarsa è stata l’aderenza ai test (valori missing>50%). Ad eccezione dell’ansia, la prevalenza di tutti gli altri disturbi (irritabilità, astenia, disfunzioni neurocognitive, dissonnia) aumentava in corso di trattamento. In corso di terapia antivirale 43/84 (51.2%) hanno avuto bisogno di usufruire del servizio di consulenza psichiatrica e 48/84 (57.1%) hanno ricevuto una psicofarmacoterapia di supporto, senza differenze significative fra i due gruppi di trattamento. Conclusioni : uno degli elementi più salienti dello studio è stata la scarsa aderenza ai test psicodiagnostici, nonostante l’elevata prevalenza di sintomi psichiatrici. I risultati di questo studio oltre ad evidenziare l’importanza dei sintomi psichiatrici in corso di trattamento e la rilevanza della consulenza psicologica e psichiatrica per consentire di portare a termine il ciclo terapeutico previsto (migliorandone l’efficacia), ha anche dimostrato che occorre ripensare gli strumenti diagnostici adattandoli probabilmente a questo specifico target.
Resumo:
Objectives To compare different ways of measuring partner notification (PN) outcomes with published audit standards, examine variability between clinics and examine factors contributing to variation in PN outcomes in genitourinary medicine (GUM) clinics in the UK. Methods Reanalysis of the 2007 BASHH national chlamydia audit. The primary outcome was the number of partners per index case tested for chlamydia, as verified by a healthcare worker or, if missing, reported by the patient. Control charts were used to examine variation between clinics considering missing values as zero or excluding missing values. Hierarchical logistic regression was used to investigate factors contributing to variation in outcomes. Results Data from 4616 individuals in 169 genitourinary medicine clinics were analysed. There was no information about the primary outcome in 41% of records. The mean number of partners tested for chlamydia ranged from 0 to 1.5 per index case per clinic. The median across all clinics was 0.47 when missing values were assumed to be zero and 0.92 per index case when missing values were excluded. Men who have sex with men were less likely than heterosexual men and patients with symptoms (4-week look-back period) were less likely than asymptomatic patients (6-month look-back) to report having one or more partners tested for chlamydia. There was no association between the primary outcome and the type of the health professional giving the PN advice. Conclusions The completeness of PN outcomes recorded in clinical notes needs to improve. Further research is needed to identify auditable measures that are associated with successful PN that prevents repeated chlamydia in index cases.
Resumo:
The purposes of this study were to examine (1) the relationship between selected components of the content of prenatal care and spontaneous preterm birth; and (2) the degree of comparability between maternal and caregivers' responses regarding the number of prenatal care visits, selected components of the content of prenatal care, and gestational age, based on analyses of the 1988 National Maternal and Infant Health Survey conducted by the National Centers for Health Statistics. Spontaneous preterm birth was subcategorized into very preterm and moderately preterm births, with term birth as the controls. The study population was limited to non-Hispanic Anglo- and African-American mothers. The racial differences in terms of birth outcomes were also compared.^ This study concluded that: (1) there was not a high degree of comparability (less than 80%) between maternal and prenatal care provider's responses regarding the number of prenatal care visits and the content of prenatal care; (2) there was a low degree of comparability (less than 50%) between maternal and infant's hospital of delivery responses regarding gestational age at birth; (3) there were differences in selected components of the content of prenatal care between the cases and controls, overall and stratified by ethnicity (i.e., hemoglobin/hematocrit test, weight measurement, and breast-feeding counseling), but they were confounded with missing values and associated preterm delivery bias; (4) there were differences in selected components of the content of prenatal care between Anglo- and African-American cases (i.e., vitamin/mineral supplement advice, weight measurement, smoking cessation and drug abuse counseling), but they, too, were difficult to interpret definitively due to item nonresponse and preterm delivery biases; (5) no significant predictive association between selected components of the content of prenatal care and spontaneous preterm birth was found; and (6) inadequate/intermediate prenatal care and birth out of wedlock were found to be associated with moderately preterm birth.^ Future research is needed to examine the validity of maternal and prenatal care providers' responses and identify the sources of disagreement between their responses. In addition, further studies are needed to examine the relationship between the quality of prenatal care and preterm birth. Finally, the completeness and quality of patient and provider data on the utilization and content of prenatal care needs to be strengthened in subsequent studies. ^
Resumo:
Ethnic violence appears to be the major source of violence in the world. Ethnic hostilities are potentially all-pervasive because most countries in the world are multi-ethnic. Public health's focus on violence documents its increasing role in this issue.^ The present study is based on a secondary analysis of a dataset of responses by 272 individuals from four ethnic groups (Anglo, African, Mexican, and Vietnamese Americans) who answered questions regarding variables related to ethnic violence from a general questionnaire which was distributed to ethnically diverse purposive, nonprobability, self-selected groups of individuals in Houston, Texas, in 1993.^ One goal was psychometric: learning about issues in analysis of datasets with modest numbers, comparison of two approaches to dealing with missing observations not missing at random (conducting analysis on two datasets), transformation analysis of continuous variables for logistic regression, and logistic regression diagnostics.^ Regarding the psychometric goal, it was concluded that measurement model analysis was not possible with a relatively small dataset with nonnormal variables, such as Likert-scaled variables; therefore, exploratory factor analysis was used. The two approaches to dealing with missing values resulted in comparable findings. Transformation analysis suggested that the continuous variables were in the correct scale, and diagnostics that the model fit was adequate.^ The substantive portion of the analysis included the testing of four hypotheses. Hypothesis One proposed that attitudes/efficacy regarding alternative approaches to resolving grievances from the general questionnaire represented underlying factors: nonpunitive social norms and strategies for addressing grievances--using the political system, organizing protests, using the system to punish offenders, and personal mediation. Evidence was found to support all but one factor, nonpunitive social norms.^ Hypothesis Two proposed that the factor variables and the other independent variables--jail, grievance, male, young, and membership in a particular ethnic group--were associated with (non)violence. Jail, grievance, and not using the political system to address grievances were associated with a greater likelihood of intergroup violence.^ No evidence was found to support Hypotheses Three and Four, which proposed that grievance and ethnic group membership would interact with other variables (i.e., age, gender, etc.) to produce variant levels of subgroup (non)violence.^ The generalizability of the results of this study are constrained by the purposive self-selected nature of the sample and small sample size (n = 272).^ Suggestions for future research include incorporating other possible variables or factors predictive of intergroup violence in models of the kind tested here, and the development and evaluation of interventions that promote electoral and nonelectoral political participation as means of reducing interethnic conflict. ^
Resumo:
Sequence analysis and optimal matching are useful heuristic tools for the descriptive analysis of heterogeneous individual pathways such as educational careers, job sequences or patterns of family formation. However, to date it remains unclear how to handle the inevitable problems caused by missing values with regard to such analysis. Multiple Imputation (MI) offers a possible solution for this problem but it has not been tested in the context of sequence analysis. Against this background, we contribute to the literature by assessing the potential of MI in the context of sequence analyses using an empirical example. Methodologically, we draw upon the work of Brendan Halpin and extend it to additional types of missing value patterns. Our empirical case is a sequence analysis of panel data with substantial attrition that examines the typical patterns and the persistence of sex segregation in school-to-work transitions in Switzerland. The preliminary results indicate that MI is a valuable methodology for handling missing values due to panel mortality in the context of sequence analysis. MI is especially useful in facilitating a sound interpretation of the resulting sequence types.
Resumo:
The use of hindcast climatic data is quite extended for multiple applications. However, this approach needs the support of a validation process to allow its drawbacks and, therefore, confidence levels to be assessed. In this work, the strategy relies on an hourly wind database resulting from a dynamical downscaling experiment, with a spatial resolution of 10 km, covering the Iberian Peninsula (IP), driven by the ERA40 reanalysis (1959–2001) extended by European Centre for Medium-Range Weather Forecast (ECMWF) analysis (2002–2007) and comprising two main steps. Initially, the skill of the simulation is evaluated comparing the quality-tested observational database (Lorente-Plazas et al., 2014) at local and regional scales. The results show that the model is able to portray the main features of the wind over the IP: annual cycles, wind roses, spatial and temporal variability, as well as the response to different circulation types. In addition, there is a significant added value of the simulation with respect to driving conditions, especially in regions with a complex orography. However, some problems are evident, the major drawback being the systematic overestimation of the wind speed, which is mainly attributed to a missrepresentation of frictional forces. The model skill is also lower along the Mediterranean coast and for the Pyrenees. In a second phase, the high spatio-temporal resolution of the pseudo-real wind database is used to explore the limitations of the observational database. It is shown that missing values do not affect the characterisation of the wind climate over the IP, while the length of the observational period (6 years) is sufficient for most regions, with only a few exceptions. The spatial distribution of the observational sampling schemes should be enhanced to improve the correct assessment of all IP wind regimes, particularly in some mountainous areas.
Resumo:
METHODS Spirometry datasets from South-Asian children were collated from four centres in India and five within the UK. Records with transcription errors, missing values for height or spirometry, and implausible values were excluded(n = 110). RESULTS Following exclusions, cross-sectional data were available from 8,124 children (56.3% male; 5-17 years). When compared with GLI-predicted values from White Europeans, forced expired volume in 1s (FEV1) and forced vital capacity (FVC) in South-Asian children were on average 15% lower, ranging from 4-19% between centres. By contrast, proportional reductions in FEV1 and FVC within all but two datasets meant that the FEV1/FVC ratio remained independent of ethnicity. The 'GLI-Other' equation fitted data from North India reasonably well while 'GLI-Black' equations provided a better approximation for South-Asian data than the 'GLI-White' equation. However, marked discrepancies in the mean lung function z-scores between centres especially when examined according to socio-economic conditions precluded derivation of a single South-Asian GLI-adjustment. CONCLUSION Until improved and more robust prediction equations can be derived, we recommend the use of 'GLI-Black' equations for interpreting most South-Asian data, although 'GLI-Other' may be more appropriate for North Indian data. Prospective data collection using standardised protocols to explore potential sources of variation due to socio-economic circumstances, secular changes in growth/predictors of lung function and ethnicities within the South-Asian classification are urgently required.
Resumo:
Logistic regression is one of the most important tools in the analysis of epidemiological and clinical data. Such data often contain missing values for one or more variables. Common practice is to eliminate all individuals for whom any information is missing. This deletion approach does not make efficient use of available information and often introduces bias.^ Two methods were developed to estimate logistic regression coefficients for mixed dichotomous and continuous covariates including partially observed binary covariates. The data were assumed missing at random (MAR). One method (PD) used predictive distribution as weight to calculate the average of the logistic regressions performing on all possible values of missing observations, and the second method (RS) used a variant of resampling technique. Additional seven methods were compared with these two approaches in a simulation study. They are: (1) Analysis based on only the complete cases, (2) Substituting the mean of the observed values for the missing value, (3) An imputation technique based on the proportions of observed data, (4) Regressing the partially observed covariates on the remaining continuous covariates, (5) Regressing the partially observed covariates on the remaining continuous covariates conditional on response variable, (6) Regressing the partially observed covariates on the remaining continuous covariates and response variable, and (7) EM algorithm. Both proposed methods showed smaller standard errors (s.e.) for the coefficient involving the partially observed covariate and for the other coefficients as well. However, both methods, especially PD, are computationally demanding; thus for analysis of large data sets with partially observed covariates, further refinement of these approaches is needed. ^
Resumo:
Eocene-Oligocene volcanic rocks drilled at Site 786 in the Izu-Bonin forearc cover a wide range of compositions from primitive boninites to highly evolved rhyolites. K-Ar dating reveals at least two distinct episodes of magmatism; one at 41 Ma and a later one at 35 Ma. The early episode produced low-Ca boninites and bronzite andesites that form an oceanic basement of pillow lavas and composite intrusive sheets, overlain by flows and intrusive sheets of intermediate-Ca boninites and bronzite-andesites and a fractionated series of andesites, dacites, and rhyolites. The later episode produced high-Ca boninites and intermediate-Ca boninites, exclusively as intrusive sheets.
Resumo:
The first data set contains the mean and cofficient of variation (standard deviation divided by mean) of a multi-frequency indicator I derived from ER60 acoustic information collected at five frequencies (18, 38, 70, 120, and 200 kHz) in the Bay of Biscay in May of the years 2006, 2008, 2009 and 2010 (Pelgas surveys). The multi-frequency indicator was first calculated per voxel (20 m long × 5 m deep sampling unit) and then averaged on a spatial grid (approx. 20 nm × 20 nm) for five 5-m depth layers in the surface waters (10-15m, 15-20m, 20-25m, 25-30m below sea surface); there are missing values in particular in the shallowest layer. The second data set provides for each grid cell and depth layer the proportion of voxels for which the multi-frequency indicator I was indicative of a certain group of organisms. For this the following interpretation was used: I < 0.39 swim bladder fish or large gas bubbles, I = 0.39-0.58 small resonant bubbles present in gas bearing organisms such as larval fish and phytoplankton, I = 0.7-0.8 fluidlike zooplankton such as copepods and euphausiids, and I > 0.8 mackerel. These proportions can be interpreted as a relative abundance index for each of the four organism groups.
Resumo:
Snow height was measured by the Snow Depth Buoy 2015S22, an autonomous platform, drifting on Arctic sea ice, deployed during the Norwegian Young sea ICE cruise (N-ICE 2015) project. The resulting time series describes the evolution of snow depth as a function of place and time between 2015-03-01 and 2015-05-06 in sample intervals of 1 hour. The Snow Depth Buoy consists of four independent sonar measurements representing the area (approx. 10 m**2) around the buoy. The buoy was installed on first year ice. In addition to snow depth, geographic position (GPS), barometric pressure, air temperature, and ice surface temperature were measured. Negative values of snow depth occur if surface ablation continues into the sea ice. Thus, these measurements describe the position of the sea ice surface relative to the original snow-ice interface. Differences between single sensors indicate small-scale variability of the snow pack around the buoy. The data set has been processed, including the removal of obvious inconsistencies (missing values). Records without any snow depth may still be used for sea ice drift analyses.
Resumo:
Snow height was measured by the Snow Depth Buoy 2015S26, an autonomous platform, drifting on Arctic sea ice, deployed during the Norwegian Young sea ICE cruise (N-ICE 2015) project. The resulting time series describes the evolution of snow depth as a function of place and time between 2015-01-24 and 2015-02-21 in sample intervals of 1 hour. The Snow Depth Buoy consists of four independent sonar measurements representing the area (approx. 10 m**2) around the buoy. The buoy was installed on first year ice. In addition to snow depth, geographic position (GPS), barometric pressure, air temperature, and ice surface temperature were measured. Negative values of snow depth occur if surface ablation continues into the sea ice. Thus, these measurements describe the position of the sea ice surface relative to the original snow-ice interface. Differences between single sensors indicate small-scale variability of the snow pack around the buoy. The data set has been processed, including the removal of obvious inconsistencies (missing values). Records without any snow depth may still be used for sea ice drift analyses.