986 resultados para Missing values


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Multi-factor models constitute a use fui tool to explain cross-sectional covariance in equities retums. We propose in this paper the use of irregularly spaced returns in the multi-factor model estimation and provide an empirical example with the 389 most liquid equities in the Brazilian Market. The market index shows itself significant to explain equity returns while the US$/Brazilian Real exchange rate and the Brazilian standard interest rate does not. This example shows the usefulness of the estimation method in further using the model to fill in missing values and to provide intervaI forecasts.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work we study the survival cure rate model proposed by Yakovlev (1993) that are considered in a competing risk setting. Covariates are introduced for modeling the cure rate and we allow some covariates to have missing values. We consider only the cases by which the missing covariates are categorical and implement the EM algorithm via the method of weights for maximum likelihood estimation. We present a Monte Carlo simulation experiment to compare the properties of the estimators based on this method with those estimators under the complete case scenario. We also evaluate, in this experiment, the impact in the parameter estimates when we increase the proportion of immune and censored individuals among the not immune one. We demonstrate the proposed methodology with a real data set involving the time until the graduation for the undergraduate course of Statistics of the Universidade Federal do Rio Grande do Norte

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We examine the problem of combining Mexican inflation predictions or projections provided by a biweekly survey of professional forecasters. Consumer price inflation in Mexico is measured twice a month. We consider several combining methods and advocate the use of dimension reduction techniques whose performance is compared with different benchmark methods, including the simplest average prediction. Missing values in the database are imputed by two different databased methods. The results obtained are basically robust to the choice of the imputation method. A preliminary analysis of the data was based on its panel data structure and showed the potential usefulness of using dimension reduction techniques to combine the experts' predictions. The main findings are: the first monthly predictions are best combined by way of the first principal component of the predictions available; the best second monthly prediction is obtained by calculating the median prediction and is more accurate than the first one.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Durante o processo de extração do conhecimento em bases de dados, alguns problemas podem ser encontrados como por exemplo, a ausência de determinada instância de um atributo. A ocorrência de tal problemática pode causar efeitos danosos nos resultados finais do processo, pois afeta diretamente a qualidade dos dados a ser submetido a um algoritmo de aprendizado de máquina. Na literatura, diversas propostas são apresentadas a fim de contornar tal dano, dentre eles está a de imputação de dados, a qual estima um valor plausível para substituir o ausente. Seguindo essa área de solução para o problema de valores ausentes, diversos trabalhos foram analisados e algumas observações foram realizadas como, a pouca utilização de bases sintéticas que simulem os principais mecanismos de ausência de dados e uma recente tendência a utilização de algoritmos bio-inspirados como tratamento do problema. Com base nesse cenário, esta dissertação apresenta um método de imputação de dados baseado em otimização por enxame de partículas, pouco explorado na área, e o aplica para o tratamento de bases sinteticamente geradas, as quais consideram os principais mecanismos de ausência de dados, MAR, MCAR e NMAR. Os resultados obtidos ao comprar diferentes configurações do método à outros dois conhecidos na área (KNNImpute e SVMImpute) são promissores para sua utilização na área de tratamento de valores ausentes uma vez que alcançou os melhores valores na maioria dos experimentos realizados.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Introduzione:l’interferone (IFN) usato per l’eradicazione del virus dell’Epatite C, induce effetti collaterali anche riferibili alla sfera psichica. I dati sugli eventi avversi di tipo psichiatrico dei nuovi farmaci antivirali (DAA) sono limitati. Lo scopo di questo studio è di valutare lo sviluppo di effetti collaterali di tipo psichiatrico in corso di due distinti schemi di trattamento: IFN-peghilato e ribavirina [terapia duplice (standard o SOC)]; DAA in associazione a IFN-peghilato e ribavirina (terapia triplice). Metodi: pazienti HCV+ consecutivi seguiti presso l’Ambulatorio delle Epatiti Croniche della Semeiotica Medica del Dipartimento di Scienze Mediche e Chirurgiche dell’Università di Bologna in procinto di intraprendere un trattamento antivirale a base di IFN, sottoposti ad esame psicodiagnostico composto da intervista clinica semistrutturata e test autosomministrati: BDI, STAXI-2, Hamilton Anxiety Scale, MMPI – 2. Risultati: Sono stati arruolati 84 pazienti, 57/84 (67.9%) nel gruppo in triplice e 27/84 nel gruppo SOC. Quasi tutti i pazienti arruolati hanno eseguito l’intervista clinica iniziale (82/84; 97.6%), mentre scarsa è stata l’aderenza ai test (valori missing>50%). Ad eccezione dell’ansia, la prevalenza di tutti gli altri disturbi (irritabilità, astenia, disfunzioni neurocognitive, dissonnia) aumentava in corso di trattamento. In corso di terapia antivirale 43/84 (51.2%) hanno avuto bisogno di usufruire del servizio di consulenza psichiatrica e 48/84 (57.1%) hanno ricevuto una psicofarmacoterapia di supporto, senza differenze significative fra i due gruppi di trattamento. Conclusioni : uno degli elementi più salienti dello studio è stata la scarsa aderenza ai test psicodiagnostici, nonostante l’elevata prevalenza di sintomi psichiatrici. I risultati di questo studio oltre ad evidenziare l’importanza dei sintomi psichiatrici in corso di trattamento e la rilevanza della consulenza psicologica e psichiatrica per consentire di portare a termine il ciclo terapeutico previsto (migliorandone l’efficacia), ha anche dimostrato che occorre ripensare gli strumenti diagnostici adattandoli probabilmente a questo specifico target.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objectives To compare different ways of measuring partner notification (PN) outcomes with published audit standards, examine variability between clinics and examine factors contributing to variation in PN outcomes in genitourinary medicine (GUM) clinics in the UK. Methods Reanalysis of the 2007 BASHH national chlamydia audit. The primary outcome was the number of partners per index case tested for chlamydia, as verified by a healthcare worker or, if missing, reported by the patient. Control charts were used to examine variation between clinics considering missing values as zero or excluding missing values. Hierarchical logistic regression was used to investigate factors contributing to variation in outcomes. Results Data from 4616 individuals in 169 genitourinary medicine clinics were analysed. There was no information about the primary outcome in 41% of records. The mean number of partners tested for chlamydia ranged from 0 to 1.5 per index case per clinic. The median across all clinics was 0.47 when missing values were assumed to be zero and 0.92 per index case when missing values were excluded. Men who have sex with men were less likely than heterosexual men and patients with symptoms (4-week look-back period) were less likely than asymptomatic patients (6-month look-back) to report having one or more partners tested for chlamydia. There was no association between the primary outcome and the type of the health professional giving the PN advice. Conclusions The completeness of PN outcomes recorded in clinical notes needs to improve. Further research is needed to identify auditable measures that are associated with successful PN that prevents repeated chlamydia in index cases.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Questionnaire data may contain missing values because certain questions do not apply to all respondents. For instance, questions addressing particular attributes of a symptom, such as frequency, triggers or seasonality, are only applicable to those who have experienced the symptom, while for those who have not, responses to these items will be missing. This missing information does not fall into the category 'missing by design', rather the features of interest do not exist and cannot be measured regardless of survey design. Analysis of responses to such conditional items is therefore typically restricted to the subpopulation in which they apply. This article is concerned with joint multivariate modelling of responses to both unconditional and conditional items without restricting the analysis to this subpopulation. Such an approach is of interest when the distributions of both types of responses are thought to be determined by common parameters affecting the whole population. By integrating the conditional item structure into the model, inference can be based both on unconditional data from the entire population and on conditional data from subjects for whom they exist. This approach opens new possibilities for multivariate analysis of such data. We apply this approach to latent class modelling and provide an example using data on respiratory symptoms (wheeze and cough) in children. Conditional data structures such as that considered here are common in medical research settings and, although our focus is on latent class models, the approach can be applied to other multivariate models.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The purposes of this study were to examine (1) the relationship between selected components of the content of prenatal care and spontaneous preterm birth; and (2) the degree of comparability between maternal and caregivers' responses regarding the number of prenatal care visits, selected components of the content of prenatal care, and gestational age, based on analyses of the 1988 National Maternal and Infant Health Survey conducted by the National Centers for Health Statistics. Spontaneous preterm birth was subcategorized into very preterm and moderately preterm births, with term birth as the controls. The study population was limited to non-Hispanic Anglo- and African-American mothers. The racial differences in terms of birth outcomes were also compared.^ This study concluded that: (1) there was not a high degree of comparability (less than 80%) between maternal and prenatal care provider's responses regarding the number of prenatal care visits and the content of prenatal care; (2) there was a low degree of comparability (less than 50%) between maternal and infant's hospital of delivery responses regarding gestational age at birth; (3) there were differences in selected components of the content of prenatal care between the cases and controls, overall and stratified by ethnicity (i.e., hemoglobin/hematocrit test, weight measurement, and breast-feeding counseling), but they were confounded with missing values and associated preterm delivery bias; (4) there were differences in selected components of the content of prenatal care between Anglo- and African-American cases (i.e., vitamin/mineral supplement advice, weight measurement, smoking cessation and drug abuse counseling), but they, too, were difficult to interpret definitively due to item nonresponse and preterm delivery biases; (5) no significant predictive association between selected components of the content of prenatal care and spontaneous preterm birth was found; and (6) inadequate/intermediate prenatal care and birth out of wedlock were found to be associated with moderately preterm birth.^ Future research is needed to examine the validity of maternal and prenatal care providers' responses and identify the sources of disagreement between their responses. In addition, further studies are needed to examine the relationship between the quality of prenatal care and preterm birth. Finally, the completeness and quality of patient and provider data on the utilization and content of prenatal care needs to be strengthened in subsequent studies. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Ethnic violence appears to be the major source of violence in the world. Ethnic hostilities are potentially all-pervasive because most countries in the world are multi-ethnic. Public health's focus on violence documents its increasing role in this issue.^ The present study is based on a secondary analysis of a dataset of responses by 272 individuals from four ethnic groups (Anglo, African, Mexican, and Vietnamese Americans) who answered questions regarding variables related to ethnic violence from a general questionnaire which was distributed to ethnically diverse purposive, nonprobability, self-selected groups of individuals in Houston, Texas, in 1993.^ One goal was psychometric: learning about issues in analysis of datasets with modest numbers, comparison of two approaches to dealing with missing observations not missing at random (conducting analysis on two datasets), transformation analysis of continuous variables for logistic regression, and logistic regression diagnostics.^ Regarding the psychometric goal, it was concluded that measurement model analysis was not possible with a relatively small dataset with nonnormal variables, such as Likert-scaled variables; therefore, exploratory factor analysis was used. The two approaches to dealing with missing values resulted in comparable findings. Transformation analysis suggested that the continuous variables were in the correct scale, and diagnostics that the model fit was adequate.^ The substantive portion of the analysis included the testing of four hypotheses. Hypothesis One proposed that attitudes/efficacy regarding alternative approaches to resolving grievances from the general questionnaire represented underlying factors: nonpunitive social norms and strategies for addressing grievances--using the political system, organizing protests, using the system to punish offenders, and personal mediation. Evidence was found to support all but one factor, nonpunitive social norms.^ Hypothesis Two proposed that the factor variables and the other independent variables--jail, grievance, male, young, and membership in a particular ethnic group--were associated with (non)violence. Jail, grievance, and not using the political system to address grievances were associated with a greater likelihood of intergroup violence.^ No evidence was found to support Hypotheses Three and Four, which proposed that grievance and ethnic group membership would interact with other variables (i.e., age, gender, etc.) to produce variant levels of subgroup (non)violence.^ The generalizability of the results of this study are constrained by the purposive self-selected nature of the sample and small sample size (n = 272).^ Suggestions for future research include incorporating other possible variables or factors predictive of intergroup violence in models of the kind tested here, and the development and evaluation of interventions that promote electoral and nonelectoral political participation as means of reducing interethnic conflict. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Sequence analysis and optimal matching are useful heuristic tools for the descriptive analysis of heterogeneous individual pathways such as educational careers, job sequences or patterns of family formation. However, to date it remains unclear how to handle the inevitable problems caused by missing values with regard to such analysis. Multiple Imputation (MI) offers a possible solution for this problem but it has not been tested in the context of sequence analysis. Against this background, we contribute to the literature by assessing the potential of MI in the context of sequence analyses using an empirical example. Methodologically, we draw upon the work of Brendan Halpin and extend it to additional types of missing value patterns. Our empirical case is a sequence analysis of panel data with substantial attrition that examines the typical patterns and the persistence of sex segregation in school-to-work transitions in Switzerland. The preliminary results indicate that MI is a valuable methodology for handling missing values due to panel mortality in the context of sequence analysis. MI is especially useful in facilitating a sound interpretation of the resulting sequence types.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The use of hindcast climatic data is quite extended for multiple applications. However, this approach needs the support of a validation process to allow its drawbacks and, therefore, confidence levels to be assessed. In this work, the strategy relies on an hourly wind database resulting from a dynamical downscaling experiment, with a spatial resolution of 10 km, covering the Iberian Peninsula (IP), driven by the ERA40 reanalysis (1959–2001) extended by European Centre for Medium-Range Weather Forecast (ECMWF) analysis (2002–2007) and comprising two main steps. Initially, the skill of the simulation is evaluated comparing the quality-tested observational database (Lorente-Plazas et al., 2014) at local and regional scales. The results show that the model is able to portray the main features of the wind over the IP: annual cycles, wind roses, spatial and temporal variability, as well as the response to different circulation types. In addition, there is a significant added value of the simulation with respect to driving conditions, especially in regions with a complex orography. However, some problems are evident, the major drawback being the systematic overestimation of the wind speed, which is mainly attributed to a missrepresentation of frictional forces. The model skill is also lower along the Mediterranean coast and for the Pyrenees. In a second phase, the high spatio-temporal resolution of the pseudo-real wind database is used to explore the limitations of the observational database. It is shown that missing values do not affect the characterisation of the wind climate over the IP, while the length of the observational period (6 years) is sufficient for most regions, with only a few exceptions. The spatial distribution of the observational sampling schemes should be enhanced to improve the correct assessment of all IP wind regimes, particularly in some mountainous areas.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

METHODS Spirometry datasets from South-Asian children were collated from four centres in India and five within the UK. Records with transcription errors, missing values for height or spirometry, and implausible values were excluded(n = 110). RESULTS Following exclusions, cross-sectional data were available from 8,124 children (56.3% male; 5-17 years). When compared with GLI-predicted values from White Europeans, forced expired volume in 1s (FEV1) and forced vital capacity (FVC) in South-Asian children were on average 15% lower, ranging from 4-19% between centres. By contrast, proportional reductions in FEV1 and FVC within all but two datasets meant that the FEV1/FVC ratio remained independent of ethnicity. The 'GLI-Other' equation fitted data from North India reasonably well while 'GLI-Black' equations provided a better approximation for South-Asian data than the 'GLI-White' equation. However, marked discrepancies in the mean lung function z-scores between centres especially when examined according to socio-economic conditions precluded derivation of a single South-Asian GLI-adjustment. CONCLUSION Until improved and more robust prediction equations can be derived, we recommend the use of 'GLI-Black' equations for interpreting most South-Asian data, although 'GLI-Other' may be more appropriate for North Indian data. Prospective data collection using standardised protocols to explore potential sources of variation due to socio-economic circumstances, secular changes in growth/predictors of lung function and ethnicities within the South-Asian classification are urgently required.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Logistic regression is one of the most important tools in the analysis of epidemiological and clinical data. Such data often contain missing values for one or more variables. Common practice is to eliminate all individuals for whom any information is missing. This deletion approach does not make efficient use of available information and often introduces bias.^ Two methods were developed to estimate logistic regression coefficients for mixed dichotomous and continuous covariates including partially observed binary covariates. The data were assumed missing at random (MAR). One method (PD) used predictive distribution as weight to calculate the average of the logistic regressions performing on all possible values of missing observations, and the second method (RS) used a variant of resampling technique. Additional seven methods were compared with these two approaches in a simulation study. They are: (1) Analysis based on only the complete cases, (2) Substituting the mean of the observed values for the missing value, (3) An imputation technique based on the proportions of observed data, (4) Regressing the partially observed covariates on the remaining continuous covariates, (5) Regressing the partially observed covariates on the remaining continuous covariates conditional on response variable, (6) Regressing the partially observed covariates on the remaining continuous covariates and response variable, and (7) EM algorithm. Both proposed methods showed smaller standard errors (s.e.) for the coefficient involving the partially observed covariate and for the other coefficients as well. However, both methods, especially PD, are computationally demanding; thus for analysis of large data sets with partially observed covariates, further refinement of these approaches is needed. ^