983 resultados para Incomplete Data


Relevância:

30.00% 30.00%

Publicador:

Resumo:

DUE TO INCOMPLETE PAPERWORK, ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Heterogeneous and incomplete datasets are common in many real-world visualisation applications. The probabilistic nature of the Generative Topographic Mapping (GTM), which was originally developed for complete continuous data, can be extended to model heterogeneous (i.e. containing both continuous and discrete values) and missing data. This paper describes and assesses the resulting model on both synthetic and real-world heterogeneous data with missing values.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Стефанка Чукова, Хър Гуан Тео - В това изследване разглеждаме и разширяваме предишната ни работа по цензуриране, типично за авто гаранционни данни. За да разрешим проблема с непълната информация за километража, използваме линеен подход в непараметрични рамки. Оценяваме средните кумулативни гаранционни разходи (за превозно средство) и стандартната им грешка като функция на възрастта, на километража и на реалното (календарно) време.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A rough set approach for attribute reduction is an important research subject in data mining and machine learning. However, most attribute reduction methods are performed on a complete decision system table. In this paper, we propose methods for attribute reduction in static incomplete decision systems and dynamic incomplete decision systems with dynamically-increasing and decreasing conditional attributes. Our methods use generalized discernibility matrix and function in tolerance-based rough sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The general knowledge of the hydrographic structure of the Southern Ocean is still rather incomplete since observations particularly in the ice covered regions are cumbersome to be carried out. But we know from the available information that thermohaline processes have large amplitudes and cover a wide range of scales in this part of the world ocean. The modification of water masses around Antarctica have indeed a worldwide impact, these processes ultimately determine the cold state of the present climate in the world ocean. We have converted efforts of the German and Russian polar research institutions to collect and validate the presently available temperature, salinity and oxygen data of the ocean south of 30°S latitude. We have carried out this work in spite of the fact that the hydrographic programme of the World Ocean Circulation Experiment (WOCE) will provide more new information in due time, but its contribution to the high latitudes of the Southern Ocean is quite sparse. The modified picture of the hydrographic structure of the Southern Ocean presented in this atlas may serve the oceanographic community in many ways and help to unravel the role of this ocean in the global climate system. This atlas could only be prepared with the altruistic assistance of many colleagues from various institutions worldwide who have provided us with their data and their advice. Their generous help is gratefully acknowledged. During two years scientists from the Arctic and Antarctic Research Institute in St. Petersburg and the Alfred Wegener Institute for Polar and Marine Research in Bremerhaven have cooperated in a fruitful way to establish the atlas and the archive of about 38749 validated hydrographic stations. We hope that both sources of information will be widely applied for future ocean studies and will serve as a reference state for global change considerations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract

Continuous variable is one of the major data types collected by the survey organizations. It can be incomplete such that the data collectors need to fill in the missingness. Or, it can contain sensitive information which needs protection from re-identification. One of the approaches to protect continuous microdata is to sum them up according to different cells of features. In this thesis, I represents novel methods of multiple imputation (MI) that can be applied to impute missing values and synthesize confidential values for continuous and magnitude data.

The first method is for limiting the disclosure risk of the continuous microdata whose marginal sums are fixed. The motivation for developing such a method comes from the magnitude tables of non-negative integer values in economic surveys. I present approaches based on a mixture of Poisson distributions to describe the multivariate distribution so that the marginals of the synthetic data are guaranteed to sum to the original totals. At the same time, I present methods for assessing disclosure risks in releasing such synthetic magnitude microdata. The illustration on a survey of manufacturing establishments shows that the disclosure risks are low while the information loss is acceptable.

The second method is for releasing synthetic continuous micro data by a nonstandard MI method. Traditionally, MI fits a model on the confidential values and then generates multiple synthetic datasets from this model. Its disclosure risk tends to be high, especially when the original data contain extreme values. I present a nonstandard MI approach conditioned on the protective intervals. Its basic idea is to estimate the model parameters from these intervals rather than the confidential values. The encouraging results of simple simulation studies suggest the potential of this new approach in limiting the posterior disclosure risk.

The third method is for imputing missing values in continuous and categorical variables. It is extended from a hierarchically coupled mixture model with local dependence. However, the new method separates the variables into non-focused (e.g., almost-fully-observed) and focused (e.g., missing-a-lot) ones. The sub-model structure of focused variables is more complex than that of non-focused ones. At the same time, their cluster indicators are linked together by tensor factorization and the focused continuous variables depend locally on non-focused values. The model properties suggest that moving the strongly associated non-focused variables to the side of focused ones can help to improve estimation accuracy, which is examined by several simulation studies. And this method is applied to data from the American Community Survey.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PURPOSE: We analyzed patients with hairy cell leukemia (HCL) to achieve a better understanding of the differentiation stage reached by HCL cells and to define the key role of the diversification of cell surface makers, especially CD25 expression. PATIENTS AND METHODS: We analyzed 38 previously untreated patients with HCL to characterize their complete (VDJ(H)) and incomplete (DJ(H)) immunoglobulin (Ig) heavy chain (IgH) rearrangements, including somatic hypermutation pattern and gene segment use. RESULTS: A correlation between immunophenotypic profile and molecular data was seen. All 38 cases showed monoclonal amplifications: VDJ(H) in 97%, DJ(H) in 42%, and both in 39%. Segments from the D(H)3 family were used more in complete compared with incomplete rearrangements (45% vs. 12%; P <.005). Furthermore, comparison between molecular and immunophenotypic characteristics disclosed differences in the expression of CD25 antigen; CD25(-) cases, a phenotype associated with HCL variant, showed complete homology to the germline in 3 of 5 cases (60%), whereas this characteristic was never observed in CD25(+) cases (P <.005). Moreover, V(H)4-34, V(H)1-08, and J(H)3 segments appeared in 2, 1, and 2 CD25(-) cases, respectively, whereas they were absent in all CD25(+) cases. CONCLUSION: These results support that HCL is a heterogeneous entity including subgroups with different molecular characteristics, which reinforces the need for additional studies with a larger number of patients to clarify the real role of gene rearrangements in HCL.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

DH-JH rearrangements of the Ig heavy-chain gene (IGH) occur early during B-cell development. Consequently, they are detected in precursor-B-cell acute lymphoblastic leukemias both at diagnosis and relapse. Incomplete DJH rearrangements have also been occasionally reported in mature B-cell lymphoproliferative disorders, but their frequency and immunobiological characteristics have not been studied in detail. We have investigated the frequency and characteristics of incomplete DJH as well as complete VDJH rearrangements in a series of 84 untreated multiple myeloma (MM) patients. The overall detection rate of clonality by amplifying VDJH and DJH rearrangements using family-specific primers was 94%. Interestingly, we found a high frequency (60%) of DJH rearrangements in this group. As expected from an immunological point of view, the vast majority of DJH rearrangements (88%) were unmutated. To the best of our knowledge, this is the first systematic study describing the incidence of incomplete DJH rearrangements in a series of unselected MM patients. These results strongly support the use of DJH rearrangements as PCR targets for clonality studies and, particularly, for quantification of minimal residual disease by real-time quantitative PCR using consensus JH probes in MM patients. The finding of hypermutation in a small proportion of incomplete DJH rearrangements (six out of 50) suggests important biological implications concerning the process of somatic hypermutation. Moreover, our data offer a new insight in the regulatory development model of IGH rearrangements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Supply Chain Simulation (SCS) is applied to acquire information to support outsourcing decisions but obtaining enough detail in key parameters can often be a barrier to making well informed decisions.
One aspect of SCS that has been relatively unexplored is the impact of inaccurate data around delays within the SC. The impact of the magnitude and variability of process cycle time on typical performance indicators in a SC context is studied.
System cycle time, WIP levels and throughput are more sensitive to the magnitude of deterministic deviations in process cycle time than variable deviations. Manufacturing costs are not very sensitive to these deviations.
Future opportunities include investigating the impact of process failure or product defects, including logistics and transportation between SC members and using alternative costing methodologies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis is actually the composition of two separate studies aimed at further understanding the role of incomplete combustion products on atmospheric chemistry. The first explores the sensitivity of black carbon (BC) forcing to aerosol vertical location since BC has an increased forcing per unit mass when it is located above reflective clouds. We used a column radiative transfer model to produce globally-averaged values of normalized direct radiative forcing (NDRF) for BC over and under different types of clouds. We developed a simple column-weighting scheme based on the mass fractions of BC that are over and under clouds in measured vertical profiles. The resulting NDRF is in good agreement with global 3-D model estimates, supporting the column-weighted model as a tool for exploring uncertainties due to diversity in vertical distribution. BC above low clouds accounts for about 20% of the global burden but 50% of the forcing. We estimate maximum-minimum spread in NDRF due to modeled profiles as about 40% and uncertainty as about 25%. Models overestimate BC in the upper troposphere compared with measurements; modeled NDRF might need to be reduced by about 15%. Redistributing BC within the lowest 4 km of the atmosphere affects modeled NDRF by only about 5% and cannot account for very high forcing estimates. The second study estimated global year 2000 carbon monoxide (CO) emissions using a traditional bottom-up inventory. We applied literature-derived emission factors to a variety of fuel and technology combinations. Combining these with regional fuel use and production data we produced CO emissions estimates that were separable by sector, fuel type, technology, and region. We estimated year 2000 stationary source emissions of 685.9 Tg/yr and 885 Tg/yr if we included adopted mobile sources from EDGAR v3.2FT2000. Open/biomass burning contributed most significantly to global CO burden, while the residential sector, primarily in Asia and Africa, were the largest contributors with respect to contained combustion sources. Industry production in Asia, including brick, cement, iron and steel-making, also contributed significantly to CO emissions. Our estimates of biofuel emissions are lower than most previously published bottom-up estimates while our other fuel emissions are generally in good agreement. Our values are also universally lower than recently estimated CO emissions from models using top-down methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Electoral researchers are so much accustomed to analyzing the choice of the single most preferred party as the left-hand side variable of their models of electoral behavior that they often ignore revealed preference data. Drawing on random utility theory, their models predict electoral behavior at the extensive margin of choice. Since the seminal work of Luce and others on individual choice behavior, however, many social science disciplines (consumer research, labor market research, travel demand, etc.) have extended their inventory of observed preference data with, for instance, multiple paired comparisons, complete or incomplete rankings, and multiple ratings. Eliciting (voter) preferences using these procedures and applying appropriate choice models is known to considerably increase the efficiency of estimates of causal factors in models of (electoral) behavior. In this paper, we demonstrate the efficiency gain when adding additional preference information to first preferences, up to full ranking data. We do so for multi-party systems of different sizes. We use simulation studies as well as empirical data from the 1972 German election study. Comparing the practical considerations for using ranking and single preference data results in suggestions for choice of measurement instruments in different multi-candidate and multi-party settings.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The article seeks to investigate patterns of performance and relationships between grip strength, gait speed and self-rated health, and investigate the relationships between them, considering the variables of gender, age and family income. This was conducted in a probabilistic sample of community-dwelling elderly aged 65 and over, members of a population study on frailty. A total of 689 elderly people without cognitive deficit suggestive of dementia underwent tests of gait speed and grip strength. Comparisons between groups were based on low, medium and high speed and strength. Self-related health was assessed using a 5-point scale. The males and the younger elderly individuals scored significantly higher on grip strength and gait speed than the female and oldest did; the richest scored higher than the poorest on grip strength and gait speed; females and men aged over 80 had weaker grip strength and lower gait speed; slow gait speed and low income arose as risk factors for a worse health evaluation. Lower muscular strength affects the self-rated assessment of health because it results in a reduction in functional capacity, especially in the presence of poverty and a lack of compensatory factors.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Obstructive sleep apnea syndrome has a high prevalence among adults. Cephalometric variables can be a valuable method for evaluating patients with this syndrome. To correlate cephalometric data with the apnea-hypopnea sleep index. We performed a retrospective and cross-sectional study that analyzed the cephalometric data of patients followed in the Sleep Disorders Outpatient Clinic of the Discipline of Otorhinolaryngology of a university hospital, from June 2007 to May 2012. Ninety-six patients were included, 45 men, and 51 women, with a mean age of 50.3 years. A total of 11 patients had snoring, 20 had mild apnea, 26 had moderate apnea, and 39 had severe apnea. The distance from the hyoid bone to the mandibular plane was the only variable that showed a statistically significant correlation with the apnea-hypopnea index. Cephalometric variables are useful tools for the understanding of obstructive sleep apnea syndrome. The distance from the hyoid bone to the mandibular plane showed a statistically significant correlation with the apnea-hypopnea index.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.