14 resultados para INCOMPLETE-DATA
em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast
Resumo:
This paper addresses the estimation of parameters of a Bayesian network from incomplete data. The task is usually tackled by running the Expectation-Maximization (EM) algorithm several times in order to obtain a high log-likelihood estimate. We argue that choosing the maximum log-likelihood estimate (as well as the maximum penalized log-likelihood and the maximum a posteriori estimate) has severe drawbacks, being affected both by overfitting and model uncertainty. Two ideas are discussed to overcome these issues: a maximum entropy approach and a Bayesian model averaging approach. Both ideas can be easily applied on top of EM, while the entropy idea can be also implemented in a more sophisticated way, through a dedicated non-linear solver. A vast set of experiments shows that these ideas produce significantly better estimates and inferences than the traditional and widely used maximum (penalized) log-likelihood and maximum a posteriori estimates. In particular, if EM is adopted as optimization engine, the model averaging approach is the best performing one; its performance is matched by the entropy approach when implemented using the non-linear solver. The results suggest that the applicability of these ideas is immediate (they are easy to implement and to integrate in currently available inference engines) and that they constitute a better way to learn Bayesian network parameters.
Resumo:
Retrospective clinical datasets are often characterized by a relatively small sample size and many missing data. In this case, a common way for handling the missingness consists in discarding from the analysis patients with missing covariates, further reducing the sample size. Alternatively, if the mechanism that generated the missing allows, incomplete data can be imputed on the basis of the observed data, avoiding the reduction of the sample size and allowing methods to deal with complete data later on. Moreover, methodologies for data imputation might depend on the particular purpose and might achieve better results by considering specific characteristics of the domain. The problem of missing data treatment is studied in the context of survival tree analysis for the estimation of a prognostic patient stratification. Survival tree methods usually address this problem by using surrogate splits, that is, splitting rules that use other variables yielding similar results to the original ones. Instead, our methodology consists in modeling the dependencies among the clinical variables with a Bayesian network, which is then used to perform data imputation, thus allowing the survival tree to be applied on the completed dataset. The Bayesian network is directly learned from the incomplete data using a structural expectation–maximization (EM) procedure in which the maximization step is performed with an exact anytime method, so that the only source of approximation is due to the EM formulation itself. On both simulated and real data, our proposed methodology usually outperformed several existing methods for data imputation and the imputation so obtained improved the stratification estimated by the survival tree (especially with respect to using surrogate splits).
Resumo:
SYSTEMATIC REVIEW AND META-ANALYSIS: EFFECTS OF WALKING EXERCISE IN CHRONIC MUSCULOSKELETAL PAIN O'Connor S.R.1, Tully M.A.2, Ryan B.3, Baxter D.G.3, Bradley J.M.1, McDonough S.M.11University of Ulster, Health & Rehabilitation Sciences Research Institute, Newtownabbey, United Kingdom, 2Queen's University, UKCRC Centre of Excellence for Public Health (NI), Belfast, United Kingdom, 3University of Otago, Centre for Physiotherapy Research, Dunedin, New ZealandPurpose: To examine the effects of walking exercise on pain and self-reported function in adults with chronic musculoskeletal pain.Relevance: Chronic musculoskeletal pain is a major cause of morbidity, exerting a substantial influence on long-term health status and overall quality of life. Current treatment recommendations advocate various aerobic exercise interventions for such conditions. Walking may represent an ideal form of exercise due to its relatively low impact. However, there is currently limited evidence for its effectiveness.Participants: Not applicable.Methods: A comprehensive search strategy was undertaken by two independent reviewers according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) and the recommendations of the Cochrane Musculoskeletal Review Group. Six electronic databases (Medline, CINAHL, PsychINFO, PEDro, Sport DISCUS and the Cochrane Central Register of Controlled Trials) were searched for relevant papers published up to January 2010 using MeSH terms. All randomised or non-randomised studies published in full were considered for inclusion. Studies were required to include adults aged 18 years or over with a diagnosis of chronic low back pain, osteoarthritis or fibromyalgia. Studies were excluded if they involved peri-operative or post-operative interventions or did not include a comparative, non exercise or non-walking exercise control group. The U.S. Preventative Services Task Force system was used to assess methodological quality. Data for pain and self-reported function were extracted and converted to a score out of 100.Analysis: Data were pooled and analyzed using RevMan (v.5.0.24). Statistical heterogeneity was assessed using the X2 and I2 test statistics. A random effects model was used to calculate the mean differences and 95% CIs. Data were analyzed by length of final follow-up which was categorized as short (≤8 weeks post randomisation), mid (2-12 months) or long-term (>12 months).Results: A total of 4324 articles were identified and twenty studies (1852 participants) meeting the inclusion criteria were included in the review. Overall, studies were judged to be of at least fair methodological quality. The most common sources of likely bias were identified as lack of concealed allocation and failure to adequately address incomplete data. Data from 12 studies were suitable for meta-analysis. Walking led to reductions in pain at short (<8 weeks post randomisation) (-8.44 [-14.54, -2.33]) and mid-term (>8 weeks - 12 month) follow-up (-9.28 [-16.34, -2.22]). No effect was observed for long-term (>12 month) data (-2.49 [-7.62, 2.65]). For function, between group differences were observed for short (-11.57 [-16.06, -7.08]) and mid-term data (-13.26 [-16.91, -9.62]). A smaller effect was also observed at long-term follow-up (-5.60 [-7.70, -3.50]).Conclusions: Walking interventions were associated with statistically significant improvements in pain and function at short and mid-term follow-up. Long-term data were limited but indicated that these effects do not appear to be maintained beyond twelve months.Implications: Walking may be an effective form of exercise for individuals with chronic musculoskeletal pain. However, further research is required which examines longer term follow-up and dose-response issues in this population.Key-words: 1. Walking exercise 2. Musculoskeletal pain 3. Systematic reviewFunding acknowledgements: Department of Employment and Learning, Northern Ireland.Ethics approval: Not applicable.
Resumo:
The trophic link density and the stability of food webs are thought to be related, but the nature of this relation is controversial. This article introduces a method for estimating the link density from diet tables which do not cover the complete food web and do not resolve all diet items to species level. A simple formula for the error of this estimate is derived. Link density is determined as a function of a threshold diet fraction below which diet items are ignored (
Resumo:
TAP pulse responses are normally analysed using moments, which are integrals of the full TAP pulse response. However, in some cases the entire pulse response may not be recorded due to technical reasons, thereby compromising any data analysis due to moments generated from incomplete pulse responses. The current work discloses the development of a function which mathematically expands the tail of a TAP pulse response, so that the TAP data analysis can be accurately conducted. This newly developed analysis method has been applied to the oxidative dehydrogenation of ethane over Co–Cr–Sn–WOx/α-Al2O3 and Co–Cr–Sn–WOx/α-Al2O3 catalysts as a case study.
Resumo:
Background: Pedigree reconstruction using genetic analysis provides a useful means to estimate fundamental population biology parameters relating to population demography, trait heritability and individual fitness when combined with other sources of data. However, there remain limitations to pedigree reconstruction in wild populations, particularly in systems where parent-offspring relationships cannot be directly observed, there is incomplete sampling of individuals, or molecular parentage inference relies on low quality DNA from archived material. While much can still be inferred from incomplete or sparse pedigrees, it is crucial to evaluate the quality and power of available genetic information a priori to testing specific biological hypotheses. Here, we used microsatellite markers to reconstruct a multi-generation pedigree of wild Atlantic salmon (Salmo salar L.) using archived scale samples collected with a total trapping system within a river over a 10 year period. Using a simulation-based approach, we determined the optimal microsatellite marker number for accurate parentage assignment, and evaluated the power of the resulting partial pedigree to investigate important evolutionary and quantitative genetic characteristics of salmon in the system.
Results: We show that at least 20 microsatellites (ave. 12 alleles/locus) are required to maximise parentage assignment and to improve the power to estimate reproductive success and heritability in this study system. We also show that 1.5 fold differences can be detected between groups simulated to have differing reproductive success, and that it is possible to detect moderate heritability values for continuous traits (h(2) similar to 0.40) with more than 80% power when using 28 moderately to highly polymorphic markers.
Conclusion: The methodologies and work flow described provide a robust approach for evaluating archived samples for pedigree-based research, even where only a proportion of the total population is sampled. The results demonstrate the feasibility of pedigree-based studies to address challenging ecological and evolutionary questions in free-living populations, where genealogies can be traced only using molecular tools, and that significant increases in pedigree assignment power can be achieved by using higher numbers of markers.
Resumo:
Master data management (MDM) integrates data from multiple
structured data sources and builds a consolidated 360-
degree view of business entities such as customers and products.
Today’s MDM systems are not prepared to integrate
information from unstructured data sources, such as news
reports, emails, call-center transcripts, and chat logs. However,
those unstructured data sources may contain valuable
information about the same entities known to MDM from
the structured data sources. Integrating information from
unstructured data into MDM is challenging as textual references
to existing MDM entities are often incomplete and
imprecise and the additional entity information extracted
from text should not impact the trustworthiness of MDM
data.
In this paper, we present an architecture for making MDM
text-aware and showcase its implementation as IBM InfoSphere
MDM Extension for Unstructured Text Correlation,
an add-on to IBM InfoSphere Master Data Management
Standard Edition. We highlight how MDM benefits from
additional evidence found in documents when doing entity
resolution and relationship discovery. We experimentally
demonstrate the feasibility of integrating information from
unstructured data sources into MDM.
Resumo:
Perfect information is seldom available to man or machines due to uncertainties inherent in real world problems. Uncertainties in geographic information systems (GIS) stem from either vague/ambiguous or imprecise/inaccurate/incomplete information and it is necessary for GIS to develop tools and techniques to manage these uncertainties. There is a widespread agreement in the GIS community that although GIS has the potential to support a wide range of spatial data analysis problems, this potential is often hindered by the lack of consistency and uniformity. Uncertainties come in many shapes and forms, and processing uncertain spatial data requires a practical taxonomy to aid decision makers in choosing the most suitable data modeling and analysis method. In this paper, we: (1) review important developments in handling uncertainties when working with spatial data and GIS applications; (2) propose a taxonomy of models for dealing with uncertainties in GIS; and (3) identify current challenges and future research directions in spatial data analysis and GIS for managing uncertainties.
Resumo:
PURPOSE: We analyzed patients with hairy cell leukemia (HCL) to achieve a better understanding of the differentiation stage reached by HCL cells and to define the key role of the diversification of cell surface makers, especially CD25 expression. PATIENTS AND METHODS: We analyzed 38 previously untreated patients with HCL to characterize their complete (VDJ(H)) and incomplete (DJ(H)) immunoglobulin (Ig) heavy chain (IgH) rearrangements, including somatic hypermutation pattern and gene segment use. RESULTS: A correlation between immunophenotypic profile and molecular data was seen. All 38 cases showed monoclonal amplifications: VDJ(H) in 97%, DJ(H) in 42%, and both in 39%. Segments from the D(H)3 family were used more in complete compared with incomplete rearrangements (45% vs. 12%; P <.005). Furthermore, comparison between molecular and immunophenotypic characteristics disclosed differences in the expression of CD25 antigen; CD25(-) cases, a phenotype associated with HCL variant, showed complete homology to the germline in 3 of 5 cases (60%), whereas this characteristic was never observed in CD25(+) cases (P <.005). Moreover, V(H)4-34, V(H)1-08, and J(H)3 segments appeared in 2, 1, and 2 CD25(-) cases, respectively, whereas they were absent in all CD25(+) cases. CONCLUSION: These results support that HCL is a heterogeneous entity including subgroups with different molecular characteristics, which reinforces the need for additional studies with a larger number of patients to clarify the real role of gene rearrangements in HCL.
Resumo:
DH-JH rearrangements of the Ig heavy-chain gene (IGH) occur early during B-cell development. Consequently, they are detected in precursor-B-cell acute lymphoblastic leukemias both at diagnosis and relapse. Incomplete DJH rearrangements have also been occasionally reported in mature B-cell lymphoproliferative disorders, but their frequency and immunobiological characteristics have not been studied in detail. We have investigated the frequency and characteristics of incomplete DJH as well as complete VDJH rearrangements in a series of 84 untreated multiple myeloma (MM) patients. The overall detection rate of clonality by amplifying VDJH and DJH rearrangements using family-specific primers was 94%. Interestingly, we found a high frequency (60%) of DJH rearrangements in this group. As expected from an immunological point of view, the vast majority of DJH rearrangements (88%) were unmutated. To the best of our knowledge, this is the first systematic study describing the incidence of incomplete DJH rearrangements in a series of unselected MM patients. These results strongly support the use of DJH rearrangements as PCR targets for clonality studies and, particularly, for quantification of minimal residual disease by real-time quantitative PCR using consensus JH probes in MM patients. The finding of hypermutation in a small proportion of incomplete DJH rearrangements (six out of 50) suggests important biological implications concerning the process of somatic hypermutation. Moreover, our data offer a new insight in the regulatory development model of IGH rearrangements.
Resumo:
Supply Chain Simulation (SCS) is applied to acquire information to support outsourcing decisions but obtaining enough detail in key parameters can often be a barrier to making well informed decisions.
One aspect of SCS that has been relatively unexplored is the impact of inaccurate data around delays within the SC. The impact of the magnitude and variability of process cycle time on typical performance indicators in a SC context is studied.
System cycle time, WIP levels and throughput are more sensitive to the magnitude of deterministic deviations in process cycle time than variable deviations. Manufacturing costs are not very sensitive to these deviations.
Future opportunities include investigating the impact of process failure or product defects, including logistics and transportation between SC members and using alternative costing methodologies.