561 resultados para Data portal
Resumo:
Master data management (MDM) integrates data from multiple
structured data sources and builds a consolidated 360-
degree view of business entities such as customers and products.
Today’s MDM systems are not prepared to integrate
information from unstructured data sources, such as news
reports, emails, call-center transcripts, and chat logs. However,
those unstructured data sources may contain valuable
information about the same entities known to MDM from
the structured data sources. Integrating information from
unstructured data into MDM is challenging as textual references
to existing MDM entities are often incomplete and
imprecise and the additional entity information extracted
from text should not impact the trustworthiness of MDM
data.
In this paper, we present an architecture for making MDM
text-aware and showcase its implementation as IBM InfoSphere
MDM Extension for Unstructured Text Correlation,
an add-on to IBM InfoSphere Master Data Management
Standard Edition. We highlight how MDM benefits from
additional evidence found in documents when doing entity
resolution and relationship discovery. We experimentally
demonstrate the feasibility of integrating information from
unstructured data sources into MDM.
Resumo:
The problem of detecting spatially-coherent groups of data that exhibit anomalous behavior has started to attract attention due to applications across areas such as epidemic analysis and weather forecasting. Earlier efforts from the data mining community have largely focused on finding outliers, individual data objects that display deviant behavior. Such point-based methods are not easy to extend to find groups of data that exhibit anomalous behavior. Scan Statistics are methods from the statistics community that have considered the problem of identifying regions where data objects exhibit a behavior that is atypical of the general dataset. The spatial scan statistic and methods that build upon it mostly adopt the framework of defining a character for regions (e.g., circular or elliptical) of objects and repeatedly sampling regions of such character followed by applying a statistical test for anomaly detection. In the past decade, there have been efforts from the statistics community to enhance efficiency of scan statstics as well as to enable discovery of arbitrarily shaped anomalous regions. On the other hand, the data mining community has started to look at determining anomalous regions that have behavior divergent from their neighborhood.In this chapter,we survey the space of techniques for detecting anomalous regions on spatial data from across the data mining and statistics communities while outlining connections to well-studied problems in clustering and image segmentation. We analyze the techniques systematically by categorizing them appropriately to provide a structured birds eye view of the work on anomalous region detection;we hope that this would encourage better cross-pollination of ideas across communities to help advance the frontier in anomaly detection.
Resumo:
Perfect information is seldom available to man or machines due to uncertainties inherent in real world problems. Uncertainties in geographic information systems (GIS) stem from either vague/ambiguous or imprecise/inaccurate/incomplete information and it is necessary for GIS to develop tools and techniques to manage these uncertainties. There is a widespread agreement in the GIS community that although GIS has the potential to support a wide range of spatial data analysis problems, this potential is often hindered by the lack of consistency and uniformity. Uncertainties come in many shapes and forms, and processing uncertain spatial data requires a practical taxonomy to aid decision makers in choosing the most suitable data modeling and analysis method. In this paper, we: (1) review important developments in handling uncertainties when working with spatial data and GIS applications; (2) propose a taxonomy of models for dealing with uncertainties in GIS; and (3) identify current challenges and future research directions in spatial data analysis and GIS for managing uncertainties.
Resumo:
Current data-intensive image processing applications push traditional embedded architectures to their limits. FPGA based hardware acceleration is a potential solution but the programmability gap and time consuming HDL design flow is significant. The proposed research approach to develop “FPGA based programmable hardware acceleration platform” that uses, large number of Streaming Image processing Processors (SIPPro) potentially addresses these issues. SIPPro is pipelined in-order soft-core processor architecture with specific optimisations for image processing applications. Each SIPPro core uses 1 DSP48, 2 Block RAMs and 370 slice-registers, making the processor as compact as possible whilst maintaining flexibility and programmability. It is area efficient, scalable and high performance softcore architecture capable of delivering 530 MIPS per core using Xilinx Zynq SoC (ZC7Z020-3). To evaluate the feasibility of the proposed architecture, a Traffic Sign Recognition (TSR) algorithm has been prototyped on a Zedboard with the color and morphology operations accelerated using multiple SIPPros. Simulation and experimental results demonstrate that the processing platform is able to achieve a speedup of 15 and 33 times for color filtering and morphology operations respectively, with a significant reduced design effort and time.
Resumo:
Importance: The natural history of patients with newly diagnosed high-risk nonmetastatic (M0) prostate cancer receiving hormone therapy (HT) either alone or with standard-of-care radiotherapy (RT) is not well documented. Furthermore, no clinical trial has assessed the role of RT in patients with node-positive (N+) M0 disease. The STAMPEDE Trial includes such individuals, allowing an exploratory multivariate analysis of the impact of radical RT.
Objective: To describe survival and the impact on failure-free survival of RT by nodal involvement in these patients.
Design, Setting, and Participants: Cohort study using data collected for patients allocated to the control arm (standard-of-care only) of the STAMPEDE Trial between October 5, 2005, and May 1, 2014. Outcomes are presented as hazard ratios (HRs) with 95% CIs derived from adjusted Cox models; survival estimates are reported at 2 and 5 years. Participants were high-risk, hormone-naive patients with newly diagnosed M0 prostate cancer starting long-term HT for the first time. Radiotherapy is encouraged in this group, but mandated for patients with node-negative (N0) M0 disease only since November 2011.
Exposures: Long-term HT either alone or with RT, as per local standard. Planned RT use was recorded at entry.
Main Outcomes and Measures: Failure-free survival (FFS) and overall survival.
Results: A total of 721 men with newly diagnosed M0 disease were included: median age at entry, 66 (interquartile range [IQR], 61-72) years, median (IQR) prostate-specific antigen level of 43 (18-88) ng/mL. There were 40 deaths (31 owing to prostate cancer) with 17 months' median follow-up. Two-year survival was 96% (95% CI, 93%-97%) and 2-year FFS, 77% (95% CI, 73%-81%). Median (IQR) FFS was 63 (26 to not reached) months. Time to FFS was worse in patients with N+ disease (HR, 2.02 [95% CI, 1.46-2.81]) than in those with N0 disease. Failure-free survival outcomes favored planned use of RT for patients with both N0M0 (HR, 0.33 [95% CI, 0.18-0.61]) and N+M0 disease (HR, 0.48 [95% CI, 0.29-0.79]).
Conclusions and Relevance: Survival for men entering the cohort with high-risk M0 disease was higher than anticipated at study inception. These nonrandomized data were consistent with previous trials that support routine use of RT with HT in patients with N0M0 disease. Additionally, the data suggest that the benefits of RT extend to men with N+M0 disease.
Trial Registration: clinicaltrials.gov Identifier: NCT00268476; ISRCTN78818544.
Resumo:
This paper proposes a probabilistic principal component analysis (PCA) approach applied to islanding detection study based on wide area PMU data. The increasing probability of uncontrolled islanding operation, according to many power system operators, is one of the biggest concerns with a large penetration of distributed renewable generation. The traditional islanding detection methods, such as RoCoF and vector shift, are however extremely sensitive and may result in many unwanted trips. The proposed probabilistic PCA aims to improve islanding detection accuracy and reduce the risk of unwanted tripping based on PMU measurements, while addressing a practical issue on missing data. The reliability and accuracy of the proposed probabilistic PCA approach are demonstrated using real data recorded in the UK power system by the OpenPMU project. The results show that the proposed methods can detect islanding accurately, without being falsely triggered by generation trips, even in the presence of missing values.
Resumo:
Type 1 diabetes (T1DM) is associated with increased risk of macrovascular complications. We examined longitudinal associations of serum conventional lipids and nuclear magnetic resonance (NMR)-determined lipoprotein subclasses with carotid intima-media thickness (IMT) in adults with T1DM (n=455) enrolled in the Diabetes Control and Complications Trial (DCCT). Data on serum lipids and lipoproteins were collected at DCCT baseline (1983-89) and were correlated with common and internal carotid IMT determined by ultrasonography during the observational follow-up of the DCCT, the Epidemiology of Diabetes Interventions and Complications (EDIC) study, at EDIC 'Year 1' (199-1996) and EDIC 'Year 6' (1998-2000). This article contains data on the associations of DCCT baseline lipoprotein profiles (NMR-based VLDL & chylomicrons, IDL/LDL and HDL subclasses and 'conventional' total, LDL-, HDL-, non-HDL-cholesterol and triglycerides) with carotid IMT at EDIC Years 1 and 6, stratified by gender. The data are supplemental to our original research article describing detailed associations of DCCT baseline lipids and lipoprotein profiles with EDIC Year 12 carotid IMT (Basu et al. in press) [1].
Resumo:
The popularity of tri-axial accelerometer data loggers to quantify animal activity through the analysis of signature traces is increasing. However, there is no consensus on how to process the large data sets that these devices generate when recording at the necessary high sample rates. In addition, there have been few attempts to validate accelerometer traces with specific behaviours in non-domesticated terrestrial mammals.
Resumo:
Objective: To determine the prevalence of systemic corticosteroid-induced morbidity in severe asthma.
Design: Cross-sectional observational study.Setting The primary care Optimum Patient Care Research Database and the British Thoracic Society Difficult Asthma Registry.
Participants: Optimum Patient Care Research Database (7195 subjects in three age- and gender-matched groups)—severe asthma (Global Initiative for Asthma (GINA) treatment step 5 with four or more prescriptions/year of oral corticosteroids, n=808), mild/moderate asthma (GINA treatment step 2/3, n=3975) and non-asthma controls (n=2412). 770 subjects with severe asthma from the British Thoracic Society Difficult Asthma Registry (442 receiving daily oral corticosteroids to maintain disease control).
Main outcome measures: Prevalence rates of morbidities associated with systemic steroid exposure were evaluated and reported separately for each group.
Results: 748/808 (93%) subjects with severe asthma had one or more condition linked to systemic corticosteroid exposure (mild/moderate asthma 3109/3975 (78%), non-asthma controls 1548/2412 (64%); p<0.001 for severe asthma versus non-asthma controls). Compared with mild/moderate asthma, morbidity rates for severe asthma were significantly higher for conditions associated with systemic steroid exposure (type II diabetes 10% vs 7%, OR=1.46 (95% CI 1.11 to 1.91), p<0.01; osteoporosis 16% vs 4%, OR=5.23, (95% CI 3.97 to 6.89), p<0.001; dyspeptic disorders (including gastric/duodenal ulceration) 65% vs 34%, OR=3.99, (95% CI 3.37 to 4.72), p<0.001; cataracts 9% vs 5%, OR=1.89, (95% CI 1.39 to 2.56), p<0.001). In the British Thoracic Society Difficult Asthma Registry similar prevalence rates were found, although, additionally, high rates of osteopenia (35%) and obstructive sleep apnoea (11%) were identified.
Conclusions: Oral corticosteroid-related adverse events are common in severe asthma. New treatments which reduce exposure to oral corticosteroids may reduce the prevalence of these conditions and this should be considered in cost-effectiveness analyses of these new treatments.