25 resultados para Data anonymization and sanitization


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Access to demographic data that are complete, accurate and up-to-date is fundamental to many aspects of public health, government and academic work and for accurate interpretation of other databases. Health registration data are the prime source of demographic information for health and social care systems; for example, as an indicator of need, as a source of denominators to convert number of events into rates, or in the case of the residential address information as the basis for generating the call-recall invitation letters that are used for most screening programs (e.g. breast, colo-rectal and AAA screening). However, list inflation (ghosts, duplicates or emigrants) and a degree of address inaccuracy are recognised caveats with the health registration data and a recent NILS-related study on breast screening suggests that improved address accuracy might be a fast and efficient means of increasing screening uptake rates in cities and amongst deprived populations. In NI these data are collated by the BSO who uniquely in the UK also have access to data relating to prescribing, dental registrations and use of A&E services. These can be used to supplement the standard demographic and address information by (i) indicating patients who are alive and resident in NI and (ii) providing an independent source of probably improved address information. This study will use the NI Unique Property Reference Number (UPRN), rather than the addresses per se which are difficult to work with, to compare the addresses registered in the BSO with those addresses in the enumerated 2011 census. Assuming that the census is a more accurate source of address information for individuals, a comparison of the health registration addresses with those recorded at the census, the aim of the proposed study will be to (i) characterise the amount and distributions of these differences, (ii) to see what proportion of those who do not attend for screening did not actually receive an invitation letter because the addresses were incorrect, (iii) to determine how much of the social gradient (and urban/rural differences) in screening uptake are due to address inaccuracies, (iv) a comparison of timing of address changes at the BSO will provide information on the delays in updating of addresses.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background. The assembly of the tree of life has seen significant progress in recent years but algae and protists have been largely overlooked in this effort. Many groups of algae and protists have ancient roots and it is unclear how much data will be required to resolve their phylogenetic relationships for incorporation in the tree of life. The red algae, a group of primary photosynthetic eukaryotes of more than a billion years old, provide the earliest fossil evidence for eukaryotic multicellularity and sexual reproduction. Despite this evolutionary significance, their phylogenetic relationships are understudied. This study aims to infer a comprehensive red algal tree of life at the family level from a supermatrix containing data mined from GenBank. We aim to locate remaining regions of low support in the topology, evaluate their causes and estimate the amount of data required to resolve them. Results. Phylogenetic analysis of a supermatrix of 14 loci and 98 red algal families yielded the most complete red algal tree of life to date. Visualization of statistical support showed the presence of five poorly supported regions. Causes for low support were identified with statistics about the age of the region, data availability and node density, showing that poor support has different origins in different parts of the tree. Parametric simulation experiments yielded optimistic estimates of how much data will be needed to resolve the poorly supported regions (ca. 103 to ca. 104 nucleotides for the different regions). Nonparametric simulations gave a markedly more pessimistic image, some regions requiring more than 2.8 105 nucleotides or not achieving the desired level of support at all. The discrepancies between parametric and nonparametric simulations are discussed in light of our dataset and known attributes of both approaches. Conclusions. Our study takes the red algae one step closer to meaningful inclusion in the tree of life. In addition to the recovery of stable relationships, the recognition of five regions in need of further study is a significant outcome of this work. Based on our analyses of current availability and future requirements of data, we make clear recommendations for forthcoming research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A rapidly increasing number of Web databases are now become accessible via
their HTML form-based query interfaces. Query result pages are dynamically generated
in response to user queries, which encode structured data and are displayed for human
use. Query result pages usually contain other types of information in addition to query
results, e.g., advertisements, navigation bar etc. The problem of extracting structured data
from query result pages is critical for web data integration applications, such as comparison
shopping, meta-search engines etc, and has been intensively studied. A number of approaches
have been proposed. As the structures of Web pages become more and more complex, the
existing approaches start to fail, and most of them do not remove irrelevant contents which
may a®ect the accuracy of data record extraction. We propose an automated approach for
Web data extraction. First, it makes use of visual features and query terms to identify data
sections and extracts data records in these sections. We also represent several content and
visual features of visual blocks in a data section, and use them to ¯lter out noisy blocks.
Second, it measures similarity between data items in di®erent data records based on their
visual and content features, and aligns them into di®erent groups so that the data in the
same group have the same semantics. The results of our experiments with a large set of
Web query result pages in di®erent domains show that our proposed approaches are highly
e®ective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: Several surveillance definitions of influenza-like illness (ILI) have been proposed, based on the presence of symptoms. Symptom data can be obtained from patients, medical records, or both. Past research has found that agreements between health record data and self-report are variable depending on the specific symptom. Therefore, we aimed to explore the implications of using data on influenza symptoms extracted from medical records, similar data collected prospectively from outpatients, and the combined data from both sources as predictors of laboratory-confirmed influenza. Methods: Using data from the Hutterite Influenza Prevention Study, we calculated: 1) the sensitivity, specificity and predictive values of individual symptoms within surveillance definitions; 2) how frequently surveillance definitions correlated to laboratory-confirmed influenza; and 3) the predictive value of surveillance definitions. Results: Of the 176 participants with reports from participants and medical records, 142 (81%) were tested for influenza and 37 (26%) were PCR positive for influenza. Fever (alone) and fever combined with cough and/or sore throat were highly correlated with being PCR positive for influenza for all data sources. ILI surveillance definitions, based on symptom data from medical records only or from both medical records and self-report, were better predictors of laboratory-confirmed influenza with higher odds ratios and positive predictive values. Discussion: The choice of data source to determine ILI will depend on the patient population, outcome of interest, availability of data source, and use for clinical decision making, research, or surveillance. © Canadian Public Health Association, 2012.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

1) Executive Summary
Legislation (Autism Act NI, 2011), a cross-departmental strategy (Autism Strategy 2013-2020) and a first action plan (2013-2016) have been developed in Northern Ireland in order to support individuals and families affected by Autism Spectrum Disorder (ASD) without a prior thorough baseline assessment of need. At the same time, there are large existing data sets about the population in NI that had never been subjected to a secondary data analysis with regards to data on ASD. This report covers the first comprehensive secondary data analysis and thereby aims to inform future policy and practice.
Following a search of all existing, large-scale, regional or national data sets that were relevant to the lives of individuals and families affected by Autism Spectrum Disorder (ASD) in Northern Ireland, extensive secondary data analyses were carried out. The focus of these secondary data analyses was to distill any ASD related data from larger generic data sets. The findings are reported for each data set and follow a lifespan perspective, i.e., data related to children is reported first before data related to adults.
Key findings:
Autism Prevalence:
Of children born in 2000 in the UK,
• 0.9% (1:109) were reported to have ASD, when they were 5-year old in 2005;
• 1.8% (1:55) were reported to have ASD, when they were 7-years old in 2007;
• 3.5% (1:29) were reported to have ASD, when they were 11-year old in 2011.
In mainstream schools in Northern Ireland
• 1.2% of the children were reported to have ASD in 2006/07;
• 1.8% of the children were reported to have ASD in 2012/13.

Economic Deprivation:
• Families of children with autism (CWA) were 9%-18% worse off per week than families of children not on the autism spectrum (COA).
• Between 2006-2013 deprivation of CWA compared to COA nearly doubled as measured by eligibility for free school meals (from near 20 % to 37%)
• In 2006, CWA and COA experienced similar levels of deprivation (approx. 20%), by 2013, a considerable deprivation gap had developed, with CWA experienced 6% more deprivation than COA.
• Nearly 1/3 of primary school CWA lived in the most deprived areas in Northern Ireland.
• Nearly ½ of children with Asperger’s Syndrome who attended special school lived in the most deprived areas.

Unemployment:
• Mothers of CWA were 6% less likely to be employed than mothers of COA.
• Mothers of CWA earned 35%-56% less than mothers of COA.
• CWA were 9% less likely to live in two income families than COA.

Health:
• Pre-diagnosis, CWA were more likely than COA to have physical health problems, including walking on level ground, speech and language, hearing, eyesight, and asthma.
• Aged 3 years of age CWA experienced poorer emotional and social health than COA, this difference increased significantly by the time they were 7 years of age.
• Mothers of young CWA had lower levels of life satisfaction and poorer mental health than mothers of young COA.
Education:
• In mainstream education, children with ASD aged 11-16 years reported less satisfaction with their social relationships than COA.
• Younger children with ASD (aged 5 and 7 years) were less likely to enjoy school, were bullied more, and were more reluctant to attend school than COA.
• CWA attended school 2-3 weeks less than COA .
• Children with Asperger’s Syndrome in special schools missed the equivalent of 8-13 school days more than children with Asperger’s Syndrome in mainstream schools.
• Children with ASD attending mainstream schooling were less likely to gain 5+ GCSEs A*-C or subsequently attend university.



Further and Higher Education:
• Enrolment rates for students with ASD have risen in Further Education (FE), from 0% to 0.7%.
• Enrolment rates for students with ASD have risen in Higher Education (HE), from 0.28% to 0.45%.
• Students with ASD chose to study different subjects than students without ASD, although other factors, e.g., gender, age etc. may have played a part in subject selection.
• Students with ASD from NI were more likely than students without ASD to choose Northern Irish HE Institutions rather than study outside NI.

Participation in adult life and employment:
• A small number of adults with ASD (n=99) have benefitted from DES employment provision over the past 12 years.
• It is unknown how many adults with ASD have received employment support elsewhere (e.g. Steps to Work).

Awareness and Attitudes in the General Population:
• In both the 2003 and 2012 NI Life and Times Survey (NILTS), NI public reported positive attitudes towards the inclusion of children with ASD in mainstream education (see also BASE Project Vol. 2).

Gap Analysis Recommendations:
This was the first comprehensive secondary analysis with regards to ASD of existing large-scale data sets in Northern Ireland. Data gaps were identified and further replications would benefit from the following data inclusion:
• ASD should be recorded routinely in the following datasets:
o Census;
o Northern Ireland Survey of Activity Limitation (NISALD);
o Training for Success/Steps to work; Steps to Success;
o Travel survey;
o Hate crime; and
o Labour Force Survey.
Data should be collected on the destinations/qualifications of special school leavers.
• NILT Survey autism module should be repeated in 5 years time (2017) (see full report of 1st NILT Survey autism module 2012 in BASE Project Report Volume 2).
• General public attitudes and awareness should be assessed for children and young people, using the Young Life and Times Survey (YLT) and the Kids Life and Times Survey (KLT); (this work is underway, Dillenburger, McKerr, Schubolz, & Lloyd, 2014-2015).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel method of audio-visual fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there is a limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new representation and a modified cosine similarity are introduced for combining and comparing bimodal features with limited training data as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal data set created from the SPIDRE and AR databases with variable noise corruption of speech and occlusion in the face images. The new method has demonstrated improved recognition accuracy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Using new biomarker data from the 2010 pilot round of the Longitudinal Aging Study in India (LASI), we investigate education, gender, and state-level disparities in health. We find that hemoglobin level, a marker for anemia, is lower for respondents with no schooling (0.7 g/dL less in the adjusted model) compared to those with some formal education and is also lower for females than for males (2.0 g/dL less in the adjusted model). In addition, we find that about one third of respondents in our sample aged 45 or older have high C-reaction protein (CRP) levels (>3 mg/L), an indicator of inflammation and a risk factor for cardiovascular disease. We find no evidence of educational or gender differences in CRP, but there are significant state-level disparities, with Kerala residents exhibiting the lowest CRP levels (a mean of 1.96 mg/L compared to 3.28 mg/L in Rajasthan, the state with the highest CRP). We use the Blinder–Oaxaca decomposition approach to explain group-level differences, and find that state-level disparities in CRP are mainly due to heterogeneity in the association of the observed characteristics of respondents with CRP, rather than differences in the distribution of endowments across the sampled state populations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Perfect information is seldom available to man or machines due to uncertainties inherent in real world problems. Uncertainties in geographic information systems (GIS) stem from either vague/ambiguous or imprecise/inaccurate/incomplete information and it is necessary for GIS to develop tools and techniques to manage these uncertainties. There is a widespread agreement in the GIS community that although GIS has the potential to support a wide range of spatial data analysis problems, this potential is often hindered by the lack of consistency and uniformity. Uncertainties come in many shapes and forms, and processing uncertain spatial data requires a practical taxonomy to aid decision makers in choosing the most suitable data modeling and analysis method. In this paper, we: (1) review important developments in handling uncertainties when working with spatial data and GIS applications; (2) propose a taxonomy of models for dealing with uncertainties in GIS; and (3) identify current challenges and future research directions in spatial data analysis and GIS for managing uncertainties.