48 resultados para heterogeneous data sources
em University of Queensland eSpace - Australia
Resumo:
Objective To compare mortality burden estimates based on direct measurement of levels and causes in communities with indirect estimates based on combining health facility cause-specific mortality structures with community measurement of mortality levels. Methods. Data from sentinel vital registration (SVR) with verbal autopsy (VA) were used to determine the cause-specific mortality burden at the community level in two areas of the United Republic of Tanzania. Proportional cause-specific mortality structures from health facilities were applied to counts of deaths obtained by SVR to produce modelled estimates. The burden was expressed in years of life lost. Findings. A total of 2884 deaths were recorded from health facilities and 2167 recorded from SVR/VAs. In the perinatal and neonatal age group cause-specific mortality rates were dominated by perinatal conditions and stillbirths in both the community and the facility data. The modelled estimates for chronic causes were very similar to those from SVR/VA. Acute febrile illnesses were coded more specifically in the facility data than in the VA. Injuries were more prevalent in the SVR/VA data than in that from the facilities. Conclusion. In this setting, improved International classification of diseases and health related problems, tenth revision (ICD-10) coding practices and applying facility-based cause structures to counts of deaths from communities, derived from SVR, appears to produce reasonable estimates of the cause-specific mortality burden in those aged 5 years and older determined directly from VA. For the perinatal and neonatal age group, VA appears to be required. Use of this approach in a nationally representative sample of facilities may produce reliable national estimates of the cause-specific mortality burden for leading causes of death in adults.
Resumo:
Retrieving large amounts of information over wide area networks, including the Internet, is problematic due to issues arising from latency of response, lack of direct memory access to data serving resources, and fault tolerance. This paper describes a design pattern for solving the issues of handling results from queries that return large amounts of data. Typically these queries would be made by a client process across a wide area network (or Internet), with one or more middle-tiers, to a relational database residing on a remote server. The solution involves implementing a combination of data retrieval strategies, including the use of iterators for traversing data sets and providing an appropriate level of abstraction to the client, double-buffering of data subsets, multi-threaded data retrieval, and query slicing. This design has recently been implemented and incorporated into the framework of a commercial software product developed at Oracle Corporation.
Resumo:
A data warehouse is a data repository which collects and maintains a large amount of data from multiple distributed, autonomous and possibly heterogeneous data sources. Often the data is stored in the form of materialized views in order to provide fast access to the integrated data. One of the most important decisions in designing a data warehouse is the selection of views for materialization. The objective is to select an appropriate set of views that minimizes the total query response time with the constraint that the total maintenance time for these materialized views is within a given bound. This view selection problem is totally different from the view selection problem under the disk space constraint. In this paper the view selection problem under the maintenance time constraint is investigated. Two efficient, heuristic algorithms for the problem are proposed. The key to devising the proposed algorithms is to define good heuristic functions and to reduce the problem to some well-solved optimization problems. As a result, an approximate solution of the known optimization problem will give a feasible solution of the original problem. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
Integrating information in the molecular biosciences involves more than the cross-referencing of sequences or structures. Experimental protocols, results of computational analyses, annotations and links to relevant literature form integral parts of this information, and impart meaning to sequence or structure. In this review, we examine some existing approaches to integrating information in the molecular biosciences. We consider not only technical issues concerning the integration of heterogeneous data sources and the corresponding semantic implications, but also the integration of analytical results. Within the broad range of strategies for integration of data and information, we distinguish between platforms and developments. We discuss two current platforms and six current developments, and identify what we believe to be their strengths and limitations. We identify key unsolved problems in integrating information in the molecular biosciences, and discuss possible strategies for addressing them including semantic integration using ontologies, XML as a data model, and graphical user interfaces as integrative environments.
Resumo:
Objective: To illustrate methodological issues involved in estimating dietary trends in populations using data obtained from various sources in Australia in the 1980s and 1990s. Methods: Estimates of absolute and relative change in consumption of selected food items were calculated using national data published annually on the national food supply for 1982-83 to 1992-93 and responses to food frequency questions in two population based risk factor surveys in 1983 and 1994 in the Hunter Region of New South Wales, Australia. The validity of estimated food quantities obtained from these inexpensive sources at the beginning of the period was assessed by comparison with data from a national dietary survey conducted in 1983 using 24 h recall. Results: Trend estimates from the food supply data and risk factor survey data were in good agreement for increases in consumption of fresh fruit, vegetables and breakfast food and decreases in butter, margarine, sugar and alcohol. Estimates for trends in milk, eggs and bread consumption, however, were inconsistent. Conclusions: Both data sources can be used for monitoring progress towards national nutrition goals based on selected food items provided that some limitations are recognized. While data collection methods should be consistent over time they also need to allow for changes in the food supply (for example the introduction of new varieties such as low-fat dairy products). From time to time the trends derived from these inexpensive data sources should be compared with data derived from more detailed and quantitative estimates of dietary intake.
Resumo:
There are two main types of data sources of income distributions in China: household survey data and grouped data. Household survey data are typically available for isolated years and individual provinces. In comparison, aggregate or grouped data are typically available more frequently and usually have national coverage. In principle, grouped data allow investigation of the change of inequality over longer, continuous periods of time, and the identification of patterns of inequality across broader regions. Nevertheless, a major limitation of grouped data is that only mean (average) income and income shares of quintile or decile groups of the population are reported. Directly using grouped data reported in this format is equivalent to assuming that all individuals in a quintile or decile group have the same income. This potentially distorts the estimate of inequality within each region. The aim of this paper is to apply an improved econometric method designed to use grouped data to study income inequality in China. A generalized beta distribution is employed to model income inequality in China at various levels and periods of time. The generalized beta distribution is more general and flexible than the lognormal distribution that has been used in past research, and also relaxes the assumption of a uniform distribution of income within quintile and decile groups of populations. The paper studies the nature and extent of inequality in rural and urban China over the period 1978 to 2002. Income inequality in the whole of China is then modeled using a mixture of province-specific distributions. The estimated results are used to study the trends in national inequality, and to discuss the empirical findings in the light of economic reforms, regional policies, and globalization of the Chinese economy.
Resumo:
In the wake of findings from the Bundaberg Hospital and Forster inquiries in Queensland, periodic public release of hospital performance reports has been recommended. A process for developing and releasing such reports is being established by Queensland Health, overseen by an independent expert panel. This recommendation presupposes that public reports based on routinely collected administrative data are accurate; that the public can access, correctly interpret and act upon report contents; that reports motivate hospital clinicians and managers to improve quality of care; and that there are no unintended adverse effects of public reporting. Available research suggests that primary data sources are often inaccurate and incomplete, that reports have low predictive value in detecting outlier hospitals, and that users experience difficulty in accessing and interpreting reports and tend to distrust their findings.
Resumo:
In recent years many real time applications need to handle data streams. We consider the distributed environments in which remote data sources keep on collecting data from real world or from other data sources, and continuously push the data to a central stream processor. In these kinds of environments, significant communication is induced by the transmitting of rapid, high-volume and time-varying data streams. At the same time, the computing overhead at the central processor is also incurred. In this paper, we develop a novel filter approach, called DTFilter approach, for evaluating the windowed distinct queries in such a distributed system. DTFilter approach is based on the searching algorithm using a data structure of two height-balanced trees, and it avoids transmitting duplicate items in data streams, thus lots of network resources are saved. In addition, theoretical analysis of the time spent in performing the search, and of the amount of memory needed is provided. Extensive experiments also show that DTFilter approach owns high performance.
Resumo:
Objective: This paper examines trends in the rate of suicide among young Australians aged 15-24 years from 1964 to 1997 and presents an age-period-cohort analysis of these trends. Method: Study design consisted of an age-period-cohort analysis of suicide mortality in Australian youth aged between 15 and 24 for the years 1964-1997 inclusive. Data sources were Australian Bureau of Statistics data on: numbers of deaths due to suicide by gender and age at death; and population at risk in each of eight birth cohorts (1940-1944, 1945-1949, 1950-1954, 1955-1959, 1960-1964, 1965-1969, 1970-1974, and 1975-1979). Main outcome measures were population rates of deaths among males and females in each birth cohort attributed to suicide in each year 1964-1997. Results: The rate of suicide deaths among Australian males aged 15-24 years increased from 8.7 per 100 000 in 1964 to 30.9 per 100 000 in 1997, with the rate among females changing little over the period, from 5.2 per 100 000 in 1964 to 7.1 per 100 000 in 1997. While the rate of deaths attributed to suicide increased over the birth cohorts, analyses revealed that these increases were largely due to period effects, with suicide twice as likely among those aged 15-24 years in 1985-1997 than between 1964 and 1969. Conclusions: The rate of youth suicide in Australia has increased since 1964, particularly among males. This increase can largely be attributed to period effects rather than to a cohort effect and has been paralleled by an increased rate of youth suicides internationally and by an increase in other psychosocial problems including psychiatric illness, criminal offending and substance use disorders.
Resumo:
Objective To determine the accuracy of the whispered voice test in detecting hearing impairment in adults and children. Design Systematic review of studies of test accuracy. Data sources Medline, Embase, Science Citation Index, unpublished theses, manual searching of bibliographies of known primary and review articles, and contact with authors. Study selection Two reviewers independently selected and extracted data on study characteristics, quality, and accuracy of studies. Studies were included if they had cross sectional designs, at least one of the index tests was the whispered voice test, and the reference test (audiometry) was performed on at least 80% of the participants. Data extraction Data were used to form 2x2 contingency tables with hearing impairment by audiometry as the reference standard. Data synthesis The eight studies that were found used six different techniques. The sensitivity in the four adult studies was 90% or 100% and the specificity was 70% to 87%. The sensitivity in the four childhood studies ranged from 80% to 96% and specificity ranged from 90% to 98%. Conclusion The whispered voice test is a simple and accurate test for detecting hearing impairment. There is some concern regarding the lower sensitivity in children and the overall reproducibility of the test, particularly in primary care settings. Further studies should be conducted in primary care settings to explore the influence of components of the testing procedure to optimise test sensitivity and to promote standardisation of the testing procedure.
Resumo:
Allergies represent a significant medical and industrial problem. Molecular and clinical data on allergens are growing exponentially and in this article we have reviewed nine specialized allergen databases and identified data sources related to protein allergens contained in general purpose molecular databases. An analysis of allergens contained in public databases indicates a high level of redundancy of entries and a relatively low coverage of allergens by individual databases. From this analysis we identify current database needs for allergy research and, in particular, highlight the need for a centralized reference allergen database.
Resumo:
Objective To assess how well B-type natriuretic peptide (BNP) predicts prognosis in patients with heart failure. Design Systematic review of studies assessing BNP for prognosis m patients with heart failure or asymptomatic patients. Data sources Electronic searches of Medline and Embase from January 1994 to March 2004 and reference lists of included studies. Study selection and data extraction We included all studies that estimated the relation between BNP measurement and the risk of death, cardiac death, sudden death, or cardiovascular event in patients with heart failure or asymptomatic patients, including initial values and changes in values in response to treatment. Multivariable models that included both BNP and left ventricular ejection fraction as predictors were used to compare the prognostic value of each variable. Two reviewers independently selected studies and extracted data. Data synthesis 19 studies used BNP to estimate the relative risk of death or cardiovascular events in heart failure patients and five studies in asymptomatic patients. In heart failure patients, each 100 pg/ml increase was associated with a 35% increase in the relative risk of death. BNP was used in 35 multivariable models of prognosis. In nine of the models, it was the only variable to reach significance-that is, other variables contained no prognostic information beyond that of BNP. Even allowing for the scale of the variables, it seems to be a strong indicator of risk. Conclusion Although systematic reviews of prognostic studies have inherent difficulties, including die possibility of publication bias, the results of the studies in this review show that BNP is a strong prognostic indicator for both asymptomatic patients mid for patients with heart failure at all stages of disease.
Resumo:
Objective To determine the costs and benefits of interventions for maternal and newborn health to assess the appropriateness of current strategies and guide future plans to attain the millennium development goals. Design Cost effectiveness analysis. Setting Two regions classified by the World Health Organization according to their epidemiological grouping: Afr-E, those countries in sub-Saharan Africa with very high adult and high child mortality, and Sear-D, comprising countries in South East Asia with high adult and high child mortality. Data sources Effectiveness data from several sources, including trials, observational studies, and expert opinion. For resource inputs, quantifies came from WHO guidelines, literature, and expert opinion, and prices from the WHO choosing interventions that are cost effective database. Main outcome measures Cost per disability adjusted life year (DALY) averted in year 2000 international dollars. Results The most cost effective mix of interventions was similar in Afr-E and Sear-D. These were the community based newborn care package, followed by antenatal care (tetanus toxoid, screening for pre-eclampsia, screening and treatment of asymptomatic bacteriuria and syphilis); skilled attendance at birth, offering first level maternal and neonatal care around childbirth; and emergency obstetric and neonatal care around and after birth. Screening and treatment of maternal syphilis, community based management of neonatal pneumonia, and steroids given during the antenatal period were relatively less cost effective in Sear-D. Scaling up all of the included interventions to 95% coverage would halve neonatal and maternal deaths. Conclusion Preventive interventions at the community level for newborn babies and at the primary care level for mothers and newborn babies are extremely cost effective, but the millennium development goals for maternal and child health will not be achieved without universal access to clinical services as well.
Resumo:
Objective. To examine possible risk factors in post-stroke depression (PSD) other than site of lesion in the brain Data sources. 191 first-ever stroke patients were examined physically shortly after their stroke and examined psychiatrically and physically 4 months post-stroke. Setting. A geographically defined segment of the metropolitan area of Perth, Western Australia, from which all strokes over a course of 18 months were examined (the Perth Community Stroke Study). Measures. Psychiatric Assessment Schedule, Mini Mental State Examination, Barthel Index, Frenchay Activities Index, physical illness and sociodemographic data were collected. Post-stroke depression (PSD) included both major depression and minor depression (dysthymia without the 2-year time stipulation) according to DSM-III (American Psychiatric Association) criteria. Patients depressed at the time of the stroke were excluded. Patients. 191 first-ever stroke patients, 111M, 80F, 28% had PSD, 17% major and 11% minor depression. Results. Significant associations with PSD at 4 months were major functional impairment, living in a nursing home, being divorced and having a high pre-stroke alcohol intake (M only). There was no significant association with age, sex, social class, cognitive impairment or pre-stroke physical illness. Conclusion. Results favoured the hypothesis that depression in an unselected group of stroke patients is no more common, and of no more specific aetiology, than it is among elderly patients with other physical illness.
Resumo:
Formal Concept Analysis is an unsupervised machine learning technique that has successfully been applied to document organisation by considering documents as objects and keywords as attributes. The basic algorithms of Formal Concept Analysis then allow an intelligent information retrieval system to cluster documents according to keyword views. This paper investigates the scalability of this idea. In particular we present the results of applying spatial data structures to large datasets in formal concept analysis. Our experiments are motivated by the application of the Formal Concept Analysis idea of a virtual filesystem [11,17,15]. In particular the libferris [1] Semantic File System. This paper presents customizations to an RD-Tree Generalized Index Search Tree based index structure to better support the application of Formal Concept Analysis to large data sources.