980 resultados para Data Linkage
Resumo:
In the context of learning paradigms of identification in the limit, we address the question: why is uncertainty sometimes desirable? We use mind change bounds on the output hypotheses as a measure of uncertainty, and interpret ‘desirable’ as reduction in data memorization, also defined in terms of mind change bounds. The resulting model is closely related to iterative learning with bounded mind change complexity, but the dual use of mind change bounds — for hypotheses and for data — is a key distinctive feature of our approach. We show that situations exists where the more mind changes the learner is willing to accept, the lesser the amount of data it needs to remember in order to converge to the correct hypothesis. We also investigate relationships between our model and learning from good examples, set-driven, monotonic and strong-monotonic learners, as well as class-comprising versus class-preserving learnability.
Resumo:
Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.
Resumo:
We propose a model-based approach to unify clustering and network modeling using time-course gene expression data. Specifically, our approach uses a mixture model to cluster genes. Genes within the same cluster share a similar expression profile. The network is built over cluster-specific expression profiles using state-space models. We discuss the application of our model to simulated data as well as to time-course gene expression data arising from animal models on prostate cancer progression. The latter application shows that with a combined statistical/bioinformatics analyses, we are able to extract gene-to-gene relationships supported by the literature as well as new plausible relationships.
Resumo:
Data breach notification laws require organisations to notify affected persons or regulatory authorities when an unauthorised acquisition of personal data occurs. Most laws provide a safe harbour to this obligation if acquired data has been encrypted. There are three types of safe harbour: an exemption; a rebuttable presumption and factor-based analysis. We demonstrate, using three condition-based scenarios, that the broad formulation of most encryption safe harbours is based on the flawed assumption that encryption is the silver bullet for personal information protection. We then contend that reliance upon an encryption safe harbour should be dependent upon a rigorous and competent risk-based review that is required on a case-by-case basis. Finally, we recommend the use of both an encryption safe harbour and a notification trigger as our preferred choice for a data breach notification regulatory framework.
Resumo:
The advent of data breach notification laws in the United States (US) has unearthed a significant problem involving the mismanagement of personal information by a range of public and private sector organisations. At present, there is currently no statutory obligation under Australian law requiring public or private sector organisations to report a data breach of personal information to law enforcement agencies or affected persons. However, following a comprehensive review of Australian privacy law, the Australian Law Reform Commission (ALRC) has recommended the introduction of a mandatory data breach notification scheme. The issue of data breach notification has ignited fierce debate amongst stakeholders, especially larger private sector entities. The purpose of this article is to document the perspectives of key industry and government representatives to identify their standpoints regarding an appropriate regulatory approach to data breach notification in Australia.
Resumo:
Public and private sector organisations are now able to capture and utilise data on a vast scale, thus heightening the importance of adequate measures for protecting unauthorised disclosure of personal information. In this respect, data breach notification has emerged as an issue of increasing importance throughout the world. It has been the subject of law reform in the United States and in other jurisdictions. This article reviews US, Australian and EU legal developments regarding the mandatory notification of data breaches. The authors highlight areas of concern based on the extant US experience that require further consideration in Australia and in the EU.
Resumo:
Estimates of potential and actual C sequestration require areal information about various types of management activities. Forest surveys, land use data, and agricultural statistics contribute information enabling calculation of the impacts of current and historical land management on C sequestration in biomass (in forests) or in soil (in agricultural systems). Unfortunately little information exists on the distribution of various management activities that can impact soil C content in grassland systems. Limited information of this type restricts our ability to carry out bottom-up estimates of the current C balance of grasslands or to assess the potential for grasslands to act as C sinks with changes in management. Here we review currently available information about grassland management, how that information could be related to information about the impacts of management on soil C stocks, information that may be available in the future, and needs that remain to be filled before in-depth assessments may be carried out. We also evaluate constraints induced by variability in information sources within and between countries. It is readily apparent that activity data for grassland management is collected less frequently and on a coarser scale than data for forest or agricultural inventories and that grassland activity data cannot be directly translated into IPCC-type factors as is done for IPCC inventories of agricultural soils. However, those management data that are available can serve to delineate broad-scale differences in management activities within regions in which soil C is likely to change in response to changes in management. This, coupled with the distinct possibility of more intensive surveys planned in the future, may enable more accurate assessments of grassland C dynamics with higher resolution both spatially and in the number management activities.
Resumo:
We propose a digital rights management approach for sharing electronic health records in a health research facility and argue advantages of the approach. We also give an outline of the system under development and our implementation of the security features and discuss challenges that we faced and future directions.
Resumo:
This paper provides a review of the state of the art relevant work on the use of public mobile data networks for aircraft telemetry and control proposes. Moreover, it describes the characterisation for airborne uses of the public mobile data communication systems known broadly as 3G. The motivation for this study was the explore how this mature public communication systems could be used for aviation purposes. An experimental system was fitted to a light aircraft to record communication latency, line speed, RF level, packet loss and cell tower identifier. Communications was established using internet protocols and connection was made to a local server. The aircraft was flown in both remote and populous areas at altitudes up to 8500 ft in a region located in South East Queensland, Australia. Results show that the average airborne RF levels are better than those on the ground by 21% and in the order of - 77dbm. Latencies were in the order of 500ms (1/2 the latency of Iridium), an average download speed of 0.48Mb/s, average uplink speed of 0.85Mb/s, a packet of information loss of 6.5%. The maximum communication range was also observed to be 70km from a single cell station. The paper also describes possible limitations and utility of using such communications architecture for both manned and unmanned aircraft systems.
Resumo:
Many developing countries are afflicted by persistent inequality in the distribution of income. While a growing body of literature emphasizes differential fertility as a channel through which income inequality persists, this paper investigates differential child mortality – differences in the incidence of child mortality across socioeconomic groups – as a critical link in this regard. Using evidence from cross-country data to evaluate this linkage, we find that differential child mortality serves as a stronger channel than differential fertility in the transmission of income inequality over time. We use random effects and generalized estimating equations techniques to account for temporal correlation within countries. The results are robust to the use of an alternate definition of fertility that reflects parental preference for children instead of realized fertility.
Resumo:
Now in its second edition, this book describes tools that are commonly used in transportation data analysis. The first part of the text provides statistical fundamentals while the second part presents continuous dependent variable models. With a focus on count and discrete dependent variable models, the third part features new chapters on mixed logit models, logistic regression, and ordered probability models. The last section provides additional coverage of Bayesian statistical modeling, including Bayesian inference and Markov chain Monte Carlo methods. Data sets are available online to use with the modeling techniques discussed.
Resumo:
Maintenance activities in a large-scale engineering system are usually scheduled according to the lifetimes of various components in order to ensure the overall reliability of the system. Lifetimes of components can be deduced by the corresponding probability distributions with parameters estimated from past failure data. While failure data of the components is not always readily available, the engineers have to be content with the primitive information from the manufacturers only, such as the mean and standard deviation of lifetime, to plan for the maintenance activities. In this paper, the moment-based piecewise polynomial model (MPPM) are proposed to estimate the parameters of the reliability probability distribution of the products when only the mean and standard deviation of the product lifetime are known. This method employs a group of polynomial functions to estimate the two parameters of the Weibull Distribution according to the mathematical relationship between the shape parameter of two-parameters Weibull Distribution and the ratio of mean and standard deviation. Tests are carried out to evaluate the validity and accuracy of the proposed methods with discussions on its suitability of applications. The proposed method is particularly useful for reliability-critical systems, such as railway and power systems, in which the maintenance activities are scheduled according to the expected lifetimes of the system components.