803 resultados para data driven approach
Resumo:
This paper studies single-channel speech separation, assuming unknown, arbitrary temporal dynamics for the speech signals to be separated. A data-driven approach is described, which matches each mixed speech segment against a composite training segment to separate the underlying clean speech segments. To advance the separation accuracy, the new approach seeks and separates the longest mixed speech segments with matching composite training segments. Lengthening the mixed speech segments to match reduces the uncertainty of the constituent training segments, and hence the error of separation. For convenience, we call the new approach Composition of Longest Segments, or CLOSE. The CLOSE method includes a data-driven approach to model long-range temporal dynamics of speech signals, and a statistical approach to identify the longest mixed speech segments with matching composite training segments. Experiments are conducted on the Wall Street Journal database, for separating mixtures of two simultaneous large-vocabulary speech utterances spoken by two different speakers. The results are evaluated using various objective and subjective measures, including the challenge of large-vocabulary continuous speech recognition. It is shown that the new separation approach leads to significant improvement in all these measures.
Resumo:
As our world becomes increasingly interconnected, diseases can spread at a faster and faster rate. Recent years have seen large-scale influenza, cholera and ebola outbreaks and failing to react in a timely manner to outbreaks leads to a larger spread and longer persistence of the outbreak. Furthermore, diseases like malaria, polio and dengue fever have been eliminated in some parts of the world but continue to put a substantial burden on countries in which these diseases are still endemic. To reduce the disease burden and eventually move towards countrywide elimination of diseases such as malaria, understanding human mobility is crucial for both planning interventions as well as estimation of the prevalence of the disease. In this talk, I will discuss how various data sources can be used to estimate human movements, population distributions and disease prevalence as well as the relevance of this information for intervention planning. Particularly anonymised mobile phone data has been shown to be a valuable source of information for countries with unreliable population density and migration data and I will present several studies where mobile phone data has been used to derive these measures.
Resumo:
Vehicle activated signs (VAS) display a warning message when drivers exceed a particular threshold. VAS are often installed on local roads to display a warning message depending on the speed of the approaching vehicles. VAS are usually powered by electricity; however, battery and solar powered VAS are also commonplace. This thesis investigated devel-opment of an automatic trigger speed of vehicle activated signs in order to influence driver behaviour, the effect of which has been measured in terms of reduced mean speed and low standard deviation. A comprehen-sive understanding of the effectiveness of the trigger speed of the VAS on driver behaviour was established by systematically collecting data. Specif-ically, data on time of day, speed, length and direction of the vehicle have been collected for the purpose, using Doppler radar installed at the road. A data driven calibration method for the radar used in the experiment has also been developed and evaluated. Results indicate that trigger speed of the VAS had variable effect on driv-ers’ speed at different sites and at different times of the day. It is evident that the optimal trigger speed should be set near the 85th percentile speed, to be able to lower the standard deviation. In the case of battery and solar powered VAS, trigger speeds between the 50th and 85th per-centile offered the best compromise between safety and power consump-tion. Results also indicate that different classes of vehicles report differ-ences in mean speed and standard deviation; on a highway, the mean speed of cars differs slightly from the mean speed of trucks, whereas a significant difference was observed between the classes of vehicles on lo-cal roads. A differential trigger speed was therefore investigated for the sake of completion. A data driven approach using Random forest was found to be appropriate in predicting trigger speeds respective to types of vehicles and traffic conditions. The fact that the predicted trigger speed was found to be consistently around the 85th percentile speed justifies the choice of the automatic model.
Resumo:
A word may have many potential meanings, but its actual meaning in any authentic written or spoken text is determined by its context: its collocations, structural patterns, and pragmatic functions. Large language corpora offer access to words in a wide range of natural contexts, which can improve and enrich both language learning and teaching.
Resumo:
Inter-subject parcellation of functional Magnetic Resonance Imaging (fMRI) data based on a standard General Linear Model (GLM) and spectral clustering was recently proposed as a means to alleviate the issues associated with spatial normalization in fMRI. However, for all its appeal, a GLM-based parcellation approach introduces its own biases, in the form of a priori knowledge about the shape of Hemodynamic Response Function (HRF) and task-related signal changes, or about the subject behaviour during the task. In this paper, we introduce a data-driven version of the spectral clustering parcellation, based on Independent Component Analysis (ICA) and Partial Least Squares (PLS) instead of the GLM. First, a number of independent components are automatically selected. Seed voxels are then obtained from the associated ICA maps and we compute the PLS latent variables between the fMRI signal of the seed voxels (which covers regional variations of the HRF) and the principal components of the signal across all voxels. Finally, we parcellate all subjects data with a spectral clustering of the PLS latent variables. We present results of the application of the proposed method on both single-subject and multi-subject fMRI datasets. Preliminary experimental results, evaluated with intra-parcel variance of GLM t-values and PLS derived t-values, indicate that this data-driven approach offers improvement in terms of parcellation accuracy over GLM based techniques.
Resumo:
A commitment in 2010 by the Australian Federal Government to spend $466.7 million dollars on the implementation of personally controlled electronic health records (PCEHR) heralded a shift to a more effective and safer patient centric eHealth system. However, deployment of the PCEHR has met with much criticism, emphasised by poor adoption rates over the first 12 months of operation. An indifferent response by the public and healthcare providers largely sceptical of its utility and safety speaks to the complex sociotechnical drivers and obstacles inherent in the embedding of large (national) scale eHealth projects. With government efforts to inflate consumer and practitioner engagement numbers giving rise to further consumer disillusionment, broader utilitarian opportunities available with the PCEHR are at risk. This paper discusses the implications of establishing the PCEHR as the cornerstone of a holistic eHealth strategy for the aggregation of longitudinal patient information. A viewpoint is offered that the real value in patient data lies not just in the collection of data but in the integration of this information into clinical processes within the framework of a commoditised data-driven approach. Consideration is given to the eHealth-as-a-Service (eHaaS) construct as a disruptive next step for co-ordinated individualised healthcare in the Australian context.
Resumo:
Decision-making is such an integral aspect in health care routine that the ability to make the right decisions at crucial moments can lead to patient health improvements. Evidence-based practice, the paradigm used to make those informed decisions, relies on the use of current best evidence from systematic research such as randomized controlled trials. Limitations of the outcomes from randomized controlled trials (RCT), such as “quantity” and “quality” of evidence generated, has lowered healthcare professionals’ confidence in using EBP. An alternate paradigm of Practice-Based Evidence has evolved with the key being evidence drawn from practice settings. Through the use of health information technology, electronic health records (EHR) capture relevant clinical practice “evidence”. A data-driven approach is proposed to capitalize on the benefits of EHR. The issues of data privacy, security and integrity are diminished by an information accountability concept. Data warehouse architecture completes the data-driven approach by integrating health data from multi-source systems, unique within the healthcare environment.
Resumo:
This paper addresses the problem of fully-automatic localization and segmentation of 3D intervertebral discs (IVDs) from MR images. Our method contains two steps, where we first localize the center of each IVD, and then segment IVDs by classifying image pixels around each disc center as foreground (disc) or background. The disc localization is done by estimating the image displacements from a set of randomly sampled 3D image patches to the disc center. The image displacements are estimated by jointly optimizing the training and test displacement values in a data-driven way, where we take into consideration both the training data and the geometric constraint on the test image. After the disc centers are localized, we segment the discs by classifying image pixels around disc centers as background or foreground. The classification is done in a similar data-driven approach as we used for localization, but in this segmentation case we are aiming to estimate the foreground/background probability of each pixel instead of the image displacements. In addition, an extra neighborhood smooth constraint is introduced to enforce the local smoothness of the label field. Our method is validated on 3D T2-weighted turbo spin echo MR images of 35 patients from two different studies. Experiments show that compared to state of the art, our method achieves better or comparable results. Specifically, we achieve for localization a mean error of 1.6-2.0 mm, and for segmentation a mean Dice metric of 85%-88% and a mean surface distance of 1.3-1.4 mm.
Resumo:
Most current 3D landscape visualisation systems either use bespoke hardware solutions, or offer a limited amount of interaction and detail when used in realtime mode. We are developing a modular, data driven 3D visualisation system that can be readily customised to specific requirements. By utilising the latest software engineering methods and bringing a dynamic data driven approach to geo-spatial data visualisation we will deliver an unparalleled level of customisation in near-photo realistic, realtime 3D landscape visualisation. In this paper we show the system framework and describe how this employs data driven techniques. In particular we discuss how data driven approaches are applied to the spatiotemporal management aspect of the application framework, and describe the advantages these convey.
Resumo:
Most current 3D landscape visualisation systems either use bespoke hardware solutions, or offer a limited amount of interaction and detail when used in realtime mode. We are developing a modular, data driven 3D visualisation system that can be readily customised to specific requirements. By utilising the latest software engineering methods and bringing a dynamic data driven approach to geo-spatial data visualisation we will deliver an unparalleled level of customisation in near-photo realistic, realtime 3D landscape visualisation. In this paper we show the system framework and describe how this employs data driven techniques. In particular we discuss how data driven approaches are applied to the spatiotemporal management aspect of the application framework, and describe the advantages these convey. © Springer-Verlag Berlin Heidelberg 2006.
Resumo:
Cancer and cardio-vascular diseases are the leading causes of death world-wide. Caused by systemic genetic and molecular disruptions in cells, these disorders are the manifestation of profound disturbance of normal cellular homeostasis. People suffering or at high risk for these disorders need early diagnosis and personalized therapeutic intervention. Successful implementation of such clinical measures can significantly improve global health. However, development of effective therapies is hindered by the challenges in identifying genetic and molecular determinants of the onset of diseases; and in cases where therapies already exist, the main challenge is to identify molecular determinants that drive resistance to the therapies. Due to the progress in sequencing technologies, the access to a large genome-wide biological data is now extended far beyond few experimental labs to the global research community. The unprecedented availability of the data has revolutionized the capabilities of computational researchers, enabling them to collaboratively address the long standing problems from many different perspectives. Likewise, this thesis tackles the two main public health related challenges using data driven approaches. Numerous association studies have been proposed to identify genomic variants that determine disease. However, their clinical utility remains limited due to their inability to distinguish causal variants from associated variants. In the presented thesis, we first propose a simple scheme that improves association studies in supervised fashion and has shown its applicability in identifying genomic regulatory variants associated with hypertension. Next, we propose a coupled Bayesian regression approach -- eQTeL, which leverages epigenetic data to estimate regulatory and gene interaction potential, and identifies combinations of regulatory genomic variants that explain the gene expression variance. On human heart data, eQTeL not only explains a significantly greater proportion of expression variance in samples, but also predicts gene expression more accurately than other methods. We demonstrate that eQTeL accurately detects causal regulatory SNPs by simulation, particularly those with small effect sizes. Using various functional data, we show that SNPs detected by eQTeL are enriched for allele-specific protein binding and histone modifications, which potentially disrupt binding of core cardiac transcription factors and are spatially proximal to their target. eQTeL SNPs capture a substantial proportion of genetic determinants of expression variance and we estimate that 58% of these SNPs are putatively causal. The challenge of identifying molecular determinants of cancer resistance so far could only be dealt with labor intensive and costly experimental studies, and in case of experimental drugs such studies are infeasible. Here we take a fundamentally different data driven approach to understand the evolving landscape of emerging resistance. We introduce a novel class of genetic interactions termed synthetic rescues (SR) in cancer, which denotes a functional interaction between two genes where a change in the activity of one vulnerable gene (which may be a target of a cancer drug) is lethal, but subsequently altered activity of its partner rescuer gene restores cell viability. Next we describe a comprehensive computational framework --termed INCISOR-- for identifying SR underlying cancer resistance. Applying INCISOR to mine The Cancer Genome Atlas (TCGA), a large collection of cancer patient data, we identified the first pan-cancer SR networks, composed of interactions common to many cancer types. We experimentally test and validate a subset of these interactions involving the master regulator gene mTOR. We find that rescuer genes become increasingly activated as breast cancer progresses, testifying to pervasive ongoing rescue processes. We show that SRs can be utilized to successfully predict patients' survival and response to the majority of current cancer drugs, and importantly, for predicting the emergence of drug resistance from the initial tumor biopsy. Our analysis suggests a potential new strategy for enhancing the effectiveness of existing cancer therapies by targeting their rescuer genes to counteract resistance. The thesis provides statistical frameworks that can harness ever increasing high throughput genomic data to address challenges in determining the molecular underpinnings of hypertension, cardiovascular disease and cancer resistance. We discover novel molecular mechanistic insights that will advance the progress in early disease prevention and personalized therapeutics. Our analyses sheds light on the fundamental biological understanding of gene regulation and interaction, and opens up exciting avenues of translational applications in risk prediction and therapeutics.
Resumo:
Online Social Networks (OSNs) facilitate to create and spread information easily and rapidly, influencing others to participate and propagandize. This work proposes a novel method of profiling Influential Blogger (IB) based on the activities performed on one's blog documents who influences various other bloggers in Social Blog Network (SBN). After constructing a social blogging site, a SBN is analyzed with appropriate parameters to get the Influential Blog Power (IBP) of each blogger in the network and demonstrate that profiling IB is adequate and accurate. The proposed Profiling Influential Blogger (PIB) Algorithm survival rate of IB is high and stable. (C) 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Resumo:
In this paper, a data driven orthogonal basis function approach is proposed for non-parametric FIR nonlinear system identification. The basis functions are not fixed a priori and match the structure of the unknown system automatically. This eliminates the problem of blindly choosing the basis functions without a priori structural information. Further, based on the proposed basis functions, approaches are proposed for model order determination and regressor selection along with their theoretical justifications. © 2008 IEEE.