10 resultados para Statistical Language Model
em Duke University
Resumo:
Brain-computer interfaces (BCI) have the potential to restore communication or control abilities in individuals with severe neuromuscular limitations, such as those with amyotrophic lateral sclerosis (ALS). The role of a BCI is to extract and decode relevant information that conveys a user's intent directly from brain electro-physiological signals and translate this information into executable commands to control external devices. However, the BCI decision-making process is error-prone due to noisy electro-physiological data, representing the classic problem of efficiently transmitting and receiving information via a noisy communication channel.
This research focuses on P300-based BCIs which rely predominantly on event-related potentials (ERP) that are elicited as a function of a user's uncertainty regarding stimulus events, in either an acoustic or a visual oddball recognition task. The P300-based BCI system enables users to communicate messages from a set of choices by selecting a target character or icon that conveys a desired intent or action. P300-based BCIs have been widely researched as a communication alternative, especially in individuals with ALS who represent a target BCI user population. For the P300-based BCI, repeated data measurements are required to enhance the low signal-to-noise ratio of the elicited ERPs embedded in electroencephalography (EEG) data, in order to improve the accuracy of the target character estimation process. As a result, BCIs have relatively slower speeds when compared to other commercial assistive communication devices, and this limits BCI adoption by their target user population. The goal of this research is to develop algorithms that take into account the physical limitations of the target BCI population to improve the efficiency of ERP-based spellers for real-world communication.
In this work, it is hypothesised that building adaptive capabilities into the BCI framework can potentially give the BCI system the flexibility to improve performance by adjusting system parameters in response to changing user inputs. The research in this work addresses three potential areas for improvement within the P300 speller framework: information optimisation, target character estimation and error correction. The visual interface and its operation control the method by which the ERPs are elicited through the presentation of stimulus events. The parameters of the stimulus presentation paradigm can be modified to modulate and enhance the elicited ERPs. A new stimulus presentation paradigm is developed in order to maximise the information content that is presented to the user by tuning stimulus paradigm parameters to positively affect performance. Internally, the BCI system determines the amount of data to collect and the method by which these data are processed to estimate the user's target character. Algorithms that exploit language information are developed to enhance the target character estimation process and to correct erroneous BCI selections. In addition, a new model-based method to predict BCI performance is developed, an approach which is independent of stimulus presentation paradigm and accounts for dynamic data collection. The studies presented in this work provide evidence that the proposed methods for incorporating adaptive strategies in the three areas have the potential to significantly improve BCI communication rates, and the proposed method for predicting BCI performance provides a reliable means to pre-assess BCI performance without extensive online testing.
Resumo:
With the popularization of GPS-enabled devices such as mobile phones, location data are becoming available at an unprecedented scale. The locations may be collected from many different sources such as vehicles moving around a city, user check-ins in social networks, and geo-tagged micro-blogging photos or messages. Besides the longitude and latitude, each location record may also have a timestamp and additional information such as the name of the location. Time-ordered sequences of these locations form trajectories, which together contain useful high-level information about people's movement patterns.
The first part of this thesis focuses on a few geometric problems motivated by the matching and clustering of trajectories. We first give a new algorithm for computing a matching between a pair of curves under existing models such as dynamic time warping (DTW). The algorithm is more efficient than standard dynamic programming algorithms both theoretically and practically. We then propose a new matching model for trajectories that avoids the drawbacks of existing models. For trajectory clustering, we present an algorithm that computes clusters of subtrajectories, which correspond to common movement patterns. We also consider trajectories of check-ins, and propose a statistical generative model, which identifies check-in clusters as well as the transition patterns between the clusters.
The second part of the thesis considers the problem of covering shortest paths in a road network, motivated by an EV charging station placement problem. More specifically, a subset of vertices in the road network are selected to place charging stations so that every shortest path contains enough charging stations and can be traveled by an EV without draining the battery. We first introduce a general technique for the geometric set cover problem. This technique leads to near-linear-time approximation algorithms, which are the state-of-the-art algorithms for this problem in either running time or approximation ratio. We then use this technique to develop a near-linear-time algorithm for this
shortest-path cover problem.
Resumo:
BACKGROUND: The rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife. Bats, remarkably, are the natural reservoirs of many of the most pathogenic viruses in humans. There are two bat genome projects currently underway, a circumstance that promises to speed the discovery host factors important in the coevolution of bats with their viruses. These genomes, however, are not yet assembled and one of them will provide only low coverage, making the inference of most genes of immunological interest error-prone. Many more wildlife genome projects are underway and intend to provide only shallow coverage. RESULTS: We have developed a statistical method for the assembly of gene families from partial genomes. The method takes full advantage of the quality scores generated by base-calling software, incorporating them into a complete probabilistic error model, to overcome the limitation inherent in the inference of gene family members from partial sequence information. We validated the method by inferring the human IFNA genes from the genome trace archives, and used it to infer 61 type-I interferon genes, and single type-II interferon genes in the bats Pteropus vampyrus and Myotis lucifugus. We confirmed our inferences by direct cloning and sequencing of IFNA, IFNB, IFND, and IFNK in P. vampyrus, and by demonstrating transcription of some of the inferred genes by known interferon-inducing stimuli. CONCLUSION: The statistical trace assembler described here provides a reliable method for extracting information from the many available and forthcoming partial or shallow genome sequencing projects, thereby facilitating the study of a wider variety of organisms with ecological and biomedical significance to humans than would otherwise be possible.
Resumo:
Angelman syndrome (AS) is a neurobehavioral disorder associated with mental retardation, absence of language development, characteristic electroencephalography (EEG) abnormalities and epilepsy, happy disposition, movement or balance disorders, and autistic behaviors. The molecular defects underlying AS are heterogeneous, including large maternal deletions of chromosome 15q11-q13 (70%), paternal uniparental disomy (UPD) of chromosome 15 (5%), imprinting mutations (rare), and mutations in the E6-AP ubiquitin ligase gene UBE3A (15%). Although patients with UBE3A mutations have a wide spectrum of neurological phenotypes, their features are usually milder than AS patients with deletions of 15q11-q13. Using a chromosomal engineering strategy, we generated mutant mice with a 1.6-Mb chromosomal deletion from Ube3a to Gabrb3, which inactivated the Ube3a and Gabrb3 genes and deleted the Atp10a gene. Homozygous deletion mutant mice died in the perinatal period due to a cleft palate resulting from the null mutation in Gabrb3 gene. Mice with a maternal deletion (m-/p+) were viable and did not have any obvious developmental defects. Expression analysis of the maternal and paternal deletion mice confirmed that the Ube3a gene is maternally expressed in brain, and showed that the Atp10a and Gabrb3 genes are biallelically expressed in all brain sub-regions studied. Maternal (m-/p+), but not paternal (m+/p-), deletion mice had increased spontaneous seizure activity and abnormal EEG. Extensive behavioral analyses revealed significant impairment in motor function, learning and memory tasks, and anxiety-related measures assayed in the light-dark box in maternal deletion but not paternal deletion mice. Ultrasonic vocalization (USV) recording in newborns revealed that maternal deletion pups emitted significantly more USVs than wild-type littermates. The increased USV in maternal deletion mice suggests abnormal signaling behavior between mothers and pups that may reflect abnormal communication behaviors in human AS patients. Thus, mutant mice with a maternal deletion from Ube3a to Gabrb3 provide an AS mouse model that is molecularly more similar to the contiguous gene deletion form of AS in humans than mice with Ube3a mutation alone. These mice will be valuable for future comparative studies to mice with maternal deficiency of Ube3a alone.
Resumo:
Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally "validated" in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN.
Resumo:
A framework for adaptive and non-adaptive statistical compressive sensing is developed, where a statistical model replaces the standard sparsity model of classical compressive sensing. We propose within this framework optimal task-specific sensing protocols specifically and jointly designed for classification and reconstruction. A two-step adaptive sensing paradigm is developed, where online sensing is applied to detect the signal class in the first step, followed by a reconstruction step adapted to the detected class and the observed samples. The approach is based on information theory, here tailored for Gaussian mixture models (GMMs), where an information-theoretic objective relationship between the sensed signals and a representation of the specific task of interest is maximized. Experimental results using synthetic signals, Landsat satellite attributes, and natural images of different sizes and with different noise levels show the improvements achieved using the proposed framework when compared to more standard sensing protocols. The underlying formulation can be applied beyond GMMs, at the price of higher mathematical and computational complexity. © 1991-2012 IEEE.
Resumo:
In the mnemonic model of posttraumatic stress disorder (PTSD), the current memory of a negative event, not the event itself, determines symptoms. The model is an alternative to the current event-based etiology of PTSD represented in the Diagnostic and Statistical Manual of Mental Disorders (4th ed., text rev.; American Psychiatric Association, 2000). The model accounts for important and reliable findings that are often inconsistent with the current diagnostic view and that have been neglected by theoretical accounts of the disorder, including the following observations. The diagnosis needs objective information about the trauma and peritraumatic emotions but uses retrospective memory reports that can have substantial biases. Negative events and emotions that do not satisfy the current diagnostic criteria for a trauma can be followed by symptoms that would otherwise qualify for PTSD. Predisposing factors that affect the current memory have large effects on symptoms. The inability-to-recall-an-important-aspect-of-the-trauma symptom does not correlate with other symptoms. Loss or enhancement of the trauma memory affects PTSD symptoms in predictable ways. Special mechanisms that apply only to traumatic memories are not needed, increasing parsimony and the knowledge that can be applied to understanding PTSD.
Resumo:
Behavior, neuropsychology, and neuroimaging suggest that episodic memories are constructed from interactions among the following basic systems: vision, audition, olfaction, other senses, spatial imagery, language, emotion, narrative, motor output, explicit memory, and search and retrieval. Each system has its own well-documented functions, neural substrates, processes, structures, and kinds of schemata. However, the systems have not been considered as interacting components of episodic memory, as is proposed here. Autobiographical memory and oral traditions are used to demonstrate the usefulness of the basic-systems model in accounting for existing data and predicting novel findings, and to argue that the model, or one similar to it, is the only way to understand episodic memory for complex stimuli routinely encountered outside the laboratory.
Resumo:
X-ray crystallography is the predominant method for obtaining atomic-scale information about biological macromolecules. Despite the success of the technique, obtaining well diffracting crystals still critically limits going from protein to structure. In practice, the crystallization process proceeds through knowledge-informed empiricism. Better physico-chemical understanding remains elusive because of the large number of variables involved, hence little guidance is available to systematically identify solution conditions that promote crystallization. To help determine relationships between macromolecular properties and their crystallization propensity, we have trained statistical models on samples for 182 proteins supplied by the Northeast Structural Genomics consortium. Gaussian processes, which capture trends beyond the reach of linear statistical models, distinguish between two main physico-chemical mechanisms driving crystallization. One is characterized by low levels of side chain entropy and has been extensively reported in the literature. The other identifies specific electrostatic interactions not previously described in the crystallization context. Because evidence for two distinct mechanisms can be gleaned both from crystal contacts and from solution conditions leading to successful crystallization, the model offers future avenues for optimizing crystallization screens based on partial structural information. The availability of crystallization data coupled with structural outcomes analyzed through state-of-the-art statistical models may thus guide macromolecular crystallization toward a more rational basis.