841 resultados para Natural language techniques, Semantic spaces, Random projection, Documents
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Typical properties of sparse random matrices over finite (Galois) fields are studied, in the limit of large matrices, using techniques from the physics of disordered systems. For the case of a finite field GF(q) with prime order q, we present results for the average kernel dimension, average dimension of the eigenvector spaces and the distribution of the eigenvalues. The number of matrices for a given distribution of entries is also calculated for the general case. The significance of these results to error-correcting codes and random graphs is also discussed.
Resumo:
This thesis attempts a psychological investigation of hemispheric functioning in developmental dyslexia. Previous work using neuropsychological methods with developmental dyslexics is reviewed ,and original work is presented both of a conventional psychometric nature and also utilising a new means of intervention. At the inception of inquiry into dyslexia, comparisons were drawn between developmental dyslexia and acquired alexia, promoting a model of brain damage as the common cause. Subsequent investigators found developmental dyslexics to be neurologically intact, and so an alternative hypothesis was offered, namely that language is abnormally localized (not in the left hemisphere). Research in the last decade, using the advanced techniques of modern neuropsychology, has indicated that developmental dyslexics are probably left hemisphere dominant for language. The development of a new type of pharmaceutical prep~ration (that appears to have a left hemisphere effect) offers an oppertunity to test the experimental hypothesis. This hypothesis propounds that most dyslexics are left hemisphere language dominant, but some of these language related operations are dysfunctioning. The methods utilised are those of psychological assessment of cognitive function, both in a traditional psychometric situation, and with a new form of intervention (Piracetam). The information resulting from intervention will be judged on its therapeutic validity and contribution to the understanding of hemispheric functioning in dyslexics. The experimental studies using conventional psychometric evaluation revealed a dyslexic profile of poor sequencing and name coding ability, with adequate spatial and verbal reasoning skills. Neuropsychological information would tend to suggest that this profile was indicative of adequate right hemsiphere abilities and deficits in some left hemsiphere abilities. When an intervention agent (Piracetam) was used with young adult dyslexics there were improvements in both the rate of acquisition and conservation of verbal learning. An experimental study with dyslexic children revealed that Piracetam appeared to improve reading, writing and sequencing, but did not influence spatial abilities. This would seem to concord with other recent findings, that deve~mental dyslexics may have left hemisphere language localisation, although some of these language related abilities are dysfunctioning.
Resumo:
Background: The controversy surrounding the non-uniqueness of predictive gene lists (PGL) of small selected subsets of genes from very large potential candidates as available in DNA microarray experiments is now widely acknowledged 1. Many of these studies have focused on constructing discriminative semi-parametric models and as such are also subject to the issue of random correlations of sparse model selection in high dimensional spaces. In this work we outline a different approach based around an unsupervised patient-specific nonlinear topographic projection in predictive gene lists. Methods: We construct nonlinear topographic projection maps based on inter-patient gene-list relative dissimilarities. The Neuroscale, the Stochastic Neighbor Embedding(SNE) and the Locally Linear Embedding(LLE) techniques have been used to construct two-dimensional projective visualisation plots of 70 dimensional PGLs per patient, classifiers are also constructed to identify the prognosis indicator of each patient using the resulting projections from those visualisation techniques and investigate whether a-posteriori two prognosis groups are separable on the evidence of the gene lists. A literature-proposed predictive gene list for breast cancer is benchmarked against a separate gene list using the above methods. Generalisation ability is investigated by using the mapping capability of Neuroscale to visualise the follow-up study, but based on the projections derived from the original dataset. Results: The results indicate that small subsets of patient-specific PGLs have insufficient prognostic dissimilarity to permit a distinction between two prognosis patients. Uncertainty and diversity across multiple gene expressions prevents unambiguous or even confident patient grouping. Comparative projections across different PGLs provide similar results. Conclusion: The random correlation effect to an arbitrary outcome induced by small subset selection from very high dimensional interrelated gene expression profiles leads to an outcome with associated uncertainty. This continuum and uncertainty precludes any attempts at constructing discriminative classifiers. However a patient's gene expression profile could possibly be used in treatment planning, based on knowledge of other patients' responses. We conclude that many of the patients involved in such medical studies are intrinsically unclassifiable on the basis of provided PGL evidence. This additional category of 'unclassifiable' should be accommodated within medical decision support systems if serious errors and unnecessary adjuvant therapy are to be avoided.
Resumo:
In this chapter we present the relevant mathematical background to address two well defined signal and image processing problems. Namely, the problem of structured noise filtering and the problem of interpolation of missing data. The former is addressed by recourse to oblique projection based techniques whilst the latter, which can be considered equivalent to impulsive noise filtering, is tackled by appropriate interpolation methods.
Resumo:
Spoken language comprehension is known to involve a large left-dominant network of fronto-temporal brain regions, but there is still little consensus about how the syntactic and semantic aspects of language are processed within this network. In an fMRI study, volunteers heard spoken sentences that contained either syntactic or semantic ambiguities as well as carefully matched low-ambiguity sentences. Results showed ambiguity-related responses in the posterior left inferior frontal gyrus (pLIFG) and posterior left middle temporal regions. The pLIFG activations were present for both syntactic and semantic ambiguities suggesting that this region is not specialised for processing either semantic or syntactic information, but instead performs cognitive operations that are required to resolve different types of ambiguity irrespective of their linguistic nature, for example by selecting between possible interpretations or reinterpreting misparsed sentences. Syntactic ambiguities also produced activation in the posterior middle temporal gyrus. These data confirm the functional relationship between these two brain regions and their importance in constructing grammatical representations of spoken language.
Resumo:
Early, lesion-based models of language processing suggested that semantic and phonological processes are associated with distinct temporal and parietal regions respectively, with frontal areas more indirectly involved. Contemporary spatial brain mapping techniques have not supported such clear-cut segregation, with strong evidence of activation in left temporal areas by both processes and disputed evidence of involvement of frontal areas in both processes. We suggest that combining spatial information with temporal and spectral data may allow a closer scrutiny of the differential involvement of closely overlapping cortical areas in language processing. Using beamforming techniques to analyze magnetoencephalography data, we localized the neuronal substrates underlying primed responses to nouns requiring either phonological or semantic processing, and examined the associated measures of time and frequency in those areas where activation was common to both tasks. Power changes in the beta (14-30 Hz) and gamma (30-50 Hz) frequency bandswere analyzed in pre-selected time windows of 350-550 and 500-700ms In left temporal regions, both tasks elicited power changes in the same time window (350-550 ms), but with different spectral characteristics, low beta (14-20 Hz) for the phonological task and high beta (20-30 Hz) for the semantic task. In frontal areas (BA10), both tasks elicited power changes in the gamma band (30-50 Hz), but in different time windows, 500-700ms for the phonological task and 350-550ms for the semantic task. In the left inferior parietal area (BA40), both tasks elicited changes in the 20-30 Hz beta frequency band but in different time windows, 350-550ms for the phonological task and 500-700ms for the semantic task. Our findings suggest that, where spatial measures may indicate overlapping areas of involvement, additional beamforming techniques can demonstrate differential activation in time and frequency domains. © 2012 McNab, Hillebrand, Swithenby and Rippon.
Resumo:
Humans consciously and subconsciously establish various links, emerge semantic images and reason in mind, learn linking effect and rules, select linked individuals to interact, and form closed loops through links while co-experiencing in multiple spaces in lifetime. Machines are limited in these abilities although various graph-based models have been used to link resources in the cyber space. The following are fundamental limitations of machine intelligence: (1) machines know few links and rules in the physical space, physiological space, psychological space, socio space and mental space, so it is not realistic to expect machines to discover laws and solve problems in these spaces; and, (2) machines can only process pre-designed algorithms and data structures in the cyber space. They are limited in ability to go beyond the cyber space, to learn linking rules, to know the effect of linking, and to explain computing results according to physical, physiological, psychological and socio laws. Linking various spaces will create a complex space — the Cyber-Physical-Physiological-Psychological-Socio-Mental Environment CP3SME. Diverse spaces will emerge, evolve, compete and cooperate with each other to extend machine intelligence and human intelligence. From multi-disciplinary perspective, this paper reviews previous ideas on various links, introduces the concept of cyber-physical society, proposes the ideal of the CP3SME including its definition, characteristics, and multi-disciplinary revolution, and explores the methodology of linking through spaces for cyber-physical-socio intelligence. The methodology includes new models, principles, mechanisms, scientific issues, and philosophical explanation. The CP3SME aims at an ideal environment for humans to live and work. Exploration will go beyond previous ideals on intelligence and computing.
Resumo:
A detailed knowledge of the mapping between sequence and structure spaces in populations of RNA molecules is essential to better understand their present-day functional properties, to envisage a plausible early evolution of RNA in a prebiotic chemical environment and to improve the design of in vitro evolution experiments, among others. Analysis of natural RNAs, as well as in vitro and computational studies, show that certain RNA structural motifs are much more abundant than others, pointing out a complex relation between sequence and structure. Within this framework, we have investigated computationally the structural properties of a large pool (10 molecules) of single-stranded, 35 nt-long, random RNA sequences. The secondary structures obtained are ranked and classified into structure families. The number of structures in main families is analytically calculated and compared with the numerical results. This permits a quantification of the fraction of structure space covered by a large pool of sequences. We further show that the number of structural motifs and their frequency is highly unbalanced with respect to the nucleotide composition: simple structures such as stem-loops and hairpins arise from sequences depleted in G, while more complex structures require an enrichment of G. In general, we observe a strong correlation between subfamilies-characterized by a fixed number of paired nucleotides-and nucleotide composition. Our results are compared to the structural repertoire obtained in a second pool where isolated base pairs are prohibited. © 2008 Elsevier Ltd. All rights reserved.
Resumo:
The Semantic Web relies on carefully structured, well defined, data to allow machines to communicate and understand one another. In many domains (e.g. geospatial) the data being described contains some uncertainty, often due to incomplete knowledge; meaningful processing of this data requires these uncertainties to be carefully analysed and integrated into the process chain. Currently, within the SemanticWeb there is no standard mechanism for interoperable description and exchange of uncertain information, which renders the automated processing of such information implausible, particularly where error must be considered and captured as it propagates through a processing sequence. In particular we adopt a Bayesian perspective and focus on the case where the inputs / outputs are naturally treated as random variables. This paper discusses a solution to the problem in the form of the Uncertainty Markup Language (UncertML). UncertML is a conceptual model, realised as an XML schema, that allows uncertainty to be quantified in a variety of ways i.e. realisations, statistics and probability distributions. UncertML is based upon a soft-typed XML schema design that provides a generic framework from which any statistic or distribution may be created. Making extensive use of Geography Markup Language (GML) dictionaries, UncertML provides a collection of definitions for common uncertainty types. Containing both written descriptions and mathematical functions, encoded as MathML, the definitions within these dictionaries provide a robust mechanism for defining any statistic or distribution and can be easily extended. Universal Resource Identifiers (URIs) are used to introduce semantics to the soft-typed elements by linking to these dictionary definitions. The INTAMAP (INTeroperability and Automated MAPping) project provides a use case for UncertML. This paper demonstrates how observation errors can be quantified using UncertML and wrapped within an Observations & Measurements (O&M) Observation. The interpolation service uses the information within these observations to influence the prediction outcome. The output uncertainties may be encoded in a variety of UncertML types, e.g. a series of marginal Gaussian distributions, a set of statistics, such as the first three marginal moments, or a set of realisations from a Monte Carlo treatment. Quantifying and propagating uncertainty in this way allows such interpolation results to be consumed by other services. This could form part of a risk management chain or a decision support system, and ultimately paves the way for complex data processing chains in the Semantic Web.
Resumo:
Formulating complex queries is hard, especially when users cannot understand all the data structures of multiple complex knowledge bases. We see a gap between simplistic but user friendly tools and formal query languages. Building on an example comparison search, we propose an approach in which reusable search components take an intermediary role between the user interface and formal query languages.
Resumo:
We introduce a flexible visual data mining framework which combines advanced projection algorithms from the machine learning domain and visual techniques developed in the information visualization domain. The advantage of such an interface is that the user is directly involved in the data mining process. We integrate principled projection algorithms, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), with powerful visual techniques, such as magnification factors, directional curvatures, parallel coordinates and billboarding, to provide a visual data mining framework. Results on a real-life chemoinformatics dataset using GTM are promising and have been analytically compared with the results from the traditional projection methods. It is also shown that the HGTM algorithm provides additional value for large datasets. The computational complexity of these algorithms is discussed to demonstrate their suitability for the visual data mining framework. Copyright 2006 ACM.
Resumo:
In this paper we study a nonlinear evolution inclusion of subdifferential type in Hilbert spaces. The perturbation term is Hausdorff continuous in the state variable and has closed but not necessarily convex values. Our result is a stochastic generalization of an existence theorem proved by Kravvaritis and Papageorgiou in [6].