874 resultados para Cross-Lingual Information Retrieval
Resumo:
Negative co-occurrence is a common phenomenon in many signal processing applications. In some cases the signals involved are sparse, and this information can be exploited to recover them. In this paper, we present a sparse learning approach that explicitly takes into account negative co-occurrence. This is achieved by adding a novel penalty term to the LASSO cost function based on the cross-products between the reconstruction coefficients. Although the resulting optimization problem is non-convex, we develop a new and efficient method for solving it based on successive convex approximations. Results on synthetic data, for both complete and overcomplete dictionaries, are provided to validate the proposed approach.
Resumo:
This work aims to develop a novel Cross-Entropy (CE) optimization-based fuzzy controller for Unmanned Aerial Monocular Vision-IMU System (UAMVIS) to solve the seeand-avoid problem using its accurate autonomous localization information. The function of this fuzzy controller is regulating the heading of this system to avoid the obstacle, e.g. wall. In the Matlab Simulink-based training stages, the Scaling Factor (SF) is adjusted according to the specified task firstly, and then the Membership Function (MF) is tuned based on the optimized Scaling Factor to further improve the collison avoidance performance. After obtained the optimal SF and MF, 64% of rules has been reduced (from 125 rules to 45 rules), and a large number of real flight tests with a quadcopter have been done. The experimental results show that this approach precisely navigates the system to avoid the obstacle. To our best knowledge, this is the first work to present the optimized fuzzy controller for UAMVIS using Cross-Entropy method in Scaling Factors and Membership Functions optimization.
Resumo:
This paper presents an adaptation of the Cross-Entropy (CE) method to optimize fuzzy logic controllers. The CE is a recently developed optimization method based on a general Monte-Carlo approach to combinatorial and continuous multi-extremal optimization and importance sampling. This work shows the application of this optimization method to optimize the inputs gains, the location and size of the different membership functions' sets of each variable, as well as the weight of each rule from the rule's base of a fuzzy logic controller (FLC). The control system approach presented in this work was designed to command the orientation of an unmanned aerial vehicle (UAV) to modify its trajectory for avoiding collisions. An onboard looking forward camera was used to sense the environment of the UAV. The information extracted by the image processing algorithm is the only input of the fuzzy control approach to avoid the collision with a predefined object. Real tests with a quadrotor have been done to corroborate the improved behavior of the optimized controllers at different stages of the optimization process.
Resumo:
En los últimos años han surgido nuevos campos de las tecnologías de la información que exploran el tratamiento de la gran cantidad de datos digitales existentes y cómo transformarlos en conocimiento explícito. Las técnicas de Procesamiento del Lenguaje Natural (NLP) son capaces de extraer información de los textos digitales presentados en forma narrativa. Además, las técnicas de machine learning clasifican instancias o ejemplos en función de sus atributos, en distintas categorías, aprendiendo de otros previamente clasificados. Los textos clínicos son una gran fuente de información no estructurada; en consecuencia, información no explotada en su totalidad. Algunos términos usados en textos clínicos se encuentran en una situación de afirmación, negación, hipótesis o histórica. La detección de esta situación es necesaria para la estructuración de información, pero a su vez tiene una gran complejidad. Extrayendo características lingüísticas de los elementos, o tokens, de los textos mediante NLP; transformando estos tokens en instancias y las características en atributos, podemos mediante técnicas de machine learning clasificarlos con el objetivo de detectar si se encuentran afirmados, negados, hipotéticos o históricos. La selección de los atributos que cada token debe tener para su clasificación, así como la selección del algoritmo de machine learning utilizado son elementos cruciales para la clasificación. Son, de hecho, los elementos que componen el modelo de clasificación. Consecuentemente, este trabajo aborda el proceso de extracción de características, selección de atributos y selección del algoritmo de machine learning para la detección de la negación en textos clínicos en español. Se expone un modelo para la clasificación que, mediante el algoritmo J48 y 35 atributos obtenidos de características lingüísticas (morfológicas y sintácticas) y disparadores de negación, detecta si un token está negado en 465 frases provenientes de textos clínicos con un F-Score del 73%, una exhaustividad del 66% y una precisión del 81% con una validación cruzada de 10 iteraciones. ---ABSTRACT--- New information technologies have emerged in the recent years which explore the processing of the huge amount of existing digital data and its transformation into knowledge. Natural Language Processing (NLP) techniques are able to extract certain features from digital texts. Additionally, through machine learning techniques it is feasible to classify instances according to different categories, learning from others previously classified. Clinical texts contain great amount of unstructured data, therefore information not fully exploited. Some terms (tokens) in clinical texts appear in different situations such as affirmed, negated, hypothetic or historic. Detecting this situation is necessary for the structuring of this data, however not simple. It is possible to detect whether if a token is negated, affirmed, hypothetic or historic by extracting its linguistic features by NLP; transforming these tokens into instances, the features into attributes, and classifying these instances through machine learning techniques. Selecting the attributes each instance must have, and choosing the machine learning algorithm are crucial issues for the classification. In fact, these elements set the classification model. Consequently, this work approaches the features retrieval as well as the attributes and algorithm selection process used by machine learning techniques for the detection of negation in clinical texts in Spanish. We present a classification model which, through J48 algorithm and 35 attributes from linguistic features (morphologic and syntactic) and negation triggers, detects whether if a token is negated in 465 sentences from historical records, with a result of 73% FScore, 66% recall and 81% precision using a 10-fold cross-validation.
Resumo:
In the past decades, online learning has transformed the educational landscape with the emergence of new ways to learn. This fact, together with recent changes in educational policy in Europe aiming to facilitate the incorporation of graduate students to the labor market, has provoked a shift on the delivery of instruction and on the role played by teachers and students, stressing the need for development of both basic and cross-curricular competencies. In parallel, the last years have witnessed the emergence of new educational disciplines that can take advantage of the information retrieved by technology-based online education in order to improve instruction, such as learning analytics. This study explores the applicability of learning analytics for prediction of development of two cross-curricular competencies – teamwork and commitment – based on the analysis of Moodle interaction data logs in a Master’s Degree program at Universidad a Distancia de Madrid (UDIMA) where the students were education professionals. The results from the study question the suitability of a general interaction-based approach and show no relation between online activity indicators and teamwork and commitment acquisition. The discussion of results includes multiple recommendations for further research on this topic.
Resumo:
A number of neuroimaging findings have been interpreted as evidence that the left inferior frontal gyrus (IFG) subserves retrieval of semantic knowledge. We provide a fundamentally different interpretation, that it is not retrieval of semantic knowledge per se that is associated with left IFG activity but rather selection of information among competing alternatives from semantic memory. Selection demands were varied across three semantic tasks in a single group of subjects. Functional magnetic resonance imaging signal in overlapping regions of left IFG was dependent on selection demands in all three tasks. In addition, the degree of semantic processing was varied independently of selection demands in one of the tasks. The absence of left IFG activity for this comparison counters the argument that the effects of selection can be attributed solely to variations in degree of semantic retrieval. Our findings suggest that it is selection, not retrieval, of semantic knowledge that drives activity in the left IFG.
Resumo:
Neuronal models predict that retrieval of specific event information reactivates brain regions that were active during encoding of this information. Consistent with this prediction, this positron-emission tomography study showed that remembering that visual words had been paired with sounds at encoding activated some of the auditory brain regions that were engaged during encoding. After word-sound encoding, activation of auditory brain regions was also observed during visual word recognition when there was no demand to retrieve auditory information. Collectively, these observations suggest that information about the auditory components of multisensory event information is stored in auditory responsive cortex and reactivated at retrieval, in keeping with classical ideas about “redintegration,” that is, the power of part of an encoded stimulus complex to evoke the whole experience.
Resumo:
Although neuronal synchronization has been shown to exist in primary motor cortex (MI), very little is known about its possible contribution to coding of movement. By using cross-correlation techniques from multi-neuron recordings in MI, we observed that activity of neurons commonly synchronized around the time of movement initiation. For some cell pairs, synchrony varied with direction in a manner not readily predicted by the firing of either neuron. Information theoretic analysis demonstrated quantitatively that synchrony provides information about movement direction beyond that expected by simple rate changes. Thus, MI neurons are not simply independent encoders of movement parameters but rather engage in mutual interactions that could potentially provide an additional coding dimension in cortex.
Resumo:
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, GeneMap’99, Human–Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP), SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.
Resumo:
The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-International databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP.
Resumo:
The BioKnowledge Library is a relational database and web site (http://www.proteome.com) composed of protein-specific information collected from the scientific literature. Each Protein Report on the web site summarizes and displays published information about a single protein, including its biochemical function, role in the cell and in the whole organism, localization, mutant phenotype and genetic interactions, regulation, domains and motifs, interactions with other proteins and other relevant data. This report describes four species-specific volumes of the BioKnowledge Library, concerned with the model organisms Saccharomyces cerevisiae (YPD), Schizosaccharomyces pombe (PombePD) and Caenorhabditis elegans (WormPD), and with the fungal pathogen Candida albicans (CalPD™). Protein Reports of each species are unified in format, easily searchable and extensively cross-referenced between species. The relevance of these comprehensively curated resources to analysis of proteins in other species is discussed, and is illustrated by a survey of model organism proteins that have similarity to human proteins involved in disease.
Resumo:
The iProClass database is an integrated resource that provides comprehensive family relationships and structural and functional features of proteins, with rich links to various databases. It is extended from ProClass, a protein family database that integrates PIR superfamilies and PROSITE motifs. The iProClass currently consists of more than 200 000 non-redundant PIR and SWISS-PROT proteins organized with more than 28 000 superfamilies, 2600 domains, 1300 motifs, 280 post-translational modification sites and links to more than 30 databases of protein families, structures, functions, genes, genomes, literature and taxonomy. Protein and family summary reports provide rich annotations, including membership information with length, taxonomy and keyword statistics, full family relationships, comprehensive enzyme and PDB cross-references and graphical feature display. The database facilitates classification-driven annotation for protein sequence databases and complete genomes, and supports structural and functional genomic research. The iProClass is implemented in Oracle 8i object-relational system and available for sequence search and report retrieval at http://pir.georgetow n.edu/iproclass/.
Resumo:
The Homeodomain Resource is an annotated collection of non-redundant protein sequences, three-dimensional structures and genomic information for the homeodomain protein family. Release 3.0 contains 795 full-length homeodomain-containing sequences, 32 experimentally-derived structures and 143 homeobox loci implicated in human genetic disorders. Entries are fully hyperlinked to facilitate easy retrieval of the original records from source databases. A simple search engine with a graphical user interface is provided to query the component databases and assemble customized data sets. A new feature for this release is the addition of DNA recognition sites for all human homeodomain proteins described in the literature. The Homeodomain Resource is freely available through the World Wide Web at http://genome.nhgri.nih.gov/homeodomain.
Resumo:
Comparative genomics offers unparalleled opportunities to integrate historically distinct disciplines, to link disparate biological kingdoms, and to bridge basic and applied science. Cross-species, cross-genera, and cross-kingdom comparisons are proving key to understanding how genes are structured, how gene structure relates to gene function, and how changes in DNA have given rise to the biological diversity on the planet. The application of genomics to the study of crop species offers special opportunities for innovative approaches for combining sequence information with the vast reservoirs of historical information associated with crops and their evolution. The grasses provide a particularly well developed system for the development of tools to facilitate comparative genetic interpretation among members of a diverse and evolutionarily successful family. Rice provides advantages for genomic sequencing because of its small genome and its diploid nature, whereas each of the other grasses provides complementary genetic information that will help extract meaning from the sequence data. Because of the importance of the cereals to the human food chain, developments in this area can lead directly to opportunities for improving the health and productivity of our food systems and for promoting the sustainable use of natural resources.
Resumo:
Remembering an event involves not only what happened, but also where and when it occurred. We measured regional cerebral blood flow by positron emission tomography during initial encoding and subsequent retrieval of item, location, and time information. Multivariate image analysis showed that left frontal brain regions were always activated during encoding, and right superior frontal regions were always activated at retrieval. Pairwise image subtraction analyses revealed information-specific activations at (i) encoding, item information in left hippocampal, location information in right parietal, and time information in left fusiform regions; and (ii) retrieval, item in right inferior frontal and temporal, location in left frontal, and time in anterior cingulate cortices. These results point to the existence of general encoding and retrieval networks of episodic memory whose operations are augmented by unique brain areas recruited for processing specific aspects of remembered events.