981 resultados para Semantic Text Analysis
Resumo:
To carry out their specific roles in the cell, genes and gene products often work together in groups, forming many relationships among themselves and with other molecules. Such relationships include physical protein-protein interaction relationships, regulatory relationships, metabolic relationships, genetic relationships, and much more. With advances in science and technology, some high throughput technologies have been developed to simultaneously detect tens of thousands of pairwise protein-protein interactions and protein-DNA interactions. However, the data generated by high throughput methods are prone to noise. Furthermore, the technology itself has its limitations, and cannot detect all kinds of relationships between genes and their products. Thus there is a pressing need to investigate all kinds of relationships and their roles in a living system using bioinformatic approaches, and is a central challenge in Computational Biology and Systems Biology. This dissertation focuses on exploring relationships between genes and gene products using bioinformatic approaches. Specifically, we consider problems related to regulatory relationships, protein-protein interactions, and semantic relationships between genes. A regulatory element is an important pattern or "signal", often located in the promoter of a gene, which is used in the process of turning a gene "on" or "off". Predicting regulatory elements is a key step in exploring the regulatory relationships between genes and gene products. In this dissertation, we consider the problem of improving the prediction of regulatory elements by using comparative genomics data. With regard to protein-protein interactions, we have developed bioinformatics techniques to estimate support for the data on these interactions. While protein-protein interactions and regulatory relationships can be detected by high throughput biological techniques, there is another type of relationship called semantic relationship that cannot be detected by a single technique, but can be inferred using multiple sources of biological data. The contributions of this thesis involved the development and application of a set of bioinformatic approaches that address the challenges mentioned above. These included (i) an EM-based algorithm that improves the prediction of regulatory elements using comparative genomics data, (ii) an approach for estimating the support of protein-protein interaction data, with application to functional annotation of genes, (iii) a novel method for inferring functional network of genes, and (iv) techniques for clustering genes using multi-source data.
Resumo:
This thesis research describes the design and implementation of a Semantic Geographic Information System (GIS) and the creation of its spatial database. The database schema is designed and created, and all textual and spatial data are loaded into the database with the help of the Semantic DBMS's Binary Database Interface currently being developed at the FIU's High Performance Database Research Center (HPDRC). A friendly graphical user interface is created together with the other main system's areas: displaying process, data animation, and data retrieval. All these components are tightly integrated to form a novel and practical semantic GIS that has facilitated the interpretation, manipulation, analysis, and display of spatial data like: Ocean Temperature, Ozone(TOMS), and simulated SeaWiFS data. At the same time, this system has played a major role in the testing process of the HPDRC's high performance and efficient parallel Semantic DBMS.
Resumo:
The Semantic Annotation component is a software application that provides support for automated text classification, a process grounded in a cohesion-centered representation of discourse that facilitates topic extraction. The component enables the semantic meta-annotation of text resources, including automated classification, thus facilitating information retrieval within the RAGE ecosystem. It is available in the ReaderBench framework (http://readerbench.com/) which integrates advanced Natural Language Processing (NLP) techniques. The component makes use of Cohesion Network Analysis (CNA) in order to ensure an in-depth representation of discourse, useful for mining keywords and performing automated text categorization. Our component automatically classifies documents into the categories provided by the ACM Computing Classification System (http://dl.acm.org/ccs_flat.cfm), but also into the categories from a high level serious games categorization provisionally developed by RAGE. English and French languages are already covered by the provided web service, whereas the entire framework can be extended in order to support additional languages.
Resumo:
A strategy for document analysis is presented which uses Portable Document Format (PDF the underlying file structure for Adobe Acrobat software) as its starting point. This strategy examines the appearance and geometric position of text and image blocks distributed over an entire document. A blackboard system is used to tag the blocks as a first stage in deducing the fundamental relationships existing between them. PDF is shown to be a useful intermediate stage in the bottom-up analysis of document structure. Its information on line spacing and font usage gives important clues in bridging the semantic gap between the scanned bitmap page and its fully analysed, block-structured form. Analysis of PDF can yield not only accurate page decomposition but also sufficient document information for the later stages of structural analysis and document understanding.
Resumo:
The Czech composer Petr Eben (1927-2007) has written music in all genres except symphony, but he is highly recognized for his organ and choral compositions, which are his preferred genres. His vocal works include choral songs and vocal-instrumental works at a wide range of difficulty levels, from simple pedagogical songs to very advanced and technically challenging compositions. This study examines two of Eben‘s vocal-instrumental compositions. The oratorio Apologia Sokratus (1967) is a three-movement work; its libretto is based on Plato‘s Apology of Socrates. The ballet Curses and Blessings (1983) has a libretto compiled from numerous texts from the thirteenth to the twentieth centuries. The formal design of the ballet is unusual—a three-movement composition where the first is choral, the second is orchestral, and the third combines the previous two played simultaneously. Eben assembled the libretti for both compositions and they both address the contrasting sides of the human soul, evil and good, and the everlasting fight between them. This unity and contrast is the philosophical foundation for both compositions. The dissertation discusses the multileveled meanings behind the text settings and musical style of the oratorio and ballet in analyses focusing on the text, melodic and harmonic construction, and symbolism. Additional brief analyses of other vocal and vocal-instrumental compositions by Eben establish the ground for the examination of the oratorio and ballet and for understanding features of the composer‘s musical style. While the oratorio Apologia Sokratus was discussed in short articles in the 1970s, the ballet Curses and Blessings has never previously been addressed within Eben scholarship. The dissertation examines the significant features of Eben‘s music. His melodic style incorporates influences as diverse as Gregorian chant and folk tunes on the one hand, and modern vocal techniques such as Sprechgesang and vocal aleatoricism on the other. His harmonic language includes bitonality and polytonality, used to augment the tonal legacy of earlier times, together with elements of pitch collections and limited serial procedures as well as various secundal and quartal harmonic sonorities derived from them. His music features the vibrant rhythms of folk music, and incorporates other folk devices like ostinato, repetitive patterns, and improvisation.
Resumo:
In this paper we use concepts from graph theory and cellular biology represented as ontologies, to carry out semantic mining tasks on signaling pathway networks. Specifically, the paper describes the semantic enrichment of signaling pathway networks. A cell signaling network describes the basic cellular activities and their interactions. The main contribution of this paper is in the signaling pathway research area, it proposes a new technique to analyze and understand how changes in these networks may affect the transmission and flow of information, which produce diseases such as cancer and diabetes. Our approach is based on three concepts from graph theory (modularity, clustering and centrality) frequently used on social networks analysis. Our approach consists into two phases: the first uses the graph theory concepts to determine the cellular groups in the network, which we will call them communities; the second uses ontologies for the semantic enrichment of the cellular communities. The measures used from the graph theory allow us to determine the set of cells that are close (for example, in a disease), and the main cells in each community. We analyze our approach in two cases: TGF-β and the Alzheimer Disease.
Resumo:
Double Degree
Resumo:
Part 19: Knowledge Management in Networks
Resumo:
Thanks to the advanced technologies and social networks that allow the data to be widely shared among the Internet, there is an explosion of pervasive multimedia data, generating high demands of multimedia services and applications in various areas for people to easily access and manage multimedia data. Towards such demands, multimedia big data analysis has become an emerging hot topic in both industry and academia, which ranges from basic infrastructure, management, search, and mining to security, privacy, and applications. Within the scope of this dissertation, a multimedia big data analysis framework is proposed for semantic information management and retrieval with a focus on rare event detection in videos. The proposed framework is able to explore hidden semantic feature groups in multimedia data and incorporate temporal semantics, especially for video event detection. First, a hierarchical semantic data representation is presented to alleviate the semantic gap issue, and the Hidden Coherent Feature Group (HCFG) analysis method is proposed to capture the correlation between features and separate the original feature set into semantic groups, seamlessly integrating multimedia data in multiple modalities. Next, an Importance Factor based Temporal Multiple Correspondence Analysis (i.e., IF-TMCA) approach is presented for effective event detection. Specifically, the HCFG algorithm is integrated with the Hierarchical Information Gain Analysis (HIGA) method to generate the Importance Factor (IF) for producing the initial detection results. Then, the TMCA algorithm is proposed to efficiently incorporate temporal semantics for re-ranking and improving the final performance. At last, a sampling-based ensemble learning mechanism is applied to further accommodate the imbalanced datasets. In addition to the multimedia semantic representation and class imbalance problems, lack of organization is another critical issue for multimedia big data analysis. In this framework, an affinity propagation-based summarization method is also proposed to transform the unorganized data into a better structure with clean and well-organized information. The whole framework has been thoroughly evaluated across multiple domains, such as soccer goal event detection and disaster information management.
Resumo:
Artificial Intelligence (AI) is gaining ever more ground in every sphere of human life, to the point that it is now even used to pass sentences in courts. The use of AI in the field of Law is however deemed quite controversial, as it could provide more objectivity yet entail an abuse of power as well, given that bias in algorithms behind AI may cause lack of accuracy. As a product of AI, machine translation is being increasingly used in the field of Law too in order to translate laws, judgements, contracts, etc. between different languages and different legal systems. In the legal setting of Company Law, accuracy of the content and suitability of terminology play a crucial role within a translation task, as any addition or omission of content or mistranslation of terms could entail legal consequences for companies. The purpose of the present study is to first assess which neural machine translation system between DeepL and ModernMT produces a more suitable translation from Italian into German of the atto costitutivo of an Italian s.r.l. in terms of accuracy of the content and correctness of terminology, and then to assess which translation proves to be closer to a human reference translation. In order to achieve the above-mentioned aims, two human and automatic evaluations are carried out based on the MQM taxonomy and the BLEU metric. Results of both evaluations show an overall better performance delivered by ModernMT in terms of content accuracy, suitability of terminology, and closeness to a human translation. As emerged from the MQM-based evaluation, its accuracy and terminology errors account for just 8.43% (as opposed to DeepL’s 9.22%), while it obtains an overall BLEU score of 29.14 (against DeepL’s 27.02). The overall performances however show that machines still face barriers in overcoming semantic complexity, tackling polysemy, and choosing domain-specific terminology, which suggests that the discrepancy with human translation may still be remarkable.
Resumo:
This paper explains and explores the concept of "semantic molecules" in the NSM methodology of semantic analysis. A semantic molecule is a complex lexical meaning which functions as an intermediate unit in the structure of other, more complex concepts. The paper undertakes an overview of different kinds of semantic molecule, showing how they enter into more complex meanings and how they themselves can be explicated. It shows that four levels of "nesting" of molecules within molecules are attested, and it argues that while some molecules such as 'hands' and 'make', may well be language-universal, many others are language-specific.
Resumo:
Formal Concept Analysis is an unsupervised machine learning technique that has successfully been applied to document organisation by considering documents as objects and keywords as attributes. The basic algorithms of Formal Concept Analysis then allow an intelligent information retrieval system to cluster documents according to keyword views. This paper investigates the scalability of this idea. In particular we present the results of applying spatial data structures to large datasets in formal concept analysis. Our experiments are motivated by the application of the Formal Concept Analysis idea of a virtual filesystem [11,17,15]. In particular the libferris [1] Semantic File System. This paper presents customizations to an RD-Tree Generalized Index Search Tree based index structure to better support the application of Formal Concept Analysis to large data sources.
Resumo:
We used event-related functional magnetic resonance imaging (fMRI) to investigate neural responses associated with the semantic interference (SI) effect in the picture-word task. Independent stage models of word production assume that the locus of the SI effect is at the conceptual processing level (Levelt et al. [1999]: Behav Brain Sci 22:1-75), whereas interactive models postulate that it occurs at phonological retrieval (Starreveld and La Heij [1996]: J Exp Psychol Learn Mem Cogn 22:896-918). In both types of model resolution of the SI effect occurs as a result of competitive, spreading activation without the involvement of inhibitory links. These assumptions were tested by randomly presenting participants with trials from semantically-related and lexical control distractor conditions and acquiring image volumes coincident with the estimated peak hemodynamic response for each trial. Overt vocalization of picture names occurred in the absence of scanner noise, allowing reaction time (RT) data to be collected. Analysis of the RT data confirmed the SI effect. Regions showing differential hemodynamic responses during the SI effect included the left mid section of the middle temporal gyrus, left posterior superior temporal gyrus, left anterior cingulate cortex, and bilateral orbitomedial prefrontal cortex. Additional responses were observed in the frontal eye fields, left inferior parietal lobule, and right anterior temporal and occipital cortex. The results are interpreted as indirectly supporting interactive models that allow spreading activation between both conceptual processing and phonological retrieval levels of word production. In addition, the data confirm that selective attention/response suppression has a role in resolving the SI effect similar to the way in which Stroop interference is resolved. We conclude that neuroimaging studies can provide information about the neuroanatomical organization of the lexical system that may prove useful for constraining theoretical models of word production. (C) 2001 Wiley-Liss, Inc.
Resumo:
The synthesis of helium in the early Universe depends on many input parameters, including the value of the gravitational coupling during the period when the nucleosynthesis takes place. We compute the primordial abundance of helium as function of the gravitational coupling, using a semi-analytical method, in order to track the influence of G in the primordial nucleosynthesis. To be specific, we construct a cosmological model with varying G, using the Brans-Dicke theory. The greater the value of G at nucleosynthesis period, the greater the predicted abundance of helium. Using the observational data for the abundance of primordial helium, constraints for the time variation of G are established.
Resumo:
A pesquisa se caracteriza pela abordagem plurimetodológica do tipo qualitativa/quantitativa, e tem como objetivo investigar de que maneira se constroem as estratégias de conciliação entre a formação esportiva e escolar em atletas de elite que servem às seleções brasileiras masculinas de basquetebol Sub 17 e Sub 19. O estudo se organizou em três capítulos. Do tipo “estado do conhecimento”, o primeiro capítulo tem por objetivo mapear as produções acadêmicas que tratam da conciliação entre formação escolar e formação esportiva. Utiliza como fonte a base de dados Scielo para busca nacional e o Portal Periódicos Capes para busca internacional. Foram encontrados 17 artigos distribuídos em 13 periódicos. Os dados foram classificados/analisados por meio de indicadores bibliométricos, como distribuição anual, distribuição por revista, relação autoral e origem demográfica. Para análise também foram levados em consideração uma tese de doutorado, três dissertações de mestrado e três trabalhos apresentados em congresso, além de um número especial de periódico, não localizado nas bases escolhidas. Mostra que a preocupação com o tema surge na Europa e nos Estados Unidos, na década de 70, e que, no Brasil, essa questão passa a ser abordada nos anos 2000. Demonstra tentativas de conciliação entre as formações realizadas em países da Europa, Estados Unidos e Brasil, além da importância da família e do pertencimento de classes sociais na possibilidade de priorização a uma das formações envolvidas. O segundo capítulo, de natureza quali-quantitativa, investiga as estratégias utilizadas pelos atletas convocados em 2013 para as seleções brasileiras de basquetebol masculinas de base Sub 17 e Sub 19 anos, quanto às possíveis conciliações entre formação esportiva e escolar. Busca, ainda, compreender a influência das convocações para as seleções nacionais nos índices de escolaridade desses atletas de elite, como abandono, atraso e repetência escolar. A pesquisa mostra que esse grupo de atletas de elite apresenta médias de repetência, abandono e atraso escolar maiores que as médias nacionais. O terceiro capítulo analisa o entendimento desse grupo de jovens atletas em relação à formação escolar ou, ainda, se um possível desinteresse do grupo pelo modelo atual de escola se daria apenas pelo fato de serem esportistas de elite. Para isso, recorre às possibilidades de investigação oriundas da segunda metade do questionário utilizado como instrumento para adotar uma metodologia de livre associação de palavras direcionadas a partir de quatro palavras indutoras (estruturas semânticas), a saber: “treinar”, “estudar”, “ir a escola” e “competir”. Essa associação livre é usualmente utilizada como suporte teórico/metodológico em pesquisas que investigam representação social (ACOSTA, 2005). Ao dar visibilidade a essas questões nota-se que a posição desses atletas, em relação à escola, não difere das encontradas em outras pesquisas que tratam de jovens inseridos no ensino médio. A falta de significado do que se aprende na escola em relação ao que eles desejam desenvolver como atividade laboral, faz com que a escola seja entendida como monótona, mas, ao mesmo tempo, necessária, caso seus projetos de formação esportiva não aconteçam.