903 resultados para Probabilistic latent semantic analysis (PLSA)


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Two important characteristics of science are the ?reproducibility? and ?clarity?. By rigorous practices, scientists explore aspects of the world that they can reproduce under carefully controlled experimental conditions. The clarity, complementing reproducibility, provides unambiguous descriptions of results in a mechanical or mathematical form. Both pillars depend on well-structured and accurate descriptions of scientific practices, which are normally recorded in experimental protocols, scientific workflows, etc. Here we present SMART Protocols (SP), our ontology-based approach for representing experimental protocols and our contribution to clarity and reproducibility. SP delivers an unambiguous description of processes by means of which data is produced; by doing so, we argue, it facilitates reproducibility. Moreover, SP is thought to be part of e-science infrastructures. SP results from the analysis of 175 protocols; from this dataset, we extracted common elements. From our analysis, we identified document, workflow and domain-specific aspects in the representation of experimental protocols. The ontology is available at http://purl.org/net/SMARTprotocol

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sentiment and Emotion Analysis strongly depend on quality language resources, especially sentiment dictionaries. These resources are usually scattered, heterogeneous and limited to specific domains of appli- cation by simple algorithms. The EUROSENTIMENT project addresses these issues by 1) developing a common language resource representation model for sentiment analysis, and APIs for sentiment analysis services based on established Linked Data formats (lemon, Marl, NIF and ONYX) 2) by creating a Language Resource Pool (a.k.a. LRP) that makes avail- able to the community existing scattered language resources and services for sentiment analysis in an interoperable way. In this paper we describe the available language resources and services in the LRP and some sam- ple applications that can be developed on top of the EUROSENTIMENT LRP.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Los análisis de fiabilidad representan una herramienta adecuada para contemplar las incertidumbres inherentes que existen en los parámetros geotécnicos. En esta Tesis Doctoral se desarrolla una metodología basada en una linealización sencilla, que emplea aproximaciones de primer o segundo orden, para evaluar eficientemente la fiabilidad del sistema en los problemas geotécnicos. En primer lugar, se emplean diferentes métodos para analizar la fiabilidad de dos aspectos propios del diseño de los túneles: la estabilidad del frente y el comportamiento del sostenimiento. Se aplican varias metodologías de fiabilidad — el Método de Fiabilidad de Primer Orden (FORM), el Método de Fiabilidad de Segundo Orden (SORM) y el Muestreo por Importancia (IS). Los resultados muestran que los tipos de distribución y las estructuras de correlación consideradas para todas las variables aleatorias tienen una influencia significativa en los resultados de fiabilidad, lo cual remarca la importancia de una adecuada caracterización de las incertidumbres geotécnicas en las aplicaciones prácticas. Los resultados también muestran que tanto el FORM como el SORM pueden emplearse para estimar la fiabilidad del sostenimiento de un túnel y que el SORM puede mejorar el FORM con un esfuerzo computacional adicional aceptable. Posteriormente, se desarrolla una metodología de linealización para evaluar la fiabilidad del sistema en los problemas geotécnicos. Esta metodología solamente necesita la información proporcionada por el FORM: el vector de índices de fiabilidad de las funciones de estado límite (LSFs) que componen el sistema y su matriz de correlación. Se analizan dos problemas geotécnicos comunes —la estabilidad de un talud en un suelo estratificado y un túnel circular excavado en roca— para demostrar la sencillez, precisión y eficiencia del procedimiento propuesto. Asimismo, se reflejan las ventajas de la metodología de linealización con respecto a las herramientas computacionales alternativas. Igualmente se muestra que, en el caso de que resulte necesario, se puede emplear el SORM —que aproxima la verdadera LSF mejor que el FORM— para calcular estimaciones más precisas de la fiabilidad del sistema. Finalmente, se presenta una nueva metodología que emplea Algoritmos Genéticos para identificar, de manera precisa, las superficies de deslizamiento representativas (RSSs) de taludes en suelos estratificados, las cuales se emplean posteriormente para estimar la fiabilidad del sistema, empleando la metodología de linealización propuesta. Se adoptan tres taludes en suelos estratificados característicos para demostrar la eficiencia, precisión y robustez del procedimiento propuesto y se discuten las ventajas del mismo con respecto a otros métodos alternativos. Los resultados muestran que la metodología propuesta da estimaciones de fiabilidad que mejoran los resultados previamente publicados, enfatizando la importancia de hallar buenas RSSs —y, especialmente, adecuadas (desde un punto de vista probabilístico) superficies de deslizamiento críticas que podrían ser no-circulares— para obtener estimaciones acertadas de la fiabilidad de taludes en suelos. Reliability analyses provide an adequate tool to consider the inherent uncertainties that exist in geotechnical parameters. This dissertation develops a simple linearization-based approach, that uses first or second order approximations, to efficiently evaluate the system reliability of geotechnical problems. First, reliability methods are employed to analyze the reliability of two tunnel design aspects: face stability and performance of support systems. Several reliability approaches —the first order reliability method (FORM), the second order reliability method (SORM), the response surface method (RSM) and importance sampling (IS)— are employed, with results showing that the assumed distribution types and correlation structures for all random variables have a significant effect on the reliability results. This emphasizes the importance of an adequate characterization of geotechnical uncertainties for practical applications. Results also show that both FORM and SORM can be used to estimate the reliability of tunnel-support systems; and that SORM can outperform FORM with an acceptable additional computational effort. A linearization approach is then developed to evaluate the system reliability of series geotechnical problems. The approach only needs information provided by FORM: the vector of reliability indices of the limit state functions (LSFs) composing the system, and their correlation matrix. Two common geotechnical problems —the stability of a slope in layered soil and a circular tunnel in rock— are employed to demonstrate the simplicity, accuracy and efficiency of the suggested procedure. Advantages of the linearization approach with respect to alternative computational tools are discussed. It is also found that, if necessary, SORM —that approximates the true LSF better than FORM— can be employed to compute better estimations of the system’s reliability. Finally, a new approach using Genetic Algorithms (GAs) is presented to identify the fully specified representative slip surfaces (RSSs) of layered soil slopes, and such RSSs are then employed to estimate the system reliability of slopes, using our proposed linearization approach. Three typical benchmark-slopes with layered soils are adopted to demonstrate the efficiency, accuracy and robustness of the suggested procedure, and advantages of the proposed method with respect to alternative methods are discussed. Results show that the proposed approach provides reliability estimates that improve previously published results, emphasizing the importance of finding good RSSs —and, especially, good (probabilistic) critical slip surfaces that might be non-circular— to obtain good estimations of the reliability of soil slope systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although highly active antiretroviral therapy (HAART) in the form of triple combinations of drugs including protease inhibitors can reduce the plasma viral load of some HIV-1-infected individuals to undetectable levels, it is unclear what the effects of these regimens are on latently infected CD4+ T cells and what role these cells play in the persistence of HIV-1 infection in individuals receiving such treatment. The present study demonstrates that highly purified CD4+ T cells from 13 of 13 patients receiving HAART with an average treatment time of 10 months and with undetectable (<500 copies HIV RNA/ml) plasma viremia by a commonly used bDNA assay carried integrated proviral DNA and were capable of producing infectious virus upon cellular activation in vitro. Phenotypic analysis of HIV-1 produced by activation of latently infected CD4+ T cells revealed the presence in some patients of syncytium-inducing virus. In addition, the presence of unintegrated HIV-1 DNA in infected resting CD4+ T cells from patients receiving HAART, even those with undetectable plasma viremia, suggests persistent active virus replication in vivo.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a variety of statistical methods for obtaining precise quantitative estimates of the similarities and differences in the structures of semantic domains in different languages. The methods include comparing mean correlations within and between groups, principal components analysis of interspeaker correlations, and analysis of variance of speaker by question data. Methods for graphical displays of the results are also presented. The methods give convergent results that are mutually supportive and equivalent under suitable interpretation. The methods are illustrated on the semantic domain of emotion terms in a comparison of the semantic structures of native English and native Japanese speaking subjects. We suggest that, in comparative studies concerning the extent to which semantic structures are universally shared or culture-specific, both similarities and differences should be measured and compared rather than placing total emphasis on one or the other polar position.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The availability of complete genome sequences and mRNA expression data for all genes creates new opportunities and challenges for identifying DNA sequence motifs that control gene expression. An algorithm, “MobyDick,” is presented that decomposes a set of DNA sequences into the most probable dictionary of motifs or words. This method is applicable to any set of DNA sequences: for example, all upstream regions in a genome or all genes expressed under certain conditions. Identification of words is based on a probabilistic segmentation model in which the significance of longer words is deduced from the frequency of shorter ones of various lengths, eliminating the need for a separate set of reference data to define probabilities. We have built a dictionary with 1,200 words for the 6,000 upstream regulatory regions in the yeast genome; the 500 most significant words (some with as few as 10 copies in all of the upstream regions) match 114 of 443 experimentally determined sites (a significance level of 18 standard deviations). When analyzing all of the genes up-regulated during sporulation as a group, we find many motifs in addition to the few previously identified by analyzing the subclusters individually to the expression subclusters. Applying MobyDick to the genes derepressed when the general repressor Tup1 is deleted, we find known as well as putative binding sites for its regulatory partners.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Electroencephalographic (EEG) signals of the human brains represent electrical activities for a number of channels recorded over a the scalp. The main purpose of this thesis is to investigate the interactions and causality of different parts of a brain using EEG signals recorded during a performance subjects of verbal fluency tasks. Subjects who have Parkinson's Disease (PD) have difficulties with mental tasks, such as switching between one behavior task and another. The behavior tasks include phonemic fluency, semantic fluency, category semantic fluency and reading fluency. This method uses verbal generation skills, activating different Broca's areas of the Brodmann's areas (BA44 and BA45). Advanced signal processing techniques are used in order to determine the activated frequency bands in the granger causality for verbal fluency tasks. The graph learning technique for channel strength is used to characterize the complex graph of Granger causality. Also, the support vector machine (SVM) method is used for training a classifier between two subjects with PD and two healthy controls. Neural data from the study was recorded at the Colorado Neurological Institute (CNI). The study reveals significant difference between PD subjects and healthy controls in terms of brain connectivities in the Broca's Area BA44 and BA45 corresponding to EEG electrodes. The results in this thesis also demonstrate the possibility to classify based on the flow of information and causality in the brain of verbal fluency tasks. These methods have the potential to be applied in the future to identify pathological information flow and causality of neurological diseases.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we present the enrichment of the Integration of Semantic Resources based in WordNet (ISR-WN Enriched). This new proposal improves the previous one where several semantic resources such as SUMO, WordNet Domains and WordNet Affects were related, adding other semantic resources such as Semantic Classes and SentiWordNet. Firstly, the paper describes the architecture of this proposal explaining the particularities of each integrated resource. After that, we analyze some problems related to the mappings of different versions and how we solve them. Moreover, we show the advantages that this kind of tool can provide to different applications of Natural Language Processing. Related to that question, we can demonstrate that the integration of semantic resources allows acquiring a multidimensional vision in the analysis of natural language.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Comunicación presentada en el VII Symposium Nacional de Reconocimiento de Formas y Análisis de Imágenes, SNRFAI, Barcelona, abril 1997.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper addresses the problem of the automatic recognition and classification of temporal expressions and events in human language. Efficacy in these tasks is crucial if the broader task of temporal information processing is to be successfully performed. We analyze whether the application of semantic knowledge to these tasks improves the performance of current approaches. We therefore present and evaluate a data-driven approach as part of a system: TIPSem. Our approach uses lexical semantics and semantic roles as additional information to extend classical approaches which are principally based on morphosyntax. The results obtained for English show that semantic knowledge aids in temporal expression and event recognition, achieving an error reduction of 59% and 21%, while in classification the contribution is limited. From the analysis of the results it may be concluded that the application of semantic knowledge leads to more general models and aids in the recognition of temporal entities that are ambiguous at shallower language analysis levels. We also discovered that lexical semantics and semantic roles have complementary advantages, and that it is useful to combine them. Finally, we carried out the same analysis for Spanish. The results obtained show comparable advantages. This supports the hypothesis that applying the proposed semantic knowledge may be useful for different languages.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the main challenges to be addressed in text summarization concerns the detection of redundant information. This paper presents a detailed analysis of three methods for achieving such goal. The proposed methods rely on different levels of language analysis: lexical, syntactic and semantic. Moreover, they are also analyzed for detecting relevance in texts. The results show that semantic-based methods are able to detect up to 90% of redundancy, compared to only the 19% of lexical-based ones. This is also reflected in the quality of the generated summaries, obtaining better summaries when employing syntactic- or semantic-based approaches to remove redundancy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work we present a semantic framework suitable of being used as support tool for recommender systems. Our purpose is to use the semantic information provided by a set of integrated resources to enrich texts by conducting different NLP tasks: WSD, domain classification, semantic similarities and sentiment analysis. After obtaining the textual semantic enrichment we would be able to recommend similar content or even to rate texts according to different dimensions. First of all, we describe the main characteristics of the semantic integrated resources with an exhaustive evaluation. Next, we demonstrate the usefulness of our resource in different NLP tasks and campaigns. Moreover, we present a combination of different NLP approaches that provide enough knowledge for being used as support tool for recommender systems. Finally, we illustrate a case of study with information related to movies and TV series to demonstrate that our framework works properly.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The geographical proximity and socioeconomic dependence on the United States brought about a deep rooted anglicization of the Cuban Spanish lexis and social strata, especially throughout the Neocolonial period (1902–1959). This study is based on the revision of a renowned newspaper of that time, Diario de la Marina, and the corresponding elaboration of a corpus of English-induced loanwords. Diario de la Marina particularly targeted upper social class, and only crónicas sociales (society pages’ columns) and print advertising were revised because of their fully descriptive texts, which encoded the ruling class ideology and consumerism. The findings show that there existed a high number of lexical and cultural anglicisms in the sociolect in question, and that the sociolinguistic anglicization was openly embraced by the upper socioeconomic stratum, entailing a differentiating sign of sophistication and social stratification. Likewise, a number of the anglicisms collected, particularly those related with social events, are unused in contemporary Cuban Spanish, which suggests a major semantic shifting in this sociolect after 1959.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Trabalho Final do Curso de Mestrado Integrado em Medicina, Faculdade de Medicina, Universidade de Lisboa, 2014

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06