992 resultados para Latent Semantic Indexing


Relevância:

20.00% 20.00%

Publicador:

Resumo:

[EN]Measuring semantic similarity and relatedness between textual items (words, sentences, paragraphs or even documents) is a very important research area in Natural Language Processing (NLP). In fact, it has many practical applications in other NLP tasks. For instance, Word Sense Disambiguation, Textual Entailment, Paraphrase detection, Machine Translation, Summarization and other related tasks such as Information Retrieval or Question Answering. In this masther thesis we study di erent approaches to compute the semantic similarity between textual items. In the framework of the european PATHS project1, we also evaluate a knowledge-base method on a dataset of cultural item descriptions. Additionaly, we describe the work carried out for the Semantic Textual Similarity (STS) shared task of SemEval-2012. This work has involved supporting the creation of datasets for similarity tasks, as well as the organization of the task itself.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 10^11 neurons, each making an average of 10^3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. Slowly, we are beginning to acquire experimental tools that can gather the massive amounts of data needed to characterize this system. However, to understand and interpret these data will also require substantial strides in inferential and statistical techniques. This dissertation attempts to meet this need, extending and applying the modern tools of latent variable modeling to problems in neural data analysis.

It is divided into two parts. The first begins with an exposition of the general techniques of latent variable modeling. A new, extremely general, optimization algorithm is proposed - called Relaxation Expectation Maximization (REM) - that may be used to learn the optimal parameter values of arbitrary latent variable models. This algorithm appears to alleviate the common problem of convergence to local, sub-optimal, likelihood maxima. REM leads to a natural framework for model size selection; in combination with standard model selection techniques the quality of fits may be further improved, while the appropriate model size is automatically and efficiently determined. Next, a new latent variable model, the mixture of sparse hidden Markov models, is introduced, and approximate inference and learning algorithms are derived for it. This model is applied in the second part of the thesis.

The second part brings the technology of part I to bear on two important problems in experimental neuroscience. The first is known as spike sorting; this is the problem of separating the spikes from different neurons embedded within an extracellular recording. The dissertation offers the first thorough statistical analysis of this problem, which then yields the first powerful probabilistic solution. The second problem addressed is that of characterizing the distribution of spike trains recorded from the same neuron under identical experimental conditions. A latent variable model is proposed. Inference and learning in this model leads to new principled algorithms for smoothing and clustering of spike data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

[ES] En este trabajo se define el cambio semántico, se analizan las causas de que se produzca y se especifican sus tipos en el griego antiguo.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Part I

The latent heat of vaporization of n-decane is measured calorimetrically at temperatures between 160° and 340°F. The internal energy change upon vaporization, and the specific volume of the vapor at its dew point are calculated from these data and are included in this work. The measurements are in excellent agreement with available data at 77° and also at 345°F, and are presented in graphical and tabular form.

Part II

Simultaneous material and energy transport from a one-inch adiabatic porous cylinder is studied as a function of free stream Reynolds Number and turbulence level. Experimental data is presented for Reynolds Numbers between 1600 and 15,000 based on the cylinder diameter, and for apparent turbulence levels between 1.3 and 25.0 per cent. n-heptane and n-octane are the evaporating fluids used in this investigation.

Gross Sherwood Numbers are calculated from the data and are in substantial agreement with existing correlations of the results of other workers. The Sherwood Numbers, characterizing mass transfer rates, increase approximately as the 0.55 power of the Reynolds Number. At a free stream Reynolds Number of 3700 the Sherwood Number showed a 40% increase as the apparent turbulence level of the free stream was raised from 1.3 to 25 per cent.

Within the uncertainties involved in the diffusion coefficients used for n-heptane and n-octane, the Sherwood Numbers are comparable for both materials. A dimensionless Frössling Number is computed which characterizes either heat or mass transfer rates for cylinders on a comparable basis. The calculated Frössling Numbers based on mass transfer measurements are in substantial agreement with Frössling Numbers calculated from the data of other workers in heat transfer.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Two distinct trends are emerging with respect to how data is shared, collected, and analyzed within the bioinformatics community. First, Linked Data, exposed as SPARQL endpoints, promises to make data easier to collect and integrate by moving towards the harmonization of data syntax, descriptive vocabularies, and identifiers, as well as providing a standardized mechanism for data access. Second, Web Services, often linked together into workflows, normalize data access and create transparent, reproducible scientific methodologies that can, in principle, be re-used and customized to suit new scientific questions. Constructing queries that traverse semantically-rich Linked Data requires substantial expertise, yet traditional RESTful or SOAP Web Services cannot adequately describe the content of a SPARQL endpoint. We propose that content-driven Semantic Web Services can enable facile discovery of Linked Data, independent of their location. Results: We use a well-curated Linked Dataset - OpenLifeData - and utilize its descriptive metadata to automatically configure a series of more than 22,000 Semantic Web Services that expose all of its content via the SADI set of design principles. The OpenLifeData SADI services are discoverable via queries to the SHARE registry and easy to integrate into new or existing bioinformatics workflows and analytical pipelines. We demonstrate the utility of this system through comparison of Web Service-mediated data access with traditional SPARQL, and note that this approach not only simplifies data retrieval, but simultaneously provides protection against resource-intensive queries. Conclusions: We show, through a variety of different clients and examples of varying complexity, that data from the myriad OpenLifeData can be recovered without any need for prior-knowledge of the content or structure of the SPARQL endpoints. We also demonstrate that, via clients such as SHARE, the complexity of federated SPARQL queries is dramatically reduced.