Biblioteca Digital

We address the problem of mining interesting phrases from subsets of a text corpus where the subset is specified using a set of features such as keywords that form a query. Previous algorithms for the problem have proposed solutions that involve sifting through a phrase dictionary based index or a document-based index where the solution is linear in either the phrase dictionary size or the size of the document subset. We propose the usage of an independence assumption between query keywords given the top correlated phrases, wherein the pre-processing could be reduced to discovering phrases from among the top phrases per each feature in the query. We then outline an indexing mechanism where per-keyword phrase lists are stored either in disk or memory, so that popular aggregation algorithms such as No Random Access and Sort-merge Join may be adapted to do the scoring at real-time to identify the top interesting phrases. Though such an approach is expected to be approximate, we empirically illustrate that very high accuracies (of over 90%) are achieved against the results of exact algorithms. Due to the simplified list-aggregation, we are also able to provide response times that are orders of magnitude better than state-of-the-art algorithms. Interestingly, our disk-based approach outperforms the in-memory baselines by up to hundred times and sometimes more, confirming the superiority of the proposed method.

Veja mais

Two-part Segmentation of Text Documents

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the problem of segmenting text documents that have a
two-part structure such as a problem part and a solution part. Documents
of this genre include incident reports that typically involve
description of events relating to a problem followed by those pertaining
to the solution that was tried. Segmenting such documents
into the component two parts would render them usable in knowledge
reuse frameworks such as Case-Based Reasoning. This segmentation
problem presents a hard case for traditional text segmentation
due to the lexical inter-relatedness of the segments. We develop
a two-part segmentation technique that can harness a corpus
of similar documents to model the behavior of the two segments
and their inter-relatedness using language models and translation
models respectively. In particular, we use separate language models
for the problem and solution segment types, whereas the interrelatedness
between segment types is modeled using an IBM Model
1 translation model. We model documents as being generated starting
from the problem part that comprises of words sampled from
the problem language model, followed by the solution part whose
words are sampled either from the solution language model or from
a translation model conditioned on the words already chosen in the
problem part. We show, through an extensive set of experiments on
real-world data, that our approach outperforms the state-of-the-art
text segmentation algorithms in the accuracy of segmentation, and
that such improved accuracy translates well to improved usability
in Case-based Reasoning systems. We also analyze the robustness
of our technique to varying amounts and types of noise and empirically
illustrate that our technique is quite noise tolerant, and
degrades gracefully with increasing amounts of noise

Veja mais

Session organiser/chair: Architecture and the changing construction of national identity in image, text and building

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Applying Machine Learning Methods to Text Corpora and Case Bases

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Public health risk communication by text message in response to a cluster of invasive meningococcal infection in a primary school

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Public health risk communication during emergencies should be rapid and accurate in order to allow the audience to take steps to prevent adverse outcomes. Delays to official communications may cause unnecessary anxiety due to uncertainty or inaccurate information circulating within the at-risk group. Modern electronic communications present opportunities for rapid, targeted public health risk communication. We present a case report of a cluster of invasive meningococcal disease in a primary school in which we used the school's mass short message service (SMS) text message system to inform parents and guardians of pupils about the incident, to tell them that chemoprophylaxis would be offered to all pupils and staff, and to advise them when to attend the school to obtain further information and antibiotics. Following notification to public health on a Saturday, an incident team met on Sunday, sent the SMS messages that afternoon, and administered chemoprophyaxis to 93% of 404 pupils on Monday. The use of mass SMS messages enabled rapid communication from an official source and greatly aided the public health response to the cluster.

Veja mais

Learning about probability from text and tables: Do color coding and labeling through an interactive-user interface help?

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Learning from visual representations is enhanced when learners appropriately integrate corresponding visual and verbal information. This study examined the effects of two methods of promoting integration, color coding and labeling, on learning about probabilistic reasoning from a table and text. Undergraduate students (N = 98) were randomly assigned to learn about probabilistic reasoning from one of 4 computer-based lessons generated from a 2 (color coding/no color coding) by 2 (labeling/no labeling) between-subjects design. Learners added the labels or color coding at their own pace by clicking buttons in a computer-based lesson. Participants' eye movements were recorded while viewing the lesson. Labeling was beneficial for learning, but color coding was not. In addition, labeling, but not color coding, increased attention to important information in the table and time with the lesson. Both labeling and color coding increased looks between the text and corresponding information in the table. The findings provide support for the multimedia principle, and they suggest that providing labeling enhances learning about probabilistic reasoning from text and tables

Veja mais

Staging the Alphabet: Text, Performance and the Feminine

Relevância:

20.00% 20.00%

Publicador:

Resumo:

My central concern in this thesis is to develop an artistic language that arises from the use of the Greek and Latin alphabet as well as from Greek and English words. In my native country, Greece, there is a tradition of great symbolic significance attached to letters and numbers. By examining the visual, the semiological as well as the psychological aspects of symbolism, I created artistic works that were based on the use of type and text in contemporary fine arts, through the female subjectivity.

Veja mais

Stars in Your Eyes due date card

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This is a due date card for the book titled Stars in Your Eyes with handwritten names and stamped dates at bottom from 1942.

Veja mais

Detecting Adverse Events in Clinical Trial Free Text

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2013

Veja mais

'Lighter-than-air: the law of the anthropocene’, Art catalogue text for Tomas Saraceno’s Monument to the Anthropocene

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Technical Vocabulary and Medieval Text Types: A Semantic Field Approach

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper seeks to discover in what sense we can classify vocabulary items as technical terms in the later medieval period. In order to arrive at a principled categorization of technicality, distribution is taken as a diagnostic factor: vocabulary shared across the widest range of text types may be assumed to be both prototypical for the semantic field, but also the most general and therefore least technical terms since lexical items derive at least part of their meaning from context, a wider range of contexts implying a wider range of senses. A further way of addressing the question of technicality is tested through the classification of the lexis into semantic hierarchies: in the terms of componential analysis, having more components of meaning puts a term lower in the semantic hierarchy and flags it as having a greater specificity of sense, and thus as more technical. The various text types are interrogated through comparison of the number of levels in their hierarchies and number of lexical items at each level within the hierarchies. Focusing on the vocabulary of a single semantic field, DRESS AND TEXTILES, this paper investigates how four medieval text types (wills, sumptuary laws, petitions, and romances) employ technical terminology in the establishment of the conventions of their genres.

Veja mais

Automatic syllabification for danish text-to-speech systems

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, a rule-based automatic syllabifier for Danish is described using the Maximal Onset Principle. Prior success rates of rule-based methods applied to Portuguese and Catalan syllabification modules were on the basis of this work. The system was implemented and tested using a very small set of rules. The results gave rise to 96.9% and 98.7% of word accuracy rate, contrary to our initial expectations, being Danish a language with a complex syllabic structure and thus difficult to be rule-driven. Comparison with data-driven syllabification system using artificial neural networks showed a higher accuracy rate of the former system.

Veja mais

906 resultados para Handwritten text

Filtro por publicador