62 resultados para Text summarization


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many plain text information hiding techniques demand deep semantic processing, and so suffer in reliability. In contrast, syntactic processing is a more mature and reliable technology. Assuming a perfect parser, this paper evaluates a set of automated and reversible syntactic transforms that can hide information in plain text without changing the meaning or style of a document. A large representative collection of newspaper text is fed through a prototype system. In contrast to previous work, the output is subjected to human testing to verify that the text has not been significantly compromised by the information hiding procedure, yielding a success rate of 96% and bandwidth of 0.3 bits per sentence. © 2007 SPIE-IS&T.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present three natural language marking strategies based on fast and reliable shallow parsing techniques, and on widely available lexical resources: lexical substitution, adjective conjunction swaps, and relativiser switching. We test these techniques on a random sample of the British National Corpus. Individual candidate marks are checked for goodness of structural and semantic fit, using both lexical resources, and the web as a corpus. A representative sample of marks is given to 25 human judges to evaluate for acceptability and preservation of meaning. This establishes a correlation between corpus based felicity measures and perceived quality, and makes qualified predictions. Grammatical acceptability correlates with our automatic measure strongly (Pearson's r = 0.795, p = 0.001), allowing us to account for about two thirds of variability in human judgements. A moderate but statistically insignificant (Pearson's r = 0.422, p = 0.356) correlation is found with judgements of meaning preservation, indicating that the contextual window of five content words used for our automatic measure may need to be extended. © 2007 SPIE-IS&T.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Over the past four decades immigration to France from the Francophone countries of North Africa has changed in character. For much of the twentieth century, migrants who crossed the Mediterranean to France were men seeking work, who frequently undertook manual labour, working long hours in difficult conditions. Recent decades have seen an increase in family reunification - the arrival of women and children from North Africa, either accompanying their husbands or joining them in France. Contemporary creative representations of migration are shaped by this shift in gender and generation from a solitary, mostly male experience to one that included women and children. Just as the shift made new demands of the 'host' society, it made new demands of authors and filmmakers as they seek to represent migration. This study reveals how text and film present new ways of thinking about migration, moving away from the configuration of the migrant as man and worker, to take women, children and the ties between them into account.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This casebook, the result of the collaborative efforts of a panel of experts from various EU Member States, is the latest in the Ius Commune Casebook series developed at the Universities of Maastricht and Leuven. The book provides a comprehensive and skilfully designed resource for students, practitioners, researchers, public officials, NGOs, consumer organisations and the judiciary. In common with earlier books in the series, this casebook presents cases and other materials (legislative materials, international and European materials, excerpts from books or articles). As non-discrimination law is a comparatively new subject, the chapters search for and develop the concepts of discrimination law on the basis of a wide variety of young and often still emerging case law and legislation. The result is a comprehensive textbook with materials from a wide variety of EU Member States. The book is entirely in English (i.e. materials are translated where not available in English). At the end of each chapter a comparative overview ties the material together, with emphasis, where appropriate, on existing or emerging general principles in the legal systems within Europe.
The book illustrates the distinct relationship between international, European and national legislation in the field of non-discrimination law. It covers the grounds of discrimination addressed in the Racial Equality and Employment Equality Directives, as well as non-discrimination law relating to gender. In so doing, it covers the law of a large number of EU Member States, alongside some international comparisons.
The Ius Commune Casebook on Non-Discrimination Law
- provides practitioners with ready access to primary and secondary legal material needed to assist them in crafting test case strategies.
- provides the judiciary with the tools needed to respond sensitively to such cases.
- provides material for teaching non-discrimination law to law and other students.
- provides a basis for ongoing research on non-discrimination law.
- provides an up-to-date overview of the implementation of the Directives and of the state of the law.
This Casebook is the result of a project which has been supported by a grant from the European Commission's Anti-Discrimination Programme.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Vector space models (VSMs) represent word meanings as points in a high dimensional space. VSMs are typically created using a large text corpora, and so represent word semantics as observed in text. We present a new algorithm (JNNSE) that can incorporate a measure of semantics not previously used to create VSMs: brain activation data recorded while people read words. The resulting model takes advantage of the complementary strengths and weaknesses of corpus and brain activation data to give a more complete representation of semantics. Evaluations show that the model 1) matches a behavioral measure of semantics more closely, 2) can be used to predict corpus data for unseen words and 3) has predictive power that generalizes across brain imaging technologies and across subjects. We believe that the model is thus a more faithful representation of mental vocabularies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The densely textured surfaces of Aran knitting seem to invite interpretation. They have been ‘read’ as identity documents, family trees, references to natural and spiritual phenomena, or even maps. This paper traces the search for meaning in Aran knitting, examining how these stitch patterns have been ‘read’ in the contexts of tourism, fine art and fashion. As Jo Turney (2013:55) argues, the idea of knitted textiles as communicative media in non-literate societies ‘consigns the garments to a preindustrial era of more rural and simple times’, situating them in an imagined state of ‘stasis’. Thus the ways in which Aran stitches are ‘read’ sometimes obscure the processes through which they are ‘written’, whether in terms of individual authorship and creativity, or in terms of their manufacture. Regardless of the historical veracity of claims that particular Aran stitch patterns index features of the social, natural or spiritual worlds, analysing the ways they have been ‘read’ in the context of comparable textile traditions, other crafts which have taken on ‘heritage’ souvenir status, and Irish national identity, reveals how Aran knitting has performed broader communicative functions (see Sonja Andrew 2008), which continue to be subverted and elaborated by fine artists, and translated into couture and mass market fashion products.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We address the problem of mining interesting phrases from subsets of a text corpus where the subset is specified using a set of features such as keywords that form a query. Previous algorithms for the problem have proposed solutions that involve sifting through a phrase dictionary based index or a document-based index where the solution is linear in either the phrase dictionary size or the size of the document subset. We propose the usage of an independence assumption between query keywords given the top correlated phrases, wherein the pre-processing could be reduced to discovering phrases from among the top phrases per each feature in the query. We then outline an indexing mechanism where per-keyword phrase lists are stored either in disk or memory, so that popular aggregation algorithms such as No Random Access and Sort-merge Join may be adapted to do the scoring at real-time to identify the top interesting phrases. Though such an approach is expected to be approximate, we empirically illustrate that very high accuracies (of over 90%) are achieved against the results of exact algorithms. Due to the simplified list-aggregation, we are also able to provide response times that are orders of magnitude better than state-of-the-art algorithms. Interestingly, our disk-based approach outperforms the in-memory baselines by up to hundred times and sometimes more, confirming the superiority of the proposed method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We consider the problem of segmenting text documents that have a
two-part structure such as a problem part and a solution part. Documents
of this genre include incident reports that typically involve
description of events relating to a problem followed by those pertaining
to the solution that was tried. Segmenting such documents
into the component two parts would render them usable in knowledge
reuse frameworks such as Case-Based Reasoning. This segmentation
problem presents a hard case for traditional text segmentation
due to the lexical inter-relatedness of the segments. We develop
a two-part segmentation technique that can harness a corpus
of similar documents to model the behavior of the two segments
and their inter-relatedness using language models and translation
models respectively. In particular, we use separate language models
for the problem and solution segment types, whereas the interrelatedness
between segment types is modeled using an IBM Model
1 translation model. We model documents as being generated starting
from the problem part that comprises of words sampled from
the problem language model, followed by the solution part whose
words are sampled either from the solution language model or from
a translation model conditioned on the words already chosen in the
problem part. We show, through an extensive set of experiments on
real-world data, that our approach outperforms the state-of-the-art
text segmentation algorithms in the accuracy of segmentation, and
that such improved accuracy translates well to improved usability
in Case-based Reasoning systems. We also analyze the robustness
of our technique to varying amounts and types of noise and empirically
illustrate that our technique is quite noise tolerant, and
degrades gracefully with increasing amounts of noise