Biblioteca Digital

Latent topics derived by topic models such as Latent Dirichlet Allocation (LDA) are the result of hidden thematic structures which provide further insights into the data. The automatic labelling of such topics derived from social media poses however new challenges since topics may characterise novel events happening in the real world. Existing automatic topic labelling approaches which depend on external knowledge sources become less applicable here since relevant articles/concepts of the extracted topics may not exist in external sources. In this paper we propose to address the problem of automatic labelling of latent topics learned from Twitter as a summarisation problem. We introduce a framework which apply summarisation algorithms to generate topic labels. These algorithms are independent of external sources and only rely on the identification of dominant terms in documents related to the latent topic. We compare the efficiency of existing state of the art summarisation algorithms. Our results suggest that summarisation algorithms generate better topic labels which capture event-related context compared to the top-n terms returned by LDA. © 2014 Association for Computational Linguistics.

Veja mais

Technology of Storage and Processing of Electronic Documents with Intellectual Search Properties

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The technology of record, storage and processing of the texts, based on creation of integer index cycles is discussed. Algorithms of exact-match search and search similar on the basis of inquiry in a natural language are considered. The software realizing offered approaches is described, and examples of the electronic archives possessing properties of intellectual search are resulted.

Veja mais

Using Covariance as a Similarity Measure for Document Language Identification in Hard Contexts

Relevância:

30.00% 30.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: C2P99.

Veja mais

Language policy and governmentality in businesses in Wales:a continuum of empowerment and regulation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, I examine how language policy acts as a means of both empowering the Welsh language and theminority language worker and as a means of exerting power over them. For this purpose, the study focuses on a particular site: private sector businesses in Wales. Therein, I trace two major discursive processes: first, the Welsh Government’s national language policy documents that promote corporate bilingualism and bilingual employees as value-added resources; second, the practice and discourse of company managers who sustain or appropriate such promotional discourses for creating and promoting their own organisational values. By drawing on concepts from governmentality, critical language policy and discourse studies, I show that promoting bilingualism in business is characterised by local and global governmentalities. These not only bring about critical shifts in valuing language as symbolic entities attached to ethnonational concerns or as promotional objects that bring material gain. Language governmentalities also appear to shape new forms of ‘languaging’ the minority language worker as selfgoverning, and yet, governed subjects who are ultimately made responsible for ‘owning’ Welsh.

Veja mais

Ranked search on data graphs

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Graph-structured databases are widely prevalent, and the problem of effective search and retrieval from such graphs has been receiving much attention recently. For example, the Web can be naturally viewed as a graph. Likewise, a relational database can be viewed as a graph where tuples are modeled as vertices connected via foreign-key relationships. Keyword search querying has emerged as one of the most effective paradigms for information discovery, especially over HTML documents in the World Wide Web. One of the key advantages of keyword search querying is its simplicity—users do not have to learn a complex query language, and can issue queries without any prior knowledge about the structure of the underlying data. The purpose of this dissertation was to develop techniques for user-friendly, high quality and efficient searching of graph structured databases. Several ranked search methods on data graphs have been studied in the recent years. Given a top-k keyword search query on a graph and some ranking criteria, a keyword proximity search finds the top-k answers where each answer is a substructure of the graph containing all query keywords, which illustrates the relationship between the keyword present in the graph. We applied keyword proximity search on the web and the page graph of web documents to find top-k answers that satisfy user’s information need and increase user satisfaction. Another effective ranking mechanism applied on data graphs is the authority flow based ranking mechanism. Given a top- k keyword search query on a graph, an authority-flow based search finds the top-k answers where each answer is a node in the graph ranked according to its relevance and importance to the query. We developed techniques that improved the authority flow based search on data graphs by creating a framework to explain and reformulate them taking in to consideration user preferences and feedback. We also applied the proposed graph search techniques for Information Discovery over biological databases. Our algorithms were experimentally evaluated for performance and quality. The quality of our method was compared to current approaches by using user surveys.

Veja mais

Improving cross language information retrieval using corpus based query suggestion approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Users seeking information may not find relevant information pertaining to their information need in a specific language. But information may be available in a language different from their own, but users may not know that language. Thus users may experience difficulty in accessing the information present in different languages. Since the retrieval process depends on the translation of the user query, there are many issues in getting the right translation of the user query. For a pair of languages chosen by a user, resources, like incomplete dictionary, inaccurate machine translation system may exist. These resources may be insufficient to map the query terms in one language to its equivalent terms in another language. Also for a given query, there might exist multiple correct translations. The underlying corpus evidence may suggest a clue to select a probable set of translations that could eventually perform a better information retrieval. In this paper, we present a cross language information retrieval approach to effectively retrieve information present in a language other than the language of the user query using the corpus driven query suggestion approach. The idea is to utilize the corpus based evidence of one language to improve the retrieval and re-ranking of news documents in the other language. We use FIRE corpora - Tamil and English news collections in our experiments and illustrate the effectiveness of the proposed cross language information retrieval approach.

Veja mais

National Cinemas and Non-Hegemonic Languages. The Invisibility of Galician Language Facing the Policies for Cultural Diversity

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the spirit of the proposals of the Agenda 2020 about the structural role of cinema in the configuration of the European identities, this article highlights the significance of the national cinemas in non-hegemonic languages in the conformation of a diverse European culture. Following this perspective, we use Galician cinema as a case study in which we analyze the presence (or more precisely the absence) of the Galician language in the original version in the feature films released between 2008 and 2012.This proposal is hosted by the I+D+I project eDCINEMA: “Towards the European Digital Space. The role of small cinemas in original version” (Ref. CSO2012-35784) financed by the Ministry of Economy and Competitiveness of Spain.

Veja mais

From labelling to social aid for immigrants in the 21st Century

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Migration is as old as humanity, but since the 1990s migration flows in Western Europe have led to societies that are not just multicultural but so-called «super-diverse». As a result, Western towns now have very complex social structures, with amongst others large amounts of small immigrant communities that are in constant change. In this paper we argue that for social workers to be able to offer adequate professional help to non-native residents in town, they will need balanced view of ‘culture’ and of the role culture plays in social aid. Culture is never static, but is continually changing. By teaching social workers about how to look at cultural backgrounds of immigrant groups and about the limitations of then role that culture plays in communication, they will be better equipped to provide adequate aid and will contribute to making various groups grow towards each other and to avoid people thinking in terms of ‘out-group-homogeneity’. Nowadays, inclusion is a priority in social work that almost every social worker supports. Social workers should have an open attitude to allow them to approach every individual as a unique person. They will see the other person as the person they are, and not as a part of a specific cultural group. Knowledge about the others makes them see the cultural heterogeneity in every group. The social sector, though, must be aware not to fall into the trap of the ‘inclusion mania’! This will cause the social deprivation of a particular group to be forgotten. An inclusive policy requires an inclusive society. Otherwise, this could result in even more deprivation of other groups, already discriminated against. Emancipation of deprived people demands a certain target-group policymaking. Categorized aid will raise efficiency of working with immigrants and of acknowledging the cultural identity of the non-natives group. It will also create the possibility to work on fighting social deprivation, in which most immigrants can be found.

Veja mais

Digitizing dissent: cyborg politics and fluid networks in contemporary Cuban activism

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Communication technologies shape how political activist networks are produced and maintain themselves. In Cuba, despite ideologically and physically oppressive practices by the state, a severe lack of Internet access, and extensive government surveillance, a small network of bloggers and cyberactivists has achieved international visibility and recognition for its critiques of the Cuban government. This qualitative study examines the blogger collective known as Voces Cubanas in Havana, Cuba in 2012, advancing a new approach to the study of transnational activism and the role of technology in the construction of political narrative. Voces Cubanas is analyzed as a network of connections between human and non-human actors that produces and sustains powerful political alliances. Voces Cubanas and its allies work collectively to co-produce contentious political discourses, confronting the dominant ideologies and knowledges produced by the Cuban state. Transnational alliances, the act of translation, and a host of unexpected and improvised technologies play central roles in the production of these narratives, indicating new breed of cyborg sociopolitical action reliant upon fluid and flexible networks and the act of writing.

Veja mais

Rethinking Copyright: History, Theory, Language

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This book provides the reader with a critical insight into the history and theory of copyright within contemporary legal and cultural discourse. It exposes as myth the orthodox history of the development of copyright law in eighteenth-century Britain and explores the way in which that myth became entrenched throughout the nineteenth and early twentieth centuries. To this historical analysis are added two theoretical approaches to copyright not otherwise found in mainstream contemporary texts. Rethinking Copyright introduces the reader to copyright through the prism of the public domain before considering how best to locate copyright within the parameters of traditional property discourse. Underpinning these various historical and theoretical strands, the book explores the constitutive power of legal writing and the place of rhetoric in framing and determining contemporary copyright policy and discourse.

Veja mais

Perception of Filtered Speech by Children with Developmental Dyslexia and Children with Specific Language Impairments

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Here we use two filtered speech tasks to investigate children’s processing of slow (<4 Hz) versus faster (∼33 Hz) temporal modulations in speech. We compare groups of children with either developmental dyslexia (Experiment 1) or speech and language impairments (SLIs, Experiment 2) to groups of typically-developing (TD) children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (<4 Hz) or band-pass filtered (22 – 40 Hz). Recognition of the filtered nursery rhymes was tested in a picture recognition multiple choice paradigm. Children with dyslexia aged 10 years showed equivalent recognition overall to TD controls for both the low-pass and band-pass filtered stimuli, but showed significantly impaired acoustic learning during the experiment from low-pass filtered targets. Children with oral SLIs aged 9 years showed significantly poorer recognition of band pass filtered targets compared to their TD controls, and showed comparable acoustic learning effects to TD children during the experiment. The SLI samples were also divided into children with and without phonological difficulties. The children with both SLI and phonological difficulties were impaired in recognizing both kinds of filtered speech. These data are suggestive of impaired temporal sampling of the speech signal at different modulation rates by children with different kinds of developmental language disorder. Both SLI and dyslexic samples showed impaired discrimination of amplitude rise times. Implications of these findings for a temporal sampling framework for understanding developmental language disorders are discussed.

Veja mais

987 resultados para HTML (Language for Labelling Documents)

Filtro por publicador