905 resultados para Text categorisation


Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

XML documents are becoming more and more common in various environments. In particular, enterprise-scale document management is commonly centred around XML, and desktop applications as well as online document collections are soon to follow. The growing number of XML documents increases the importance of appropriate indexing methods and search tools in keeping the information accessible. Therefore, we focus on content that is stored in XML format as we develop such indexing methods. Because XML is used for different kinds of content ranging all the way from records of data fields to narrative full-texts, the methods for Information Retrieval are facing a new challenge in identifying which content is subject to data queries and which should be indexed for full-text search. In response to this challenge, we analyse the relation of character content and XML tags in XML documents in order to separate the full-text from data. As a result, we are able to both reduce the size of the index by 5-6\% and improve the retrieval precision as we select the XML fragments to be indexed. Besides being challenging, XML comes with many unexplored opportunities which are not paid much attention in the literature. For example, authors often tag the content they want to emphasise by using a typeface that stands out. The tagged content constitutes phrases that are descriptive of the content and useful for full-text search. They are simple to detect in XML documents, but also possible to confuse with other inline-level text. Nonetheless, the search results seem to improve when the detected phrases are given additional weight in the index. Similar improvements are reported when related content is associated with the indexed full-text including titles, captions, and references. Experimental results show that for certain types of document collections, at least, the proposed methods help us find the relevant answers. Even when we know nothing about the document structure but the XML syntax, we are able to take advantage of the XML structure when the content is indexed for full-text search.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents 'vSpeak', the first initiative taken in Pakistan for ICT enabled conversion of dynamic Sign Urdu gestures into natural language sentences. To realize this, vSpeak has adopted a novel approach for feature extraction using edge detection and image compression which gives input to the Artificial Neural Network that recognizes the gesture. This technique caters for the blurred images as well. The training and testing is currently being performed on a dataset of 200 patterns of 20 words from Sign Urdu with target accuracy of 90% and above.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes an approach based on Zernike moments and Delaunay triangulation for localization of hand-written text in machine printed text documents. The Zernike moments of the image are first evaluated and we classify the text as hand-written using the nearest neighbor classifier. These features are independent of size, slant, orientation, translation and other variations in handwritten text. We then use Delaunay triangulation to reclassify the misclassified text regions. When imposing Delaunay triangulation on the centroid points of the connected components, we extract features based on the triangles and reclassify the text. We remove the noise components in the document as part of the preprocessing step so this method works well on noisy documents. The success rate of the method is found to be 86%. Also for specific hand-written elements such as signatures or similar text the accuracy is found to be even higher at 93%.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an overview of the 6th ALTA shared task that ran in 2015. The task was to identify in English texts all the potential cognates from the perspective of the French language. In other words, identify all the words in the English text that would acceptably translate into a similar word in French. We present the motivations for the task, the description of the data and the results of the 4 participating teams. We discuss the results against a baseline and prior work.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Advertising and marketing institutions produce categorisations of different groups of population. These categorisations serve as tools for addressing the potential consumers. This research is about how and what kind of categorisations of consumerhood are produced and how they are used as governing patterns within the institutions of advertising. My goal is to shed light on methods, cultural patterns and discourses for making people become consumers, objects for marketing measures. The data consists of 23 qualitative thematic interviews with Finnish advertising professionals. Moreover, examples are drawn from professional magazines and brochures of media agencies and marketing research organisations. First, I present some of the the official consumer categories in the consumer monitors produced by research organisations. Then, I analyse the unofficial consumer categories which are produced by advertising professionals in the interviews. The methodological framework is based on discourse theory and especially on Michel Foucault s ideas on power, governmentality, and discourses. Discursive categorisation of the population is one of the means of governmentality used in marketing and advertising. Knowledge of the consumer research is used as a tool for governing the potential consumers. Even though the real consumers always have a possibility to behave against the marketer s wishes. The marketers can not make people buy certain products or services, but they aim at influencing people in a way that they want to buy the products and start to govern themselves. As result, I present six unofficial discursive consumer categories, which are used by the advertising professionals. The consumerhood may be represented as rational, self-fulfilling, indifferent, whimsical, manipulated or sovereign. However, The discursive consumer categorisations are overlapping and controversial. The interviewed advertising professionals construct their own particular position in relation to the consumers which are viewed as others . On the other, the interviewees may talk about themselves as consumers. Finally, I maintain that the consumers and target groups of advertising are viewed as commodities in advertising institutions. The end product of the product development is not only the product but the aim is to produce the consumer of the product. The research of the ways advertising professionals aim to govern the consumers gives knowledge on the networks of power in which people act within consumer culture.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The study analyses the reaction of urban residents to problems, i.e. disturbing factors, in their living environment, and also their ways of doing something about these problems. It is based on urban-sociological theory on everyday life in a modern metropolis. On this theoretical basis, problems in the urban living environment are analysed in terms of a policy of everyday interference: when urban citizens become aware of a problem in their environment, they face a pattern of behaviour where the norm is polite indifference and negative solidarity. They may feel they ought to do something about the problem, but at the same time, an implicit rule of urban life is not to interfere with other people s lives so they won t interfere with yours. For example, it is not that easy for someone disturbed by littering to complain directly to those who litter the streets. Or if you complain about tobacco smoke from the neighbour s balcony, your neighbours might get cross. Direct interference with a problem in the environment usually implies an encounter with a hitherto unknown counterpart and their possible counter-reaction. The risk is either to lose face or get into downright conflict. Therefore, an easier way may be to complain to the city authorities. The Helsinki City Environment Centre is currently working on solutions for all the various kinds of problems that occur in a dense urban structure. Various ways of conceptualising the problems in the living environment are analysed empirically using theme interviews made with citizens having contacted Helsinki City Environment Centre. A phenomenographic approach and a theory-based categorisation are applied on the analysis of the theme interviews. On the grounds of the analysis, the ways of conceptualising are determined by 1) the difficulty of interfering and convincing other people, which in practice means meddling in other people s business, 2) a territorial struggle for space and a place in a dense urban structure, 3) breaches of rules and norms for social routines in urban life, and 4) a crumbling of the urban identity and all that goes along with that. The analysis of the ways of conceptualisation is deepened using a cultural risk theory. The final outcome of the analysis is four types of behaviour among urban residents with regard to interference with everyday problems in the living environment. They have been called yard police , fence builder , park warden and environmental caretaker . The study combines an urban-sociological approach with the theoretical tradition of urban research and with research on municipal environmental policy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper describes a new flexible delexicalization method based on glottal excited parametric speech synthesis scheme. The system utilizes inverse filtered glottal flow and all-pole modelling of the vocal tract. The method provides a possibil- ity to retain and manipulate all relevant prosodic features of any kind of speech. Most importantly, the features include voice quality, which has not been properly modeled in earlier delex- icalization methods. The functionality of the new method was tested in a prosodic tagging experiment aimed at providing word prominence data for a text-to-speech synthesis system. The ex- periment confirmed the usefulness of the method and further corroborated earlier evidence that linguistic factors influence the perception of prosodic prominence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Despite the acknowledged importance of strategic planning in business and other organizations, there are few studies focusing on strategy texts and the related processes of their production and consumption. In this paper, we attempt to partially fill this research gap by examining the institutionalized aspects of strategy discourse: what strategy is as genre. Combining textual analysis and analysis of conversation, the article focuses on the official strategy of the City of Lahti in Finland. Our analysis shows how specific communicative purposes and lexico-grammatical features characterize the genre of strategy and how the actual negotiations over strategy text involve particular kinds of intersubjectivity and intertextuality.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose two texture-based approaches, one involving Gabor filters and the other employing log-polar wavelets, for separating text from non-text elements in a document image. Both the proposed algorithms compute local energy at some information-rich points, which are marked by Harris' corner detector. The advantage of this approach is that the algorithm calculates the local energy at selected points and not throughout the image, thus saving a lot of computational time. The algorithm has been tested on a large set of scanned text pages and the results have been seen to be better than the results from the existing algorithms. Among the proposed schemes, the Gabor filter based scheme marginally outperforms the wavelet based scheme.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Separation of printed text blocks from the non-text areas, containing signatures, handwritten text, logos and other such symbols, is a necessary first step for an OCR involving printed text recognition. In the present work, we compare the efficacy of some feature-classifier combinations to carry out this separation task. We have selected length-nomalized horizontal projection profile (HPP) as the starting point of such a separation task. This is with the assumption that the printed text blocks contain lines of text which generate HPP's with some regularity. Such an assumption is demonstrated to be valid. Our features are the HPP and its two transformed versions, namely, eigen and Fisher profiles. Four well known classifiers, namely, Nearest neighbor, Linear discriminant function, SVM's and artificial neural networks have been considered and efficiency of the combination of these classifiers with the above features is compared. A sequential floating feature selection technique has been adopted to enhance the efficiency of this separation task. The results give an average accuracy of about 96.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes and compares four methods of binarzing text images captured using a camera mounted on a cell phone. The advantages and disadvantages(image clarity and computational complexity) of each method over the others are demonstrated through binarized results. The images are of VGA or lower resolution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

At the the heart of this study can be seen the dual concern of how the nation is represented as a categorical entity and how this is put to use in everyday social interactions.This can be seen as a reaction to the general approach to categorisation and identity functions that tend to be reified and essentialized within the social sciences. The empirical focus of this study is the Isle of Man, a crown dependency situated geographically central within the British Isles while remaining political outside the United Kingdom. The choice of this site was chosen explicitly as ‘notions of nation’ expressed on the island can be seen as being contested and ephemerally unstable. To get at these ‘notions of nation’ is was necessary to choose specific theoretical tools that were able to capture the wider cultural and representational domain while being capable of addressing the nuanced and functional aspects of interaction. As such, the main theoretical perspective used within this study was that of critical discursive psychology which incorporates the specific theoretical tools interpretative repertoires, ideological dilemmas and subject positions. To supplement these tools, a discursive approach to place was taken in tandem to address the form and function of place attached to nationhood. Two methods of data collection were utilized, that of computer mediated communication and acquaintance interviews. From the data a number of interpretative repertoires were proposed, namely being, essential rights, economic worth, heritage claims, conflict orientation, people-as-nation and place-as-nation. Attached to such interpretative repertoires were the ideological dilemmas region vs. country, people vs. place and individualism vs. collectivism. The subject positions found are much more difficult to condense, but the most significant ones were gender, age and parentage. The final focus of the study, that of place, was shown to be more than just an unreflected on ‘container’ of people but was significant in terms of the rhetorical construction of such places for how people saw themselves and the discursive function of the particular interaction. As such, certain forms of place construction included size, community, temporal, economic, safety, political and recognition. A number of conclusions were drawn from the above which included, that when looking at nation categories we should take into account the specific meanings that people attach to such concepts and to be aware of the particular uses they are put to in interaction. Also, that it is impossible to separate concepts neatly, but it is necessary to be aware of the intersection where concepts cross, and clash, when looking at nationhood.