997 resultados para Computer linguistics


Relevância:

70.00% 70.00%

Publicador:

Resumo:

It is important to help researchers find valuable papers from a large literature collection. To this end, many graph-based ranking algorithms have been proposed. However, most of these algorithms suffer from the problem of ranking bias. Ranking bias hurts the usefulness of a ranking algorithm because it returns a ranking list with an undesirable time distribution. This paper is a focused study on how to alleviate ranking bias by leveraging the heterogeneous network structure of the literature collection. We propose a new graph-based ranking algorithm, MutualRank, that integrates mutual reinforcement relationships among networks of papers, researchers, and venues to achieve a more synthetic, accurate, and less-biased ranking than previous methods. MutualRank provides a unified model that involves both intra- and inter-network information for ranking papers, researchers, and venues simultaneously. We use the ACL Anthology Network as the benchmark data set and construct the gold standard from computer linguistics course websites of well-known universities and two well-known textbooks. The experimental results show that MutualRank greatly outperforms the state-of-the-art competitors, including PageRank, HITS, CoRank, Future Rank, and P-Rank, in ranking papers in both improving ranking effectiveness and alleviating ranking bias. Rankings of researchers and venues by MutualRank are also quite reasonable.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents our system to address the CogALex-IV 2014 shared task of identifying a single word most semantically related to a group of 5 words (queries). Our system uses an implementation of a neural language model and identifies the answer word by finding the most semantically similar word representation to the sum of the query representations. It is a fully unsupervised system which learns on around 20% of the UkWaC corpus. It correctly identifies 85 exact correct targets out of 2,000 queries, 285 approximate targets in lists of 5 suggestions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Experiences showed that developing business applications that base on text analysis normally requires a lot of time and expertise in the field of computer linguistics. Several approaches of integrating text analysis systems with business applications have been proposed, but so far there has been no coordinated approach which would enable building scalable and flexible applications of text analysis in enterprise scenarios. In this paper, a service-oriented architecture for text processing applications in the business domain is introduced. It comprises various groups of processing components and knowledge resources. The architecture, created as a result of our experiences with building natural language processing applications in business scenarios, allows for the reuse of text analysis and other components, and facilitates the development of business applications. We verify our approach by showing how the proposed architecture can be applied to create a text analytics enabled business application that addresses a concrete business scenario. © 2010 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Finite-state methods have been adopted widely in computational morphology and related linguistic applications. To enable efficient development of finite-state based linguistic descriptions, these methods should be a freely available resource for academic language research and the language technology industry. The following needs can be identified: (i) a registry that maps the existing approaches, implementations and descriptions, (ii) managing the incompatibilities of the existing tools, (iii) increasing synergy and complementary functionality of the tools, (iv) persistent availability of the tools used to manipulate the archived descriptions, (v) an archive for free finite-state based tools and linguistic descriptions. Addressing these challenges contributes to building a common research infrastructure for advanced language technology.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bibliography of American linguistics, 1926-1928 in v. 6, p. 69-75.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study presents a detailed contrastive description of the textual functioning of connectives in English and Arabic. Particular emphasis is placed on the organisational force of connectives and their role in sustaining cohesion. The description is intended as a contribution for a better understanding of the variations in the dominant tendencies for text organisation in each language. The findings are expected to be utilised for pedagogical purposes, particularly in improving EFL teaching of writing at the undergraduate level. The study is based on an empirical investigation of the phenomenon of connectivity and, for optimal efficiency, employs computer-aided procedures, particularly those adopted in corpus linguistics, for investigatory purposes. One important methodological requirement is the establishment of two comparable and statistically adequate corpora, also the design of software and the use of existing packages and to achieve the basic analysis. Each corpus comprises ca 250,000 words of newspaper material sampled in accordance to a specific set of criteria and assembled in machine readable form prior to the computer-assisted analysis. A suite of programmes have been written in SPITBOL to accomplish a variety of analytical tasks, and in particular to perform a battery of measurements intended to quantify the textual functioning of connectives in each corpus. Concordances and some word lists are produced by using OCP. Results of these researches confirm the existence of fundamental differences in text organisation in Arabic in comparison to English. This manifests itself in the way textual operations of grouping and sequencing are performed and in the intensity of the textual role of connectives in imposing linearity and continuity and in maintaining overall stability. Furthermore, computation of connective functionality and range of operationality has identified fundamental differences in the way favourable choices for text organisation are made and implemented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study investigates plagiarism detection, with an application in forensic contexts. Two types of data were collected for the purposes of this study. Data in the form of written texts were obtained from two Portuguese Universities and from a Portuguese newspaper. These data are analysed linguistically to identify instances of verbatim, morpho-syntactical, lexical and discursive overlap. Data in the form of survey were obtained from two higher education institutions in Portugal, and another two in the United Kingdom. These data are analysed using a 2 by 2 between-groups Univariate Analysis of Variance (ANOVA), to reveal cross-cultural divergences in the perceptions of plagiarism. The study discusses the legal and social circumstances that may contribute to adopting a punitive approach to plagiarism, or, conversely, reject the punishment. The research adopts a critical approach to plagiarism detection. On the one hand, it describes the linguistic strategies adopted by plagiarists when borrowing from other sources, and, on the other hand, it discusses the relationship between these instances of plagiarism and the context in which they appear. A focus of this study is whether plagiarism involves an intention to deceive, and, in this case, whether forensic linguistic evidence can provide clues to this intentionality. It also evaluates current computational approaches to plagiarism detection, and identifies strategies that these systems fail to detect. Specifically, a method is proposed to translingual plagiarism. The findings indicate that, although cross-cultural aspects influence the different perceptions of plagiarism, a distinction needs to be made between intentional and unintentional plagiarism. The linguistic analysis demonstrates that linguistic elements can contribute to finding clues for the plagiarist’s intentionality. Furthermore, the findings show that translingual plagiarism can be detected by using the method proposed, and that plagiarism detection software can be improved using existing computer tools.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A purpose of this research study was to demonstrate the practical linguistic study and evaluation of dissertations by using two examples of the latest technology, the microcomputer and optical scanner. That involved developing efficient methods for data entry plus creating computer algorithms appropriate for personal, linguistic studies. The goal was to develop a prototype investigation which demonstrated practical solutions for maximizing the linguistic potential of the dissertation data base. The mode of text entry was from a Dest PC Scan 1000 Optical Scanner. The function of the optical scanner was to copy the complete stack of educational dissertations from the Florida Atlantic University Library into an I.B.M. XT microcomputer. The optical scanner demonstrated its practical value by copying 15,900 pages of dissertation text directly into the microcomputer. A total of 199 dissertations or 72% of the entire stack of education dissertations (277) were successfully copied into the microcomputer's word processor where each dissertation was analyzed for a variety of syntax frequencies. The results of the study demonstrated the practical use of the optical scanner for data entry, the microcomputer for data and statistical analysis, and the availability of the college library as a natural setting for text studies. A supplemental benefit was the establishment of a computerized dissertation corpus which could be used for future research and study. The final step was to build a linguistic model of the differences in dissertation writing styles by creating 7 factors from 55 dependent variables through principal components factor analysis. The 7 factors (textual components) were then named and described on a hypothetical construct defined as a continuum from a conversational, interactional style to a formal, academic writing style. The 7 factors were then grouped through discriminant analysis to create discriminant functions for each of the 7 independent variables. The results indicated that a conversational, interactional writing style was associated with more recent dissertations (1972-1987), an increase in author's age, females, and the department of Curriculum and Instruction. A formal, academic writing style was associated with older dissertations (1972-1987), younger authors, males, and the department of Administration and Supervision. It was concluded that there were no significant differences in writing style due to subject matter (community college studies) compared to other subject matter. It was also concluded that there were no significant differences in writing style due to the location of dissertation origin (Florida Atlantic University, University of Central Florida, Florida International University).