967 resultados para Natural language techniques, Semantic spaces, Random projection, Documents
Resumo:
2000 Mathematics Subject Classification: C2P99.
Resumo:
This paper introduces a quantitative method for identifying newly emerging word forms in large time-stamped corpora of natural language and then describes an analysis of lexical emergence in American social media using this method based on a multi-billion word corpus of Tweets collected between October 2013 and November 2014. In total 29 emerging word forms, which represent various semantic classes, grammatical parts-of speech, and word formations processes, were identified through this analysis. These 29 forms are then examined from various perspectives in order to begin to better understand the process of lexical emergence.
Resumo:
The semantic model described in this paper is based on ones developed for arithmetic (e.g. McCloskey et al. 1985, Cohene and Dehaene 1995), natural language processing (Fodor 1975, Chomsky 1981) and work by the author on how learners parse mathematical structures. The semantic model highlights the importance of the parsing process and the relationship between this process and the mathematical lexicon/grammar. It concludes by demonstrating that for a learner to become an efficient, competent mathematician a process of top-down parsing is essential.
Resumo:
One of the leading motivations behind the multilingual semantic web is to make resources accessible digitally in an online global multilingual context. Consequently, it is fundamental for knowledge bases to find a way to manage multilingualism and thus be equipped with those procedures for its conceptual modelling. In this context, the goal of this paper is to discuss how common-sense knowledge and cultural knowledge are modelled in a multilingual framework. More particularly, multilingualism and conceptual modelling are dealt with from the perspective of FunGramKB, a lexico-conceptual knowledge base for natural language understanding. This project argues for a clear division between the lexical and the conceptual dimensions of knowledge. Moreover, the conceptual layer is organized into three modules, which result from a strong commitment towards capturing semantic knowledge (Ontology), procedural knowledge (Cognicon) and episodic knowledge (Onomasticon). Cultural mismatches are discussed and formally represented at the three conceptual levels of FunGramKB.
Resumo:
The current study builds upon a previous study, which examined the degree to which the lexical properties of students’ essays could predict their vocabulary scores. We expand on this previous research by incorporating new natural language processing indices related to both the surface- and discourse-levels of students’ essays. Additionally, we investigate the degree to which these NLP indices can be used to account for variance in students’ reading comprehension skills. We calculated linguistic essay features using our framework, ReaderBench, which is an automated text analysis tools that calculates indices related to linguistic and rhetorical features of text. University students (n = 108) produced timed (25 minutes), argumentative essays, which were then analyzed by ReaderBench. Additionally, they completed the Gates-MacGinitie Vocabulary and Reading comprehension tests. The results of this study indicated that two indices were able to account for 32.4% of the variance in vocabulary scores and 31.6% of the variance in reading comprehension scores. Follow-up analyses revealed that these models further improved when only considering essays that contained multiple paragraph (R2 values = .61 and .49, respectively). Overall, the results of the current study suggest that natural language processing techniques can help to inform models of individual differences among student writers.
Resumo:
Text cohesion is an important element of discourse processing. This paper presents a new approach to modeling, quantifying, and visualizing text cohesion using automated cohesion flow indices that capture semantic links among paragraphs. Cohesion flow is calculated by applying Cohesion Network Analysis, a combination of semantic distances, Latent Semantic Analysis, and Latent Dirichlet Allocation, as well as Social Network Analysis. Experiments performed on 315 timed essays indicated that cohesion flow indices are significantly correlated with human ratings of text coherence and essay quality. Visualizations of the global cohesion indices are also included to support a more facile understanding of how cohesion flow impacts coherence in terms of semantic dependencies between paragraphs.
Resumo:
This paper addresses the problem of colorectal tumour segmentation in complex real world imagery. For efficient segmentation, a multi-scale strategy is developed for extracting the potentially cancerous region of interest (ROI) based on colour histograms while searching for the best texture resolution. To achieve better segmentation accuracy, we apply a novel bag-of-visual-words method based on rotation invariant raw statistical features and random projection based l2-norm sparse representation to classify tumour areas in histopathology images. Experimental results on 20 real world digital slides demonstrate that the proposed algorithm results in better recognition accuracy than several state of the art segmentation techniques.
Resumo:
Community-driven Question Answering (CQA) systems that crowdsource experiential information in the form of questions and answers and have accumulated valuable reusable knowledge. Clustering of QA datasets from CQA systems provides a means of organizing the content to ease tasks such as manual curation and tagging. In this paper, we present a clustering method that exploits the two-part question-answer structure in QA datasets to improve clustering quality. Our method, {\it MixKMeans}, composes question and answer space similarities in a way that the space on which the match is higher is allowed to dominate. This construction is motivated by our observation that semantic similarity between question-answer data (QAs) could get localized in either space. We empirically evaluate our method on a variety of real-world labeled datasets. Our results indicate that our method significantly outperforms state-of-the-art clustering methods for the task of clustering question-answer archives.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
The necessity of the insertion of the capital of Rio Grande do Norte in the world-wide commercial scene and its claim as the seat of political power, in ends of nineteenth and beginning of twentieth century, determined the direction of urban interventions undertaken by government to restructure the city. In that matter, there were several actions of improvements and embellishment in Natal, which had, as a starting point, the adequacy works of the port, located in the Ribeira quarter, with the aim of ending the physical isolation that reinforced its economic stagnation. Besides the problems faced in the opening bar of the Potengi River, and would complement the required improvements, other barriers demonstrate the tension established between the physic-geographic field and the man: the flooded and slope which connected Cidade Alta and Ribeira the first two quarters of the city.The execution of these works demanded knowledge whose domain and application it was for engineering. But, how the actions done for the engineers, in sense to transform natural areas into constructed spaces made possible the intentional conformation of the quarter of the Ribeira in a commercial and politician-administrative center, in the middle of the XIX century and beginning of the XX? Understand, therefore, the employment effects of technology on the physical-geographical Ribeira, is the objective of this work that uses theoretical and methodological procedures of Urban Environmental History, by analyzing the relationship between the environment and the man, mediated by knowledge and use of technologies. The documental research was used, as primary sources, the Messages of the Provincial Assembly Government that later became the Legislative Assembly of Rio Grande do Norte reports and articles on specialized publications, in addition to local newspapers. The work is structured in five chapters. First, some comments about Urban Environmental History (Chapter 1) supplemented with analysis of the conceptual construction of nature in the Contemporary Era and its application in the city (chapter 02), the following chapters (03 and 04) deal with the rise of engineers as a active group in the Brazilian government frameworks and their vision about the nature inside the urban environment and it is studied how the professional technicians dealt with the improvement work of the harbor and in the shock with the natural forces. Other works that would complement this "project" of modernization and had had natural obstacles to be removed the Ribeira flood and slope constitute the subject of the fifth chapter. Finally, some final considerations retake the initial discussions aiming an association between the technique and the nature as junction elements inside the process of constitution of a Modern Natal
Resumo:
Depuis le milieu des années 2000, une nouvelle approche en apprentissage automatique, l'apprentissage de réseaux profonds (deep learning), gagne en popularité. En effet, cette approche a démontré son efficacité pour résoudre divers problèmes en améliorant les résultats obtenus par d'autres techniques qui étaient considérées alors comme étant l'état de l'art. C'est le cas pour le domaine de la reconnaissance d'objets ainsi que pour la reconnaissance de la parole. Sachant cela, l’utilisation des réseaux profonds dans le domaine du Traitement Automatique du Langage Naturel (TALN, Natural Language Processing) est donc une étape logique à suivre. Cette thèse explore différentes structures de réseaux de neurones dans le but de modéliser le texte écrit, se concentrant sur des modèles simples, puissants et rapides à entraîner.
Resumo:
The current study is a post-hoc analysis of data from the original randomized control trial of the Play and Language for Autistic Youngsters (PLAY) Home Consultation program, a parent-mediated, DIR/Floortime based early intervention program for children with ASD (Solomon, Van Egeren, Mahone, Huber, & Zimmerman, 2014). We examined 22 children from the original RCT who received the PLAY program. Children were split into two groups (high and lower functioning) based on the ADOS module administered prior to intervention. Fifteen-minute parent-child video sessions were coded through the use of CHILDES transcription software. Child and maternal language, communicative behaviors, and communicative functions were assessed in the natural language samples both pre- and post-intervention. Results demonstrated significant improvements in both child and maternal behaviors following intervention. There was a significant increase in child verbal and non-verbal initiations and verbal responses in whole group analysis. Total number of utterances, word production, and grammatical complexity all significantly improved when viewed across the whole group of participants; however, lexical growth did not reach significance. Changes in child communicative function were especially noteworthy, and demonstrated a significant increase in social interaction and a significant decrease in non-interactive behaviors. Further, mothers demonstrated an increase in responsiveness to the child’s conversational bids, increased ability to follow the child’s lead, and a decrease in directiveness. When separated for analyses within groups, trends emerged for child and maternal variables, suggesting greater gains in use of communicative function in both high and low groups over changes in linguistic structure. Additional analysis also revealed a significant inverse relationship between maternal responsiveness and child non-interactive behaviors; as mothers became more responsive, children’s non-engagement was decreased. Such changes further suggest that changes in learned skills following PLAY parent training may result in improvements in child social interaction and language abilities.
Resumo:
Image (Video) retrieval is an interesting problem of retrieving images (videos) similar to the query. Images (Videos) are represented in an input (feature) space and similar images (videos) are obtained by finding nearest neighbors in the input representation space. Numerous input representations both in real valued and binary space have been proposed for conducting faster retrieval. In this thesis, we present techniques that obtain improved input representations for retrieval in both supervised and unsupervised settings for images and videos. Supervised retrieval is a well known problem of retrieving same class images of the query. We address the practical aspects of achieving faster retrieval with binary codes as input representations for the supervised setting in the first part, where binary codes are used as addresses into hash tables. In practice, using binary codes as addresses does not guarantee fast retrieval, as similar images are not mapped to the same binary code (address). We address this problem by presenting an efficient supervised hashing (binary encoding) method that aims to explicitly map all the images of the same class ideally to a unique binary code. We refer to the binary codes of the images as `Semantic Binary Codes' and the unique code for all same class images as `Class Binary Code'. We also propose a new class based Hamming metric that dramatically reduces the retrieval times for larger databases, where only hamming distance is computed to the class binary codes. We also propose a Deep semantic binary code model, by replacing the output layer of a popular convolutional Neural Network (AlexNet) with the class binary codes and show that the hashing functions learned in this way outperforms the state of the art, and at the same time provide fast retrieval times. In the second part, we also address the problem of supervised retrieval by taking into account the relationship between classes. For a given query image, we want to retrieve images that preserve the relative order i.e. we want to retrieve all same class images first and then, the related classes images before different class images. We learn such relationship aware binary codes by minimizing the similarity between inner product of the binary codes and the similarity between the classes. We calculate the similarity between classes using output embedding vectors, which are vector representations of classes. Our method deviates from the other supervised binary encoding schemes as it is the first to use output embeddings for learning hashing functions. We also introduce new performance metrics that take into account the related class retrieval results and show significant gains over the state of the art. High Dimensional descriptors like Fisher Vectors or Vector of Locally Aggregated Descriptors have shown to improve the performance of many computer vision applications including retrieval. In the third part, we will discuss an unsupervised technique for compressing high dimensional vectors into high dimensional binary codes, to reduce storage complexity. In this approach, we deviate from adopting traditional hyperplane hashing functions and instead learn hyperspherical hashing functions. The proposed method overcomes the computational challenges of directly applying the spherical hashing algorithm that is intractable for compressing high dimensional vectors. A practical hierarchical model that utilizes divide and conquer techniques using the Random Select and Adjust (RSA) procedure to compress such high dimensional vectors is presented. We show that our proposed high dimensional binary codes outperform the binary codes obtained using traditional hyperplane methods for higher compression ratios. In the last part of the thesis, we propose a retrieval based solution to the Zero shot event classification problem - a setting where no training videos are available for the event. To do this, we learn a generic set of concept detectors and represent both videos and query events in the concept space. We then compute similarity between the query event and the video in the concept space and videos similar to the query event are classified as the videos belonging to the event. We show that we significantly boost the performance using concept features from other modalities.
Resumo:
The necessity of the insertion of the capital of Rio Grande do Norte in the world-wide commercial scene and its claim as the seat of political power, in ends of nineteenth and beginning of twentieth century, determined the direction of urban interventions undertaken by government to restructure the city. In that matter, there were several actions of improvements and embellishment in Natal, which had, as a starting point, the adequacy works of the port, located in the Ribeira quarter, with the aim of ending the physical isolation that reinforced its economic stagnation. Besides the problems faced in the opening bar of the Potengi River, and would complement the required improvements, other barriers demonstrate the tension established between the physic-geographic field and the man: the flooded and slope which connected Cidade Alta and Ribeira the first two quarters of the city.The execution of these works demanded knowledge whose domain and application it was for engineering. But, how the actions done for the engineers, in sense to transform natural areas into constructed spaces made possible the intentional conformation of the quarter of the Ribeira in a commercial and politician-administrative center, in the middle of the XIX century and beginning of the XX? Understand, therefore, the employment effects of technology on the physical-geographical Ribeira, is the objective of this work that uses theoretical and methodological procedures of Urban Environmental History, by analyzing the relationship between the environment and the man, mediated by knowledge and use of technologies. The documental research was used, as primary sources, the Messages of the Provincial Assembly Government that later became the Legislative Assembly of Rio Grande do Norte reports and articles on specialized publications, in addition to local newspapers. The work is structured in five chapters. First, some comments about Urban Environmental History (Chapter 1) supplemented with analysis of the conceptual construction of nature in the Contemporary Era and its application in the city (chapter 02), the following chapters (03 and 04) deal with the rise of engineers as a active group in the Brazilian government frameworks and their vision about the nature inside the urban environment and it is studied how the professional technicians dealt with the improvement work of the harbor and in the shock with the natural forces. Other works that would complement this "project" of modernization and had had natural obstacles to be removed the Ribeira flood and slope constitute the subject of the fifth chapter. Finally, some final considerations retake the initial discussions aiming an association between the technique and the nature as junction elements inside the process of constitution of a Modern Natal
Resumo:
Dissertação de Mestrado, Processamento de Linguagem Natural e Indústrias da Língua, Faculdade de Ciências Humanas e Sociais, Universidade do Algarve, 2014