968 resultados para Short-text clustering


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Interpretation of utterances affects an interrogator’s determination of human from machine during live Turing tests. Here, we consider transcripts realised as a result of a series of practical Turing tests that were held on 23 June 2012 at Bletchley Park, England. The focus in this paper is to consider the effects of lying and truth-telling on the human judges by the hidden entities, whether human or a machine. Turing test transcripts provide a glimpse into short text communication, the type that occurs in emails: how does the reader determine truth from the content of a stranger’s textual message? Different types of lying in the conversations are explored, and the judge’s attribution of human or machine is investigated in each test.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The extended flight of the Airborne Ionospheric Observatory during the Geospace Environment Modeling (GEM) Pilot program on January 16, 1990, allowed continuous all-sky monitoring of the two-dimensional ionospheric footprint of the northward interplanetary magnetic field (IMF) cusp in several wavelengths. Especially important in determining the locus of magnetosheath electron precipitation was the 630.0-nm red line emission. The most striking morphological change in the images was the transient appearance of zonally elongated regions of enhanced 630.0-nm emission which resembled “rays” emanating from the centroid of the precipitation. The appearance of these rays was strongly correlated with the Y component of the IMF: when the magnitude of By was large compared to Bz, the rays appeared; otherwise, the distribution was relatively unstructured. Late in the flight the field of view of the imager included the field of view of flow measurements from the European incoherent scatter radar (EISCAT). The rays visible in 630.0-nm emission exactly aligned with the position of strong flow jets observed by EISCAT. We attribute this correspondence to the requirement of quasi-neutrality; namely, the soft electrons have their largest precipitating fluxes where the bulk of the ions precipitate. The ions, in regions of strong convective flow, are spread out farther along the flow path than in regions of weaker flow. The occurrence and direction of these flow bursts are controlled by the IMF in a manner consistent with newly opened flux tubes; i.e., when |By| > |Bz|, tension in the reconnected field lines produce east-west flow regions downstream of the ionospheric projection of the x line. We interpret the optical rays (flow bursts), which typically last between 5 and 15 min, as evidence of periods of enhanced dayside (or lobe) reconnection when |By| > |Bz|. The length of the reconnection pulse is difficult to determine, however, since strong zonal flows would be expected to persist until the tension force in the field line has decayed, even if the duration of the enhanced reconnection was relatively short.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recent developments in sensor networks and cloud computing saw the emergence of a new platform called sensor-clouds. While the proposition of such a platform is to virtualise the management of physical sensor devices, we are seeing novel applications been created based on a new class of social sensors. Social sensors are effectively a human-device combination that sends torrent of data as a result of social interactions and social events. The data generated appear in different formats such as photographs, videos and short text messages. Unlike other sensor devices, social sensors operate on the control of individuals via their mobile devices such as a phone or a laptop. And unlike other sensors that generate data at a constant rate or format, social sensors generate data that are spurious and varied, often in response to events as individual as a dinner outing, or a news announcement of interests to the public. This collective presence of social data creates opportunities for novel applications never experienced before. This paper discusses such applications as a result of utilising social sensors within a sensor-cloud environment. Consequently, the associated research problems are also presented.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

La presente Tesi di Dottorato intende affrontare una lettura critica della Casa in Belvederestraße 60, realizzata dall’architetto Oswald Mathias Ungers (Kaisersesch, 12 luglio 1926 – Köln, 30 settembre 2007), nel 1958-’59 a Köln-Müngersdorf, come studio per sé ed abitazione per la propria famiglia. Questo primo oggetto della ricerca viene considerato evidente espressione delle convinzioni formali e compositive dell’architetto, negli anni Cinquanta e Sessanta. A differenza di altri progetti residenziali coevi ed antecedenti, frutto di un’elaborazione autonoma, la prima casa che costruisce per sé riflette una maggiore libertà di pensiero, dettata dalla coincidenza delle figure di progettista e committente; a ciò si aggiunge anche una precisa volontà dichiarativa ed ideologica. Proprio quest’ultimo aspetto permette di introdurre il secondo oggetto della Tesi: il manifesto “ideologico”, Zu einer neuen Architektur, scritto dallo stesso Oswald Mathias Ungers e da Reinhard Gieselmann, alla fine del 1960; un breve testo che espone, con toni perentori ed inappellabili, il punto di vista dei due architetti nei confronti di un panorama architettonico e critico, caratterizzato da una sterilità di pensiero dilagante, a causa dell’egemonia costruttiva funzionalista. La ricerca indaga quindi le forti reciprocità delle due opere: casa e testo, viste in chiave di “manifesto scritto e manifesto costruito”. Il primo legame tra i due soggetti è senza dubbio la concomitanza temporale, (tra il 1958 ed il 1960) associata ad un rapporto causa-effetto, tale per cui il manifesto viene redatto a difesa delle aspre critiche scaturite dalla pubblicazione della casa sulla rivista Bauwelt. Il secondo nesso è la possibilità di comprendere le accezioni effettive dei termini impiegati nella redazione del testo, attraverso le forme di una delle opere maggiormente personali dell’architetto, estraendone il senso e conferendogli un’immagine architettonica. Si vuole creare così un rapporto biunivoco di traducibilità, dell’architettura nello scritto e della semantica ungersiana in azioni compositive.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

According to the colophon (f. 117v), copy completed in the hand of ʻAbd al-Razzāq ibn Muḥammad Ḥusayn al-Yazdī in 1240 AH [December 1824-5 AD].

Relevância:

80.00% 80.00%

Publicador:

Resumo:

As microblog services such as Twitter become a fast and convenient communication approach, identification of trendy topics in microblog services has great academic and business value. However detecting trendy topics is very challenging due to huge number of users and short-text posts in microblog diffusion networks. In this paper we introduce a trendy topics detection system under computation and communication resource constraints. In stark contrast to retrieving and processing the whole microblog contents, we develop an idea of selecting a small set of microblog users and processing their posts to achieve an overall acceptable trendy topic coverage, without exceeding resource budget for detection. We formulate the selection operation of these subset users as mixed-integer optimization problems, and develop heuristic algorithms to compute their approximate solutions. The proposed system is evaluated with real-time test data retrieved from Sina Weibo, the dominant microblog service provider in China. It's shown that by monitoring 500 out of 1.6 million microblog users and tracking their microposts (about 15,000 daily) with our system, nearly 65% trendy topics can be detected, while on average 5 hours earlier before they appear in Sina Weibo official trends.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Short text messages a.k.a Microposts (e.g. Tweets) have proven to be an effective channel for revealing information about trends and events, ranging from those related to Disaster (e.g. hurricane Sandy) to those related to Violence (e.g. Egyptian revolution). Being informed about such events as they occur could be extremely important to authorities and emergency professionals by allowing such parties to immediately respond. In this work we study the problem of topic classification (TC) of Microposts, which aims to automatically classify short messages based on the subject(s) discussed in them. The accurate TC of Microposts however is a challenging task since the limited number of tokens in a post often implies a lack of sufficient contextual information. In order to provide contextual information to Microposts, we present and evaluate several graph structures surrounding concepts present in linked knowledge sources (KSs). Traditional TC techniques enrich the content of Microposts with features extracted only from the Microposts content. In contrast our approach relies on the generation of different weighted semantic meta-graphs extracted from linked KSs. We introduce a new semantic graph, called category meta-graph. This novel meta-graph provides a more fine grained categorisation of concepts providing a set of novel semantic features. Our findings show that such category meta-graph features effectively improve the performance of a topic classifier of Microposts. Furthermore our goal is also to understand which semantic feature contributes to the performance of a topic classifier. For this reason we propose an approach for automatic estimation of accuracy loss of a topic classifier on new, unseen Microposts. We introduce and evaluate novel topic similarity measures, which capture the similarity between the KS documents and Microposts at a conceptual level, considering the enriched representation of these documents. Extensive evaluation in the context of Emergency Response (ER) and Violence Detection (VD) revealed that our approach outperforms previous approaches using single KS without linked data and Twitter data only up to 31.4% in terms of F1 measure. Our main findings indicate that the new category graph contains useful information for TC and achieves comparable results to previously used semantic graphs. Furthermore our results also indicate that the accuracy of a topic classifier can be accurately predicted using the enhanced text representation, outperforming previous approaches considering content-based similarity measures. © 2014 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Topic classification (TC) of short text messages offers an effective and fast way to reveal events happening around the world ranging from those related to Disaster (e.g. Sandy hurricane) to those related to Violence (e.g. Egypt revolution). Previous approaches to TC have mostly focused on exploiting individual knowledge sources (KS) (e.g. DBpedia or Freebase) without considering the graph structures that surround concepts present in KSs when detecting the topics of Tweets. In this paper we introduce a novel approach for harnessing such graph structures from multiple linked KSs, by: (i) building a conceptual representation of the KSs, (ii) leveraging contextual information about concepts by exploiting semantic concept graphs, and (iii) providing a principled way for the combination of KSs. Experiments evaluating our TC classifier in the context of Violence detection (VD) and Emergency Responses (ER) show promising results that significantly outperform various baseline models including an approach using a single KS without linked data and an approach using only Tweets. Copyright 2013 ACM.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With the development of information technology, the theory and methodology of complex network has been introduced to the language research, which transforms the system of language in a complex networks composed of nodes and edges for the quantitative analysis about the language structure. The development of dependency grammar provides theoretical support for the construction of a treebank corpus, making possible a statistic analysis of complex networks. This paper introduces the theory and methodology of the complex network and builds dependency syntactic networks based on the treebank of speeches from the EEE-4 oral test. According to the analysis of the overall characteristics of the networks, including the number of edges, the number of the nodes, the average degree, the average path length, the network centrality and the degree distribution, it aims to find in the networks potential difference and similarity between various grades of speaking performance. Through clustering analysis, this research intends to prove the network parameters’ discriminating feature and provide potential reference for scoring speaking performance.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Description of a patient's injuries is recorded in narrative text form by hospital emergency departments. For statistical reporting, this text data needs to be mapped to pre-defined codes. Existing research in this field uses the Naïve Bayes probabilistic method to build classifiers for mapping. In this paper, we focus on providing guidance on the selection of a classification method. We build a number of classifiers belonging to different classification families such as decision tree, probabilistic, neural networks, and instance-based, ensemble-based and kernel-based linear classifiers. An extensive pre-processing is carried out to ensure the quality of data and, in hence, the quality classification outcome. The records with a null entry in injury description are removed. The misspelling correction process is carried out by finding and replacing the misspelt word with a soundlike word. Meaningful phrases have been identified and kept, instead of removing the part of phrase as a stop word. The abbreviations appearing in many forms of entry are manually identified and only one form of abbreviations is used. Clustering is utilised to discriminate between non-frequent and frequent terms. This process reduced the number of text features dramatically from about 28,000 to 5000. The medical narrative text injury dataset, under consideration, is composed of many short documents. The data can be characterized as high-dimensional and sparse, i.e., few features are irrelevant but features are correlated with one another. Therefore, Matrix factorization techniques such as Singular Value Decomposition (SVD) and Non Negative Matrix Factorization (NNMF) have been used to map the processed feature space to a lower-dimensional feature space. Classifiers with these reduced feature space have been built. In experiments, a set of tests are conducted to reflect which classification method is best for the medical text classification. The Non Negative Matrix Factorization with Support Vector Machine method can achieve 93% precision which is higher than all the tested traditional classifiers. We also found that TF/IDF weighting which works well for long text classification is inferior to binary weighting in short document classification. Another finding is that the Top-n terms should be removed in consultation with medical experts, as it affects the classification performance.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper investigates the clustering pattern in the Finnish stock market. Using trading volume and time as factors capturing the clustering pattern in the market, the Keim and Madhavan (1996) and the Engle and Russell (1998) model provide the framework for the analysis. The descriptive and the parametric analysis provide evidences that an important determinant of the famous U-shape pattern in the market is the rate of information arrivals as measured by large trading volumes and durations at the market open and close. Precisely, 1) the larger the trading volume, the greater the impact on prices both in the short and the long run, thus prices will differ across quantities. 2) Large trading volume is a non-linear function of price changes in the long run. 3) Arrival times are positively autocorrelated, indicating a clustering pattern and 4) Information arrivals as approximated by durations are negatively related to trading flow.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Computer Assisted Assessment (CAA) has been existing for several years now. While some forms of CAA do not require sophisticated text understanding (e.g., multiple choice questions), there are also student answers that consist of free text and require analysis of text in the answer. Research towards the latter till date has concentrated on two main sub-tasks: (i) grading of essays, which is done mainly by checking the style, correctness of grammar, and coherence of the essay and (ii) assessment of short free-text answers. In this paper, we present a structured view of relevant research in automated assessment techniques for short free-text answers. We review papers spanning the last 15 years of research with emphasis on recent papers. Our main objectives are two folds. First we present the survey in a structured way by segregating information on dataset, problem formulation, techniques, and evaluation measures. Second we present a discussion on some of the potential future directions in this domain which we hope would be helpful for researchers.