Biblioteca Digital

929 resultados para Chinese bug textual data

COMP6043 Analysis of Textual Data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Class exercise to analyse qualitative data mediated on use of a set of transcripts, augmented by videos from web site. Discussion is around not only how the data is codes, interview bias, dimensions of analysis. Designed as an introduction.

Veja mais

Tariff reductions and labor demand elasticities : evidence from Chinese firm-level data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

International production fragmentation has been a global trend for decades, becoming especially important in Asia where the manufacturing process is fragmented into stages and dispersed around the region. This paper examines the effects of input and output tariff reductions on labor demand elasticities at the firm level. For this purpose, we consider a simple heterogenous firm model in which firms are allowed to export their products and to use imported intermediate inputs. The model predicts that only productive firms can use imported intermediate inputs (outsourcing) and tend to have larger constant-output labor demand elasticities. Input tariff reductions would lower the factor shares of labor for these productive firms and raise conditional labor demand elasticities further. We test these empirical predictions, constructing Chinese firm-level panel data over the 2000--2006 period. Controlling for potential tariff endogeneity by instruments, our empirical studies generally support these predictions.

Veja mais

A semantic framework for textual data enrichment

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work we present a semantic framework suitable of being used as support tool for recommender systems. Our purpose is to use the semantic information provided by a set of integrated resources to enrich texts by conducting different NLP tasks: WSD, domain classification, semantic similarities and sentiment analysis. After obtaining the textual semantic enrichment we would be able to recommend similar content or even to rate texts according to different dimensions. First of all, we describe the main characteristics of the semantic integrated resources with an exhaustive evaluation. Next, we demonstrate the usefulness of our resource in different NLP tasks and campaigns. Moreover, we present a combination of different NLP approaches that provide enough knowledge for being used as support tool for recommender systems. Finally, we illustrate a case of study with information related to movies and TV series to demonstrate that our framework works properly.

Veja mais

Content-aware compression for big textual data analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.

Veja mais

Integrating technology to improve the efficiency of qualitative data analysis - A note on methods

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Qualitative data analysis (QDA) is often a time-consuming and laborious process usually involving the management of large quantities of textual data. Recently developed computer programs offer great advances in the efficiency of the processes of QDA. In this paper we report on an innovative use of a combination of extant computer software technologies to further enhance and simplify QDA. Used in appropriate circumstances, we believe that this innovation greatly enhances the speed with which theoretical and descriptive ideas can be abstracted from rich, complex, and chaotic qualitative data. © 2001 Human Sciences Press, Inc.

Veja mais

Textual autocorrelation : formalism and illustrations

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract Textual autocorrelation is a broad and pervasive concept, referring to the similarity between nearby textual units: lexical repetitions along consecutive sentences, semantic association between neighbouring lexemes, persistence of discourse types (narrative, descriptive, dialogal...) and so on. Textual autocorrelation can also be negative, as illustrated by alternating phonological or morpho-syntactic categories, or the succession of word lengths. This contribution proposes a general Markov formalism for textual navigation, and inspired by spatial statistics. The formalism can express well-known constructs in textual data analysis, such as term-document matrices, references and hyperlinks navigation, (web) information retrieval, and in particular textual autocorrelation, as measured by Moran's I relatively to the exchange matrix associated to neighbourhoods of various possible types. Four case studies (word lengths alternation, lexical repulsion, parts of speech autocorrelation, and semantic autocorrelation) illustrate the theory. In particular, one observes a short-range repulsion between nouns together with a short-range attraction between verbs, both at the lexical and semantic levels. Résumé: Le concept d'autocorrélation textuelle, fort vaste, réfère à la similarité entre unités textuelles voisines: répétitions lexicales entre phrases successives, association sémantique entre lexèmes voisins, persistance du type de discours (narratif, descriptif, dialogal...) et ainsi de suite. L'autocorrélation textuelle peut être également négative, comme l'illustrent l'alternance entre les catégories phonologiques ou morpho-syntaxiques, ou la succession des longueurs de mots. Cette contribution propose un formalisme markovien général pour la navigation textuelle, inspiré par la statistique spatiale. Le formalisme est capable d'exprimer des constructions bien connues en analyse des données textuelles, telles que les matrices termes-documents, les références et la navigation par hyperliens, la recherche documentaire sur internet, et, en particulier, l'autocorélation textuelle, telle que mesurée par le I de Moran relatif à une matrice d'échange associée à des voisinages de différents types possibles. Quatre cas d'étude illustrent la théorie: alternance des longueurs de mots, répulsion lexicale, autocorrélation des catégories morpho-syntaxiques et autocorrélation sémantique. On observe en particulier une répulsion à courte portée entre les noms, ainsi qu'une attraction à courte portée entre les verbes, tant au niveau lexical que sémantique.

Veja mais

WAIS seminar: Alan Walks Wales: Data and Challenges

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract This seminar is a research discussion around a very interesting problem, which may be a good basis for a WAISfest theme. A little over a year ago Professor Alan Dix came to tell us of his plans for a magnificent adventure:to walk all of the way round Wales - 1000 miles 'Alan Walks Wales'. The walk was a personal journey, but also a technological and community one, exploring the needs of the walker and the people along the way. Whilst walking he recorded his thoughts in an audio diary, took lots of photos, wrote a blog and collected data from the tech instruments he was wearing. As a result Alan has extensive quantitative data (bio-sensing and location) and qualitative data (text, images and some audio). There are challenges in analysing individual kinds of data, including merging similar data streams, entity identification, time-series and textual data mining, dealing with provenance, ontologies for paths, and journeys. There are also challenges for author and third-party annotation, linking the data-sets and visualising the merged narrative or facets of it.

Veja mais

Family language policy in the Chinese community in Singapore: a question of balance?

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper focuses on the language shift phenomenon in Singapore as a consequence of the top-town policies. By looking at bilingual family language policies it examines the characteristics of Singapore’s multilingual nature and cultural diversity. Specifically, it looks at what languages are practiced and how family language policies are enacted in Singaporean English-Chinese bilingual families, and to what extend macro language policies – i.e. national and educational language policies influence and interact with family language policies. Involving 545 families and including parents and grandparents as participants, the study traces the trajectory of the policy history. Data sources include 2 parts: 1) a prescribed linguistic practices survey; and 2) participant observation of actual negotiation of FLP in face-to-face social interaction in bilingual English-Chinese families. The data provides valuable information on how family language policy is enacted and language practices are negotiated, and what linguistic practices have been changed and abandoned against the background of the Speaking Mandarin Campaign and the current bilingual policy implemented in the 1970s. Importantly, the detailed face-to-face interactions and linguistics practices are able to enhance our understanding of the subtleties and processes of language (dis)continuity in relation to policy interventions. The study also discusses the reality of language management measures in contrast to the government’s ‘separate bilingualism’ (Creese & Blackledge, 2011) expectations with regard to ‘striking a balance’ between Asian and Western culture (Curdt-Christiansen & Silver 2013; Shepherd, 2005) and between English and mother tongue languages (Curdt-Christiansen, 2014). Demonstrating how parents and children negotiate their family language policy through translanguaging or heteroglossia practices (Canagarajah, 2013; Garcia & Li Wei, 2014), this paper argues that ‘striking a balance’ as a political ideology places emphasis on discrete and separate notions of cultural and linguistic categorization and thus downplays the significant influences from historical, political and sociolinguistic contexts in which people find themselves. This simplistic view of culture and linguistic code will inevitably constrain individuals’ language expression as it regards code switching and translanguaging as delimited and incompetent language behaviour.

Veja mais

Frame-driven Extraction of Linked Data and Ontologies from Text

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ontology design and population -core aspects of semantic technologies- re- cently have become fields of great interest due to the increasing need of domain-specific knowledge bases that can boost the use of Semantic Web. For building such knowledge resources, the state of the art tools for ontology design require a lot of human work. Producing meaningful schemas and populating them with domain-specific data is in fact a very difficult and time-consuming task. Even more if the task consists in modelling knowledge at a web scale. The primary aim of this work is to investigate a novel and flexible method- ology for automatically learning ontology from textual data, lightening the human workload required for conceptualizing domain-specific knowledge and populating an extracted schema with real data, speeding up the whole ontology production process. Here computational linguistics plays a fundamental role, from automati- cally identifying facts from natural language and extracting frame of relations among recognized entities, to producing linked data with which extending existing knowledge bases or creating new ones. In the state of the art, automatic ontology learning systems are mainly based on plain-pipelined linguistics classifiers performing tasks such as Named Entity recognition, Entity resolution, Taxonomy and Relation extraction [11]. These approaches present some weaknesses, specially in capturing struc- tures through which the meaning of complex concepts is expressed [24]. Humans, in fact, tend to organize knowledge in well-defined patterns, which include participant entities and meaningful relations linking entities with each other. In literature, these structures have been called Semantic Frames by Fill- 6 Introduction more [20], or more recently as Knowledge Patterns [23]. Some NLP studies has recently shown the possibility of performing more accurate deep parsing with the ability of logically understanding the structure of discourse [7]. In this work, some of these technologies have been investigated and em- ployed to produce accurate ontology schemas. The long-term goal is to collect large amounts of semantically structured information from the web of crowds, through an automated process, in order to identify and investigate the cognitive patterns used by human to organize their knowledge.

Veja mais

Selectivity of diacylhydrazine insecticides to the predatory bug Orius laevigatus: In vivo and modeling/docking experiments

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: Knowledge of pesticide selectivity to natural enemies is necessary for a successful implementation of biological and chemical control methods in integrated pest management (IPM) programs. Diacylhydrazine (DAH)-based ecdysone agonists also known as molting-accelerating compounds (MACs) are considered a selective group of insecticides, and their compatibility with predatory Heteroptera, which are used as biological control agents, is known. However, their molecular mode of action has not been explored in beneficial insects such as Orius laevigatus (Fieber) (Hemiptera: Anthocoridae). RESULTS: In this project in vivo toxicity assays demonstrated that the DAH-based RH-5849, tebufenozide and methoxyfenozide have no toxic effect against O. laevigatus. The ligand-binding domain (LBD) of the ecdysone receptor (EcR) of O. laevigatus was sequenced and a homology protein model was constructed which confirmed a cavity structure with 12 ?-helixes, harboring the natural insect molting hormone 20-hydroxyecdysone. However, docking studies showed that a steric clash occurred for the DAH-based insecticides due to a restricted extent of the ligand-binding cavity of the EcR of O. laevigatus. CONCLUSIONS: The insect toxicity assays demonstrated that MACs are selective for O. laevigatus. The modeling/docking experiments are indications that these pesticides do not bind with the LBD-EcR of O. laevigatus and support that they show no biological effects in the predatory bug. These data help in explaining the compatible use of MACs together with predatory bugs in IPM programs. Keywords: Orius laevigatus, selectivity, diacylhydrazine insecticides, ecdysone receptor, homology modelling, docking studies.

Veja mais

An examination of the relationship between social locations of Chinese immigrants and their ethnic newspaper dependency

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study drew upon media system dependency theory (MSD) and social identity theory to examine the relationship between social locations of Chinese immigrants and their dependency on Chinese ethnic newspapers. Data was obtained from a survey participated by 265 respondents with Chinese origin but currently residing in Australia. Results indicated that among the three indicators of social location, age appeared to be a strong positive predictor of the dependency on ethnic newspapers for information. Respondents who stayed longer in the host country tended to be more frequent readers of ethnic newspapers as well. Education did not appear as a significant predictor of ethnic newspaper dependency. These findings suggested the need for us to further investigate the impact of ethnic print media on ethnic minorities in the age of various information sources offered by new technologies.

Veja mais

On stopwords, filtering and data sparsity for sentiment analysis of Twitter

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sentiment classification over Twitter is usually affected by the noisy nature (abbreviations, irregular forms) of tweets data. A popular procedure to reduce the noise of textual data is to remove stopwords by using pre-compiled stopword lists or more sophisticated methods for dynamic stopword identification. However, the effectiveness of removing stopwords in the context of Twitter sentiment classification has been debated in the last few years. In this paper we investigate whether removing stopwords helps or hampers the effectiveness of Twitter sentiment classification methods. To this end, we apply six different stopword identification methods to Twitter data from six different datasets and observe how removing stopwords affects two well-known supervised sentiment classification methods. We assess the impact of removing stopwords by observing fluctuations on the level of data sparsity, the size of the classifier's feature space and its classification performance. Our results show that using pre-compiled lists of stopwords negatively impacts the performance of Twitter sentiment classification approaches. On the other hand, the dynamic generation of stopword lists, by removing those infrequent terms appearing only once in the corpus, appears to be the optimal method to maintaining a high classification performance while reducing the data sparsity and substantially shrinking the feature space

Veja mais

Examining argumentative coherence in essays by undergraduate students of English as a foreign language in Mainland China and their English speaking peers in the United States

Relevância:

100.00% 100.00%

Publicador:

Resumo:

I conducted this study to provide insights toward deepening understanding of association between culture and writing by building, assessing, and refining a conceptual model of second language writing. To do this, I examined culture and coherence as well as the relationship between them through a mixed methods research design. Coherence has been an important and complex concept in ESL/EFL writing. I intended to study the concept of coherence in the research context of contrastive rhetoric, comparing the coherence quality in argumentative essays written by undergraduates in Mainland China and their U.S. peers. In order to analyze the complex concept of coherence, I synthesized five linguistic theories of coherence: Halliday and Hasan's cohesion theory, Carroll's theory of coherence, Enkvist's theory of coherence, Topical Structure Analysis, and Toulmin's Model. Based upon the synthesis, 16 variables were generated. Across these 16 variables, Hotelling t-test statistical analysis was conducted to predict differences in argumentative coherence between essays written by two groups of participants. In order to complement the statistical analysis, I conducted 30 interviews of the writers in the studies. Participants' responses were analyzed with open and axial coding. By analyzing the empirical data, I refined the conceptual model by adding more categories and establishing associations among them. The study found that U.S. students made use of more pronominal reference. Chinese students adopted more lexical devices of reiteration and extended paralleling progression. The interview data implied that the difference may be associated with the difference in linguistic features and rhetorical conventions in Chinese and English. As far as Toulmin's Model is concerned, Chinese students scored higher on data than their U.S. peers. According to the interview data, this may be due to the fact that Toulmin's Model, modified as three elements of arguments, have been widely and long taught in Chinese writing instruction while U.S. interview participants said that they were not taught to write essays according to Toulmin's Model. Implications were generated from the process of textual data analysis and the formulation of structural model defining coherence. These implications were aimed at informing writing instruction, assessment, peer-review, and self-revision.

Veja mais

Data Mining in Promoting Flight Safety

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The incredible rapid development to huge volumes of air travel, mainly because of jet airliners that appeared to the sky in the 1950s, created the need for systematic research for aviation safety and collecting data about air traffic. The structured data can be analysed easily using queries from databases and running theseresults through graphic tools. However, in analysing narratives that often give more accurate information about the case, mining tools are needed. The analysis of textual data with computers has not been possible until data mining tools have been developed. Their use, at least among aviation, is still at a moderate level. The research aims at discovering lethal trends in the flight safety reports. The narratives of 1,200 flight safety reports from years 1994 – 1996 in Finnish were processed with three text mining tools. One of them was totally language independent, the other had a specific configuration for Finnish and the third originally created for English, but encouraging results had been achieved with Spanish and that is why a Finnish test was undertaken, too. The global rate of accidents is stabilising and the situation can now be regarded as satisfactory, but because of the growth in air traffic, the absolute number of fatal accidents per year might increase, if the flight safety will not be improved. The collection of data and reporting systems have reached their top level. The focal point in increasing the flight safety is analysis. The air traffic has generally been forecasted to grow 5 – 6 per cent annually over the next two decades. During this period, the global air travel will probably double also with relatively conservative expectations of economic growth. This development makes the airline management confront growing pressure due to increasing competition, signify cant rise in fuel prices and the need to reduce the incident rate due to expected growth in air traffic volumes. All this emphasises the urgent need for new tools and methods. All systems provided encouraging results, as well as proved challenges still to be won. Flight safety can be improved through the development and utilisation of sophisticated analysis tools and methods, like data mining, using its results supporting the decision process of the executives.

Veja mais

Using text-mining-assisted analysis to examine the applicability of unstructured data in the context of customer complaint management

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Double Degree

Veja mais

929 resultados para Chinese bug textual data

Filtro por publicador