884 resultados para Chinese bug textual data


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Univ SE Calif, Ctr Syst & Software Engn, ABB, Microsoft Res, IEEE, ACMSIGSOFT, N Carolina State Univ Comp Sci

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Class exercise to analyse qualitative data mediated on use of a set of transcripts, augmented by videos from web site. Discussion is around not only how the data is codes, interview bias, dimensions of analysis. Designed as an introduction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

International production fragmentation has been a global trend for decades, becoming especially important in Asia where the manufacturing process is fragmented into stages and dispersed around the region. This paper examines the effects of input and output tariff reductions on labor demand elasticities at the firm level. For this purpose, we consider a simple heterogenous firm model in which firms are allowed to export their products and to use imported intermediate inputs. The model predicts that only productive firms can use imported intermediate inputs (outsourcing) and tend to have larger constant-output labor demand elasticities. Input tariff reductions would lower the factor shares of labor for these productive firms and raise conditional labor demand elasticities further. We test these empirical predictions, constructing Chinese firm-level panel data over the 2000--2006 period. Controlling for potential tariff endogeneity by instruments, our empirical studies generally support these predictions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work we present a semantic framework suitable of being used as support tool for recommender systems. Our purpose is to use the semantic information provided by a set of integrated resources to enrich texts by conducting different NLP tasks: WSD, domain classification, semantic similarities and sentiment analysis. After obtaining the textual semantic enrichment we would be able to recommend similar content or even to rate texts according to different dimensions. First of all, we describe the main characteristics of the semantic integrated resources with an exhaustive evaluation. Next, we demonstrate the usefulness of our resource in different NLP tasks and campaigns. Moreover, we present a combination of different NLP approaches that provide enough knowledge for being used as support tool for recommender systems. Finally, we illustrate a case of study with information related to movies and TV series to demonstrate that our framework works properly.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective To synthesise recent research on the use of machine learning approaches to mining textual injury surveillance data. Design Systematic review. Data sources The electronic databases which were searched included PubMed, Cinahl, Medline, Google Scholar, and Proquest. The bibliography of all relevant articles was examined and associated articles were identified using a snowballing technique. Selection criteria For inclusion, articles were required to meet the following criteria: (a) used a health-related database, (b) focused on injury-related cases, AND used machine learning approaches to analyse textual data. Methods The papers identified through the search were screened resulting in 16 papers selected for review. Articles were reviewed to describe the databases and methodology used, the strength and limitations of different techniques, and quality assurance approaches used. Due to heterogeneity between studies meta-analysis was not performed. Results Occupational injuries were the focus of half of the machine learning studies and the most common methods described were Bayesian probability or Bayesian network based methods to either predict injury categories or extract common injury scenarios. Models were evaluated through either comparison with gold standard data or content expert evaluation or statistical measures of quality. Machine learning was found to provide high precision and accuracy when predicting a small number of categories, was valuable for visualisation of injury patterns and prediction of future outcomes. However, difficulties related to generalizability, source data quality, complexity of models and integration of content and technical knowledge were discussed. Conclusions The use of narrative text for injury surveillance has grown in popularity, complexity and quality over recent years. With advances in data mining techniques, increased capacity for analysis of large databases, and involvement of computer scientists in the injury prevention field, along with more comprehensive use and description of quality assurance methods in text mining approaches, it is likely that we will see a continued growth and advancement in knowledge of text mining in the injury field.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The explosive growth in the development of Traditional Chinese Medicine (TCM) has resulted in the continued increase in clinical and research data. The lack of standardised terminology, flaws in data quality planning and management of TCM informatics are preventing clinical decision-making, drug discovery and education. This paper argues that the introduction of data warehousing technologies to enhance the effectiveness and durability in TCM is paramount. To showcase the role of data warehousing in the improvement of TCM, this paper presents a practical model for data warehousing with detailed explanation, which is based on the structured electronic records, for TCM clinical researches and medical knowledge discovery.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research compares Chinese HRM with Western HRM, particularly in the areas of development of HR information systems (HRIS) and HR measurement systems and their relation to HR’s involvement as a strategic partner in firms. The research uses a 3-stage model of HRIS (workforce profiling, business insight, and strategic driver) based on studies of Irmer and Ellerby (2005) and Boudreau and Ramstad (2003) to compare the relative stages of development of Chinese and Western HRM. The quantitative aspect of the study comprises a survey of senior HR practitioners from 171 Chinese firms whose data is compared with data from Irmer and Ellerby’s study of Australian and U.S. HRM (2005) and Lawler et al’s series of studies of U.S firms (1995, 1998, 2001, 2004). The main results of the comparison are that Chinese HRM generally lags behind Western HRM. In particular, Chinese HR professionals allocate less time to strategic activities and their roles are less strategic than those of Western HR professionals. The HR measurement systems of Chinese firms are more limited in function, and the HR information systems of Chinese companies are less automated and integrated. However there is also evidence of a “two speed” HR system in China with a small proportion of firms having highly sophisticated HR systems but with a much larger proportion of Chinese firms than in the West having only the most basic HR information systems. This ‘two speed” system is in part attributable to a split between the relatively advanced HR systems of large State Owned Enterprises and the basic systems that predominate in smaller, growing Local Private firms. The survey study is complemented by a series of interviews with a number of senior Chinese HR practitioners who provide richer insights into their experiences and the challenges they face in contemporary Chinese firms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis took a novel approach to examining factors associated with risky attitudes and risky road use in China by investigating the economic and political background status of a sample of young Chinese drivers. Using data from an online survey significant relationships are found between some family background factors and road safety variables. Correlation analysis, ANOVA, hierarchical regression analysis and structural equation modelling are applied in this study, with culture, personality and demographic variables as additional factors for a better understanding of the key findings. The findings are discussed in light of China's political management system and potential education opportunities for young drivers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The core aim of machine learning is to make a computer program learn from the experience. Learning from data is usually defined as a task of learning regularities or patterns in data in order to extract useful information, or to learn the underlying concept. An important sub-field of machine learning is called multi-view learning where the task is to learn from multiple data sets or views describing the same underlying concept. A typical example of such scenario would be to study a biological concept using several biological measurements like gene expression, protein expression and metabolic profiles, or to classify web pages based on their content and the contents of their hyperlinks. In this thesis, novel problem formulations and methods for multi-view learning are presented. The contributions include a linear data fusion approach during exploratory data analysis, a new measure to evaluate different kinds of representations for textual data, and an extension of multi-view learning for novel scenarios where the correspondence of samples in the different views or data sets is not known in advance. In order to infer the one-to-one correspondence of samples between two views, a novel concept of multi-view matching is proposed. The matching algorithm is completely data-driven and is demonstrated in several applications such as matching of metabolites between humans and mice, and matching of sentences between documents in two languages.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tese de doutoramento, Linguística (Linguística Aplicada), Universidade de Lisboa, Faculdade de Letras, 2015

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract This seminar is a research discussion around a very interesting problem, which may be a good basis for a WAISfest theme. A little over a year ago Professor Alan Dix came to tell us of his plans for a magnificent adventure:to walk all of the way round Wales - 1000 miles 'Alan Walks Wales'. The walk was a personal journey, but also a technological and community one, exploring the needs of the walker and the people along the way. Whilst walking he recorded his thoughts in an audio diary, took lots of photos, wrote a blog and collected data from the tech instruments he was wearing. As a result Alan has extensive quantitative data (bio-sensing and location) and qualitative data (text, images and some audio). There are challenges in analysing individual kinds of data, including merging similar data streams, entity identification, time-series and textual data mining, dealing with provenance, ontologies for paths, and journeys. There are also challenges for author and third-party annotation, linking the data-sets and visualising the merged narrative or facets of it.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper focuses on the language shift phenomenon in Singapore as a consequence of the top-town policies. By looking at bilingual family language policies it examines the characteristics of Singapore’s multilingual nature and cultural diversity. Specifically, it looks at what languages are practiced and how family language policies are enacted in Singaporean English-Chinese bilingual families, and to what extend macro language policies – i.e. national and educational language policies influence and interact with family language policies. Involving 545 families and including parents and grandparents as participants, the study traces the trajectory of the policy history. Data sources include 2 parts: 1) a prescribed linguistic practices survey; and 2) participant observation of actual negotiation of FLP in face-to-face social interaction in bilingual English-Chinese families. The data provides valuable information on how family language policy is enacted and language practices are negotiated, and what linguistic practices have been changed and abandoned against the background of the Speaking Mandarin Campaign and the current bilingual policy implemented in the 1970s. Importantly, the detailed face-to-face interactions and linguistics practices are able to enhance our understanding of the subtleties and processes of language (dis)continuity in relation to policy interventions. The study also discusses the reality of language management measures in contrast to the government’s ‘separate bilingualism’ (Creese & Blackledge, 2011) expectations with regard to ‘striking a balance’ between Asian and Western culture (Curdt-Christiansen & Silver 2013; Shepherd, 2005) and between English and mother tongue languages (Curdt-Christiansen, 2014). Demonstrating how parents and children negotiate their family language policy through translanguaging or heteroglossia practices (Canagarajah, 2013; Garcia & Li Wei, 2014), this paper argues that ‘striking a balance’ as a political ideology places emphasis on discrete and separate notions of cultural and linguistic categorization and thus downplays the significant influences from historical, political and sociolinguistic contexts in which people find themselves. This simplistic view of culture and linguistic code will inevitably constrain individuals’ language expression as it regards code switching and translanguaging as delimited and incompetent language behaviour.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ontology design and population -core aspects of semantic technologies- re- cently have become fields of great interest due to the increasing need of domain-specific knowledge bases that can boost the use of Semantic Web. For building such knowledge resources, the state of the art tools for ontology design require a lot of human work. Producing meaningful schemas and populating them with domain-specific data is in fact a very difficult and time-consuming task. Even more if the task consists in modelling knowledge at a web scale. The primary aim of this work is to investigate a novel and flexible method- ology for automatically learning ontology from textual data, lightening the human workload required for conceptualizing domain-specific knowledge and populating an extracted schema with real data, speeding up the whole ontology production process. Here computational linguistics plays a fundamental role, from automati- cally identifying facts from natural language and extracting frame of relations among recognized entities, to producing linked data with which extending existing knowledge bases or creating new ones. In the state of the art, automatic ontology learning systems are mainly based on plain-pipelined linguistics classifiers performing tasks such as Named Entity recognition, Entity resolution, Taxonomy and Relation extraction [11]. These approaches present some weaknesses, specially in capturing struc- tures through which the meaning of complex concepts is expressed [24]. Humans, in fact, tend to organize knowledge in well-defined patterns, which include participant entities and meaningful relations linking entities with each other. In literature, these structures have been called Semantic Frames by Fill- 6 Introduction more [20], or more recently as Knowledge Patterns [23]. Some NLP studies has recently shown the possibility of performing more accurate deep parsing with the ability of logically understanding the structure of discourse [7]. In this work, some of these technologies have been investigated and em- ployed to produce accurate ontology schemas. The long-term goal is to collect large amounts of semantically structured information from the web of crowds, through an automated process, in order to identify and investigate the cognitive patterns used by human to organize their knowledge.