991 resultados para text analytics


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Experiences showed that developing business applications that base on text analysis normally requires a lot of time and expertise in the field of computer linguistics. Several approaches of integrating text analysis systems with business applications have been proposed, but so far there has been no coordinated approach which would enable building scalable and flexible applications of text analysis in enterprise scenarios. In this paper, a service-oriented architecture for text processing applications in the business domain is introduced. It comprises various groups of processing components and knowledge resources. The architecture, created as a result of our experiences with building natural language processing applications in business scenarios, allows for the reuse of text analysis and other components, and facilitates the development of business applications. We verify our approach by showing how the proposed architecture can be applied to create a text analytics enabled business application that addresses a concrete business scenario. © 2010 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We review recent visualization techniques aimed at supporting tasks that require the analysis of text documents, from approaches targeted at visually summarizing the relevant content of a single document to those aimed at assisting exploratory investigation of whole collections of documents.Techniques are organized considering their target input materialeither single texts or collections of textsand their focus, which may be at displaying content, emphasizing relevant relationships, highlighting the temporal evolution of a document or collection, or helping users to handle results from a query posed to a search engine.We describe the approaches adopted by distinct techniques and briefly review the strategies they employ to obtain meaningful text models, discuss how they extract the information required to produce representative visualizations, the tasks they intend to support and the interaction issues involved, and strengths and limitations. Finally, we show a summary of techniques, highlighting their goals and distinguishing characteristics. We also briefly discuss some open problems and research directions in the fields of visual text mining and text analytics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. ^ Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. ^ In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A version of the Course Experience Questionnaire (CEQ) has been included in the Graduate Careers Council of Australia national survey of university graduates from 1993 onward. In addition to the quantitative response items noted above, the CEQ also includes an invitation to respondents to write open-ended comments on the best aspects (BA) of their university course experience and those aspects most needing improvement (NI).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the undergraduate engineering program at Griffith University in Australia, the unit 1006ENG Design and Professional Skills aims to provide an introduction to engineering design and professional practice through a project-based learning (PBL) approach to problem solving. It provides students with an experience of PBL in the first-year of their programme. The unit comprises an underpinning lecture series, design work including group project activities, an individual computer-aided drawing exercise/s and an oral presentation. Griffith University employs a ‘Student Experience of Course’ (SEC) online survey as part of its student evaluation of teaching, quality improvement and staff performance management processes. As well as numerical response scale items, it includes the following two questions inviting open-ended text responses from students: i) What did you find particularly good about this course? and ii) How could this course be improved? The collection of textual data in in student surveys is commonplace, due to the rich descriptions of respondent experiences they can provide at relatively low cost. However, historically these data have been underutilised because they are time consuming to analyse manually, and there has been a lack of automated tools to exploit such data efficiently. Text analytics approaches offer analysis methods that result in visual representations of comment data that highlight key individual themes in these data and the relationships between those themes. We present a text analytics-based evaluation of the SEC open-ended comments received in the first two years of offer of the PBL unit 1006ENG. We discuss the results obtained in detail. The method developed and documented here is a practical and useful approach to analysing/visualising open-ended comment data that could be applied by others with similar comment data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In product design and engineering, identifying customer needs is the foundation for designing and producing a successful product. Traditionally, a range of techniques have been employed to elicit customer needs. A relatively new technique for identifying customer needs is ‘crowdsourcing’. An emerging area of research is the crowdsourcing of customer needs from online product review sites. This paper proposes a simple process for crowdsourcing customer needs for product design using text analytics. The analysis/visualization method is presented in detail. The text content of online customer reviews for a popular product is collected and processed using text analytics software. A published case study identifying expressed customer needs for the same generic product, collected via conventional means, is used to successfully validate the findings from the text analytics method.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background The past decade has seen a rapid change in the climate system with an increased risk of extreme weather events. On and following the 3rd of January 2013, Tasmania experienced three catastrophic bushfires, which led to the evacuation of several communities, the loss of many properties, and a financial cost of approximately AUD$80 million. Objective To explore the impacts of the 2012/2013 Tasmanian bushfires on community pharmacies. Method Qualitative research methods were undertaken, employing semi-structured telephone interviews with a purposive sample of seven Tasmanian pharmacists. The interviews were recorded and transcribed, and two different methods were used to analyse the text. The first method utilised Leximancer® text analytics software to provide a birds-eye view of the conceptual structure of the text. The second method involved manual, open and axial coding, conducted independently by the two researchers for inter-rater reliability, to identify key themes in the discourse. Results Two main themes were identified - ‘people’ and ‘supply’ - from which six key concepts were derived. The six concepts were ‘patients’, ‘pharmacists’, ‘local doctor’, ‘pharmacy operations’, ‘disaster management planning’, and ‘emergency supply regulation’. Conclusion This study identified challenges faced by community pharmacists during Tasmanian bushfires. Interviewees highlighted the need for both the Tasmanian State Government and the Australian Federal Government to recognise the important primary care role that community pharmacists play during natural disasters, and therefore involve pharmacists in disaster management planning. They called for greater support and guidance for community pharmacists from regulatory and other government bodies during these events. Their comments highlighted the need for a review of Tasmania’s 3-day emergency supply regulation that allows pharmacists to provide a three-day supply of a patient’s medication without a doctor’s prescription in an emergency situation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Information portals are seen as an appropriate platform for personalised healthcare and wellbeing information provision. Efficient content management is a core capability of a successful smart health information portal (SHIP) and domain expertise is a vital input to content management when it comes to matching user profiles with the appropriate resources. The rate of generation of new health-related content far exceeds the numbers that can be manually examined by domain experts for relevance to a specific topic and audience. In this paper we investigate automated content discovery as a plausible solution to this shortcoming that capitalises on the existing database of expert-endorsed content as an implicit store of knowledge to guide such a solution. We propose a novel content discovery technique based on a text analytics approach that utilises an existing content repository to acquire new and relevant content. We also highlight the contribution of this technique towards realisation of smart content management for SHIPs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Continuous content management of health information portals is a feature vital for its sustainability and widespread acceptance. Knowledge and experience of a domain expert is essential for content management in the health domain. The rate of generation of online health resources is exponential and thereby manual examination for relevance to a specific topic and audience is a formidable challenge for domain experts. Intelligent content discovery for effective content management is a less researched topic. An existing expert-endorsed content repository can provide the necessary leverage to automatically identify relevant resources and evaluate qualitative metrics.Objective: This paper reports on the design research towards an intelligent technique for automated content discovery and ranking for health information portals. The proposed technique aims to improve efficiency of the current mostly manual process of portal content management by utilising an existing expert-endorsed content repository as a supporting base and a benchmark to evaluate the suitability of new contentMethods: A model for content management was established based on a field study of potential users. The proposed technique is integral to this content management model and executes in several phases (ie, query construction, content search, text analytics and fuzzy multi-criteria ranking). The construction of multi-dimensional search queries with input from Wordnet, the use of multi-word and single-word terms as representative semantics for text analytics and the use of fuzzy multi-criteria ranking for subjective evaluation of quality metrics are original contributions reported in this paper.Results: The feasibility of the proposed technique was examined with experiments conducted on an actual health information portal, the BCKOnline portal. Both intermediary and final results generated by the technique are presented in the paper and these help to establish benefits of the technique and its contribution towards effective content management.Conclusions: The prevalence of large numbers of online health resources is a key obstacle for domain experts involved in content management of health information portals and websites. The proposed technique has proven successful at search and identification of resources and the measurement of their relevance. It can be used to support the domain expert in content management and thereby ensure the health portal is up-to-date and current.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Progettazione di un sistema di Social Intelligence e Sentiment Analysis per un'azienda del settore consumer goods

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Reflective writing is an important learning task to help foster reflective practice, but even when assessed it is rarely analysed or critically reviewed due to its subjective and affective nature. We propose a process for capturing subjective and affective analytics based on the identification and recontextualisation of anomalous features within reflective text. We evaluate 2 human supervised trials of the process, and so demonstrate the potential for an automated Anomaly Recontextualisation process for Learning Analytics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: A major challenge for assessing students’ conceptual understanding of STEM subjects is the capacity of assessment tools to reliably and robustly evaluate student thinking and reasoning. Multiple-choice tests are typically used to assess student learning and are designed to include distractors that can indicate students’ incomplete understanding of a topic or concept based on which distractor the student selects. However, these tests fail to provide the critical information uncovering the how and why of students’ reasoning for their multiple-choice selections. Open-ended or structured response questions are one method for capturing higher level thinking, but are often costly in terms of time and attention to properly assess student responses. Purpose: The goal of this study is to evaluate methods for automatically assessing open-ended responses, e.g. students’ written explanations and reasoning for multiple-choice selections. Design/Method: We incorporated an open response component for an online signals and systems multiple-choice test to capture written explanations of students’ selections. The effectiveness of an automated approach for identifying and assessing student conceptual understanding was evaluated by comparing results of lexical analysis software packages (Leximancer and NVivo) to expert human analysis of student responses. In order to understand and delineate the process for effectively analysing text provided by students, the researchers evaluated strengths and weakness for both the human and automated approaches. Results: Human and automated analyses revealed both correct and incorrect associations for certain conceptual areas. For some questions, that were not anticipated or included in the distractor selections, showing how multiple-choice questions alone fail to capture the comprehensive picture of student understanding. The comparison of textual analysis methods revealed the capability of automated lexical analysis software to assist in the identification of concepts and their relationships for large textual data sets. We also identified several challenges to using automated analysis as well as the manual and computer-assisted analysis. Conclusions: This study highlighted the usefulness incorporating and analysing students’ reasoning or explanations in understanding how students think about certain conceptual ideas. The ultimate value of automating the evaluation of written explanations is that it can be applied more frequently and at various stages of instruction to formatively evaluate conceptual understanding and engage students in reflective

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Traditional text classification technology based on machine learning and data mining techniques has made a big progress. However, it is still a big problem on how to draw an exact decision boundary between relevant and irrelevant objects in binary classification due to much uncertainty produced in the process of the traditional algorithms. The proposed model CTTC (Centroid Training for Text Classification) aims to build an uncertainty boundary to absorb as many indeterminate objects as possible so as to elevate the certainty of the relevant and irrelevant groups through the centroid clustering and training process. The clustering starts from the two training subsets labelled as relevant or irrelevant respectively to create two principal centroid vectors by which all the training samples are further separated into three groups: POS, NEG and BND, with all the indeterminate objects absorbed into the uncertain decision boundary BND. Two pairs of centroid vectors are proposed to be trained and optimized through the subsequent iterative multi-learning process, all of which are proposed to collaboratively help predict the polarities of the incoming objects thereafter. For the assessment of the proposed model, F1 and Accuracy have been chosen as the key evaluation measures. We stress the F1 measure because it can display the overall performance improvement of the final classifier better than Accuracy. A large number of experiments have been completed using the proposed model on the Reuters Corpus Volume 1 (RCV1) which is important standard dataset in the field. The experiment results show that the proposed model has significantly improved the binary text classification performance in both F1 and Accuracy compared with three other influential baseline models.