937 resultados para text analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term- based ones in describing user preferences, but many experiments do not support this hypothesis. This research presents a promising method, Relevance Feature Discovery (RFD), for solving this challenging issue. It discovers both positive and negative patterns in text documents as high-level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the high-level features. The thesis also introduces an adaptive model (called ARFD) to enhance the exibility of using RFD in adaptive environment. ARFD automatically updates the system's knowledge based on a sliding window over new incoming feedback documents. It can efficiently decide which incoming documents can bring in new knowledge into the system. Substantial experiments using the proposed models on Reuters Corpus Volume 1 and TREC topics show that the proposed models significantly outperform both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and other pattern-based methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: A major challenge for assessing students’ conceptual understanding of STEM subjects is the capacity of assessment tools to reliably and robustly evaluate student thinking and reasoning. Multiple-choice tests are typically used to assess student learning and are designed to include distractors that can indicate students’ incomplete understanding of a topic or concept based on which distractor the student selects. However, these tests fail to provide the critical information uncovering the how and why of students’ reasoning for their multiple-choice selections. Open-ended or structured response questions are one method for capturing higher level thinking, but are often costly in terms of time and attention to properly assess student responses. Purpose: The goal of this study is to evaluate methods for automatically assessing open-ended responses, e.g. students’ written explanations and reasoning for multiple-choice selections. Design/Method: We incorporated an open response component for an online signals and systems multiple-choice test to capture written explanations of students’ selections. The effectiveness of an automated approach for identifying and assessing student conceptual understanding was evaluated by comparing results of lexical analysis software packages (Leximancer and NVivo) to expert human analysis of student responses. In order to understand and delineate the process for effectively analysing text provided by students, the researchers evaluated strengths and weakness for both the human and automated approaches. Results: Human and automated analyses revealed both correct and incorrect associations for certain conceptual areas. For some questions, that were not anticipated or included in the distractor selections, showing how multiple-choice questions alone fail to capture the comprehensive picture of student understanding. The comparison of textual analysis methods revealed the capability of automated lexical analysis software to assist in the identification of concepts and their relationships for large textual data sets. We also identified several challenges to using automated analysis as well as the manual and computer-assisted analysis. Conclusions: This study highlighted the usefulness incorporating and analysing students’ reasoning or explanations in understanding how students think about certain conceptual ideas. The ultimate value of automating the evaluation of written explanations is that it can be applied more frequently and at various stages of instruction to formatively evaluate conceptual understanding and engage students in reflective

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Assessing students’ conceptual understanding of technical content is important for instructors as well as students to learn content and apply knowledge in various contexts. Concept inventories that identify possible misconceptions through validated multiple-choice questions are helpful in identifying a misconception that may exist, but do not provide a meaningful assessment of why they exist or the nature of the students’ understanding. We conducted a case study with undergraduate students in an electrical engineering course by testing a validated multiple-choice response concept inventory that we augmented with a component for students to provide written explanations for their multiple-choice selection. Results revealed that correctly chosen multiple-choice selections did not always match correct conceptual understanding for question testing a specific concept. The addition of a text-response to multiple-choice concept inventory questions provided an enhanced and meaningful assessment of students’ conceptual understanding and highlighted variables associated with current concept inventories or multiple choice questions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In numerosi campi scientici l'analisi di network complessi ha portato molte recenti scoperte: in questa tesi abbiamo sperimentato questo approccio sul linguaggio umano, in particolare quello scritto, dove le parole non interagiscono in modo casuale. Abbiamo quindi inizialmente presentato misure capaci di estrapolare importanti strutture topologiche dai newtork linguistici(Degree, Strength, Entropia, . . .) ed esaminato il software usato per rappresentare e visualizzare i grafi (Gephi). In seguito abbiamo analizzato le differenti proprietà statistiche di uno stesso testo in varie sue forme (shuffolato, senza stopwords e senza parole con bassa frequenza): il nostro database contiene cinque libri di cinque autori vissuti nel XIX secolo. Abbiamo infine mostrato come certe misure siano importanti per distinguere un testo reale dalle sue versioni modificate e perché la distribuzione del Degree di un testo normale e di uno shuffolato abbiano lo stesso andamento. Questi risultati potranno essere utili nella sempre più attiva analisi di fenomeni linguistici come l'autorship attribution e il riconoscimento di testi shuffolati.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The given work is devoted to development of the computer-aided system of semantic text analysis of a technical specification. The purpose of this work is to increase efficiency of software engineering based on automation of semantic text analysis of a technical specification. In work it is offered and investigated the model of the analysis of the text of the technical project is submitted, the attribute grammar of a technical specification, intended for formalization of limited Russian is constructed with the purpose of analysis of offers of text of a technical specification, style features of the technical project as class of documents are considered, recommendations on preparation of text of a technical specification for the automated processing are formulated. The computer-aided system of semantic text analysis of a technical specification is considered. This system consists of the following subsystems: preliminary text processing, the syntactic and semantic analysis and construction of software models, storage of documents and interface.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The given work is devoted to development of the computer-aided system of semantic text analysis of a technical specification. The purpose of this work is to increase efficiency of software engineering based on automation of semantic text analysis of a technical specification. In work it is offered and investigated a technique of the text analysis of a technical specification is submitted, the expanded fuzzy attribute grammar of a technical specification, intended for formalization of limited Russian language is constructed with the purpose of analysis of offers of text of a technical specification, style features of the technical specification as class of documents are considered, recommendations on preparation of text of a technical specification for the automated processing are formulated. The computer-aided system of semantic text analysis of a technical specification is considered. This system consist of the following subsystems: preliminary text processing, the syntactic and semantic analysis and construction of software models, storage of documents and interface.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A tárgyalófelek elé kitett mobiltelefon alkalmazása előrejelzi a beszélgetőpartnerek versenyképességét a versenyképesség-mutatók alapján, javaslatokat adva a tárgyalás további menetére. Ez a vízió nyilván még futurisztikus, ám a csúcsvezetői nyilatkozatok rejtett szövegtartalma alapján következtetéseket levonni a képviselt szervezetek versenyképességi orientációira – ez már ma lehetőség. A GLOBE-projekt kultúrakutatási módszertanával, valamint szövegelemzési módszerekkel sikerült kimutatni a versenyképességet előrejelző hatalmi távolság és az intézményi kollektivizmus szövegbeli jeleit. Mindez eszközt jelenthet egyebek mellett a szervezetfejlesztéssel, hírszerzéssel, HR-gazdálkodással foglalkozó szakembereknek is. _______ The use of the mobile telephones laid in front of the negotiators during their conversations forecasts their indicators of competitiveness and gives suggestions for the further course of negotiation. This is obviously a futuristic vision, but drawing conclusions from the hidden content of top management narratives concerning the competitive cultural orientations of the represented organizations is a possibility that is already available. Using the culture research methodology of the GLOBE project as well as text analysis methods, it was possible to reveal narrative patterns both of the power distance, forecasting competitiveness, and of institutional collectivism. These findings may be useful tools for professionals, among others of organizational development, intelligence service and HR management.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Introduction: According to the Declaration of Helsinki and other guidelines, clinical studies should be approved by a research ethics committee and seek valid informed consent from the participants. Editors of medical journals are encouraged by the ICMJE and COPE to include requirements for these principles in the journal's instructions for authors. This study assessed the editorial policies of psychiatry journals regarding ethics review and informed consent. Methods and Findings: The information given on ethics review and informed consent and the mentioning of the ICMJE and COPE recommendations were assessed within author's instructions and online submission procedures of all 123 eligible psychiatry journals. While 54% and 58% of editorial policies required ethics review and informed consent, only 14% and 19% demanded the reporting of these issues in the manuscript. The TOP-10 psychiatry journals (ranked by impact factor) performed similarly in this regard. Conclusions: Only every second psychiatry journal adheres to the ICMJE's recommendation to inform authors about requirements for informed consent and ethics review. Furthermore, we argue that even the ICMJE's recommendations in this regard are insufficient, at least for ethically challenging clinical trials. At the same time, ideal scientific design sometimes even needs to be compromised for ethical reasons. We suggest that features of clinical studies that make them morally controversial, but not necessarily unethical, are analogous to methodological limitations and should thus be reported explicitly. Editorial policies as well as reporting guidelines such as CONSORT should be extended to support a meaningful reporting of ethical research.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. ^ Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. ^ In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data. ^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This dissertation applies statistical methods to the evaluation of automatic summarization using data from the Text Analysis Conferences in 2008-2011. Several aspects of the evaluation framework itself are studied, including the statistical testing used to determine significant differences, the assessors, and the design of the experiment. In addition, a family of evaluation metrics is developed to predict the score an automatically generated summary would receive from a human judge and its results are demonstrated at the Text Analysis Conference. Finally, variations on the evaluation framework are studied and their relative merits considered. An over-arching theme of this dissertation is the application of standard statistical methods to data that does not conform to the usual testing assumptions.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Internet services are important part of daily activities for most of us. These services come with sophisticated authentication requirements which may not be handled by average Internet users. The management of secure passwords for example creates an extra overhead which is often neglected due to usability reasons. Furthermore, password-based approaches are applicable only for initial logins and do not protect against unlocked workstation attacks. In this paper, we provide a non-intrusive identity verification scheme based on behavior biometrics where keystroke dynamics based-on free-text is used continuously for verifying the identity of a user in real-time. We improved existing keystroke dynamics based verification schemes in four aspects. First, we improve the scalability where we use a constant number of users instead of whole user space to verify the identity of target user. Second, we provide an adaptive user model which enables our solution to take the change of user behavior into consideration in verification decision. Next, we identify a new distance measure which enables us to verify identity of a user with shorter text. Fourth, we decrease the number of false results. Our solution is evaluated on a data set which we have collected from users while they were interacting with their mail-boxes during their daily activities.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This thesis addressed issues that have prevented qualitative researchers from using thematic discovery algorithms. The central hypothesis evaluated whether allowing qualitative researchers to interact with thematic discovery algorithms and incorporate domain knowledge improved their ability to address research questions and trust the derived themes. Non-negative Matrix Factorisation and Latent Dirichlet Allocation find latent themes within document collections but these algorithms are rarely used, because qualitative researchers do not trust and cannot interact with the themes that are automatically generated. The research determined the types of interactivity that qualitative researchers require and then evaluated interactive algorithms that matched these requirements. Theoretical contributions included the articulation of design guidelines for interactive thematic discovery algorithms, the development of an Evaluation Model and a Conceptual Framework for Interactive Content Analysis.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Background Prescription medicine samples provided by pharmaceutical companies are predominantly newer and more expensive products. The range of samples provided to practices may not represent the drugs that the doctors desire to have available. Few studies have used a qualitative design to explore the reasons behind sample use. Objective The aim of this study was to explore the opinions of a variety of Australian key informants about prescription medicine samples, using a qualitative methodology. Methods Twenty-three organizations involved in quality use of medicines in Australia were identified, based on the authors' previous knowledge. Each organization was invited to nominate 1 or 2 representatives to participate in semistructured interviews utilizing seeding questions. Each interview was recorded and transcribed verbatim. Leximancer v2.25 text analysis software (Leximancer Pty Ltd., Jindalee, Queensland, Australia) was used for textual analysis. The top 10 concepts from each analysis group were interrogated back to the original transcript text to determine the main emergent opinions. Results A total of 18 key interviewees representing 16 organizations participated. Samples, patient, doctor, and medicines were the major concepts among general opinions about samples. The concept drug became more frequent and the concept companies appeared when marketing issues were discussed. The Australian Pharmaceutical Benefits Scheme and cost were more prevalent in discussions about alternative sample distribution models, indicating interviewees were cognizant of budgetary implications. Key interviewee opinions added richness to the single-word concepts extracted by Leximancer. Conclusions Participants recognized that prescription medicine samples have an influence on quality use of medicines and play a role in the marketing of medicines. They also believed that alternative distribution systems for samples could provide benefits. The cost of a noncommercial system for distributing samples or starter packs was a concern. These data will be used to design further research investigating alternative models for distribution of samples.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Engineers must have deep and accurate conceptual understanding of their field and Concept inventories (CIs) are one method of assessing conceptual understanding and providing formative feedback. Current CI tests use Multiple Choice Questions (MCQ) to identify misconceptions and have undergone reliability and validity testing to assess conceptual understanding. However, they do not readily provide the diagnostic information about students’ reasoning and therefore do not effectively point to specific actions that can be taken to improve student learning. We piloted the textual component of our diagnostic CI on electrical engineering students using items from the signals and systems CI. We then analysed the textual responses using automated lexical analysis software to test the effectiveness of these types of software and interviewed the students regarding their experience using the textual component. Results from the automated text analysis revealed that students held both incorrect and correct ideas for certain conceptual areas and provided indications of student misconceptions. User feedback also revealed that the inclusion of the textual component is helpful to students in assessing and reflecting on their own understanding.