988 resultados para Text Analysis
Resumo:
In numerosi campi scientici l'analisi di network complessi ha portato molte recenti scoperte: in questa tesi abbiamo sperimentato questo approccio sul linguaggio umano, in particolare quello scritto, dove le parole non interagiscono in modo casuale. Abbiamo quindi inizialmente presentato misure capaci di estrapolare importanti strutture topologiche dai newtork linguistici(Degree, Strength, Entropia, . . .) ed esaminato il software usato per rappresentare e visualizzare i grafi (Gephi). In seguito abbiamo analizzato le differenti proprietà statistiche di uno stesso testo in varie sue forme (shuffolato, senza stopwords e senza parole con bassa frequenza): il nostro database contiene cinque libri di cinque autori vissuti nel XIX secolo. Abbiamo infine mostrato come certe misure siano importanti per distinguere un testo reale dalle sue versioni modificate e perché la distribuzione del Degree di un testo normale e di uno shuffolato abbiano lo stesso andamento. Questi risultati potranno essere utili nella sempre più attiva analisi di fenomeni linguistici come l'autorship attribution e il riconoscimento di testi shuffolati.
Resumo:
The given work is devoted to development of the computer-aided system of semantic text analysis of a technical specification. The purpose of this work is to increase efficiency of software engineering based on automation of semantic text analysis of a technical specification. In work it is offered and investigated the model of the analysis of the text of the technical project is submitted, the attribute grammar of a technical specification, intended for formalization of limited Russian is constructed with the purpose of analysis of offers of text of a technical specification, style features of the technical project as class of documents are considered, recommendations on preparation of text of a technical specification for the automated processing are formulated. The computer-aided system of semantic text analysis of a technical specification is considered. This system consists of the following subsystems: preliminary text processing, the syntactic and semantic analysis and construction of software models, storage of documents and interface.
Resumo:
The given work is devoted to development of the computer-aided system of semantic text analysis of a technical specification. The purpose of this work is to increase efficiency of software engineering based on automation of semantic text analysis of a technical specification. In work it is offered and investigated a technique of the text analysis of a technical specification is submitted, the expanded fuzzy attribute grammar of a technical specification, intended for formalization of limited Russian language is constructed with the purpose of analysis of offers of text of a technical specification, style features of the technical specification as class of documents are considered, recommendations on preparation of text of a technical specification for the automated processing are formulated. The computer-aided system of semantic text analysis of a technical specification is considered. This system consist of the following subsystems: preliminary text processing, the syntactic and semantic analysis and construction of software models, storage of documents and interface.
Resumo:
A tárgyalófelek elé kitett mobiltelefon alkalmazása előrejelzi a beszélgetőpartnerek versenyképességét a versenyképesség-mutatók alapján, javaslatokat adva a tárgyalás további menetére. Ez a vízió nyilván még futurisztikus, ám a csúcsvezetői nyilatkozatok rejtett szövegtartalma alapján következtetéseket levonni a képviselt szervezetek versenyképességi orientációira – ez már ma lehetőség. A GLOBE-projekt kultúrakutatási módszertanával, valamint szövegelemzési módszerekkel sikerült kimutatni a versenyképességet előrejelző hatalmi távolság és az intézményi kollektivizmus szövegbeli jeleit. Mindez eszközt jelenthet egyebek mellett a szervezetfejlesztéssel, hírszerzéssel, HR-gazdálkodással foglalkozó szakembereknek is. _______ The use of the mobile telephones laid in front of the negotiators during their conversations forecasts their indicators of competitiveness and gives suggestions for the further course of negotiation. This is obviously a futuristic vision, but drawing conclusions from the hidden content of top management narratives concerning the competitive cultural orientations of the represented organizations is a possibility that is already available. Using the culture research methodology of the GLOBE project as well as text analysis methods, it was possible to reveal narrative patterns both of the power distance, forecasting competitiveness, and of institutional collectivism. These findings may be useful tools for professionals, among others of organizational development, intelligence service and HR management.
Resumo:
Introduction: According to the Declaration of Helsinki and other guidelines, clinical studies should be approved by a research ethics committee and seek valid informed consent from the participants. Editors of medical journals are encouraged by the ICMJE and COPE to include requirements for these principles in the journal's instructions for authors. This study assessed the editorial policies of psychiatry journals regarding ethics review and informed consent. Methods and Findings: The information given on ethics review and informed consent and the mentioning of the ICMJE and COPE recommendations were assessed within author's instructions and online submission procedures of all 123 eligible psychiatry journals. While 54% and 58% of editorial policies required ethics review and informed consent, only 14% and 19% demanded the reporting of these issues in the manuscript. The TOP-10 psychiatry journals (ranked by impact factor) performed similarly in this regard. Conclusions: Only every second psychiatry journal adheres to the ICMJE's recommendation to inform authors about requirements for informed consent and ethics review. Furthermore, we argue that even the ICMJE's recommendations in this regard are insufficient, at least for ethically challenging clinical trials. At the same time, ideal scientific design sometimes even needs to be compromised for ethical reasons. We suggest that features of clinical studies that make them morally controversial, but not necessarily unethical, are analogous to methodological limitations and should thus be reported explicitly. Editorial policies as well as reporting guidelines such as CONSORT should be extended to support a meaningful reporting of ethical research.
Resumo:
In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. ^ Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. ^ In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data. ^
Resumo:
In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.
Resumo:
This dissertation applies statistical methods to the evaluation of automatic summarization using data from the Text Analysis Conferences in 2008-2011. Several aspects of the evaluation framework itself are studied, including the statistical testing used to determine significant differences, the assessors, and the design of the experiment. In addition, a family of evaluation metrics is developed to predict the score an automatically generated summary would receive from a human judge and its results are demonstrated at the Text Analysis Conference. Finally, variations on the evaluation framework are studied and their relative merits considered. An over-arching theme of this dissertation is the application of standard statistical methods to data that does not conform to the usual testing assumptions.
Resumo:
This class introduces basics of web mining and information retrieval including, for example, an introduction to the Vector Space Model and Text Mining. Guest Lecturer: Dr. Michael Granitzer Optional: Modeling the Internet and the Web: Probabilistic Methods and Algorithms, Pierre Baldi, Paolo Frasconi, Padhraic Smyth, Wiley, 2003 (Chapter 4, Text Analysis)
Resumo:
Complex networks have been increasingly used in text analysis, including in connection with natural language processing tools, as important text features appear to be captured by the topology and dynamics of the networks. Following previous works that apply complex networks concepts to text quality measurement, summary evaluation, and author characterization, we now focus on machine translation (MT). In this paper we assess the possible representation of texts as complex networks to evaluate cross-linguistic issues inherent in manual and machine translation. We show that different quality translations generated by NIT tools can be distinguished from their manual counterparts by means of metrics such as in-(ID) and out-degrees (OD), clustering coefficient (CC), and shortest paths (SP). For instance, we demonstrate that the average OD in networks of automatic translations consistently exceeds the values obtained for manual ones, and that the CC values of source texts are not preserved for manual translations, but are for good automatic translations. This probably reflects the text rearrangements humans perform during manual translation. We envisage that such findings could lead to better NIT tools and automatic evaluation metrics.
Resumo:
Many countries recognized the potential of medicaltourism as an alternative source of economic growth. Especially after theeconomic crisis many Asian countries joined medical tourism in hopes to escapethe severe financial difficulty. However, yet only few countries have managedto become a famous medical tourism destination. With growing number ofcompetitors, newly joined countries of medical tourism, face the difficulty inintroducing them self as attractive medical tourism destination. South Koreaas a new medical tourism destination, should consider what to offer to themedical tourists to attract them. The aim of the thesis was to investigate aspects influencing the participationof medical tourists to discover how South Korea could develop anattractive medical tourism destination. After examining the casestudy and results from the text analysis, researcher reached to the conclusionthat quality, cost and accessibility to treatment are the major reasons toparticipate in medical tourism. Also in the fierce competition, it is importantto develop differentiated offers from other destinations. Therefore, Koreashould concentrate on specialized treatments and ICT system to become anattractive medical tourism destination.
Resumo:
The paper presents an approach to extraction of facts from texts of documents. This approach is based on using knowledge about the subject domain, specialized dictionary and the schemes of facts that describe fact structures taking into consideration both semantic and syntactic compatibility of elements of facts. Actually extracted facts combine into one structure the dictionary lexical objects found in the text and match them against concepts of subject domain ontology.
Resumo:
The principal feature of ontology, which is developed for a text processing, is wider knowledge representation of an external world due to introduction of three-level hierarchy. It allows to improve semantic interpretation of natural language texts.
Resumo:
With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the author(s) of a biomedical publication, or implicit, such as the positive or negative sentiment that an author had when she wrote a product review; there may also be complex context such as the social network of the authors. Many applications require analysis of topic patterns over different contexts. For instance, analysis of search logs in the context of the user can reveal how we can improve the quality of a search engine by optimizing the search results according to particular users; analysis of customer reviews in the context of positive and negative sentiments can help the user summarize public opinions about a product; analysis of blogs or scientific publications in the context of a social network can facilitate discovery of more meaningful topical communities. Since context information significantly affects the choices of topics and language made by authors, in general, it is very important to incorporate it into analyzing and mining text data. In general, modeling the context in text, discovering contextual patterns of language units and topics from text, a general task which we refer to as Contextual Text Mining, has widespread applications in text mining. In this thesis, we provide a novel and systematic study of contextual text mining, which is a new paradigm of text mining treating context information as the ``first-class citizen.'' We formally define the problem of contextual text mining and its basic tasks, and propose a general framework for contextual text mining based on generative modeling of text. This conceptual framework provides general guidance on text mining problems with context information and can be instantiated into many real tasks, including the general problem of contextual topic analysis. We formally present a functional framework for contextual topic analysis, with a general contextual topic model and its various versions, which can effectively solve the text mining problems in a lot of real world applications. We further introduce general components of contextual topic analysis, by adding priors to contextual topic models to incorporate prior knowledge, regularizing contextual topic models with dependency structure of context, and postprocessing contextual patterns to extract refined patterns. The refinements on the general contextual topic model naturally lead to a variety of probabilistic models which incorporate different types of context and various assumptions and constraints. These special versions of the contextual topic model are proved effective in a variety of real applications involving topics and explicit contexts, implicit contexts, and complex contexts. We then introduce a postprocessing procedure for contextual patterns, by generating meaningful labels for multinomial context models. This method provides a general way to interpret text mining results for real users. By applying contextual text mining in the ``context'' of other text information management tasks, including ad hoc text retrieval and web search, we further prove the effectiveness of contextual text mining techniques in a quantitative way with large scale datasets. The framework of contextual text mining not only unifies many explorations of text analysis with context information, but also opens up many new possibilities for future research directions in text mining.
Resumo:
This article undertakes a text analysis of the promotional materials generated by two educational brokers, the British Council’s Education Counselling Service (ECS) and Australia’s International Development Programme (IDP-Education Australia).By focusing on the micropractices of branding, the constructions of the "international student" and "international education" are examined to uncover the relations between international education and globalisation.The conclusion reached here is that the dominant marketing messages used to brand and sell education are unevenly weighted in favour of the economic imperative.International education remains fixed in modernist spatiotemporal contexts that ignore the challenges presented by globalisation.Developing new notions of international education will require a more critical engagement with the geopolitics of knowledge and with issues of subjectivity, difference, and power.Ultimately, a more sustained and comprehensive engagement with the noneconomic dimensions of globalisation will be necessary to achieve new visions of international education.