981 resultados para Semantic Text Analysis
Resumo:
In this paper, we compare a well-known semantic spacemodel, Latent Semantic Analysis (LSA) with another model, Hyperspace Analogue to Language (HAL) which is widely used in different area, especially in automatic query refinement. We conduct this comparative analysis to prove our hypothesis that with respect to ability of extracting the lexical information from a corpus of text, LSA is quite similar to HAL. We regard HAL and LSA as black boxes. Through a Pearsonrsquos correlation analysis to the outputs of these two black boxes, we conclude that LSA highly co-relates with HAL and thus there is a justification that LSA and HAL can potentially play a similar role in the area of facilitating automatic query refinement. This paper evaluates LSA in a new application area and contributes an effective way to compare different semantic space models.
Resumo:
Sociolinguists have documented the substrate influence of various languages on the formation of dialects in numerous ethnic-regional setting throughout the United States. This literature shows that while phonological and grammatical influences from other languages may be instantiated as durable dialect features, lexical phenomena often fade over time as ethnolinguistic communities assimilate with contiguous dialect groups. In preliminary investigations of emerging Miami Latino English, we have observed that lexical forms based on Spanish lexical forms are not only ubiquitous among the speech of the first generation Cuban Americans but also of the second. Examples, observed in field work, casual observation, and studied formally in an experimental context include the following: “get down from the car,” which derives from the Spanish equivalent, bajar del carro instead of “get out of the car”. The translation task administered to thirty-one participants showed a variety lexical phenomena are still maintained at equal or higher frequencies.
Resumo:
Discovery Driven Analysis (DDA) is a common feature of OLAP technology to analyze structured data. In essence, DDA helps analysts to discover anomalous data by highlighting 'unexpected' values in the OLAP cube. By giving indications to the analyst on what dimensions to explore, DDA speeds up the process of discovering anomalies and their causes. However, Discovery Driven Analysis (and OLAP in general) is only applicable on structured data, such as records in databases. We propose a system to extend DDA technology to semi-structured text documents, that is, text documents with a few structured data. Our system pipeline consists of two stages: first, the text part of each document is structured around user specified dimensions, using semi-PLSA algorithm; then, we adapt DDA to these fully structured documents, thus enabling DDA on text documents. We present some applications of this system in OLAP analysis and show how scalability issues are solved. Results show that our system can handle reasonable datasets of documents, in real time, without any need for pre-computation.
Resumo:
Mestrado em Engenharia Informática
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
telligence applications for the banking industry. Searches were performed in relevant journals resulting in 219 articles published between 2002 and 2013. To analyze such a large number of manuscripts, text mining techniques were used in pursuit for relevant terms on both business intelligence and banking domains. Moreover, the latent Dirichlet allocation modeling was used in or- der to group articles in several relevant topics. The analysis was conducted using a dictionary of terms belonging to both banking and business intelli- gence domains. Such procedure allowed for the identification of relationships between terms and topics grouping articles, enabling to emerge hypotheses regarding research directions. To confirm such hypotheses, relevant articles were collected and scrutinized, allowing to validate the text mining proce- dure. The results show that credit in banking is clearly the main application trend, particularly predicting risk and thus supporting credit approval or de- nial. There is also a relevant interest in bankruptcy and fraud prediction. Customer retention seems to be associated, although weakly, with targeting, justifying bank offers to reduce churn. In addition, a large number of ar- ticles focused more on business intelligence techniques and its applications, using the banking industry just for evaluation, thus, not clearly acclaiming for benefits in the banking business. By identifying these current research topics, this study also highlights opportunities for future research.
Resumo:
BACKGROUND: Shared Decision Making (SDM) is increasingly advocated as a model for medical decision making. However, there is still low use of SDM in clinical practice. High impact factor journals might represent an efficient way for its dissemination. We aimed to identify and characterize publication trends of SDM in 15 high impact medical journals. METHODS: We selected the 15 general and internal medicine journals with the highest impact factor publishing original articles, letters and editorials. We retrieved publications from 1996 to 2011 through the full-text search function on each journal website and abstracted bibliometric data. We included publications of any type containing the phrase "shared decision making" or five other variants in their abstract or full text. These were referred to as SDM publications. A polynomial Poisson regression model with logarithmic link function was used to assess the evolution across the period of the number of SDM publications according to publication characteristics. RESULTS: We identified 1285 SDM publications out of 229,179 publications in 15 journals from 1996 to 2011. The absolute number of SDM publications by journal ranged from 2 to 273 over 16 years. SDM publications increased both in absolute and relative numbers per year, from 46 (0.32% relative to all publications from the 15 journals) in 1996 to 165 (1.17%) in 2011. This growth was exponential (P < 0.01). We found fewer research publications (465, 36.2% of all SDM publications) than non-research publications, which included non-systematic reviews, letters, and editorials. The increase of research publications across time was linear. Full-text search retrieved ten times more SDM publications than a similar PubMed search (1285 vs. 119 respectively). CONCLUSION: This review in full-text showed that SDM publications increased exponentially in major medical journals from 1996 to 2011. This growth might reflect an increased dissemination of the SDM concept to the medical community.
Analysis and evaluation of techniques for the extraction of classes in the ontology learning process
Resumo:
This paper analyzes and evaluates, in the context of Ontology learning, some techniques to identify and extract candidate terms to classes of a taxonomy. Besides, this work points out some inconsistencies that may be occurring in the preprocessing of text corpus, and proposes techniques to obtain good terms candidate to classes of a taxonomy.
Resumo:
BACKGROUND: Shared Decision Making (SDM) is increasingly advocated as a model for medical decision making. However, there is still low use of SDM in clinical practice. High impact factor journals might represent an efficient way for its dissemination. We aimed to identify and characterize publication trends of SDM in 15 high impact medical journals. METHODS: We selected the 15 general and internal medicine journals with the highest impact factor publishing original articles, letters and editorials. We retrieved publications from 1996 to 2011 through the full-text search function on each journal website and abstracted bibliometric data. We included publications of any type containing the phrase "shared decision making" or five other variants in their abstract or full text. These were referred to as SDM publications. A polynomial Poisson regression model with logarithmic link function was used to assess the evolution across the period of the number of SDM publications according to publication characteristics. RESULTS: We identified 1285 SDM publications out of 229,179 publications in 15 journals from 1996 to 2011. The absolute number of SDM publications by journal ranged from 2 to 273 over 16 years. SDM publications increased both in absolute and relative numbers per year, from 46 (0.32% relative to all publications from the 15 journals) in 1996 to 165 (1.17%) in 2011. This growth was exponential (P < 0.01). We found fewer research publications (465, 36.2% of all SDM publications) than non-research publications, which included non-systematic reviews, letters, and editorials. The increase of research publications across time was linear. Full-text search retrieved ten times more SDM publications than a similar PubMed search (1285 vs. 119 respectively). CONCLUSION: This review in full-text showed that SDM publications increased exponentially in major medical journals from 1996 to 2011. This growth might reflect an increased dissemination of the SDM concept to the medical community.
Resumo:
Tässä pro gradu -tutkielmassa käsittelen lähde- ja kohdetekstikeskeisyyttä näytelmäkääntämisessä. Tutkimuskohteina olivat käännösten sanasto, syntaksi, näyttämötekniikka, kielikuvat, sanaleikit, runomitta ja tyyli. Tutkimuksen tarkoituksena oli selvittää, näkyykö teoreettinen painopisteen siirtyminen lähdetekstikeskeisyydestä kohdetekstikeskeisyyteen suomenkielisessä näytelmäkääntämisessä. Oletuksena oli, että siirtyminen näkyy käytetyissä käännösstrategioissa. Tutkimuksen teoriaosuudessa käsitellään ensin lähde- ja kohdetekstikeskeisiä käännösteorioita. Ensin esitellään kaksi lähdetekstikeskeistä teoriaa, jotka ovat Catfordin (1965) muodollinen vastaavuus ja Nidan (1964) dynaaminen ekvivalenssi. Kohdetekstikeskeisistä teorioista käsitellään Touryn (1980) ja Newmarkin (1981) teoreettisia näkemyksiä sekä Reiss ja Vermeerin (1986) esittelemää skopos-teoriaa. Vieraannuttamisen ja kotouttamisen periaatteet esitellään lyhyesti. Teoriaosuudessa käsitellään myös näytelmäkääntämistä, William Shakespearen kieltä ja siihen liittyviä käännösongelmia. Lisäksi esittelen lyhyesti Shakespearen kääntämistä Suomessa ja Julius Caesarin neljä kääntäjää. Tutkimuksen materiaalina oli neljä Shakespearen Julius Caesar –näytelmän suomennosta, joista Paavo Cajanderin käännös on julkaistu vuonna 1883, Eeva-Liisa Mannerin vuonna 1983, Lauri Siparin vuonna 2006 ja Jarkko Laineen vuonna 2007. Analyysissa käännöksiä verrattiin lähdetekstiin ja toisiinsa ja vertailtiin kääntäjien tekemiä käännösratkaisuja. Tulokset olivat oletuksen mukaisia. Lähdetekstikeskeisiä käännösstrategioita oli käytetty uusissa käännöksissä vähemmän kuin vanhemmissa. Kohdetekstikeskeiset strategiat erosivat huomattavasti toisistaan ja uusinta käännöstä voi sanoa adaptaatioksi. Jatkotutkimuksissa tulisi materiaali laajentaa koskemaan muitakin Shakespearen näytelmien suomennoksia. Eri aikakausien käännöksiä tulisi verrata keskenään ja toisiinsa, jotta voitaisiin luotettavasti kuvata muutosta lähde- ja kohdetekstikeskeisten käännösstrategioiden käytössä ja eri aikakausien tyypillisten strategioiden kartoittamiseksi.
Resumo:
Choice of industrial development options and the relevant allocation of the research funds become more and more difficult because of the increasing R&D costs and pressure for shorter development period. Forecast of the research progress is based on the analysis of the publications activity in the field of interest as well as on the dynamics of its change. Moreover, allocation of funds is hindered by exponential growth in the number of publications and patents. Thematic clusters become more and more difficult to identify, and their evolution hard to follow. The existing approaches of research field structuring and identification of its development are very limited. They do not identify the thematic clusters with adequate precision while the identified trends are often ambiguous. Therefore, there is a clear need to develop methods and tools, which are able to identify developing fields of research. The main objective of this Thesis is to develop tools and methods helping in the identification of the promising research topics in the field of separation processes. Two structuring methods as well as three approaches for identification of the development trends have been proposed. The proposed methods have been applied to the analysis of the research on distillation and filtration. The results show that the developed methods are universal and could be used to study of the various fields of research. The identified thematic clusters and the forecasted trends of their development have been confirmed in almost all tested cases. It proves the universality of the proposed methods. The results allow for identification of the fast-growing scientific fields as well as the topics characterized by stagnant or diminishing research activity.