Biblioteca Digital

899 resultados para Information Filtering, Pattern Mining, Relevance Feature Discovery, Text Mining

Proceedings of LOUHI '08. The First Conference on Text and Data Mining of Clinical Documents

Relevância:

40.00% 40.00%

Publicador:

Veja mais

Opinion Mining

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this thesis we study the field of opinion mining by giving a comprehensive review of the available research that has been done in this topic. Also using this available knowledge we present a case study of a multilevel opinion mining system for a student organization's sales management system. We describe the field of opinion mining by discussing its historical roots, its motivations and applications as well as the different scientific approaches that have been used to solve this challenging problem of mining opinions. To deal with this huge subfield of natural language processing, we first give an abstraction of the problem of opinion mining and describe the theoretical frameworks that are available for dealing with appraisal language. Then we discuss the relation between opinion mining and computational linguistics which is a crucial pre-processing step for the accuracy of the subsequent steps of opinion mining. The second part of our thesis deals with the semantics of opinions where we describe the different ways used to collect lists of opinion words as well as the methods and techniques available for extracting knowledge from opinions present in unstructured textual data. In the part about collecting lists of opinion words we describe manual, semi manual and automatic ways to do so and give a review of the available lists that are used as gold standards in opinion mining research. For the methods and techniques of opinion mining we divide the task into three levels that are the document, sentence and feature level. The techniques that are presented in the document and sentence level are divided into supervised and unsupervised approaches that are used to determine the subjectivity and polarity of texts and sentences at these levels of analysis. At the feature level we give a description of the techniques available for finding the opinion targets, the polarity of the opinions about these opinion targets and the opinion holders. Also at the feature level we discuss the various ways to summarize and visualize the results of this level of analysis. In the third part of our thesis we present a case study of a sales management system that uses free form text and that can benefit from an opinion mining system. Using the knowledge gathered in the review of this field we provide a theoretical multi level opinion mining system (MLOM) that can perform most of the tasks needed from an opinion mining system. Based on the previous research we give some hints that many of the laborious market research tasks that are done by the sales force, which uses this sales management system, can improve their insight about their partners and by that increase the quality of their sales services and their overall results.

Veja mais

Informed recommender: basing recommendations on consumer product reviews

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Recommender systems attempt to predict items in which a user might be interested, given some information about the user's and items' profiles. Most existing recommender systems use content-based or collaborative filtering methods or hybrid methods that combine both techniques (see the sidebar for more details). We created Informed Recommender to address the problem of using consumer opinion about products, expressed online in free-form text, to generate product recommendations. Informed recommender uses prioritized consumer product reviews to make recommendations. Using text-mining techniques, it maps each piece of each review comment automatically into an ontology

Veja mais

Data mining techniques for identification of spectrally homogeneous areas using NDVI temporal profiles of soybean crop

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The aim of this study was to group temporal profiles of 10-day composites NDVI product by similarity, which was obtained by the SPOT Vegetation sensor, for municipalities with high soybean production in the state of Paraná, Brazil, in the 2005/2006 cropping season. Data mining is a valuable tool that allows extracting knowledge from a database, identifying valid, new, potentially useful and understandable patterns. Therefore, it was used the methods for clusters generation by means of the algorithms K-Means, MAXVER and DBSCAN, implemented in the WEKA software package. Clusters were created based on the average temporal profiles of NDVI of the 277 municipalities with high soybean production in the state and the best results were found with the K-Means algorithm, grouping the municipalities into six clusters, considering the period from the beginning of October until the end of March, which is equivalent to the crop vegetative cycle. Half of the generated clusters presented spectro-temporal pattern, a characteristic of soybeans and were mostly under the soybean belt in the state of Paraná, which shows good results that were obtained with the proposed methodology as for identification of homogeneous areas. These results will be useful for the creation of regional soybean "masks" to estimate the planted area for this crop.

Veja mais

Vocalization data mining for estimating swine stress conditions

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This study aimed to identify differences in swine vocalization pattern according to animal gender and different stress conditions. A total of 150 barrow males and 150 females (Dalland® genetic strain), aged 100 days, were used in the experiment. Pigs were exposed to different stressful situations: thirst (no access to water), hunger (no access to food), and thermal stress (THI exceeding 74). For the control treatment, animals were kept under a comfort situation (animals with full access to food and water, with environmental THI lower than 70). Acoustic signals were recorded every 30 minutes, totaling six samples for each stress situation. Afterwards, the audios were analyzed by Praat® 5.1.19 software, generating a sound spectrum. For determination of stress conditions, data were processed by WEKA® 3.5 software, using the decision tree algorithm C4.5, known as J48 in the software environment, considering cross-validation with samples of 10% (10-fold cross-validation). According to the Decision Tree, the acoustic most important attribute for the classification of stress conditions was sound Intensity (root node). It was not possible to identify, using the tested attributes, the animal gender by vocal register. A decision tree was generated for recognition of situations of swine hunger, thirst, and heat stress from records of sound intensity, Pitch frequency, and Formant 1.

Veja mais

Identification of research trends in the field of separation processes. Application of epidemiological model, citation analysis, text mining, and technical analysis of the financial markets

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Choice of industrial development options and the relevant allocation of the research funds become more and more difficult because of the increasing R&D costs and pressure for shorter development period. Forecast of the research progress is based on the analysis of the publications activity in the field of interest as well as on the dynamics of its change. Moreover, allocation of funds is hindered by exponential growth in the number of publications and patents. Thematic clusters become more and more difficult to identify, and their evolution hard to follow. The existing approaches of research field structuring and identification of its development are very limited. They do not identify the thematic clusters with adequate precision while the identified trends are often ambiguous. Therefore, there is a clear need to develop methods and tools, which are able to identify developing fields of research. The main objective of this Thesis is to develop tools and methods helping in the identification of the promising research topics in the field of separation processes. Two structuring methods as well as three approaches for identification of the development trends have been proposed. The proposed methods have been applied to the analysis of the research on distillation and filtration. The results show that the developed methods are universal and could be used to study of the various fields of research. The identified thematic clusters and the forecasted trends of their development have been confirmed in almost all tested cases. It proves the universality of the proposed methods. The results allow for identification of the fast-growing scientific fields as well as the topics characterized by stagnant or diminishing research activity.

Veja mais

Envia – a repository for environmental information access and discovery

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Veja mais

Biomedical Event Extraction with Machine Learning

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Biomedical natural language processing (BioNLP) is a subfield of natural language processing, an area of computational linguistics concerned with developing programs that work with natural language: written texts and speech. Biomedical relation extraction concerns the detection of semantic relations such as protein-protein interactions (PPI) from scientific texts. The aim is to enhance information retrieval by detecting relations between concepts, not just individual concepts as with a keyword search. In recent years, events have been proposed as a more detailed alternative for simple pairwise PPI relations. Events provide a systematic, structural representation for annotating the content of natural language texts. Events are characterized by annotated trigger words, directed and typed arguments and the ability to nest other events. For example, the sentence “Protein A causes protein B to bind protein C” can be annotated with the nested event structure CAUSE(A, BIND(B, C)). Converted to such formal representations, the information of natural language texts can be used by computational applications. Biomedical event annotations were introduced by the BioInfer and GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task on Event Extraction. In this thesis we present a method for automated event extraction, implemented as the Turku Event Extraction System (TEES). A unified graph format is defined for representing event annotations and the problem of extracting complex event structures is decomposed into a number of independent classification tasks. These classification tasks are solved using SVM and RLS classifiers, utilizing rich feature representations built from full dependency parsing. Building on earlier work on pairwise relation extraction and using a generalized graph representation, the resulting TEES system is capable of detecting binary relations as well as complex event structures. We show that this event extraction system has good performance, reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently, TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared Tasks, as well as shown competitive performance in the binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared tasks. The Turku Event Extraction System is published as a freely available open-source project, documenting the research in detail as well as making the method available for practical applications. In particular, in this thesis we describe the application of the event extraction method to PubMed-scale text mining, showing how the developed approach not only shows good performance, but is generalizable and applicable to large-scale real-world text mining projects. Finally, we discuss related literature, summarize the contributions of the work and present some thoughts on future directions for biomedical event extraction. This thesis includes and builds on six original research publications. The first of these introduces the analysis of dependency parses that leads to development of TEES. The entries in the three BioNLP Shared Tasks, as well as in the DDIExtraction 2011 task are covered in four publications, and the sixth one demonstrates the application of the system to PubMed-scale text mining.

Veja mais

Differential expression of AMPA-type glutamate receptor subunits during development of the chick optic tectum

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Glutamate receptors have been often associated with developmental processes. We used immunohistochemical techniques to evaluate the expression of the AMPA-type glutamate receptor (GluR) subunits in the chick optic tectum (TeO). Chick embryos from the 5th through the 20th embryonic day (E5-E20) and one-day-old (P1) chicks were used. The three types of immunoreactivity evaluated (GluR1, GluR2/3, and GluR4) had different temporal and spatial expression patterns in the several layers of the TeO. The GluR1 subunit first appeared as moderate staining on E7 and then increased on E9. The mature GluR1 pattern included intense staining only in layer 5 of the TeO. The GluR2/3 subunits presented low expression on E5, which became intense on E7. The staining for GluR2/3 changed to very intense on E14 in tectal layer 13. Staining of layer 13 neurons is the most prominent feature of GluR immunoreactivity in the adult TeO. The GluR4 subunit generally presented the lowest expression starting on E7, which was similar to the adult pattern. Some instances of transient expression of GluR subunits were observed in specific cell populations from E9 through E20. These results demonstrate a differential expression of the GluR subunits in the embryonic TeO, adding information about their possible functions in the developmental processes of the visual system.

Veja mais

Epidemiological studies in the information and genomics era: experience of the Clinical Genome of Cancer Project in São Paulo, Brazil

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Genomics is expanding the horizons of epidemiology, providing a new dimension for classical epidemiological studies and inspiring the development of large-scale multicenter studies with the statistical power necessary for the assessment of gene-gene and gene-environment interactions in cancer etiology and prognosis. This paper describes the methodology of the Clinical Genome of Cancer Project in São Paulo, Brazil (CGCP), which includes patients with nine types of tumors and controls. Three major epidemiological designs were used to reach specific objectives: cross-sectional studies to examine gene expression, case-control studies to evaluate etiological factors, and follow-up studies to analyze genetic profiles in prognosis. The clinical groups included patients' data in the electronic database through the Internet. Two approaches were used for data quality control: continuous data evaluation and data entry consistency. A total of 1749 cases and 1509 controls were entered into the CGCP database from the first trimester of 2002 to the end of 2004. Continuous evaluation showed that, for all tumors taken together, only 0.5% of the general form fields still included potential inconsistencies by the end of 2004. Regarding data entry consistency, the highest percentage of errors (11.8%) was observed for the follow-up form, followed by 6.7% for the clinical form, 4.0% for the general form, and only 1.1% for the pathology form. Good data quality is required for their transformation into useful information for clinical application and for preventive measures. The use of the Internet for communication among researchers and for data entry is perhaps the most innovative feature of the CGCP. The monitoring of patients' data guaranteed their quality.

Veja mais

Liber fostering Open Science and Knowledge Discovery

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Presentation of Kristiina Hormia-Poutanen at the 25th Anniversary Conference of The National Repository Library of Finland, Kuopio 22th of May 2015.

Veja mais

A novel polar-based human face recognition computational model

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Motivated by a recently proposed biologically inspired face recognition approach, we investigated the relation between human behavior and a computational model based on Fourier-Bessel (FB) spatial patterns. We measured human recognition performance of FB filtered face images using an 8-alternative forced-choice method. Test stimuli were generated by converting the images from the spatial to the FB domain, filtering the resulting coefficients with a band-pass filter, and finally taking the inverse FB transformation of the filtered coefficients. The performance of the computational models was tested using a simulation of the psychophysical experiment. In the FB model, face images were first filtered by simulated V1- type neurons and later analyzed globally for their content of FB components. In general, there was a higher human contrast sensitivity to radially than to angularly filtered images, but both functions peaked at the 11.3-16 frequency interval. The FB-based model presented similar behavior with regard to peak position and relative sensitivity, but had a wider frequency band width and a narrower response range. The response pattern of two alternative models, based on local FB analysis and on raw luminance, strongly diverged from the human behavior patterns. These results suggest that human performance can be constrained by the type of information conveyed by polar patterns, and consequently that humans might use FB-like spatial patterns in face processing.

Veja mais

Sleep pattern and learning in knockdown mice with reduced cholinergic neurotransmission

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Impaired cholinergic neurotransmission can affect memory formation and influence sleep-wake cycles (SWC). In the present study, we describe the SWC in mice with a deficient vesicular acetylcholine transporter (VAChT) system, previously characterized as presenting reduced acetylcholine release and cognitive and behavioral dysfunctions. Continuous, chronic ECoG and EMG recordings were used to evaluate the SWC pattern during light and dark phases in VAChT knockdown heterozygous (VAChT-KDHET, n=7) and wild-type (WT, n=7) mice. SWC were evaluated for sleep efficiency, total amount and mean duration of slow-wave, intermediate and paradoxical sleep, as well as the number of awakenings from sleep. After recording SWC, contextual fear-conditioning tests were used as an acetylcholine-dependent learning paradigm. The results showed that sleep efficiency in VAChT-KDHET animals was similar to that of WT mice, but that the SWC was more fragmented. Fragmentation was characterized by an increase in the number of awakenings, mainly during intermediate sleep. VAChT-KDHET animals performed poorly in the contextual fear-conditioning paradigm (mean freezing time: 34.4±3.1 and 44.5±3.3 s for WT and VAChT-KDHET animals, respectively), which was followed by a 45% reduction in the number of paradoxical sleep episodes after the training session. Taken together, the results show that reduced cholinergic transmission led to sleep fragmentation and learning impairment. We discuss the results on the basis of cholinergic plasticity and its relevance to sleep homeostasis. We suggest that VAChT-KDHET mice could be a useful model to test cholinergic drugs used to treat sleep dysfunction in neurodegenerative disorders.

Veja mais

Ripening pattern of guava cv. Pedro Sato

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Guava is a fruit with high respiration rates and a very short shelf life. Since information on its respiration pattern is contradictory, the objective was to study the changes occurring in the fruit during ripening and to relate them to the respiration behavior of this fruit. Guavas were picked at the half-ripe stage and stored for 8 days at 22 ± 1 ºC and 78 ± 1% relative humidity. The analyses conducted were: peel and pulp coloration, firmness, total soluble solids (TSS), total titratable acidity (TTA), and ethylene production. According to the results, it was verified that the parameters analyzed apparently do not coincide and are ethylene-independent. There was an accentuated ethylene production during ripening, starting from the 4th day. The ethylene synthesis continued increasing up to the 8th day, when the fruits were already decomposing. It was observed that the firmness decreased sharply in the first three days of ripening, and the skin and pulp color changed during ripening. The TSS, total soluble solids, and the TTA, total titratable acidity, practically did not change during the ripening, even with the increased ethylene production. It can be concluded that guava is a fruit that presents characteristics of climacteric and non-climacteric fruits.

Veja mais

Feature Selection and Classification Using Age Layered Population Structure Genetic Programming

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and deterministic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel metaheuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS metaheuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.

Veja mais

899 resultados para Information Filtering, Pattern Mining, Relevance Feature Discovery, Text Mining

Filtro por publicador