899 resultados para Information Filtering, Pattern Mining, Relevance Feature Discovery, Text Mining


Relevância:

50.00% 50.00%

Publicador:

Resumo:

We review recent visualization techniques aimed at supporting tasks that require the analysis of text documents, from approaches targeted at visually summarizing the relevant content of a single document to those aimed at assisting exploratory investigation of whole collections of documents.Techniques are organized considering their target input materialeither single texts or collections of textsand their focus, which may be at displaying content, emphasizing relevant relationships, highlighting the temporal evolution of a document or collection, or helping users to handle results from a query posed to a search engine.We describe the approaches adopted by distinct techniques and briefly review the strategies they employ to obtain meaningful text models, discuss how they extract the information required to produce representative visualizations, the tasks they intend to support and the interaction issues involved, and strengths and limitations. Finally, we show a summary of techniques, highlighting their goals and distinguishing characteristics. We also briefly discuss some open problems and research directions in the fields of visual text mining and text analytics.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Background: The integration of sequencing and gene interaction data and subsequent generation of pathways and networks contained in databases such as KEGG Pathway is essential for the comprehension of complex biological processes. We noticed the absence of a chart or pathway describing the well-studied preimplantation development stages; furthermore, not all genes involved in the process have entries in KEGG Orthology, important information for knowledge application with relation to other organisms. Results: In this work we sought to develop the regulatory pathway for the preimplantation development stage using text-mining tools such as Medline Ranker and PESCADOR to reveal biointeractions among the genes involved in this process. The genes present in the resulting pathway were also used as seeds for software developed by our group called SeedServer to create clusters of homologous genes. These homologues allowed the determination of the last common ancestor for each gene and revealed that the preimplantation development pathway consists of a conserved ancient core of genes with the addition of modern elements. Conclusions: The generation of regulatory pathways through text-mining tools allows the integration of data generated by several studies for a more complete visualization of complex biological processes. Using the genes in this pathway as “seeds” for the generation of clusters of homologues, the pathway can be visualized for other organisms. The clustering of homologous genes together with determination of the ancestry leads to a better understanding of the evolution of such process.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Advances in biomedical signal acquisition systems for motion analysis have led to lowcost and ubiquitous wearable sensors which can be used to record movement data in different settings. This implies the potential availability of large amounts of quantitative data. It is then crucial to identify and to extract the information of clinical relevance from the large amount of available data. This quantitative and objective information can be an important aid for clinical decision making. Data mining is the process of discovering such information in databases through data processing, selection of informative data, and identification of relevant patterns. The databases considered in this thesis store motion data from wearable sensors (specifically accelerometers) and clinical information (clinical data, scores, tests). The main goal of this thesis is to develop data mining tools which can provide quantitative information to the clinician in the field of movement disorders. This thesis will focus on motor impairment in Parkinson's disease (PD). Different databases related to Parkinson subjects in different stages of the disease were considered for this thesis. Each database is characterized by the data recorded during a specific motor task performed by different groups of subjects. The data mining techniques that were used in this thesis are feature selection (a technique which was used to find relevant information and to discard useless or redundant data), classification, clustering, and regression. The aims were to identify high risk subjects for PD, characterize the differences between early PD subjects and healthy ones, characterize PD subtypes and automatically assess the severity of symptoms in the home setting.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Il problema relativo alla predizione, la ricerca di pattern predittivi all‘interno dei dati, è stato studiato ampiamente. Molte metodologie robuste ed efficienti sono state sviluppate, procedimenti che si basano sull‘analisi di informazioni numeriche strutturate. Quella testuale, d‘altro canto, è una tipologia di informazione fortemente destrutturata. Quindi, una immediata conclusione, porterebbe a pensare che per l‘analisi predittiva su dati testuali sia necessario sviluppare metodi completamente diversi da quelli ben noti dalle tecniche di data mining. Un problema di predizione può essere risolto utilizzando invece gli stessi metodi : dati testuali e documenti possono essere trasformati in valori numerici, considerando per esempio l‘assenza o la presenza di termini, rendendo di fatto possibile una utilizzazione efficiente delle tecniche già sviluppate. Il text mining abilita la congiunzione di concetti da campi di applicazione estremamente eterogenei. Con l‘immensa quantità di dati testuali presenti, basti pensare, sul World Wide Web, ed in continua crescita a causa dell‘utilizzo pervasivo di smartphones e computers, i campi di applicazione delle analisi di tipo testuale divengono innumerevoli. L‘avvento e la diffusione dei social networks e della pratica di micro blogging abilita le persone alla condivisione di opinioni e stati d‘animo, creando un corpus testuale di dimensioni incalcolabili aggiornato giornalmente. Le nuove tecniche di Sentiment Analysis, o Opinion Mining, si occupano di analizzare lo stato emotivo o la tipologia di opinione espressa all‘interno di un documento testuale. Esse sono discipline attraverso le quali, per esempio, estrarre indicatori dello stato d‘animo di un individuo, oppure di un insieme di individui, creando una rappresentazione dello stato emotivo sociale. L‘andamento dello stato emotivo sociale può condizionare macroscopicamente l‘evolvere di eventi globali? Studi in campo di Economia e Finanza Comportamentale assicurano un legame fra stato emotivo, capacità nel prendere decisioni ed indicatori economici. Grazie alle tecniche disponibili ed alla mole di dati testuali continuamente aggiornati riguardanti lo stato d‘animo di milioni di individui diviene possibile analizzare tali correlazioni. In questo studio viene costruito un sistema per la previsione delle variazioni di indici di borsa, basandosi su dati testuali estratti dalla piattaforma di microblogging Twitter, sotto forma di tweets pubblici; tale sistema include tecniche di miglioramento della previsione basate sullo studio di similarità dei testi, categorizzandone il contributo effettivo alla previsione.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Die tropische Süsswasserschnecke Biomphalaria glabrata gehört zu der Familie der Planorbidae, welche als einziges Taxon der Gastropoden Hämoglobin als Sauerstofftransportprotein verwenden. Als Zwischenwirt des Bilharzioseerregers Schistosoma mansoni ist B. glabrata von tropenmedizinischer Interesse. Das extrazelluläre BgHb zeigt sich mit einem Anteil von 95% als Hauptprotein in der Hämolymphe. Dieses setzt sich aus Polypeptidketten mit je 240kDa zusammen. Diese wiederrum lassen sich in 13-Häm-Domänen und eine deutlich kleinere N-terminalen nicht Häm-Domäne untergliedern. Die Sequenzierung von zwei der drei Untereinheiten des BgHb (BgHb1, BgHb2) ermöglichte die rekombinante Expression ganzer Untereinheiten in Insektenzellen, und die Expression einiger BgHb2-Konstrukte in E. coli Zellen. Im Rahmen meiner Arbeit gelang es, BgHb1 in biologisch aktiver Form in Insektenzellen zu exprimieren. Das aus dem Überstand der Insektenzellen aufgereinigte rekombinante BgHb1 zeigte eine immunologische Identität mit nativen BgHb. Strukturelle Analysen belegten zudem die Assemblierung des rekombinanten BgHb1 zu einer dem nativen Protein gleichenden Quartärstruktur. Demnach konnte in meiner Arbeit der Nachweis erbracht werden, dass eine einzelne Isoform in der Lage ist, zur Quartärstruktur zu assemblieren. Zusätzlich ergaben Sauerstoffbindungsanalysen, dass das rekombinante BgHb1 reversibel Sauerstoff binden kann.rnIn den restlichen 5% der B. glabrata Hämolymphe zeigt sich ein rudimentäres Hämocyanin, welches für den Sauerstofftransport keine Rolle zu spielen scheint, und ein rosettenförmiges Protein, das es aufzuklären galt. Durch massenspektrometrische Analysen erhaltene Peptidfragmente zeigten eine hohe Sequenzähnlichkeit zu den löslichen Acetylcholin -Bindeproteinen anderer Mollusken. Diese AChBP zeigen eine hohe Sequenzähnlichkeit zur Ligandenbindedomäne von Rezeptoren der Cys-Loop-Proteinfamilie.rnDatenbankrecherchen deckten die Existenz zweier Isoformen auf

Relevância:

50.00% 50.00%

Publicador:

Resumo:

We propose a method that robustly combines color and feature buffers to denoise Monte Carlo renderings. On one hand, feature buffers, such as per pixel normals, textures, or depth, are effective in determining denoising filters because features are highly correlated with rendered images. Filters based solely on features, however, are prone to blurring image details that are not well represented by the features. On the other hand, color buffers represent all details, but they may be less effective to determine filters because they are contaminated by the noise that is supposed to be removed. We propose to obtain filters using a combination of color and feature buffers in an NL-means and cross-bilateral filtering framework. We determine a robust weighting of colors and features using a SURE-based error estimate. We show significant improvements in subjective and quantitative errors compared to the previous state-of-the-art. We also demonstrate adaptive sampling and space-time filtering for animations.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Much of the research on visual hallucinations (VHs) has been conducted in the context of eye disease and neurodegenerative conditions, but little is known about these phenomena in psychiatric and nonclinical populations. The purpose of this article is to bring together current knowledge regarding VHs in the psychosis phenotype and contrast this data with the literature drawn from neurodegenerative disorders and eye disease. The evidence challenges the traditional views that VHs are atypical or uncommon in psychosis. The weighted mean for VHs is 27% in schizophrenia, 15% in affective psychosis, and 7.3% in the general community. VHs are linked to a more severe psychopathological profile and less favorable outcome in psychosis and neurodegenerative conditions. VHs typically co-occur with auditory hallucinations, suggesting a common etiological cause. VHs in psychosis are also remarkably complex, negative in content, and are interpreted to have personal relevance. The cognitive mechanisms of VHs in psychosis have rarely been investigated, but existing studies point to source-monitoring deficits and distortions in top-down mechanisms, although evidence for visual processing deficits, which feature strongly in the organic literature, is lacking. Brain imaging studies point to the activation of visual cortex during hallucinations on a background of structural and connectivity changes within wider brain networks. The relationship between VHs in psychosis, eye disease, and neurodegeneration remains unclear, although the pattern of similarities and differences described in this review suggests that comparative studies may have potentially important clinical and theoretical implications.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Large amounts of animal health care data are present in veterinary electronic medical records (EMR) and they present an opportunity for companion animal disease surveillance. Veterinary patient records are largely in free-text without clinical coding or fixed vocabulary. Text-mining, a computer and information technology application, is needed to identify cases of interest and to add structure to the otherwise unstructured data. In this study EMR's were extracted from veterinary management programs of 12 participating veterinary practices and stored in a data warehouse. Using commercially available text-mining software (WordStat™), we developed a categorization dictionary that could be used to automatically classify and extract enteric syndrome cases from the warehoused electronic medical records. The diagnostic accuracy of the text-miner for retrieving cases of enteric syndrome was measured against human reviewers who independently categorized a random sample of 2500 cases as enteric syndrome positive or negative. Compared to the reviewers, the text-miner retrieved cases with enteric signs with a sensitivity of 87.6% (95%CI, 80.4-92.9%) and a specificity of 99.3% (95%CI, 98.9-99.6%). Automatic and accurate detection of enteric syndrome cases provides an opportunity for community surveillance of enteric pathogens in companion animals.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Background Simple Sequence Repeats (SSRs) are widely used in population genetic studies but their classical development is costly and time-consuming. The ever-increasing available DNA datasets generated by high-throughput techniques offer an inexpensive alternative for SSRs discovery. Expressed Sequence Tags (ESTs) have been widely used as SSR source for plants of economic relevance but their application to non-model species is still modest. Methods Here, we explored the use of publicly available ESTs (GenBank at the National Center for Biotechnology Information-NCBI) for SSRs development in non-model plants, focusing on genera listed by the International Union for the Conservation of Nature (IUCN). We also search two model genera with fully annotated genomes for EST-SSRs, Arabidopsis and Oryza, and used them as controls for genome distribution analyses. Overall, we downloaded 16 031 555 sequences for 258 plant genera which were mined for SSRsand their primers with the help of QDD1. Genome distribution analyses in Oryza and Arabidopsis were done by blasting the sequences with SSR against the Oryza sativa and Arabidopsis thaliana reference genomes implemented in the Basal Local Alignment Tool (BLAST) of the NCBI website. Finally, we performed an empirical test to determine the performance of our EST-SSRs in a few individuals from four species of two eudicot genera, Trifolium and Centaurea. Results We explored a total of 14 498 726 EST sequences from the dbEST database (NCBI) in 257 plant genera from the IUCN Red List. We identify a very large number (17 102) of ready-to-test EST-SSRs in most plant genera (193) at no cost. Overall, dinucleotide and trinucleotide repeats were the prevalent types but the abundance of the various types of repeat differed between taxonomic groups. Control genomes revealed that trinucleotide repeats were mostly located in coding regions while dinucleotide repeats were largely associated with untranslated regions. Our results from the empirical test revealed considerable amplification success and transferability between congenerics. Conclusions The present work represents the first large-scale study developing SSRs by utilizing publicly accessible EST databases in threatened plants. Here we provide a very large number of ready-to-test EST-SSR (17 102) for 193 genera. The cross-species transferability suggests that the number of possible target species would be large. Since trinucleotide repeats are abundant and mainly linked to exons they might be useful in evolutionary and conservation studies. Altogether, our study highly supports the use of EST databases as an extremely affordable and fast alternative for SSR developing in threatened plants.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

In this paper, we describe NewsCATS (news categorization and trading system), a system implemented to predict stock price trends for the time immediately after the publication of press releases. NewsCATS consists mainly of three components. The first component retrieves relevant information from press releases through the application of text preprocessing techniques. The second component sorts the press releases into predefined categories. Finally, appropriate trading strategies are derived by the third component by means of the earlier categorization. The findings indicate that a categorization of press releases is able to provide additional information that can be used to forecast stock price trends, but that an adequate trading strategy is essential for the results of the categorization to be fully exploited.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The main goal of the bilingual and monolingual participation of the MIRACLE team in CLEF 2004 was to test the effect of combination approaches on information retrieval. The starting point was a set of basic components: stemming, transformation, filtering, generation of n-grams, weighting and relevance feedback. Some of these basic components were used in different combinations and order of application for document indexing and for query processing. A second order combination was also tested, mainly by averaging or selective combination of the documents retrieved by different approaches for a particular query.