853 resultados para Data stream mining
Resumo:
Data Mining, Learning from data, graphical models, possibility theory
Resumo:
Magdeburg, Univ., Fak. für Informatik, Diss., 2012
Resumo:
Online Data Mining, Data Streams, Classification, Clustering
Resumo:
Data Mining, Vision Restoration, Treatment outcome prediction, Self-Organising-Map
Resumo:
Magdeburg, Univ., Fak. für Maschinenbau, Diss., 2009
Resumo:
Magdeburg, Univ., Fak. für Informatik, Diss., 2013
Resumo:
Consider a model with parameter phi, and an auxiliary model with parameter theta. Let phi be a randomly sampled from a given density over the known parameter space. Monte Carlo methods can be used to draw simulated data and compute the corresponding estimate of theta, say theta_tilde. A large set of tuples (phi, theta_tilde) can be generated in this manner. Nonparametric methods may be use to fit the function E(phi|theta_tilde=a), using these tuples. It is proposed to estimate phi using the fitted E(phi|theta_tilde=theta_hat), where theta_hat is the auxiliary estimate, using the real sample data. This is a consistent and asymptotically normally distributed estimator, under certain assumptions. Monte Carlo results for dynamic panel data and vector autoregressions show that this estimator can have very attractive small sample properties. Confidence intervals can be constructed using the quantiles of the phi for which theta_tilde is close to theta_hat. Such confidence intervals are found to have very accurate coverage.
Resumo:
Le "data mining", ou "fouille de données", est un ensemble de méthodes et de techniques attractif qui a connu une popularité fulgurante ces dernières années, spécialement dans le domaine du marketing. Le développement récent de l'analyse ou du renseignement criminel soulève des problèmatiques auxqwuelles il est tentant de d'appliquer ces méthodes et techniques. Le potentiel et la place du data mining dans le contexte de l'analyse criminelle doivent être mieux définis afin de piloter son application. Cette réflexion est menée dans le cadre du renseignement produit par des systèmes de détection et de suivi systématique de la criminalité répétitive, appelés processus de veille opérationnelle. Leur fonctionnement nécessite l'existence de patterns inscrits dans les données, et justifiés par les approches situationnelles en criminologie. Muni de ce bagage théorique, l'enjeu principal revient à explorer les possibilités de détecter ces patterns au travers des méthodes et techniques de data mining. Afin de répondre à cet objectif, une recherche est actuellement menée au Suisse à travers une approche interdisciplinaire combinant des connaissances forensiques, criminologiques et computationnelles.
Resumo:
The DNA microarray technology has arguably caught the attention of the worldwide life science community and is now systematically supporting major discoveries in many fields of study. The majority of the initial technical challenges of conducting experiments are being resolved, only to be replaced with new informatics hurdles, including statistical analysis, data visualization, interpretation, and storage. Two systems of databases, one containing expression data and one containing annotation data are quickly becoming essential knowledge repositories of the research community. This present paper surveys several databases, which are considered "pillars" of research and important nodes in the network. This paper focuses on a generalized workflow scheme typical for microarray experiments using two examples related to cancer research. The workflow is used to reference appropriate databases and tools for each step in the process of array experimentation. Additionally, benefits and drawbacks of current array databases are addressed, and suggestions are made for their improvement.
Resumo:
Imaging mass spectrometry (IMS) represents an innovative tool in the cancer research pipeline, which is increasingly being used in clinical and pharmaceutical applications. The unique properties of the technique, especially the amount of data generated, make the handling of data from multiple IMS acquisitions challenging. This work presents a histology-driven IMS approach aiming to identify discriminant lipid signatures from the simultaneous mining of IMS data sets from multiple samples. The feasibility of the developed workflow is evaluated on a set of three human colorectal cancer liver metastasis (CRCLM) tissue sections. Lipid IMS on tissue sections was performed using MALDI-TOF/TOF MS in both negative and positive ionization modes after 1,5-diaminonaphthalene matrix deposition by sublimation. The combination of both positive and negative acquisition results was performed during data mining to simplify the process and interrogate a larger lipidome into a single analysis. To reduce the complexity of the IMS data sets, a sub data set was generated by randomly selecting a fixed number of spectra from a histologically defined region of interest, resulting in a 10-fold data reduction. Principal component analysis confirmed that the molecular selectivity of the regions of interest is maintained after data reduction. Partial least-squares and heat map analyses demonstrated a selective signature of the CRCLM, revealing lipids that are significantly up- and down-regulated in the tumor region. This comprehensive approach is thus of interest for defining disease signatures directly from IMS data sets by the use of combinatory data mining, opening novel routes of investigation for addressing the demands of the clinical setting.
Resumo:
In this project a research both in finding predictors via clustering techniques and in reviewing the Data Mining free software is achieved. The research is based in a case of study, from where additionally to the KDD free software used by the scientific community; a new free tool for pre-processing the data is presented. The predictors are intended for the e-learning domain as the data from where these predictors have to be inferred are student qualifications from different e-learning environments. Through our case of study not only clustering algorithms are tested but also additional goals are proposed.
Resumo:
Human T-cell lymphotropic virus type 1 (HTLV-1) is mainly associated with two diseases: tropical spastic paraparesis/HTLV-1-associated myelopathy (TSP/HAM) and adult T-cell leukaemia/lymphoma. This retrovirus infects five-10 million individuals throughout the world. Previously, we developed a database that annotates sequence data from GenBank and the present study aimed to describe the clinical, molecular and epidemiological scenarios of HTLV-1 infection through the stored sequences in this database. A total of 2,545 registered complete and partial sequences of HTLV-1 were collected and 1,967 (77.3%) of those sequences represented unique isolates. Among these isolates, 93% contained geographic origin information and only 39% were related to any clinical status. A total of 1,091 sequences contained information about the geographic origin and viral subtype and 93% of these sequences were identified as subtype “a”. Ethnicity data are very scarce. Regarding clinical status data, 29% of the sequences were generated from TSP/HAM and 67.8% from healthy carrier individuals. Although the data mining enabled some inferences about specific aspects of HTLV-1 infection to be made, due to the relative scarcity of data of available sequences, it was not possible to delineate a global scenario of HTLV-1 infection.
Resumo:
O presente trabalho cujo Título é técnicas de Data e Text Mining para a anotação dum Arquivo Digital, tem como objectivo testar a viabilidade da utilização de técnicas de processamento automático de texto para a anotação das sessões dos debates parlamentares da Assembleia da República de Portugal. Ao longo do trabalho abordaram-se conceitos como tecnologias de descoberta do conhecimento (KDD), o processo da descoberta do conhecimento em texto, a caracterização das várias etapas do processamento de texto e a descrição de algumas ferramentas open souce para a mineração de texto. A metodologia utilizada baseou-se na experimentação de várias técnicas de processamento textual utilizando a open source R/tm. Apresentam-se, como resultados, a influência do pré-processamento, tamanho dos documentos e tamanhos dos corpora no resultado do processamento utilizando o algoritmo knnflex.