807 resultados para frequency based knowledge discovery
Resumo:
We investigate the use of independent component analysis (ICA) for speech feature extraction in digits speech recognition systems. We observe that this may be true for recognition tasks based on Geometrical Learning with little training data. In contrast to image processing, phase information is not essential for digits speech recognition. We therefore propose a new scheme that shows how the phase sensitivity can be removed by using an analytical description of the ICA-adapted basis functions. Furthermore, since the basis functions are not shift invariant, we extend the method to include a frequency-based ICA stage that removes redundant time shift information. The digits speech recognition results show promising accuracy. Experiments show that the method based on ICA and Geometrical Learning outperforms HMM in a different number of training samples.
Resumo:
We investigate the use of independent component analysis (ICA) for speech feature extraction in digits speech recognition systems. We observe that this may be true for recognition tasks based on Geometrical Learning with little training data. In contrast to image processing, phase information is not essential for digits speech recognition. We therefore propose a new scheme that shows how the phase sensitivity can be removed by using an analytical description of the ICA-adapted basis functions. Furthermore, since the basis functions are not shift invariant, we extend the method to include a frequency-based ICA stage that removes redundant time shift information. The digits speech recognition results show promising accuracy. Experiments show that the method based on ICA and Geometrical Learning outperforms HMM in a different number of training samples.
Resumo:
This study investigated the method of the focus identification in Chinese text discourse and the relationship between accent and focus, large corpus analysis and decision tree were used in the research. The main results are: 1. Based on the concept of the Focus and understanding of the discourse, Foci identification is consistent and steady; 2. Special Focus markers and specific Focus constructions have greater influence than special constituent order on identifying Focus in Chinese discourse; while information states also have great influence on focus identifying; part of speech,information state, the relative position in the sentence, focus-sensitive operator, specific Focus constructions, contrast relations, relations between the sentences are important factors to focus identifying; 3. Using multi-dimensional tagging and knowledge discovery, it is a feasible way to construct and employ decision trees by computing tagging results to identify Focus; 4. Focus predicting also depends on literal types and styles of the discourse, several types of decision trees should be constructed for different literal types; 5. In the monologue discourse, the most prominent accent is located on the Focus word or in the scope of the Focus; there are some kinds of rules on accent assignment in broad Focus; it is necessary to analyze and classify focus structure for the research of relations between accent and Focus.
Resumo:
R. Jensen, Q. Shen, Data Reduction with Rough Sets, In: Encyclopedia of Data Warehousing and Mining - 2nd Edition, Vol. II, 2008.
Resumo:
Time-series and sequences are important patterns in data mining. Based on an ontology of time-elements, this paper presents a formal characterization of time-series and state-sequences, where a state denotes a collection of data whose validation is dependent on time. While a time-series is formalized as a vector of time-elements temporally ordered one after another, a state-sequence is denoted as a list of states correspondingly ordered by a time-series. In general, a time-series and a state-sequence can be incomplete in various ways. This leads to the distinction between complete and incomplete time-series, and between complete and incomplete state-sequences, which allows the expression of both absolute and relative temporal knowledge in data mining.
Resumo:
Temporal representation and reasoning plays an important role in Data Mining and Knowledge Discovery, particularly, in mining and recognizing patterns with rich temporal information. Based on a formal characterization of time-series and state-sequences, this paper presents the computational technique and algorithm for matching state-based temporal patterns. As a case study of real-life applications, zone-defense pattern recognition in basketball games is specially examined as an illustrating example. Experimental results demonstrate that it provides a formal and comprehensive temporal ontology for research and applications in video events detection.
Resumo:
This is a report on the 7th Annual Congress of International Drug Discovery Science and Technology held in Shanghai, China from 22–25 October, 2009. The conference, organized by BIT Life Sciences, comprised several parallel sessions, keynote presentations and a selection of selection of 20-minute presentations covering a range of therapeutic areas, including general medicinal chemistry, oncology, inflammation, receptors and ion channels, drug, metabolism and pharmokinetics, and fragment-based drug discovery. There were also sessions devoted to genomics, biomarkers, immunology, cell biology, molecular imaging and biochips. Supported by an exhibition of services/products and posters, the conference underlined the marked presence of Asian CROs.
Resumo:
It is noted that the determination of an oscillation frequency by used of the power spectrum of measured time series is susceptible to filtering of the signal. Similarly, frequency measurements made by period counting can yield different, results depending on how the signal is filtered for noise reduction. In an attempt to eliminate these ambiguities, a new measure of frequency, based on an approximate reconstruction of the phase-space trajectory of the oscillator from the signal, is introduced. This measure is shown to be invariant under linear filtering. For this reason, it is also inaccessible by spectral methods. The effect of filtering on frequency for weakly nonlinear, noisy oscillators, to which this definition applies only imperfectly, is quantified. This work provides the theoretical basis for frequency measurements employing MIRVA filtering.
Resumo:
This paper presents a preliminary study of developing a novel distributed adaptive real-time learning framework for wide area monitoring of power systems integrated with distributed generations using synchrophasor technology. The framework comprises distributed agents (synchrophasors) for autonomous local condition monitoring and fault detection, and a central unit for generating global view for situation awareness and decision making. Key technologies that can be integrated into this hierarchical distributed learning scheme are discussed to enable real-time information extraction and knowledge discovery for decision making, without explicitly accumulating and storing all raw data by the central unit. Based on this, the configuration of a wide area monitoring system of power systems using synchrophasor technology, and the functionalities for locally installed open-phasor-measurement-units (OpenPMUs) and a central unit are presented. Initial results on anti-islanding protection using the proposed approach are given to illustrate the effectiveness.
Resumo:
University classroom talk is a collaborative struggle to make meaning. Taking the perspectival nature of interaction as central, this paper presents an investigation of the genre of spoken academic discourse and in particular the types of activities which are orientated to the goal of collaborative ideas or tasks, such as seminars, tutorials, workshops. The purpose of the investigation was to identify examples of dialogicality through an examination of stance-taking. The data used in this study is a spoken corpus of academic English created from recordings of a range of subject discipline classrooms at a UK university. A frequency-based approach to recurrent word sequences (lexical bundles) was used to identify signals of epistemic and attitudinal stance and to initiate an exploration of the features of elaboration. Findings of quantitative and qualitative analyses reveal some similarities and differences between this study and those of US based classroom contexts in relation to the use and frequency of lexical bundles. Findings also highlight the process that elaboration plays in grounding perspectives and negotiating alignment of interactants. Elaboration seems to afford the space for the enactment of student stance in relation to the tutor embodiment of discipline knowledge.
Resumo:
The last decade has witnessed an unprecedented growth in availability of data having spatio-temporal characteristics. Given the scale and richness of such data, finding spatio-temporal patterns that demonstrate significantly different behavior from their neighbors could be of interest for various application scenarios such as – weather modeling, analyzing spread of disease outbreaks, monitoring traffic congestions, and so on. In this paper, we propose an automated approach of exploring and discovering such anomalous patterns irrespective of the underlying domain from which the data is recovered. Our approach differs significantly from traditional methods of spatial outlier detection, and employs two phases – i) discovering homogeneous regions, and ii) evaluating these regions as anomalies based on their statistical difference from a generalized neighborhood. We evaluate the quality of our approach and distinguish it from existing techniques via an extensive experimental evaluation.
Resumo:
A importância e preocupação dedicadas à autonomia e independência das pessoas idosas e dos pacientes que sofrem de algum tipo de deficiência tem vindo a aumentar significativamente ao longo das últimas décadas. As cadeiras de rodas inteligentes (CRI) são tecnologias que podem ajudar este tipo de população a aumentar a sua autonomia, sendo atualmente uma área de investigação bastante ativa. Contudo, a adaptação das CRIs a pacientes específicos e a realização de experiências com utilizadores reais são assuntos de estudo ainda muito pouco aprofundados. A cadeira de rodas inteligente, desenvolvida no âmbito do Projeto IntellWheels, é controlada a alto nível utilizando uma interface multimodal flexível, recorrendo a comandos de voz, expressões faciais, movimentos de cabeça e através de joystick. Este trabalho teve como finalidade a adaptação automática da CRI atendendo às características dos potenciais utilizadores. Foi desenvolvida uma metodologia capaz de criar um modelo do utilizador. A investigação foi baseada num sistema de recolha de dados que permite obter e armazenar dados de voz, expressões faciais, movimentos de cabeça e do corpo dos pacientes. A utilização da CRI pode ser efetuada em diferentes situações em ambiente real e simulado e um jogo sério foi desenvolvido permitindo especificar um conjunto de tarefas a ser realizado pelos utilizadores. Os dados foram analisados recorrendo a métodos de extração de conhecimento, de modo a obter o modelo dos utilizadores. Usando os resultados obtidos pelo sistema de classificação, foi criada uma metodologia que permite selecionar a melhor interface e linguagem de comando da cadeira para cada utilizador. A avaliação para validação da abordagem foi realizada no âmbito do Projeto FCT/RIPD/ADA/109636/2009 - "IntellWheels - Intelligent Wheelchair with Flexible Multimodal Interface". As experiências envolveram um vasto conjunto de indivíduos que sofrem de diversos níveis de deficiência, em estreita colaboração com a Escola Superior de Tecnologia de Saúde do Porto e a Associação do Porto de Paralisia Cerebral. Os dados recolhidos através das experiências de navegação na CRI foram acompanhados por questionários preenchidos pelos utilizadores. Estes dados foram analisados estatisticamente, a fim de provar a eficácia e usabilidade na adequação da interface da CRI ao utilizador. Os resultados mostraram, em ambiente simulado, um valor de usabilidade do sistema de 67, baseado na opinião de uma amostra de pacientes que apresentam os graus IV e V (os mais severos) de Paralisia Cerebral. Foi também demonstrado estatisticamente que a interface atribuída automaticamente pela ferramenta tem uma avaliação superior à sugerida pelos técnicos de Terapia Ocupacional, mostrando a possibilidade de atribuir automaticamente uma linguagem de comando adaptada a cada utilizador. Experiências realizadas com distintos modos de controlo revelaram a preferência dos utilizadores por um controlo compartilhado com um nível de ajuda associado ao nível de constrangimento do paciente. Em conclusão, este trabalho demonstra que é possível adaptar automaticamente uma CRI ao utilizador com claros benefícios a nível de usabilidade e segurança.
Resumo:
The rapid evolution and proliferation of a world-wide computerized network, the Internet, resulted in an overwhelming and constantly growing amount of publicly available data and information, a fact that was also verified in biomedicine. However, the lack of structure of textual data inhibits its direct processing by computational solutions. Information extraction is the task of text mining that intends to automatically collect information from unstructured text data sources. The goal of the work described in this thesis was to build innovative solutions for biomedical information extraction from scientific literature, through the development of simple software artifacts for developers and biocurators, delivering more accurate, usable and faster results. We started by tackling named entity recognition - a crucial initial task - with the development of Gimli, a machine-learning-based solution that follows an incremental approach to optimize extracted linguistic characteristics for each concept type. Afterwards, Totum was built to harmonize concept names provided by heterogeneous systems, delivering a robust solution with improved performance results. Such approach takes advantage of heterogenous corpora to deliver cross-corpus harmonization that is not constrained to specific characteristics. Since previous solutions do not provide links to knowledge bases, Neji was built to streamline the development of complex and custom solutions for biomedical concept name recognition and normalization. This was achieved through a modular and flexible framework focused on speed and performance, integrating a large amount of processing modules optimized for the biomedical domain. To offer on-demand heterogenous biomedical concept identification, we developed BeCAS, a web application, service and widget. We also tackled relation mining by developing TrigNER, a machine-learning-based solution for biomedical event trigger recognition, which applies an automatic algorithm to obtain the best linguistic features and model parameters for each event type. Finally, in order to assist biocurators, Egas was developed to support rapid, interactive and real-time collaborative curation of biomedical documents, through manual and automatic in-line annotation of concepts and relations. Overall, the research work presented in this thesis contributed to a more accurate update of current biomedical knowledge bases, towards improved hypothesis generation and knowledge discovery.
Resumo:
Presently power system operation produces huge volumes of data that is still treated in a very limited way. Knowledge discovery and machine learning can make use of these data resulting in relevant knowledge with very positive impact. In the context of competitive electricity markets these data is of even higher value making clear the trend to make data mining techniques application in power systems more relevant. This paper presents two cases based on real data, showing the importance of the use of data mining for supporting demand response and for supporting player strategic behavior.
Resumo:
Projecto para obtenção do grau de Mestre em Engenharia Informática e de computadores