980 resultados para pattern extraction
Resumo:
This study presents an automatic, computer-aided analytical method called Comparison Structure Analysis (CSA), which can be applied to different dimensions of music. The aim of CSA is first and foremost practical: to produce dynamic and understandable representations of musical properties by evaluating the prevalence of a chosen musical data structure through a musical piece. Such a comparison structure may refer to a mathematical vector, a set, a matrix or another type of data structure and even a combination of data structures. CSA depends on an abstract systematic segmentation that allows for a statistical or mathematical survey of the data. To choose a comparison structure is to tune the apparatus to be sensitive to an exclusive set of musical properties. CSA settles somewhere between traditional music analysis and computer aided music information retrieval (MIR). Theoretically defined musical entities, such as pitch-class sets, set-classes and particular rhythm patterns are detected in compositions using pattern extraction and pattern comparison algorithms that are typical within the field of MIR. In principle, the idea of comparison structure analysis can be applied to any time-series type data and, in the music analytical context, to polyphonic as well as homophonic music. Tonal trends, set-class similarities, invertible counterpoints, voice-leading similarities, short-term modulations, rhythmic similarities and multiparametric changes in musical texture were studied. Since CSA allows for a highly accurate classification of compositions, its methods may be applicable to symbolic music information retrieval as well. The strength of CSA relies especially on the possibility to make comparisons between the observations concerning different musical parameters and to combine it with statistical and perhaps other music analytical methods. The results of CSA are dependent on the competence of the similarity measure. New similarity measures for tonal stability, rhythmic and set-class similarity measurements were proposed. The most advanced results were attained by employing the automated function generation – comparable with the so-called genetic programming – to search for an optimal model for set-class similarity measurements. However, the results of CSA seem to agree strongly, independent of the type of similarity function employed in the analysis.
Resumo:
This paper proposes an efficient pattern extraction algorithm that can be applied on melodic sequences that are represented as strings of abstract intervallic symbols; the melodic representation introduces special “binary don’t care” symbols for intervals that may belong to two partially overlapping intervallic categories. As a special case the well established “step–leap” representation is examined. In the step–leap representation, each melodic diatonic interval is classified as a step (±s), a leap (±l) or a unison (u). Binary don’t care symbols are used to represent the possible overlapping between the various abstract categories e.g. *=s, *=l and #=-s, #=-l. We propose an O(n+d(n-d)+z)-time algorithm for computing all maximal-pairs in a given sequence x=x[1..n], where x contains d occurrences of binary don’t cares and z is the number of reported maximal-pairs.
Resumo:
This paper proposes an efficient pattern extraction algorithm that can be applied on melodic sequences that are represented as strings of abstract intervallic symbols; the melodic representation introduces special “binary don’t care” symbols for intervals that may belong to two partially overlapping intervallic categories. As a special case the well established “step–leap” representation is examined. In the step–leap representation, each melodic diatonic interval is classified as a step (±s), a leap (±l) or a unison (u). Binary don’t care symbols are used to represent the possible overlapping between the various abstract categories e.g. *=s, *=l and #=-s, #=-l. We propose an O(n+d(n-d)+z)-time algorithm for computing all maximal-pairs in a given sequence x=x[1..n], where x contains d occurrences of binary don’t cares and z is the number of reported maximal-pairs.
Resumo:
Most existing approaches to Twitter sentiment analysis assume that sentiment is explicitly expressed through affective words. Nevertheless, sentiment is often implicitly expressed via latent semantic relations, patterns and dependencies among words in tweets. In this paper, we propose a novel approach that automatically captures patterns of words of similar contextual semantics and sentiment in tweets. Unlike previous work on sentiment pattern extraction, our proposed approach does not rely on external and fixed sets of syntactical templates/patterns, nor requires deep analyses of the syntactic structure of sentences in tweets. We evaluate our approach with tweet- and entity-level sentiment analysis tasks by using the extracted semantic patterns as classification features in both tasks. We use 9 Twitter datasets in our evaluation and compare the performance of our patterns against 6 state-of-the-art baselines. Results show that our patterns consistently outperform all other baselines on all datasets by 2.19% at the tweet-level and 7.5% at the entity-level in average F-measure.
Resumo:
Clustering algorithms, pattern mining techniques and associated quality metrics emerged as reliable methods for modeling learners’ performance, comprehension and interaction in given educational scenarios. The specificity of available data such as missing values, extreme values or outliers, creates a challenge to extract significant user models from an educational perspective. In this paper we introduce a pattern detection mechanism with-in our data analytics tool based on k-means clustering and on SSE, silhouette, Dunn index and Xi-Beni index quality metrics. Experiments performed on a dataset obtained from our online e-learning platform show that the extracted interaction patterns were representative in classifying learners. Furthermore, the performed monitoring activities created a strong basis for generating automatic feedback to learners in terms of their course participation, while relying on their previous performance. In addition, our analysis introduces automatic triggers that highlight learners who will potentially fail the course, enabling tutors to take timely actions.
Resumo:
Second language (L2) learning outcomes may depend on the structure of the input and learners’ cognitive abilities. This study tested whether less predictable input might facilitate learning and generalization of L2 morphology while evaluating contributions of statistical learning ability, nonverbal intelligence, phonological short-term memory, and verbal working memory. Over three sessions, 54 adults were exposed to a Russian case-marking paradigm with a balanced or skewed item distribution in the input. Whereas statistical learning ability and nonverbal intelligence predicted learning of trained items, only nonverbal intelligence also predicted generalization of case-marking inflections to new vocabulary. Neither measure of temporary storage capacity predicted learning. Balanced, less predictable input was associated with higher accuracy in generalization but only in the initial test session. These results suggest that individual differences in pattern extraction play a more sustained role in L2 acquisition than instructional manipulations that vary the predictability of lexical items in the input.
Resumo:
The cell phone has become a means of finding persuasive content, making friends and accessing information. For this reason it is of great interest in analyzing is use by young people during their formative years. Regarding this point, the objectives of the research are: to obtain percentage data on the distraction call phones cause among students; to define their habits re fun activities and their study responsibility in their personal and academic lives; and to identify leisure-expressive and referential trends. It was decided to analyze the Spanish discourse found on Twitter. To do so, a quantitative and qualitative methodology was used, with a linguistic pattern extraction tool which allows us to obtain categories of meaning. The sample was 10,000 tweets in Spanish, with key words such as “cell” and “study” and all possible derivatives. The data were gathered between 30 May and the 6 June 2014, which coincides with the start of the exam period in Spain. Finally, the potential applications of the research to the specific field of publicity and advertising is discussed as a possible solution to the current needs of the brands, which have to find participative formats taking into account the students’ leisure trends.
Resumo:
In present research, headspace solid-phase microextraction (HS-SPME) followed by gas chromatography–mass spectrometry (GC–qMS), was evaluated as a reliable and improved alternative to the commonly used liquid–liquid extraction (LLE) technique for the establishment of the pattern of hydrolytically released components of 7 Vitis vinifera L. grape varieties, commonly used to produce the world-famous Madeira wine. Since there is no data available on their glycosidic fractions, at a first step, two hydrolyse procedures, acid and enzymatic, were carried out using Boal grapes as matrix. Several parameters susceptible of influencing the hydrolytic process were studied. The best results, expressed as GC peak area, number of identified components and reproducibility, were obtained using ProZym M with b-glucosidase activity at 35 °C for 42 h. For the extraction of hydrolytically released components, HS-SPME technique was evaluated as a reliable and improved alternative to the conventional extraction technique, LLE (ethyl acetate). HS-SPME using DVB/CAR/PDMS as coating fiber displayed an extraction capacity two fold higher than LLE (ethyl acetate). The hydrolyzed fraction was mainly characterized by the occurrence of aliphatic and aromatic alcohols, followed by acids, esters, carbonyl compounds, terpenoids, and volatile phenols. Concerning to terpenoids its contribution to the total hydrolyzed fraction is highest for Malvasia Cândida (23%) and Malvasia Roxa (13%), and their presence according previous studies, even at low concentration, is important from a sensorial point of view (can impart floral notes to the wines), due to their low odor threshold (μg/L). According to the obtained data by principal component analysis (PCA), the sensorial properties of Madeira wines produced by Malvasia Cândida and Malvasia Roxa could be improved by hydrolysis procedure, since their hydrolyzed fraction is mainly characterized by terpenoids (e.g. linalool, geraniol) which are responsible for floral notes. Bual and Sercial grapes are characterized by aromatic alcohols (e.g. benzyl alcohol, 2-phenylethyl alcohol), so an improvement in sensorial characteristics (citrus, sweet and floral odors) of the corresponding wines, as result of hydrolytic process, is expected.
Resumo:
Background: A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological pathways. In this work we utilised the latest information on the mouse transcriptome as revealed by the RIKEN FANTOM2 project to identify novel human disease-related candidate genes. We define a new term patholog to mean a homolog of a human disease-related gene encoding a product ( transcript, anti-sense or protein) potentially relevant to disease. Rather than just focus on Mendelian inheritance, we applied the analysis to all potential pathologs regardless of their inheritance pattern. Results: Bioinformatic analysis and human curation of 60,770 RIKEN full-length mouse cDNA clones produced 2,578 sequences that showed similarity ( 70 - 85% identity) to known human-disease genes. Using a newly developed biological information extraction and annotation tool ( FACTS) in parallel with human expert analysis of 17,051 MEDLINE scientific abstracts we identified 182 novel potential pathologs. Of these, 36 were identified by computational tools only, 49 by human expert analysis only and 97 by both methods. These pathologs were related to neoplastic ( 53%), hereditary ( 24%), immunological ( 5%), cardio-vascular (4%), or other (14%), disorders. Conclusions: Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease. For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature. This paper demonstrates the power of combining human expert annotation with FACTS, a newly developed bioinformatics tool, to identify novel pathologs from within large-scale mouse transcript datasets.
Resumo:
Pattern recognition methods have been successfully applied in several functional neuroimaging studies. These methods can be used to infer cognitive states, so-called brain decoding. Using such approaches, it is possible to predict the mental state of a subject or a stimulus class by analyzing the spatial distribution of neural responses. In addition it is possible to identify the regions of the brain containing the information that underlies the classification. The Support Vector Machine (SVM) is one of the most popular methods used to carry out this type of analysis. The aim of the current study is the evaluation of SVM and Maximum uncertainty Linear Discrimination Analysis (MLDA) in extracting the voxels containing discriminative information for the prediction of mental states. The comparison has been carried out using fMRI data from 41 healthy control subjects who participated in two experiments, one involving visual-auditory stimulation and the other based on bimanual fingertapping sequences. The results suggest that MLDA uses significantly more voxels containing discriminative information (related to different experimental conditions) to classify the data. On the other hand, SVM is more parsimonious and uses less voxels to achieve similar classification accuracies. In conclusion, MLDA is mostly focused on extracting all discriminative information available, while SVM extracts the information which is sufficient for classification. (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
The vision of the Internet of Things (IoT) includes large and dense deployment of interconnected smart sensing and monitoring devices. This vast deployment necessitates collection and processing of large volume of measurement data. However, collecting all the measured data from individual devices on such a scale may be impractical and time consuming. Moreover, processing these measurements requires complex algorithms to extract useful information. Thus, it becomes imperative to devise distributed information processing mechanisms that identify application-specific features in a timely manner and with a low overhead. In this article, we present a feature extraction mechanism for dense networks that takes advantage of dominance-based medium access control (MAC) protocols to (i) efficiently obtain global extrema of the sensed quantities, (ii) extract local extrema, and (iii) detect the boundaries of events, by using simple transforms that nodes employ on their local data. We extend our results for a large dense network with multiple broadcast domains (MBD). We discuss and compare two approaches for addressing the challenges with MBD and we show through extensive evaluations that our proposed distributed MBD approach is fast and efficient at retrieving the most valuable measurements, independent of the number sensor nodes in the network.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica
Resumo:
Based in internet growth, through semantic web, together with communication speed improvement and fast development of storage device sizes, data and information volume rises considerably every day. Because of this, in the last few years there has been a growing interest in structures for formal representation with suitable characteristics, such as the possibility to organize data and information, as well as the reuse of its contents aimed for the generation of new knowledge. Controlled Vocabulary, specifically Ontologies, present themselves in the lead as one of such structures of representation with high potential. Not only allow for data representation, as well as the reuse of such data for knowledge extraction, coupled with its subsequent storage through not so complex formalisms. However, for the purpose of assuring that ontology knowledge is always up to date, they need maintenance. Ontology Learning is an area which studies the details of update and maintenance of ontologies. It is worth noting that relevant literature already presents first results on automatic maintenance of ontologies, but still in a very early stage. Human-based processes are still the current way to update and maintain an ontology, which turns this into a cumbersome task. The generation of new knowledge aimed for ontology growth can be done based in Data Mining techniques, which is an area that studies techniques for data processing, pattern discovery and knowledge extraction in IT systems. This work aims at proposing a novel semi-automatic method for knowledge extraction from unstructured data sources, using Data Mining techniques, namely through pattern discovery, focused in improving the precision of concept and its semantic relations present in an ontology. In order to verify the applicability of the proposed method, a proof of concept was developed, presenting its results, which were applied in building and construction sector.
Resumo:
Evaluating leaf litter beetle data sampled by Winkler extraction from Atlantic forest sites in southern Brazil. To evaluate the reliability of data obtained by Winkler extraction in Atlantic forest sites in southern Brazil, we studied litter beetle assemblages in secondary forests (5 to 55 years after abandonment) and old-growth forests at two seasonally different points in time. For all regeneration stages, species density and abundance were lower in April compared to August; but, assemblage composition of the corresponding forest stages was similar in both months. We suggest that sampling of small litter inhabiting beetles at different points in time using the Winkler technique reveals identical ecological patterns, which are more likely to be influenced by sample incompleteness than by differences in their assemblage composition. A strong relationship between litter quantity and beetle occurrences indicates the importance of this variable for the temporal species density pattern. Additionally, the sampled beetle material was compared with beetle data obtained with pitfall traps in one old-growth forest. Over 60% of the focal species captured with pitfall traps were also sampled by Winkler extraction in different forest stages. Few beetles with a body size too large to be sampled by Winkler extraction were only sampled with pitfall traps. This indicates that the local litter beetle fauna is dominated by small species. Hence, being aware of the exclusion of large beetles and beetle species occurring during the wet season, the Winkler method reveals a reliable picture of the local leaf litter beetle community.