785 resultados para applicazione, business analysis, data mining, Facebook, PRIN, relazioni sociali, social network


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Acoustic classification of anurans (frogs) has received increasing attention for its promising application in biological and environment studies. In this study, a novel feature extraction method for frog call classification is presented based on the analysis of spectrograms. The frog calls are first automatically segmented into syllables. Then, spectral peak tracks are extracted to separate desired signal (frog calls) from background noise. The spectral peak tracks are used to extract various syllable features, including: syllable duration, dominant frequency, oscillation rate, frequency modulation, and energy modulation. Finally, a k-nearest neighbor classifier is used for classifying frog calls based on the results of principal component analysis. The experiment results show that syllable features can achieve an average classification accuracy of 90.5% which outperforms Mel-frequency cepstral coefficients features (79.0%).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Over past few decades, frog species have been experiencing dramatic decline around the world. The reason for this decline includes habitat loss, invasive species, climate change and so on. To better know the status of frog species, classifying frogs has become increasingly important. In this study, acoustic features are investigated for multi-level classification of Australian frogs: family, genus and species, including three families, eleven genera and eighty five species which are collected from Queensland, Australia. For each frog species, six instances are selected from which ten acoustic features are calculated. Then, the multicollinearity between ten features are studied for selecting non-correlated features for subsequent analysis. A decision tree (DT) classifier is used to visually and explicitly determine which acoustic features are relatively important for classifying family, which for genus, and which for species. Finally, a weighted support vector machines (SVMs) classifier is used for the multi- level classification with three most important acoustic features respectively. Our experiment results indicate that using different acoustic feature sets can successfully classify frogs at different levels and the average classification accuracy can be up to 85.6%, 86.1% and 56.2% for family, genus and species respectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Document clustering is one of the prominent methods for mining important information from the vast amount of data available on the web. However, document clustering generally suffers from the curse of dimensionality. Providentially in high dimensional space, data points tend to be more concentrated in some areas of clusters. We take advantage of this phenomenon by introducing a novel concept of dynamic cluster representation named as loci. Clusters’ loci are efficiently calculated using documents’ ranking scores generated from a search engine. We propose a fast loci-based semi-supervised document clustering algorithm that uses clusters’ loci instead of conventional centroids for assigning documents to clusters. Empirical analysis on real-world datasets shows that the proposed method produces cluster solutions with promising quality and is substantially faster than several benchmarked centroid-based semi-supervised document clustering methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the explosion of information resources, there is an imminent need to understand interesting text features or topics in massive text information. This thesis proposes a theoretical model to accurately weight specific text features, such as patterns and n-grams. The proposed model achieves impressive performance in two data collections, Reuters Corpus Volume 1 (RCV1) and Reuters 21578.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The development of techniques for scaling up classifiers so that they can be applied to problems with large datasets of training examples is one of the objectives of data mining. Recently, AdaBoost has become popular among machine learning community thanks to its promising results across a variety of applications. However, training AdaBoost on large datasets is a major problem, especially when the dimensionality of the data is very high. This paper discusses the effect of high dimensionality on the training process of AdaBoost. Two preprocessing options to reduce dimensionality, namely the principal component analysis and random projection are briefly examined. Random projection subject to a probabilistic length preserving transformation is explored further as a computationally light preprocessing step. The experimental results obtained demonstrate the effectiveness of the proposed training process for handling high dimensional large datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Acoustic recordings play an increasingly important role in monitoring terrestrial and aquatic environments. However, rapid advances in technology make it possible to accumulate thousands of hours of recordings, more than ecologists can ever listen to. Our approach to this big-data challenge is to visualize the content of long-duration audio recordings on multiple scales, from minutes, hours, days to years. The visualization should facilitate navigation and yield ecologically meaningful information prior to listening to the audio. To construct images, we calculate acoustic indices, statistics that describe the distribution of acoustic energy and reflect content of ecological interest. We combine various indices to produce false-color spectrogram images that reveal acoustic content and facilitate navigation. The technical challenge we investigate in this work is how to navigate recordings that are days or even months in duration. We introduce a method of zooming through multiple temporal scales, analogous to Google Maps. However, the “landscape” to be navigated is not geographical and not therefore intrinsically visual, but rather a graphical representation of the underlying audio. We describe solutions to navigating spectrograms that range over three orders of magnitude of temporal scale. We make three sets of observations: 1. We determine that at least ten intermediate scale steps are required to zoom over three orders of magnitude of temporal scale; 2. We determine that three different visual representations are required to cover the range of temporal scales; 3. We present a solution to the problem of maintaining visual continuity when stepping between different visual representations. Finally, we demonstrate the utility of the approach with four case studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Migraine is a common neurological disorder with a genetically complex background. This paper describes a meta-analysis of genome-wide association (GWA) studies on migraine, performed by the Dutch-Icelandic migraine genetics (DICE) consortium, which brings together six population-based European migraine cohorts with a total sample size of 10,980 individuals (2446 cases and 8534 controls). A total of 32 SNPs showed marginal evidence for association at a P-value<10(-5). The best result was obtained for SNP rs9908234, which had a P-value of 8.00 x 10(-8). This top SNP is located in the nerve growth factor receptor (NGFR) gene. However, this SNP did not replicate in three cohorts from the Netherlands and Australia. Of the other 31 SNPs, 18 SNPs were tested in two replication cohorts, but none replicated. In addition, we explored previously identified candidate genes in the meta-analysis data set. This revealed a modest gene-based significant association between migraine and the metadherin (MTDH) gene, previously identified in the first clinic-based GWA study (GWAS) for migraine (Bonferroni-corrected gene-based P-value=0.026). This finding is consistent with the involvement of the glutamate pathway in migraine. Additional research is necessary to further confirm the involvement of glutamate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Helicobacter pylori infection is a risk factor for gastric cancer, which is a major health issue worldwide. Gastric cancer has a poor prognosis due to the unnoticeable progression of the disease and surgery is the only available treatment in gastric cancer. Therefore, gastric cancer patients would greatly benefit from identifying biomarker genes that would improve diagnostic and prognostic prediction and provide targets for molecular therapies. DNA copy number amplifications are the hallmarks of cancers in various anatomical locations. Mechanisms of amplification predict that DNA double-strand breaks occur at the margins of the amplified region. The first objective of this thesis was to identify the genes that were differentially expressed in H. pylori infection as well as the transcription factors and signal transduction pathways that were associated with the gene expression changes. The second objective was to identify putative biomarker genes in gastric cancer with correlated expression and copy number, and the last objective was to characterize cancers based on DNA copy number amplifications. DNA microarrays, an in vitro model and real-time polymerase chain reaction were used to measure gene expression changes in H. pylori infected AGS cells. In order to identify the transcription factors and signal transduction pathways that were activated after H. pylori infection, gene expression profiling data from the H. pylori experiments and a bioinformatics approach accompanied by experimental validation were used. Genome-wide expression and copy number microarray analysis of clinical gastric cancer samples and immunohistochemistry on tissue microarray were used to identify putative gastric cancer genes. Data mining and machine learning techniques were applied to study amplifications in a cross-section of cancers. FOS and various stress response genes were regulated by H. pylori infection. H. pylori regulated genes were enriched in the chromosomal regions that are frequently changed in gastric cancer, suggesting that molecular pathways of gastric cancer and premalignant H. pylori infection that induces gastritis are interconnected. 16 transcription factors were identified as being associated with H. pylori infection induced changes in gene expression. NF-κB transcription factor and p50 and p65 subunits were verified using elecrophoretic mobility shift assays. ERBB2 and other genes located in 17q12- q21 were found to be up-regulated in association with copy number amplification in gastric cancer. Cancers with similar cell type and origin clustered together based on the genomic localization of the amplifications. Cancer genes and large genes were co-localized with amplified regions and fragile sites, telomeres, centromeres and light chromosome bands were enriched at the amplification boundaries. H. pylori activated transcription factors and signal transduction pathways function in cellular mechanisms that might be capable of promoting carcinogenesis of the stomach. Intestinal and diffuse type gastric cancers showed distinct molecular genetic profiles. Integration of gene expression and copy number microarray data allowed the identification of genes that might be involved in gastric carcinogenesis and have clinical relevance. Gene amplifications were demonstrated to be non-random genomic instabilities. Cell lineage, properties of precursor stem cells, tissue microenvironment and genomic map localization of specific oncogenes define the site specificity of DNA amplifications, whereas labile genomic features define the structures of amplicons. These conclusions suggest that the definition of genomic changes in cancer is based on the interplay between the cancer cell and the tumor microenvironment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective To assess the impact of exercise referral schemes on physical activity and health outcomes. Design Systematic review and meta-analysis. Data sources Medline, Embase, PsycINFO, Cochrane Library, ISI Web of Science, SPORTDiscus, and ongoing trial registries up to October 2009. We also checked study references. Study selection - Design: randomised controlled trials or non-randomised controlled (cluster or individual) studies published in peer review journals. - Population: sedentary individuals with or without medical diagnosis. - Exercise referral schemes defined as: clear referrals by primary care professionals to third party service providers to increase physical activity or exercise, physical activity or exercise programmes tailored to individuals, and initial assessment and monitoring throughout programmes. - Comparators: usual care, no intervention, or alternative exercise referral schemes. Results Eight randomised controlled trials met the inclusion criteria, comparing exercise referral schemes with usual care (six trials), alternative physical activity intervention (two), and an exercise referral scheme plus a self determination theory intervention (one). Compared with usual care, follow-up data for exercise referral schemes showed an increased number of participants who achieved 90-150 minutes of physical activity of at least moderate intensity per week (pooled relative risk 1.16, 95% confidence intervals 1.03 to 1.30) and a reduced level of depression (pooled standardised mean difference −0.82, −1.28 to −0.35). Evidence of a between group difference in physical activity of moderate or vigorous intensity or in other health outcomes was inconsistent at follow-up. We did not find any difference in outcomes between exercise referral schemes and the other two comparator groups. None of the included trials separately reported outcomes in individuals with specific medical diagnoses. Substantial heterogeneity in the quality and nature of the exercise referral schemes across studies might have contributed to the inconsistency in outcome findings. Conclusions Considerable uncertainty remains as to the effectiveness of exercise referral schemes for increasing physical activity, fitness, or health indicators, or whether they are an efficient use of resources for sedentary people with or without a medical diagnosis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research studied distributed computing of all-to-all comparison problems with big data sets. The thesis formalised the problem, and developed a high-performance and scalable computing framework with a programming model, data distribution strategies and task scheduling policies to solve the problem. The study considered storage usage, data locality and load balancing for performance improvement in solving the problem. The research outcomes can be applied in bioinformatics, biometrics and data mining and other domains in which all-to-all comparisons are a typical computing pattern.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an effective feature representation method in the context of activity recognition. Efficient and effective feature representation plays a crucial role not only in activity recognition, but also in a wide range of applications such as motion analysis, tracking, 3D scene understanding etc. In the context of activity recognition, local features are increasingly popular for representing videos because of their simplicity and efficiency. While they achieve state-of-the-art performance with low computational requirements, their performance is still limited for real world applications due to a lack of contextual information and models not being tailored to specific activities. We propose a new activity representation framework to address the shortcomings of the popular, but simple bag-of-words approach. In our framework, first multiple instance SVM (mi-SVM) is used to identify positive features for each action category and the k-means algorithm is used to generate a codebook. Then locality-constrained linear coding is used to encode the features into the generated codebook, followed by spatio-temporal pyramid pooling to convey the spatio-temporal statistics. Finally, an SVM is used to classify the videos. Experiments carried out on two popular datasets with varying complexity demonstrate significant performance improvement over the base-line bag-of-feature method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Network data packet capture and replay capabilities are basic requirements for forensic analysis of faults and security-related anomalies, as well as for testing and development. Cyber-physical networks, in which data packets are used to monitor and control physical devices, must operate within strict timing constraints, in order to match the hardware devices' characteristics. Standard network monitoring tools are unsuitable for such systems because they cannot guarantee to capture all data packets, may introduce their own traffic into the network, and cannot reliably reproduce the original timing of data packets. Here we present a high-speed network forensics tool specifically designed for capturing and replaying data traffic in Supervisory Control and Data Acquisition systems. Unlike general-purpose "packet capture" tools it does not affect the observed network's data traffic and guarantees that the original packet ordering is preserved. Most importantly, it allows replay of network traffic precisely matching its original timing. The tool was implemented by developing novel user interface and back-end software for a special-purpose network interface card. Experimental results show a clear improvement in data capture and replay capabilities over standard network monitoring methods and general-purpose forensics solutions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study examined factors (e.g., ad values and social networking advertising characteristics) influencing consumers' attitudes and behavioural intention towards three types of social networking advertising (SNA) on Facebook – home page ad, social impression ad, and organic impression ad. Findings demonstrate that peer influence had the most significant impacts on attitude and behavioural intention across all types of SNA. The significant interaction term of invasiveness and privacy concern indicates that both attitude and behavioural intention were diminished, particularly when perceived invasiveness and privacy concern were high simultaneously. In addition, results suggest that attitudes towards the ad played a mediating role between SNA characteristics and behavioural intention. Lastly, among the types of SNA, consumers preferred organic impression ads that featured friends' names on their newsfeed more than paid ads located on the sidebar of their Facebook pages.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multi-document summarization addressing the problem of information overload has been widely utilized in the various real-world applications. Most of existing approaches adopt term-based representation for documents which limit the performance of multi-document summarization systems. In this paper, we proposed a novel pattern-based topic model (PBTMSum) for the task of the multi-document summarization. PBTMSum combining pattern mining techniques with LDA topic modelling could generate discriminative and semantic rich representations for topics and documents so that the most representative and non-redundant sentences can be selected to form a succinct and informative summary. Extensive experiments are conducted on the data of document understanding conference (DUC) 2007. The results prove the effectiveness and efficiency of our proposed approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The idea of extracting knowledge in process mining is a descendant of data mining. Both mining disciplines emphasise data flow and relations among elements in the data. Unfortunately, challenges have been encountered when working with the data flow and relations. One of the challenges is that the representation of the data flow between a pair of elements or tasks is insufficiently simplified and formulated, as it considers only a one-to-one data flow relation. In this paper, we discuss how the effectiveness of knowledge representation can be extended in both disciplines. To this end, we introduce a new representation of the data flow and dependency formulation using a flow graph. The flow graph solves the issue of the insufficiency of presenting other relation types, such as many-to-one and one-to-many relations. As an experiment, a new evaluation framework is applied to the Teleclaim process in order to show how this method can provide us with more precise results when compared with other representations.