266 resultados para applicazione, business analysis, data mining, Facebook, PRIN, relazioni sociali, social network
Resumo:
With the explosion of information resources, there is an imminent need to understand interesting text features or topics in massive text information. This thesis proposes a theoretical model to accurately weight specific text features, such as patterns and n-grams. The proposed model achieves impressive performance in two data collections, Reuters Corpus Volume 1 (RCV1) and Reuters 21578.
Resumo:
Acoustic recordings play an increasingly important role in monitoring terrestrial and aquatic environments. However, rapid advances in technology make it possible to accumulate thousands of hours of recordings, more than ecologists can ever listen to. Our approach to this big-data challenge is to visualize the content of long-duration audio recordings on multiple scales, from minutes, hours, days to years. The visualization should facilitate navigation and yield ecologically meaningful information prior to listening to the audio. To construct images, we calculate acoustic indices, statistics that describe the distribution of acoustic energy and reflect content of ecological interest. We combine various indices to produce false-color spectrogram images that reveal acoustic content and facilitate navigation. The technical challenge we investigate in this work is how to navigate recordings that are days or even months in duration. We introduce a method of zooming through multiple temporal scales, analogous to Google Maps. However, the “landscape” to be navigated is not geographical and not therefore intrinsically visual, but rather a graphical representation of the underlying audio. We describe solutions to navigating spectrograms that range over three orders of magnitude of temporal scale. We make three sets of observations: 1. We determine that at least ten intermediate scale steps are required to zoom over three orders of magnitude of temporal scale; 2. We determine that three different visual representations are required to cover the range of temporal scales; 3. We present a solution to the problem of maintaining visual continuity when stepping between different visual representations. Finally, we demonstrate the utility of the approach with four case studies.
Resumo:
Migraine is a common neurological disorder with a genetically complex background. This paper describes a meta-analysis of genome-wide association (GWA) studies on migraine, performed by the Dutch-Icelandic migraine genetics (DICE) consortium, which brings together six population-based European migraine cohorts with a total sample size of 10,980 individuals (2446 cases and 8534 controls). A total of 32 SNPs showed marginal evidence for association at a P-value<10(-5). The best result was obtained for SNP rs9908234, which had a P-value of 8.00 x 10(-8). This top SNP is located in the nerve growth factor receptor (NGFR) gene. However, this SNP did not replicate in three cohorts from the Netherlands and Australia. Of the other 31 SNPs, 18 SNPs were tested in two replication cohorts, but none replicated. In addition, we explored previously identified candidate genes in the meta-analysis data set. This revealed a modest gene-based significant association between migraine and the metadherin (MTDH) gene, previously identified in the first clinic-based GWA study (GWAS) for migraine (Bonferroni-corrected gene-based P-value=0.026). This finding is consistent with the involvement of the glutamate pathway in migraine. Additional research is necessary to further confirm the involvement of glutamate.
Resumo:
Objective To assess the impact of exercise referral schemes on physical activity and health outcomes. Design Systematic review and meta-analysis. Data sources Medline, Embase, PsycINFO, Cochrane Library, ISI Web of Science, SPORTDiscus, and ongoing trial registries up to October 2009. We also checked study references. Study selection - Design: randomised controlled trials or non-randomised controlled (cluster or individual) studies published in peer review journals. - Population: sedentary individuals with or without medical diagnosis. - Exercise referral schemes defined as: clear referrals by primary care professionals to third party service providers to increase physical activity or exercise, physical activity or exercise programmes tailored to individuals, and initial assessment and monitoring throughout programmes. - Comparators: usual care, no intervention, or alternative exercise referral schemes. Results Eight randomised controlled trials met the inclusion criteria, comparing exercise referral schemes with usual care (six trials), alternative physical activity intervention (two), and an exercise referral scheme plus a self determination theory intervention (one). Compared with usual care, follow-up data for exercise referral schemes showed an increased number of participants who achieved 90-150 minutes of physical activity of at least moderate intensity per week (pooled relative risk 1.16, 95% confidence intervals 1.03 to 1.30) and a reduced level of depression (pooled standardised mean difference −0.82, −1.28 to −0.35). Evidence of a between group difference in physical activity of moderate or vigorous intensity or in other health outcomes was inconsistent at follow-up. We did not find any difference in outcomes between exercise referral schemes and the other two comparator groups. None of the included trials separately reported outcomes in individuals with specific medical diagnoses. Substantial heterogeneity in the quality and nature of the exercise referral schemes across studies might have contributed to the inconsistency in outcome findings. Conclusions Considerable uncertainty remains as to the effectiveness of exercise referral schemes for increasing physical activity, fitness, or health indicators, or whether they are an efficient use of resources for sedentary people with or without a medical diagnosis.
Resumo:
This research studied distributed computing of all-to-all comparison problems with big data sets. The thesis formalised the problem, and developed a high-performance and scalable computing framework with a programming model, data distribution strategies and task scheduling policies to solve the problem. The study considered storage usage, data locality and load balancing for performance improvement in solving the problem. The research outcomes can be applied in bioinformatics, biometrics and data mining and other domains in which all-to-all comparisons are a typical computing pattern.
Resumo:
This paper presents an effective feature representation method in the context of activity recognition. Efficient and effective feature representation plays a crucial role not only in activity recognition, but also in a wide range of applications such as motion analysis, tracking, 3D scene understanding etc. In the context of activity recognition, local features are increasingly popular for representing videos because of their simplicity and efficiency. While they achieve state-of-the-art performance with low computational requirements, their performance is still limited for real world applications due to a lack of contextual information and models not being tailored to specific activities. We propose a new activity representation framework to address the shortcomings of the popular, but simple bag-of-words approach. In our framework, first multiple instance SVM (mi-SVM) is used to identify positive features for each action category and the k-means algorithm is used to generate a codebook. Then locality-constrained linear coding is used to encode the features into the generated codebook, followed by spatio-temporal pyramid pooling to convey the spatio-temporal statistics. Finally, an SVM is used to classify the videos. Experiments carried out on two popular datasets with varying complexity demonstrate significant performance improvement over the base-line bag-of-feature method.
Resumo:
Network data packet capture and replay capabilities are basic requirements for forensic analysis of faults and security-related anomalies, as well as for testing and development. Cyber-physical networks, in which data packets are used to monitor and control physical devices, must operate within strict timing constraints, in order to match the hardware devices' characteristics. Standard network monitoring tools are unsuitable for such systems because they cannot guarantee to capture all data packets, may introduce their own traffic into the network, and cannot reliably reproduce the original timing of data packets. Here we present a high-speed network forensics tool specifically designed for capturing and replaying data traffic in Supervisory Control and Data Acquisition systems. Unlike general-purpose "packet capture" tools it does not affect the observed network's data traffic and guarantees that the original packet ordering is preserved. Most importantly, it allows replay of network traffic precisely matching its original timing. The tool was implemented by developing novel user interface and back-end software for a special-purpose network interface card. Experimental results show a clear improvement in data capture and replay capabilities over standard network monitoring methods and general-purpose forensics solutions.
Resumo:
This study examined factors (e.g., ad values and social networking advertising characteristics) influencing consumers' attitudes and behavioural intention towards three types of social networking advertising (SNA) on Facebook – home page ad, social impression ad, and organic impression ad. Findings demonstrate that peer influence had the most significant impacts on attitude and behavioural intention across all types of SNA. The significant interaction term of invasiveness and privacy concern indicates that both attitude and behavioural intention were diminished, particularly when perceived invasiveness and privacy concern were high simultaneously. In addition, results suggest that attitudes towards the ad played a mediating role between SNA characteristics and behavioural intention. Lastly, among the types of SNA, consumers preferred organic impression ads that featured friends' names on their newsfeed more than paid ads located on the sidebar of their Facebook pages.
Resumo:
Multi-document summarization addressing the problem of information overload has been widely utilized in the various real-world applications. Most of existing approaches adopt term-based representation for documents which limit the performance of multi-document summarization systems. In this paper, we proposed a novel pattern-based topic model (PBTMSum) for the task of the multi-document summarization. PBTMSum combining pattern mining techniques with LDA topic modelling could generate discriminative and semantic rich representations for topics and documents so that the most representative and non-redundant sentences can be selected to form a succinct and informative summary. Extensive experiments are conducted on the data of document understanding conference (DUC) 2007. The results prove the effectiveness and efficiency of our proposed approach.
Resumo:
The idea of extracting knowledge in process mining is a descendant of data mining. Both mining disciplines emphasise data flow and relations among elements in the data. Unfortunately, challenges have been encountered when working with the data flow and relations. One of the challenges is that the representation of the data flow between a pair of elements or tasks is insufficiently simplified and formulated, as it considers only a one-to-one data flow relation. In this paper, we discuss how the effectiveness of knowledge representation can be extended in both disciplines. To this end, we introduce a new representation of the data flow and dependency formulation using a flow graph. The flow graph solves the issue of the insufficiency of presenting other relation types, such as many-to-one and one-to-many relations. As an experiment, a new evaluation framework is applied to the Teleclaim process in order to show how this method can provide us with more precise results when compared with other representations.
Resumo:
Identifying unusual or anomalous patterns in an underlying dataset is an important but challenging task in many applications. The focus of the unsupervised anomaly detection literature has mostly been on vectorised data. However, many applications are more naturally described using higher-order tensor representations. Approaches that vectorise tensorial data can destroy the structural information encoded in the high-dimensional space, and lead to the problem of the curse of dimensionality. In this paper we present the first unsupervised tensorial anomaly detection method, along with a randomised version of our method. Our anomaly detection method, the One-class Support Tensor Machine (1STM), is a generalisation of conventional one-class Support Vector Machines to higher-order spaces. 1STM preserves the multiway structure of tensor data, while achieving significant improvement in accuracy and efficiency over conventional vectorised methods. We then leverage the theory of nonlinear random projections to propose the Randomised 1STM (R1STM). Our empirical analysis on several real and synthetic datasets shows that our R1STM algorithm delivers comparable or better accuracy to a state-of-the-art deep learning method and traditional kernelised approaches for anomaly detection, while being approximately 100 times faster in training and testing.