4 resultados para data movement problem

em Memorial University Research Repository


Relevância:

90.00% 90.00%

Publicador:

Resumo:

The social media classification problems draw more and more attention in the past few years. With the rapid development of Internet and the popularity of computers, there is astronomical amount of information in the social network (social media platforms). The datasets are generally large scale and are often corrupted by noise. The presence of noise in training set has strong impact on the performance of supervised learning (classification) techniques. A budget-driven One-class SVM approach is presented in this thesis that is suitable for large scale social media data classification. Our approach is based on an existing online One-class SVM learning algorithm, referred as STOCS (Self-Tuning One-Class SVM) algorithm. To justify our choice, we first analyze the noise-resilient ability of STOCS using synthetic data. The experiments suggest that STOCS is more robust against label noise than several other existing approaches. Next, to handle big data classification problem for social media data, we introduce several budget driven features, which allow the algorithm to be trained within limited time and under limited memory requirement. Besides, the resulting algorithm can be easily adapted to changes in dynamic data with minimal computational cost. Compared with two state-of-the-art approaches, Lib-Linear and kNN, our approach is shown to be competitive with lower requirements of memory and time.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data integration systems offer uniform access to a set of autonomous and heterogeneous data sources. One of the main challenges in data integration is reconciling semantic differences among data sources. Approaches that been used to solve this problem can be categorized as schema-based and attribute-based. Schema-based approaches use schema information to identify the semantic similarity in data; furthermore, they focus on reconciling types before reconciling attributes. In contrast, attribute-based approaches use statistical and structural information of attributes to identify the semantic similarity of data in different sources. This research examines an approach to semantic reconciliation based on integrating properties expressed at different levels of abstraction or granularity using the concept of property precedence. Property precedence reconciles the meaning of attributes by identifying similarities between attributes based on what these attributes represent in the real world. In order to use property precedence for semantic integration, we need to identify the precedence of attributes within and across data sources. The goal of this research is to develop and evaluate a method and algorithms that will identify precedence relations among attributes and build property precedence graph (PPG) that can be used to support integration.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

My thesis examines fine-scale habitat use and movement patterns of age 1 Greenland cod (Gadus macrocephalus ogac) tracked using acoustic telemetry. Recent advances in tracking technologies such as GPS and acoustic telemetry have led to increasingly large and detailed datasets that present new opportunities for researchers to address fine-scale ecological questions regarding animal movement and spatial distribution. There is a growing demand for home range models that will not only work with massive quantities of autocorrelated data, but that can also exploit the added detail inherent in these high-resolution datasets. Most published home range studies use radio-telemetry or satellite data from terrestrial mammals or avian species, and most studies that evaluate the relative performance of home range models use simulated data. In Chapter 2, I used actual field-collected data from age-1 Greenland cod tracked with acoustic telemetry to evaluate the accuracy and precision of six home range models: minimum convex polygons, kernel densities with plug-in bandwidth selection and the reference bandwidth, adaptive local convex hulls, Brownian bridges, and dynamic Brownian bridges. I then applied the most appropriate model to two years (2010-2012) of tracking data collected from 82 tagged Greenland cod tracked in Newman Sound, Newfoundland, Canada, to determine diel and seasonal differences in habitat use and movement patterns (Chapter 3). Little is known of juvenile cod ecology, so resolving these relationships will provide valuable insight into activity patterns, habitat use, and predator-prey dynamics, while filling a knowledge gap regarding the use of space by age 1 Greenland cod in a coastal nursery habitat. By doing so, my thesis demonstrates an appropriate technique for modelling the spatial use of fish from acoustic telemetry data that can be applied to high-resolution, high-frequency tracking datasets collected from mobile organisms in any environment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis stems from the project with real-time environmental monitoring company EMSAT Corporation. They were looking for methods to automatically ag spikes and other anomalies in their environmental sensor data streams. The problem presents several challenges: near real-time anomaly detection, absence of labeled data and time-changing data streams. Here, we address this problem using both a statistical parametric approach as well as a non-parametric approach like Kernel Density Estimation (KDE). The main contribution of this thesis is extending the KDE to work more effectively for evolving data streams, particularly in presence of concept drift. To address that, we have developed a framework for integrating Adaptive Windowing (ADWIN) change detection algorithm with KDE. We have tested this approach on several real world data sets and received positive feedback from our industry collaborator. Some results appearing in this thesis have been presented at ECML PKDD 2015 Doctoral Consortium.