6 resultados para Data cleaning

em Digital Commons at Florida International University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the advent of peer to peer networks, and more importantly sensor networks, the desire to extract useful information from continuous and unbounded streams of data has become more prominent. For example, in tele-health applications, sensor based data streaming systems are used to continuously and accurately monitor Alzheimer's patients and their surrounding environment. Typically, the requirements of such applications necessitate the cleaning and filtering of continuous, corrupted and incomplete data streams gathered wirelessly in dynamically varying conditions. Yet, existing data stream cleaning and filtering schemes are incapable of capturing the dynamics of the environment while simultaneously suppressing the losses and corruption introduced by uncertain environmental, hardware, and network conditions. Consequently, existing data cleaning and filtering paradigms are being challenged. This dissertation develops novel schemes for cleaning data streams received from a wireless sensor network operating under non-linear and dynamically varying conditions. The study establishes a paradigm for validating spatio-temporal associations among data sources to enhance data cleaning. To simplify the complexity of the validation process, the developed solution maps the requirements of the application on a geometrical space and identifies the potential sensor nodes of interest. Additionally, this dissertation models a wireless sensor network data reduction system by ascertaining that segregating data adaptation and prediction processes will augment the data reduction rates. The schemes presented in this study are evaluated using simulation and information theory concepts. The results demonstrate that dynamic conditions of the environment are better managed when validation is used for data cleaning. They also show that when a fast convergent adaptation process is deployed, data reduction rates are significantly improved. Targeted applications of the developed methodology include machine health monitoring, tele-health, environment and habitat monitoring, intermodal transportation and homeland security.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as ƒ-test is performed during each node's split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The promise of Wireless Sensor Networks (WSNs) is the autonomous collaboration of a collection of sensors to accomplish some specific goals which a single sensor cannot offer. Basically, sensor networking serves a range of applications by providing the raw data as fundamentals for further analyses and actions. The imprecision of the collected data could tremendously mislead the decision-making process of sensor-based applications, resulting in an ineffectiveness or failure of the application objectives. Due to inherent WSN characteristics normally spoiling the raw sensor readings, many research efforts attempt to improve the accuracy of the corrupted or "dirty" sensor data. The dirty data need to be cleaned or corrected. However, the developed data cleaning solutions restrict themselves to the scope of static WSNs where deployed sensors would rarely move during the operation. Nowadays, many emerging applications relying on WSNs need the sensor mobility to enhance the application efficiency and usage flexibility. The location of deployed sensors needs to be dynamic. Also, each sensor would independently function and contribute its resources. Sensors equipped with vehicles for monitoring the traffic condition could be depicted as one of the prospective examples. The sensor mobility causes a transient in network topology and correlation among sensor streams. Based on static relationships among sensors, the existing methods for cleaning sensor data in static WSNs are invalid in such mobile scenarios. Therefore, a solution of data cleaning that considers the sensor movements is actively needed. This dissertation aims to improve the quality of sensor data by considering the consequences of various trajectory relationships of autonomous mobile sensors in the system. First of all, we address the dynamic network topology due to sensor mobility. The concept of virtual sensor is presented and used for spatio-temporal selection of neighboring sensors to help in cleaning sensor data streams. This method is one of the first methods to clean data in mobile sensor environments. We also study the mobility pattern of moving sensors relative to boundaries of sub-areas of interest. We developed a belief-based analysis to determine the reliable sets of neighboring sensors to improve the cleaning performance, especially when node density is relatively low. Finally, we design a novel sketch-based technique to clean data from internal sensors where spatio-temporal relationships among sensors cannot lead to the data correlations among sensor streams.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the advent of peer to peer networks, and more importantly sensor networks, the desire to extract useful information from continuous and unbounded streams of data has become more prominent. For example, in tele-health applications, sensor based data streaming systems are used to continuously and accurately monitor Alzheimer's patients and their surrounding environment. Typically, the requirements of such applications necessitate the cleaning and filtering of continuous, corrupted and incomplete data streams gathered wirelessly in dynamically varying conditions. Yet, existing data stream cleaning and filtering schemes are incapable of capturing the dynamics of the environment while simultaneously suppressing the losses and corruption introduced by uncertain environmental, hardware, and network conditions. Consequently, existing data cleaning and filtering paradigms are being challenged. This dissertation develops novel schemes for cleaning data streams received from a wireless sensor network operating under non-linear and dynamically varying conditions. The study establishes a paradigm for validating spatio-temporal associations among data sources to enhance data cleaning. To simplify the complexity of the validation process, the developed solution maps the requirements of the application on a geometrical space and identifies the potential sensor nodes of interest. Additionally, this dissertation models a wireless sensor network data reduction system by ascertaining that segregating data adaptation and prediction processes will augment the data reduction rates. The schemes presented in this study are evaluated using simulation and information theory concepts. The results demonstrate that dynamic conditions of the environment are better managed when validation is used for data cleaning. They also show that when a fast convergent adaptation process is deployed, data reduction rates are significantly improved. Targeted applications of the developed methodology include machine health monitoring, tele-health, environment and habitat monitoring, intermodal transportation and homeland security.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Online learning systems (OLS) have become center stage for corporations and educational institutions as a competitive tool in the knowledge economy. The satisfaction construct has received extensive coverage in information systems literature as an indicator of effectiveness but has been criticized for lack of validity; yet, the value construct has been largely ignored, although it has a long history in psychology, sociology, and behavioral science. The purpose of this dissertation is to investigate the value and satisfaction constructs in the context of OLS, and their perceived by learners relationship for implied effectiveness of OLS. ^ First, a qualitative phase is employed to gather OLS values from learners' focus groups, followed by a pilot phase to refine a proposed instrument, and a main phase to validate the survey. Responses were received from 75 students in four focus groups, 141 in the pilot, and 207 the main survey. Extensive data cleaning and exploratory factor analysis were done to identify factors of learners' perceived value and satisfaction of OLS. Then, Value-Satisfaction grids and the Learners' Value Index of Satisfaction (LeVIS) were developed as benchmarking tools of OLS. Moreover, Multicriteria Decision Analysis (MCDA) techniques were employed to impute value from satisfaction scores in order to reduce survey response time. ^ The results provided four satisfaction and four value factors with high reliability (Cronbach's α). Moreover, value and satisfaction were found to have low linear and nonlinear correlations, indicating that they are two distinct uncorrelated constructs. This is consistent with the literature. Value-Satisfaction grids and the LeVIS index indicated relatively high effectiveness for technology and support characteristics, relatively low effectiveness for professor's characteristics, while course and learner characteristics indicated average effectiveness. ^ The main contributions of this study include identifying, defining, and articulating the relationship between value and satisfaction constructs as assessment of users' implied IS effectiveness, as well as assessing the accuracy of MCDA procedures to predict value scores, thus reducing by half the survey questionnaire size. ^