854 resultados para data gathering algorithm


Relevância:

30.00% 30.00%

Publicador:

Resumo:

We develop a simplified implementation of the Hoshen-Kopelman cluster counting algorithm adapted for honeycomb networks. In our implementation of the algorithm we assume that all nodes in the network are occupied and links between nodes can be intact or broken. The algorithm counts how many clusters there are in the network and determines which nodes belong to each cluster. The network information is stored into two sets of data. The first one is related to the connectivity of the nodes and the second one to the state of links. The algorithm finds all clusters in only one scan across the network and thereafter cluster relabeling operates on a vector whose size is much smaller than the size of the network. Counting the number of clusters of each size, the algorithm determines the cluster size probability distribution from which the mean cluster size parameter can be estimated. Although our implementation of the Hoshen-Kopelman algorithm works only for networks with a honeycomb (hexagonal) structure, it can be easily changed to be applied for networks with arbitrary connectivity between the nodes (triangular, square, etc.). The proposed adaptation of the Hoshen-Kopelman cluster counting algorithm is applied to studying the thermal degradation of a graphene-like honeycomb membrane by means of Molecular Dynamics simulation with a Langevin thermostat. ACM Computing Classification System (1998): F.2.2, I.5.3.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sequential pattern mining is an important subject in data mining with broad applications in many different areas. However, previous sequential mining algorithms mostly aimed to calculate the number of occurrences (the support) without regard to the degree of importance of different data items. In this paper, we propose to explore the search space of subsequences with normalized weights. We are not only interested in the number of occurrences of the sequences (supports of sequences), but also concerned about importance of sequences (weights). When generating subsequence candidates we use both the support and the weight of the candidates while maintaining the downward closure property of these patterns which allows to accelerate the process of candidate generation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

2010 Mathematics Subject Classification: 68T50,62H30,62J05.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a surrogate-model-based optimization of a doubly-fed induction generator (DFIG) machine winding design for maximizing power yield. Based on site-specific wind profile data and the machine's previous operational performance, the DFIG's stator and rotor windings are optimized to match the maximum efficiency with operating conditions for rewinding purposes. The particle swarm optimization-based surrogate optimization techniques are used in conjunction with the finite element method to optimize the machine design utilizing the limited available information for the site-specific wind profile and generator operating conditions. A response surface method in the surrogate model is developed to formulate the design objectives and constraints. Besides, the machine tests and efficiency calculations follow IEEE standard 112-B. Numerical and experimental results validate the effectiveness of the proposed technologies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Lifelong surveillance is not cost-effective after endovascular aneurysm repair (EVAR), but is required to detect aortic complications which are fatal if untreated (type 1/3 endoleak, sac expansion, device migration). Aneurysm morphology determines the probability of aortic complications and therefore the need for surveillance, but existing analyses have proven incapable of identifying patients at sufficiently low risk to justify abandoning surveillance. This study aimed to improve the prediction of aortic complications, through the application of machine-learning techniques. Patients undergoing EVAR at 2 centres were studied from 2004–2010. Aneurysm morphology had previously been studied to derive the SGVI Score for predicting aortic complications. Bayesian Neural Networks were designed using the same data, to dichotomise patients into groups at low- or high-risk of aortic complications. Network training was performed only on patients treated at centre 1. External validation was performed by assessing network performance independently of network training, on patients treated at centre 2. Discrimination was assessed by Kaplan-Meier analysis to compare aortic complications in predicted low-risk versus predicted high-risk patients. 761 patients aged 75 +/− 7 years underwent EVAR in 2 centres. Mean follow-up was 36+/− 20 months. Neural networks were created incorporating neck angu- lation/length/diameter/volume; AAA diameter/area/volume/length/tortuosity; and common iliac tortuosity/diameter. A 19-feature network predicted aor- tic complications with excellent discrimination and external validation (5-year freedom from aortic complications in predicted low-risk vs predicted high-risk patients: 97.9% vs. 63%; p < 0.0001). A Bayesian Neural-Network algorithm can identify patients in whom it may be safe to abandon surveillance after EVAR. This proposal requires prospective study.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This dissertation introduces an integrated algorithm for a new application dedicated at discriminating between electrodes leading to a seizure onset and those that do not, using interictal subdural EEG data. The significance of this study is in determining among all of these channels, all containing interictal spikes, why some electrodes eventually lead to seizure while others do not. A first finding in the development process of the algorithm is that these interictal spikes had to be asynchronous and should be located in different regions of the brain, before any consequential interpretations of EEG behavioral patterns are possible. A singular merit of the proposed approach is that even when the EEG data is randomly selected (independent of the onset of seizure), we are able to classify those channels that lead to seizure from those that do not. It is also revealed that the region of ictal activity does not necessarily evolve from the tissue located at the channels that present interictal activity, as commonly believed.^ The study is also significant in terms of correlating clinical features of EEG with the patient's source of ictal activity, which is coming from a specific subset of channels that present interictal activity. The contributions of this dissertation emanate from (a) the choice made on the discriminating parameters used in the implementation, (b) the unique feature space that was used to optimize the delineation process of these two type of electrodes, (c) the development of back-propagation neural network that automated the decision making process, and (d) the establishment of mathematical functions that elicited the reasons for this delineation process. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The nation's freeway systems are becoming increasingly congested. A major contribution to traffic congestion on freeways is due to traffic incidents. Traffic incidents are non-recurring events such as accidents or stranded vehicles that cause a temporary roadway capacity reduction, and they can account for as much as 60 percent of all traffic congestion on freeways. One major freeway incident management strategy involves diverting traffic to avoid incident locations by relaying timely information through Intelligent Transportation Systems (ITS) devices such as dynamic message signs or real-time traveler information systems. The decision to divert traffic depends foremost on the expected duration of an incident, which is difficult to predict. In addition, the duration of an incident is affected by many contributing factors. Determining and understanding these factors can help the process of identifying and developing better strategies to reduce incident durations and alleviate traffic congestion. A number of research studies have attempted to develop models to predict incident durations, yet with limited success. ^ This dissertation research attempts to improve on this previous effort by applying data mining techniques to a comprehensive incident database maintained by the District 4 ITS Office of the Florida Department of Transportation (FDOT). Two categories of incident duration prediction models were developed: "offline" models designed for use in the performance evaluation of incident management programs, and "online" models for real-time prediction of incident duration to aid in the decision making of traffic diversion in the event of an ongoing incident. Multiple data mining analysis techniques were applied and evaluated in the research. The multiple linear regression analysis and decision tree based method were applied to develop the offline models, and the rule-based method and a tree algorithm called M5P were used to develop the online models. ^ The results show that the models in general can achieve high prediction accuracy within acceptable time intervals of the actual durations. The research also identifies some new contributing factors that have not been examined in past studies. As part of the research effort, software code was developed to implement the models in the existing software system of District 4 FDOT for actual applications. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent advances in airborne Light Detection and Ranging (LIDAR) technology allow rapid and inexpensive measurements of topography over large areas. Airborne LIDAR systems usually return a 3-dimensional cloud of point measurements from reflective objects scanned by the laser beneath the flight path. This technology is becoming a primary method for extracting information of different kinds of geometrical objects, such as high-resolution digital terrain models (DTMs), buildings and trees, etc. In the past decade, LIDAR gets more and more interest from researchers in the field of remote sensing and GIS. Compared to the traditional data sources, such as aerial photography and satellite images, LIDAR measurements are not influenced by sun shadow and relief displacement. However, voluminous data pose a new challenge for automated extraction the geometrical information from LIDAR measurements because many raster image processing techniques cannot be directly applied to irregularly spaced LIDAR points. ^ In this dissertation, a framework is proposed to filter out information about different kinds of geometrical objects, such as terrain and buildings from LIDAR automatically. They are essential to numerous applications such as flood modeling, landslide prediction and hurricane animation. The framework consists of several intuitive algorithms. Firstly, a progressive morphological filter was developed to detect non-ground LIDAR measurements. By gradually increasing the window size and elevation difference threshold of the filter, the measurements of vehicles, vegetation, and buildings are removed, while ground data are preserved. Then, building measurements are identified from no-ground measurements using a region growing algorithm based on the plane-fitting technique. Raw footprints for segmented building measurements are derived by connecting boundary points and are further simplified and adjusted by several proposed operations to remove noise, which is caused by irregularly spaced LIDAR measurements. To reconstruct 3D building models, the raw 2D topology of each building is first extracted and then further adjusted. Since the adjusting operations for simple building models do not work well on 2D topology, 2D snake algorithm is proposed to adjust 2D topology. The 2D snake algorithm consists of newly defined energy functions for topology adjusting and a linear algorithm to find the minimal energy value of 2D snake problems. Data sets from urbanized areas including large institutional, commercial, and small residential buildings were employed to test the proposed framework. The results demonstrated that the proposed framework achieves a very good performance. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Wireless sensor networks are emerging as effective tools in the gathering and dissemination of data. They can be applied in many fields including health, environmental monitoring, home automation and the military. Like all other computing systems it is necessary to include security features, so that security sensitive data traversing the network is protected. However, traditional security techniques cannot be applied to wireless sensor networks. This is due to the constraints of battery power, memory, and the computational capacities of the miniature wireless sensor nodes. Therefore, to address this need, it becomes necessary to develop new lightweight security protocols. This dissertation focuses on designing a suite of lightweight trust-based security mechanisms and a cooperation enforcement protocol for wireless sensor networks. This dissertation presents a trust-based cluster head election mechanism used to elect new cluster heads. This solution prevents a major security breach against the routing protocol, namely, the election of malicious or compromised cluster heads. This dissertation also describes a location-aware, trust-based, compromise node detection, and isolation mechanism. Both of these mechanisms rely on the ability of a node to monitor its neighbors. Using neighbor monitoring techniques, the nodes are able to determine their neighbors’ reputation and trust level through probabilistic modeling. The mechanisms were designed to mitigate internal attacks within wireless sensor networks. The feasibility of the approach is demonstrated through extensive simulations. The dissertation also addresses non-cooperation problems in multi-user wireless sensor networks. A scalable lightweight enforcement algorithm using evolutionary game theory is also designed. The effectiveness of this cooperation enforcement algorithm is validated through mathematical analysis and simulation. This research has advanced the knowledge of wireless sensor network security and cooperation by developing new techniques based on mathematical models. By doing this, we have enabled others to build on our work towards the creation of highly trusted wireless sensor networks. This would facilitate its full utilization in many fields ranging from civilian to military applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Traffic incidents are non-recurring events that can cause a temporary reduction in roadway capacity. They have been recognized as a major contributor to traffic congestion on our nation’s highway systems. To alleviate their impacts on capacity, automatic incident detection (AID) has been applied as an incident management strategy to reduce the total incident duration. AID relies on an algorithm to identify the occurrence of incidents by analyzing real-time traffic data collected from surveillance detectors. Significant research has been performed to develop AID algorithms for incident detection on freeways; however, similar research on major arterial streets remains largely at the initial stage of development and testing. This dissertation research aims to identify design strategies for the deployment of an Artificial Neural Network (ANN) based AID algorithm for major arterial streets. A section of the US-1 corridor in Miami-Dade County, Florida was coded in the CORSIM microscopic simulation model to generate data for both model calibration and validation. To better capture the relationship between the traffic data and the corresponding incident status, Discrete Wavelet Transform (DWT) and data normalization were applied to the simulated data. Multiple ANN models were then developed for different detector configurations, historical data usage, and the selection of traffic flow parameters. To assess the performance of different design alternatives, the model outputs were compared based on both detection rate (DR) and false alarm rate (FAR). The results show that the best models were able to achieve a high DR of between 90% and 95%, a mean time to detect (MTTD) of 55-85 seconds, and a FAR below 4%. The results also show that a detector configuration including only the mid-block and upstream detectors performs almost as well as one that also includes a downstream detector. In addition, DWT was found to be able to improve model performance, and the use of historical data from previous time cycles improved the detection rate. Speed was found to have the most significant impact on the detection rate, while volume was found to contribute the least. The results from this research provide useful insights on the design of AID for arterial street applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Antenna design is an iterative process in which structures are analyzed and changed to comply with certain performance parameters required. The classic approach starts with analyzing a "known" structure, obtaining the value of its performance parameter and changing this structure until the "target" value is achieved. This process relies on having an initial structure, which follows some known or "intuitive" patterns already familiar to the designer. The purpose of this research was to develop a method of designing UWB antennas. What is new in this proposal is that the design process is reversed: the designer will start with the target performance parameter and obtain a structure as the result of the design process. This method provided a new way to replicate and optimize existing performance parameters. The base of the method was the use of a Genetic Algorithm (GA) adapted to the format of the chromosome that will be evaluated by the Electromagnetic (EM) solver. For the electromagnetic study we used XFDTD™ program, based in the Finite-Difference Time-Domain technique. The programming portion of the method was created under the MatLab environment, which serves as the interface for converting chromosomes, file formats and transferring of data between the XFDTD™ and GA. A high level of customization had to be written into the code to work with the specific files generated by the XFDTD™ program. Two types of cost functions were evaluated; the first one seeking broadband performance within the UWB band, and the second one searching for curve replication of a reference geometry. The performance of the method was evaluated considering the speed provided by the computer resources used. Balance between accuracy, data file size and speed of execution was achieved by defining parameters in the GA code as well as changing the internal parameters of the XFDTD™ projects. The results showed that the GA produced geometries that were analyzed by the XFDTD™ program and changed following the search criteria until reaching the target value of the cost function. Results also showed how the parameters can change the search criteria and influence the running of the code to provide a variety of geometries.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the exponential increasing demands and uses of GIS data visualization system, such as urban planning, environment and climate change monitoring, weather simulation, hydrographic gauge and so forth, the geospatial vector and raster data visualization research, application and technology has become prevalent. However, we observe that current web GIS techniques are merely suitable for static vector and raster data where no dynamic overlaying layers. While it is desirable to enable visual explorations of large-scale dynamic vector and raster geospatial data in a web environment, improving the performance between backend datasets and the vector and raster applications remains a challenging technical issue. This dissertation is to implement these challenging and unimplemented areas: how to provide a large-scale dynamic vector and raster data visualization service with dynamic overlaying layers accessible from various client devices through a standard web browser, and how to make the large-scale dynamic vector and raster data visualization service as rapid as the static one. To accomplish these, a large-scale dynamic vector and raster data visualization geographic information system based on parallel map tiling and a comprehensive performance improvement solution are proposed, designed and implemented. They include: the quadtree-based indexing and parallel map tiling, the Legend String, the vector data visualization with dynamic layers overlaying, the vector data time series visualization, the algorithm of vector data rendering, the algorithm of raster data re-projection, the algorithm for elimination of superfluous level of detail, the algorithm for vector data gridding and re-grouping and the cluster servers side vector and raster data caching.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modern data centers host hundreds of thousands of servers to achieve economies of scale. Such a huge number of servers create challenges for the data center network (DCN) to provide proportionally large bandwidth. In addition, the deployment of virtual machines (VMs) in data centers raises the requirements for efficient resource allocation and find-grained resource sharing. Further, the large number of servers and switches in the data center consume significant amounts of energy. Even though servers become more energy efficient with various energy saving techniques, DCN still accounts for 20% to 50% of the energy consumed by the entire data center. The objective of this dissertation is to enhance DCN performance as well as its energy efficiency by conducting optimizations on both host and network sides. First, as the DCN demands huge bisection bandwidth to interconnect all the servers, we propose a parallel packet switch (PPS) architecture that directly processes variable length packets without segmentation-and-reassembly (SAR). The proposed PPS achieves large bandwidth by combining switching capacities of multiple fabrics, and it further improves the switch throughput by avoiding padding bits in SAR. Second, since certain resource demands of the VM are bursty and demonstrate stochastic nature, to satisfy both deterministic and stochastic demands in VM placement, we propose the Max-Min Multidimensional Stochastic Bin Packing (M3SBP) algorithm. M3SBP calculates an equivalent deterministic value for the stochastic demands, and maximizes the minimum resource utilization ratio of each server. Third, to provide necessary traffic isolation for VMs that share the same physical network adapter, we propose the Flow-level Bandwidth Provisioning (FBP) algorithm. By reducing the flow scheduling problem to multiple stages of packet queuing problems, FBP guarantees the provisioned bandwidth and delay performance for each flow. Finally, while DCNs are typically provisioned with full bisection bandwidth, DCN traffic demonstrates fluctuating patterns, we propose a joint host-network optimization scheme to enhance the energy efficiency of DCNs during off-peak traffic hours. The proposed scheme utilizes a unified representation method that converts the VM placement problem to a routing problem and employs depth-first and best-fit search to find efficient paths for flows.