9 resultados para in-domain data requirement

em Digital Commons at Florida International University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation introduces an integrated algorithm for a new application dedicated at discriminating between electrodes leading to a seizure onset and those that do not, using interictal subdural EEG data. The significance of this study is in determining among all of these channels, all containing interictal spikes, why some electrodes eventually lead to seizure while others do not. A first finding in the development process of the algorithm is that these interictal spikes had to be asynchronous and should be located in different regions of the brain, before any consequential interpretations of EEG behavioral patterns are possible. A singular merit of the proposed approach is that even when the EEG data is randomly selected (independent of the onset of seizure), we are able to classify those channels that lead to seizure from those that do not. It is also revealed that the region of ictal activity does not necessarily evolve from the tissue located at the channels that present interictal activity, as commonly believed.^ The study is also significant in terms of correlating clinical features of EEG with the patient's source of ictal activity, which is coming from a specific subset of channels that present interictal activity. The contributions of this dissertation emanate from (a) the choice made on the discriminating parameters used in the implementation, (b) the unique feature space that was used to optimize the delineation process of these two type of electrodes, (c) the development of back-propagation neural network that automated the decision making process, and (d) the establishment of mathematical functions that elicited the reasons for this delineation process. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The rapid growth of virtualized data centers and cloud hosting services is making the management of physical resources such as CPU, memory, and I/O bandwidth in data center servers increasingly important. Server management now involves dealing with multiple dissimilar applications with varying Service-Level-Agreements (SLAs) and multiple resource dimensions. The multiplicity and diversity of resources and applications are rendering administrative tasks more complex and challenging. This thesis aimed to develop a framework and techniques that would help substantially reduce data center management complexity.^ We specifically addressed two crucial data center operations. First, we precisely estimated capacity requirements of client virtual machines (VMs) while renting server space in cloud environment. Second, we proposed a systematic process to efficiently allocate physical resources to hosted VMs in a data center. To realize these dual objectives, accurately capturing the effects of resource allocations on application performance is vital. The benefits of accurate application performance modeling are multifold. Cloud users can size their VMs appropriately and pay only for the resources that they need; service providers can also offer a new charging model based on the VMs performance instead of their configured sizes. As a result, clients will pay exactly for the performance they are actually experiencing; on the other hand, administrators will be able to maximize their total revenue by utilizing application performance models and SLAs. ^ This thesis made the following contributions. First, we identified resource control parameters crucial for distributing physical resources and characterizing contention for virtualized applications in a shared hosting environment. Second, we explored several modeling techniques and confirmed the suitability of two machine learning tools, Artificial Neural Network and Support Vector Machine, to accurately model the performance of virtualized applications. Moreover, we suggested and evaluated modeling optimizations necessary to improve prediction accuracy when using these modeling tools. Third, we presented an approach to optimal VM sizing by employing the performance models we created. Finally, we proposed a revenue-driven resource allocation algorithm which maximizes the SLA-generated revenue for a data center.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conceptual database design is an unusually difficult and error-prone task for novice designers. This study examined how two training approaches---rule-based and pattern-based---might improve performance on database design tasks. A rule-based approach prescribes a sequence of rules for modeling conceptual constructs, and the action to be taken at various stages while developing a conceptual model. A pattern-based approach presents data modeling structures that occur frequently in practice, and prescribes guidelines on how to recognize and use these structures. This study describes the conceptual framework, experimental design, and results of a laboratory experiment that employed novice designers to compare the effectiveness of the two training approaches (between-subjects) at three levels of task complexity (within subjects). Results indicate an interaction effect between treatment and task complexity. The rule-based approach was significantly better in the low-complexity and the high-complexity cases; there was no statistical difference in the medium-complexity case. Designer performance fell significantly as complexity increased. Overall, though the rule-based approach was not significantly superior to the pattern-based approach in all instances, it out-performed the pattern-based approach at two out of three complexity levels. The primary contributions of the study are (1) the operationalization of the complexity construct to a degree not addressed in previous studies; (2) the development of a pattern-based instructional approach to database design; and (3) the finding that the effectiveness of a particular training approach may depend on the complexity of the task.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The rapid growth of virtualized data centers and cloud hosting services is making the management of physical resources such as CPU, memory, and I/O bandwidth in data center servers increasingly important. Server management now involves dealing with multiple dissimilar applications with varying Service-Level-Agreements (SLAs) and multiple resource dimensions. The multiplicity and diversity of resources and applications are rendering administrative tasks more complex and challenging. This thesis aimed to develop a framework and techniques that would help substantially reduce data center management complexity. We specifically addressed two crucial data center operations. First, we precisely estimated capacity requirements of client virtual machines (VMs) while renting server space in cloud environment. Second, we proposed a systematic process to efficiently allocate physical resources to hosted VMs in a data center. To realize these dual objectives, accurately capturing the effects of resource allocations on application performance is vital. The benefits of accurate application performance modeling are multifold. Cloud users can size their VMs appropriately and pay only for the resources that they need; service providers can also offer a new charging model based on the VMs performance instead of their configured sizes. As a result, clients will pay exactly for the performance they are actually experiencing; on the other hand, administrators will be able to maximize their total revenue by utilizing application performance models and SLAs. This thesis made the following contributions. First, we identified resource control parameters crucial for distributing physical resources and characterizing contention for virtualized applications in a shared hosting environment. Second, we explored several modeling techniques and confirmed the suitability of two machine learning tools, Artificial Neural Network and Support Vector Machine, to accurately model the performance of virtualized applications. Moreover, we suggested and evaluated modeling optimizations necessary to improve prediction accuracy when using these modeling tools. Third, we presented an approach to optimal VM sizing by employing the performance models we created. Finally, we proposed a revenue-driven resource allocation algorithm which maximizes the SLA-generated revenue for a data center.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation presents a unique research opportunity by using recordings which provide electrocardiogram (ECG) plus a reference breathing signal (RBS). ECG derived breathing (EDR) is measured and correlated against RBS. Standard deviations of multiresolution wavelet analysis coefficients (SDMW) are obtained from heart rate and classified using RBS. Prior works by others used select patients for sleep apnea scoring with EDR but no RBS. Another prior work classified select heart disease patients with SDMW but no RBS. This study used randomly chosen sleep disorder patient recordings; central and obstructive apneas, with and without heart disease.^ Implementation required creating an application because existing systems were limited in power and scope. A review survey was created to choose a development environment. The survey is presented as a learning tool and teaching resource. Development objectives were rapid development using limited resources (manpower and money). Open Source resources were used exclusively for implementation. ^ Results show: (1) Three groups of patients exist in the study. Grouping RBS correlations shows a response with either ECG interval or amplitude variation. A third group exists where neither ECG intervals nor amplitude variation correlate with breathing. (2) Previous work done by other groups analyzed SDMW. Similar results were found in this study but some subjects had higher SDMW, attributed to a large number of apneas, arousals and/or disconnects. SDMW does not need RBS to show apneic conditions exist within ECG recordings. (3) Results in this study support the assertion that autonomic nervous system variation was measured with SDMW. Measurements using RBS are not corrupted due to breathing even though respiration overlaps the same frequency band.^ Overall, this work becomes an Open Source resource which can be reused, modified and/or expanded. It might fast track additional research. In the future the system could also be used for public domain data. Prerecorded data exist in similar formats in public databases which could provide additional research opportunities. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modern data centers host hundreds of thousands of servers to achieve economies of scale. Such a huge number of servers create challenges for the data center network (DCN) to provide proportionally large bandwidth. In addition, the deployment of virtual machines (VMs) in data centers raises the requirements for efficient resource allocation and find-grained resource sharing. Further, the large number of servers and switches in the data center consume significant amounts of energy. Even though servers become more energy efficient with various energy saving techniques, DCN still accounts for 20% to 50% of the energy consumed by the entire data center. The objective of this dissertation is to enhance DCN performance as well as its energy efficiency by conducting optimizations on both host and network sides. First, as the DCN demands huge bisection bandwidth to interconnect all the servers, we propose a parallel packet switch (PPS) architecture that directly processes variable length packets without segmentation-and-reassembly (SAR). The proposed PPS achieves large bandwidth by combining switching capacities of multiple fabrics, and it further improves the switch throughput by avoiding padding bits in SAR. Second, since certain resource demands of the VM are bursty and demonstrate stochastic nature, to satisfy both deterministic and stochastic demands in VM placement, we propose the Max-Min Multidimensional Stochastic Bin Packing (M3SBP) algorithm. M3SBP calculates an equivalent deterministic value for the stochastic demands, and maximizes the minimum resource utilization ratio of each server. Third, to provide necessary traffic isolation for VMs that share the same physical network adapter, we propose the Flow-level Bandwidth Provisioning (FBP) algorithm. By reducing the flow scheduling problem to multiple stages of packet queuing problems, FBP guarantees the provisioned bandwidth and delay performance for each flow. Finally, while DCNs are typically provisioned with full bisection bandwidth, DCN traffic demonstrates fluctuating patterns, we propose a joint host-network optimization scheme to enhance the energy efficiency of DCNs during off-peak traffic hours. The proposed scheme utilizes a unified representation method that converts the VM placement problem to a routing problem and employs depth-first and best-fit search to find efficient paths for flows.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently, energy efficiency or green IT has become a hot issue for many IT infrastructures as they attempt to utilize energy-efficient strategies in their enterprise IT systems in order to minimize operational costs. Networking devices are shared resources connecting important IT infrastructures, especially in a data center network they are always operated 24/7 which consume a huge amount of energy, and it has been obviously shown that this energy consumption is largely independent of the traffic through the devices. As a result, power consumption in networking devices is becoming more and more a critical problem, which is of interest for both research community and general public. Multicast benefits group communications in saving link bandwidth and improving application throughput, both of which are important for green data center. In this paper, we study the deployment strategy of multicast switches in hybrid mode in energy-aware data center network: a case of famous fat-tree topology. The objective is to find the best location to deploy multicast switch not only to achieve optimal bandwidth utilization but also to minimize power consumption. We show that it is possible to easily achieve nearly 50% of energy consumption after applying our proposed algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modern data centers host hundreds of thousands of servers to achieve economies of scale. Such a huge number of servers create challenges for the data center network (DCN) to provide proportionally large bandwidth. In addition, the deployment of virtual machines (VMs) in data centers raises the requirements for efficient resource allocation and find-grained resource sharing. Further, the large number of servers and switches in the data center consume significant amounts of energy. Even though servers become more energy efficient with various energy saving techniques, DCN still accounts for 20% to 50% of the energy consumed by the entire data center. The objective of this dissertation is to enhance DCN performance as well as its energy efficiency by conducting optimizations on both host and network sides. First, as the DCN demands huge bisection bandwidth to interconnect all the servers, we propose a parallel packet switch (PPS) architecture that directly processes variable length packets without segmentation-and-reassembly (SAR). The proposed PPS achieves large bandwidth by combining switching capacities of multiple fabrics, and it further improves the switch throughput by avoiding padding bits in SAR. Second, since certain resource demands of the VM are bursty and demonstrate stochastic nature, to satisfy both deterministic and stochastic demands in VM placement, we propose the Max-Min Multidimensional Stochastic Bin Packing (M3SBP) algorithm. M3SBP calculates an equivalent deterministic value for the stochastic demands, and maximizes the minimum resource utilization ratio of each server. Third, to provide necessary traffic isolation for VMs that share the same physical network adapter, we propose the Flow-level Bandwidth Provisioning (FBP) algorithm. By reducing the flow scheduling problem to multiple stages of packet queuing problems, FBP guarantees the provisioned bandwidth and delay performance for each flow. Finally, while DCNs are typically provisioned with full bisection bandwidth, DCN traffic demonstrates fluctuating patterns, we propose a joint host-network optimization scheme to enhance the energy efficiency of DCNs during off-peak traffic hours. The proposed scheme utilizes a unified representation method that converts the VM placement problem to a routing problem and employs depth-first and best-fit search to find efficient paths for flows.