798 resultados para Data-Intensive Science
Resumo:
Feature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be cornputationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional. datasets. In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised, semi-supervised, and unsupervised learning, being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 10(5) features, show the time efficiency of our methods, with lower generalization error than state-of-the-art techniques, while being dramatically simpler and faster.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Física
Resumo:
Cloud data centers have been progressively adopted in different scenarios, as reflected in the execution of heterogeneous applications with diverse workloads and diverse quality of service (QoS) requirements. Virtual machine (VM) technology eases resource management in physical servers and helps cloud providers achieve goals such as optimization of energy consumption. However, the performance of an application running inside a VM is not guaranteed due to the interference among co-hosted workloads sharing the same physical resources. Moreover, the different types of co-hosted applications with diverse QoS requirements as well as the dynamic behavior of the cloud makes efficient provisioning of resources even more difficult and a challenging problem in cloud data centers. In this paper, we address the problem of resource allocation within a data center that runs different types of application workloads, particularly CPU- and network-intensive applications. To address these challenges, we propose an interference- and power-aware management mechanism that combines a performance deviation estimator and a scheduling algorithm to guide the resource allocation in virtualized environments. We conduct simulations by injecting synthetic workloads whose characteristics follow the last version of the Google Cloud tracelogs. The results indicate that our performance-enforcing strategy is able to fulfill contracted SLAs of real-world environments while reducing energy costs by as much as 21%.
Resumo:
Barotrauma is identified as one of the leading diseases in Ventilated Patients. This type of problem is most common in the Intensive Care Units. In order to prevent this problem the use of Data Mining (DM) can be useful for predicting their occurrence. The main goal is to predict the occurence of Barotrauma in order to support the health professionals taking necessary precautions. In a first step intensivists identified the Plateau Pressure values as a possible cause of Barotrauma. Through this study DM models (classification) where induced for predicting the Plateau Pressure class (>=30 cm
Resumo:
Lecture Notes in Computer Science, 9273
Resumo:
The occurrence of Barotrauma is identified as a major concern for health professionals, since it can be fatal for patients. In order to support the decision process and to predict the risk of occurring barotrauma Data Mining models were induced. Based on this principle, the present study addresses the Data Mining process aiming to provide hourly probability of a patient has Barotrauma. The process of discovering implicit knowledge in data collected from Intensive Care Units patientswas achieved through the standard process Cross Industry Standard Process for Data Mining. With the goal of making predictions according to the classification approach they several DM techniques were selected: Decision Trees, Naive Bayes and Support Vector Machine. The study was focused on identifying the validity and viability to predict a composite variable. To predict the Barotrauma two classes were created: “risk” and “no risk”. Such target come from combining two variables: Plateau Pressure and PCO2. The best models presented a sensitivity between 96.19% and 100%. In terms of accuracy the values varied between 87.5% and 100%. This study and the achieved results demonstrated the feasibility of predicting the risk of a patient having Barotrauma by presenting the probability associated.
Resumo:
Patient blood pressure is an important vital signal to the physicians take a decision and to better understand the patient condition. In Intensive Care Units is possible monitoring the blood pressure due the fact of the patient being in continuous monitoring through bedside monitors and the use of sensors. The intensivist only have access to vital signs values when they look to the monitor or consult the values hourly collected. Most important is the sequence of the values collected, i.e., a set of highest or lowest values can signify a critical event and bring future complications to a patient as is Hypotension or Hypertension. This complications can leverage a set of dangerous diseases and side-effects. The main goal of this work is to predict the probability of a patient has a blood pressure critical event in the next hours by combining a set of patient data collected in real-time and using Data Mining classification techniques. As output the models indicate the probability (%) of a patient has a Blood Pressure Critical Event in the next hour. The achieved results showed to be very promising, presenting sensitivity around of 95%.
Resumo:
Objective. To study the acquisition and cross-transmission of Staphylococcus aureus in different intensive care units (ICUs). Methods. We performed a multicenter cohort study. Six ICUs in 6 countries participated. During a 3-month period at each ICU, all patients had nasal and perineal swab specimens obtained at ICU admission and during their stay. All S. aureus isolates that were collected were genotyped by spa typing and multilocus variable-number tandem-repeat analysis typing for cross-transmission analysis. A total of 629 patients were admitted to ICUs, and 224 of these patients were found to be colonized with S. aureus at least once during ICU stay (22% were found to be colonized with methicillin-resistant S. aureus [MRSA]). A total of 316 patients who had test results negative for S. aureus at ICU admission and had at least 1 follow-up swab sample obtained for culture were eligible for acquisition analysis. Results. A total of 45 patients acquired S. aureus during ICU stay (31 acquired methicillin-susceptible S. aureus [MSSA], and 14 acquired MRSA). Several factors that were believed to affect the rate of acquisition of S. aureus were analyzed in univariate and multivariate analyses, including the amount of hand disinfectant used, colonization pressure, number of beds per nurse, antibiotic use, length of stay, and ICU setting (private room versus open ICU treatment). Greater colonization pressure and a greater number of beds per nurse correlated with a higher rate of acquisition for both MSSA and MRSA. The type of ICU setting was related to MRSA acquisition only, and the amount of hand disinfectant used was related to MSSA acquisition only. In 18 (40%) of the cases of S. aureus acquisition, cross-transmission from another patient was possible. Conclusions. Colonization pressure, the number of beds per nurse, and the treatment of all patients in private rooms correlated with the number of S. aureus acquisitions on an ICU. The amount of hand disinfectant used was correlated with the number of cases of MSSA acquisition but not with the number of cases of MRSA acquisition. The number of cases of patient-to-patient cross-transmission was comparable for MSSA and MRSA.
Resumo:
The aim of this talk is to convince the reader that there are a lot of interesting statisticalproblems in presentday life science data analysis which seem ultimately connected withcompositional statistics.Key words: SAGE, cDNA microarrays, (1D-)NMR, virus quasispecies
Resumo:
Introduction: Interprofessional collaborative practices are increasingly recognized as an effective way to deal with complex health problems. However, health sciences students continue to be trained in specialized programs and have little occasion for learning in interdisciplinary contexts. Program Development: The project's purpose was to develop content and an educational design for new prelicensure interfaculty courses on interprofessional collaboration in patient and family-centered care which embedded interprofessional education principles where participants learn with, from and about each other. Implementation: Intensive training was part of a 45-hour program, offered each semester, which was divided into three 15-hour courses given on weekends, to enhance accessibility. Evaluation: A total of 215 students completed questionnaires following the courses, to assess their satisfaction with the educational content. Pre/post measures assessed perception of skills acquisition and perceived benefits of interprofessional collaboration training. Results showed a significant increase from the students' point of view in the knowledge and benefits to be gained from interprofessional collaboration training.
Resumo:
One major methodological problem in analysis of sequence data is the determination of costs from which distances between sequences are derived. Although this problem is currently not optimally dealt with in the social sciences, it has some similarity with problems that have been solved in bioinformatics for three decades. In this article, the authors propose an optimization of substitution and deletion/insertion costs based on computational methods. The authors provide an empirical way of determining costs for cases, frequent in the social sciences, in which theory does not clearly promote one cost scheme over another. Using three distinct data sets, the authors tested the distances and cluster solutions produced by the new cost scheme in comparison with solutions based on cost schemes associated with other research strategies. The proposed method performs well compared with other cost-setting strategies, while it alleviates the justification problem of cost schemes.
Resumo:
This paper presents a process of mining research & development abstract databases to profile current status and to project potential developments for target technologies, The process is called "technology opportunities analysis." This article steps through the process using a sample data set of abstracts from the INSPEC database on the topic o "knowledge discovery and data mining." The paper offers a set of specific indicators suitable for mining such databases to understand innovation prospects. In illustrating the uses of such indicators, it offers some insights into the status of knowledge discovery research*.