889 resultados para support vector machine


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hundreds of Terabytes of CMS (Compact Muon Solenoid) data are being accumulated for storage day by day at the University of Nebraska-Lincoln, which is one of the eight US CMS Tier-2 sites. Managing this data includes retaining useful CMS data sets and clearing storage space for newly arriving data by deleting less useful data sets. This is an important task that is currently being done manually and it requires a large amount of time. The overall objective of this study was to develop a methodology to help identify the data sets to be deleted when there is a requirement for storage space. CMS data is stored using HDFS (Hadoop Distributed File System). HDFS logs give information regarding file access operations. Hadoop MapReduce was used to feed information in these logs to Support Vector Machines (SVMs), a machine learning algorithm applicable to classification and regression which is used in this Thesis to develop a classifier. Time elapsed in data set classification by this method is dependent on the size of the input HDFS log file since the algorithmic complexities of Hadoop MapReduce algorithms here are O(n). The SVM methodology produces a list of data sets for deletion along with their respective sizes. This methodology was also compared with a heuristic called Retention Cost which was calculated using size of the data set and the time since its last access to help decide how useful a data set is. Accuracies of both were compared by calculating the percentage of data sets predicted for deletion which were accessed at a later instance of time. Our methodology using SVMs proved to be more accurate than using the Retention Cost heuristic. This methodology could be used to solve similar problems involving other large data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Several recent studies in literature have identified brain morphological alterations associated to Borderline Personality Disorder (BPD) patients. These findings are reported by studies based on voxel-based-morphometry analysis of structural MRI data, comparing mean gray-matter concentration between groups of BPD patients and healthy controls. On the other hand, mean differences between groups are not informative about the discriminative value of neuroimaging data to predict the group of individual subjects. In this paper, we go beyond mean differences analyses, and explore to what extent individual BPD patients can be differentiated from controls (25 subjects in each group), using a combination of automated-morphometric tools for regional cortical thickness/volumetric estimation and Support Vector Machine classifier. The approach included a feature selection step in order to identify the regions containing most discriminative information. The accuracy of this classifier was evaluated using the leave-one-subject-out procedure. The brain regions indicated as containing relevant information to discriminate groups were the orbitofrontal, rostral anterior cingulate, posterior cingulate, middle temporal cortices, among others. These areas, which are distinctively involved in emotional and affect regulation of BPD patients, were the most informative regions to achieve both sensitivity and specificity values of 80% in SVM classification. The findings suggest that this new methodology can add clinical and potential diagnostic value to neuroimaging of psychiatric disorders. (C) 2012 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. Results: We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. Conclusion: The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

PURPOSE: To evaluate the sensitivity and specificity of machine learning classifiers (MLCs) for glaucoma diagnosis using Spectral Domain OCT (SD-OCT) and standard automated perimetry (SAP). METHODS: Observational cross-sectional study. Sixty two glaucoma patients and 48 healthy individuals were included. All patients underwent a complete ophthalmologic examination, achromatic standard automated perimetry (SAP) and retinal nerve fiber layer (RNFL) imaging with SD-OCT (Cirrus HD-OCT; Carl Zeiss Meditec Inc., Dublin, California). Receiver operating characteristic (ROC) curves were obtained for all SD-OCT parameters and global indices of SAP. Subsequently, the following MLCs were tested using parameters from the SD-OCT and SAP: Bagging (BAG), Naive-Bayes (NB), Multilayer Perceptron (MLP), Radial Basis Function (RBF), Random Forest (RAN), Ensemble Selection (ENS), Classification Tree (CTREE), Ada Boost M1(ADA),Support Vector Machine Linear (SVML) and Support Vector Machine Gaussian (SVMG). Areas under the receiver operating characteristic curves (aROC) obtained for isolated SAP and OCT parameters were compared with MLCs using OCT+SAP data. RESULTS: Combining OCT and SAP data, MLCs' aROCs varied from 0.777(CTREE) to 0.946 (RAN).The best OCT+SAP aROC obtained with RAN (0.946) was significantly larger the best single OCT parameter (p<0.05), but was not significantly different from the aROC obtained with the best single SAP parameter (p=0.19). CONCLUSION: Machine learning classifiers trained on OCT and SAP data can successfully discriminate between healthy and glaucomatous eyes. The combination of OCT and SAP measurements improved the diagnostic accuracy compared with OCT data alone.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

State of Sao Paulo Research Foundation (FAPESP)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is not a specific test to diagnose Alzheimer`s disease (AD). Its diagnosis should be based upon clinical history, neuropsychological and laboratory tests, neuroimaging and electroencephalography (EEG). Therefore, new approaches are necessary to enable earlier and more accurate diagnosis and to follow treatment results. In this study we used a Machine Learning (ML) technique, named Support Vector Machine (SVM), to search patterns in EEG epochs to differentiate AD patients from controls. As a result, we developed a quantitative EEG (qEEG) processing method for automatic differentiation of patients with AD from normal individuals, as a complement to the diagnosis of probable dementia. We studied EEGs from 19 normal subjects (14 females/5 males, mean age 71.6 years) and 16 probable mild to moderate symptoms AD patients (14 females/2 males, mean age 73.4 years. The results obtained from analysis of EEG epochs were accuracy 79.9% and sensitivity 83.2%. The analysis considering the diagnosis of each individual patient reached 87.0% accuracy and 91.7% sensitivity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: To develop a model to predict the bleeding source and identify the cohort amongst patients with acute gastrointestinal bleeding (GIB) who require urgent intervention, including endoscopy. Patients with acute GIB, an unpredictable event, are most commonly evaluated and managed by non-gastroenterologists. Rapid and consistently reliable risk stratification of patients with acute GIB for urgent endoscopy may potentially improve outcomes amongst such patients by targeting scarce health-care resources to those who need it the most. Design and methods: Using ICD-9 codes for acute GIB, 189 patients with acute GIB and all. available data variables required to develop and test models were identified from a hospital medical records database. Data on 122 patients was utilized for development of the model and on 67 patients utilized to perform comparative analysis of the models. Clinical data such as presenting signs and symptoms, demographic data, presence of co-morbidities, laboratory data and corresponding endoscopic diagnosis and outcomes were collected. Clinical data and endoscopic diagnosis collected for each patient was utilized to retrospectively ascertain optimal management for each patient. Clinical presentations and corresponding treatment was utilized as training examples. Eight mathematical models including artificial neural network (ANN), support vector machine (SVM), k-nearest neighbor, linear discriminant analysis (LDA), shrunken centroid (SC), random forest (RF), logistic regression, and boosting were trained and tested. The performance of these models was compared using standard statistical analysis and ROC curves. Results: Overall the random forest model best predicted the source, need for resuscitation, and disposition with accuracies of approximately 80% or higher (accuracy for endoscopy was greater than 75%). The area under ROC curve for RF was greater than 0.85, indicating excellent performance by the random forest model Conclusion: While most mathematical models are effective as a decision support system for evaluation and management of patients with acute GIB, in our testing, the RF model consistently demonstrated the best performance. Amongst patients presenting with acute GIB, mathematical models may facilitate the identification of the source of GIB, need for intervention and allow optimization of care and healthcare resource allocation; these however require further validation. (c) 2007 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Energy systems worldwide are complex and challenging environments. Multi-agent based simulation platforms are increasing at a high rate, as they show to be a good option to study many issues related to these systems, as well as the involved players at act in this domain. In this scope the authors’ research group has developed a multi-agent system: MASCEM (Multi- Agent System for Competitive Electricity Markets), which simulates the electricity markets environment. MASCEM is integrated with ALBidS (Adaptive Learning Strategic Bidding System) that works as a decision support system for market players. The ALBidS system allows MASCEM market negotiating players to take the best possible advantages from the market context. This paper presents the application of a Support Vector Machines (SVM) based approach to provide decision support to electricity market players. This strategy is tested and validated by being included in ALBidS and then compared with the application of an Artificial Neural Network, originating promising results. The proposed approach is tested and validated using real electricity markets data from MIBEL - Iberian market operator.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Wind speed forecasting has been becoming an important field of research to support the electricity industry mainly due to the increasing use of distributed energy sources, largely based on renewable sources. This type of electricity generation is highly dependent on the weather conditions variability, particularly the variability of the wind speed. Therefore, accurate wind power forecasting models are required to the operation and planning of wind plants and power systems. A Support Vector Machines (SVM) model for short-term wind speed is proposed and its performance is evaluated and compared with several artificial neural network (ANN) based approaches. A case study based on a real database regarding 3 years for predicting wind speed at 5 minutes intervals is presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last two decades, small strain shear modulus became one of the most important geotechnical parameters to characterize soil stiffness. Finite element analysis have shown that in-situ stiffness of soils and rocks is much higher than what was previously thought and that stress-strain behaviour of these materials is non-linear in most cases with small strain levels, especially in the ground around retaining walls, foundations and tunnels, typically in the order of 10−2 to 10−4 of strain. Although the best approach to estimate shear modulus seems to be based in measuring seismic wave velocities, deriving the parameter through correlations with in-situ tests is usually considered very useful for design practice.The use of Neural Networks for modeling systems has been widespread, in particular within areas where the great amount of available data and the complexity of the systems keeps the problem very unfriendly to treat following traditional data analysis methodologies. In this work, the use of Neural Networks and Support Vector Regression is proposed to estimate small strain shear modulus for sedimentary soils from the basic or intermediate parameters derived from Marchetti Dilatometer Test. The results are discussed and compared with some of the most common available methodologies for this evaluation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last two decades, small strain shear modulus became one of the most important geotechnical parameters to characterize soil stiffness. Finite element analysis have shown that in-situ stiffness of soils and rocks is much higher than what was previously thought and that stress-strain behaviour of these materials is non-linear in most cases with small strain levels, especially in the ground around retaining walls, foundations and tunnels, typically in the order of 10−2 to 10−4 of strain. Although the best approach to estimate shear modulus seems to be based in measuring seismic wave velocities, deriving the parameter through correlations with in-situ tests is usually considered very useful for design practice.The use of Neural Networks for modeling systems has been widespread, in particular within areas where the great amount of available data and the complexity of the systems keeps the problem very unfriendly to treat following traditional data analysis methodologies. In this work, the use of Neural Networks and Support Vector Regression is proposed to estimate small strain shear modulus for sedimentary soils from the basic or intermediate parameters derived from Marchetti Dilatometer Test. The results are discussed and compared with some of the most common available methodologies for this evaluation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The occurrence of Barotrauma is identified as a major concern for health professionals, since it can be fatal for patients. In order to support the decision process and to predict the risk of occurring barotrauma Data Mining models were induced. Based on this principle, the present study addresses the Data Mining process aiming to provide hourly probability of a patient has Barotrauma. The process of discovering implicit knowledge in data collected from Intensive Care Units patientswas achieved through the standard process Cross Industry Standard Process for Data Mining. With the goal of making predictions according to the classification approach they several DM techniques were selected: Decision Trees, Naive Bayes and Support Vector Machine. The study was focused on identifying the validity and viability to predict a composite variable. To predict the Barotrauma two classes were created: “risk” and “no risk”. Such target come from combining two variables: Plateau Pressure and PCO2. The best models presented a sensitivity between 96.19% and 100%. In terms of accuracy the values varied between 87.5% and 100%. This study and the achieved results demonstrated the feasibility of predicting the risk of a patient having Barotrauma by presenting the probability associated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Drug delivery is one of the most common clinical routines in hospitals, and is critical to patients' health and recovery. It includes a decision making process in which a medical doctor decides the amount (dose) and frequency (dose interval) on the basis of a set of available patients' feature data and the doctor's clinical experience (a priori adaptation). This process can be computerized in order to make the prescription procedure in a fast, objective, inexpensive, non-invasive and accurate way. This paper proposes a Drug Administration Decision Support System (DADSS) to help clinicians/patients with the initial dose computing. The system is based on a Support Vector Machine (SVM) algorithm for estimation of the potential drug concentration in the blood of a patient, from which a best combination of dose and dose interval is selected at the level of a DSS. The addition of the RANdom SAmple Consensus (RANSAC) technique enhances the prediction accuracy by selecting inliers for SVM modeling. Experiments are performed for the drug imatinib case study which shows more than 40% improvement in the prediction accuracy compared with previous works. An important extension to the patient features' data is also proposed in this paper.