3 resultados para Wavelet Packet and Support Vector Machine
em DigitalCommons@University of Nebraska - Lincoln
Resumo:
Hundreds of Terabytes of CMS (Compact Muon Solenoid) data are being accumulated for storage day by day at the University of Nebraska-Lincoln, which is one of the eight US CMS Tier-2 sites. Managing this data includes retaining useful CMS data sets and clearing storage space for newly arriving data by deleting less useful data sets. This is an important task that is currently being done manually and it requires a large amount of time. The overall objective of this study was to develop a methodology to help identify the data sets to be deleted when there is a requirement for storage space. CMS data is stored using HDFS (Hadoop Distributed File System). HDFS logs give information regarding file access operations. Hadoop MapReduce was used to feed information in these logs to Support Vector Machines (SVMs), a machine learning algorithm applicable to classification and regression which is used in this Thesis to develop a classifier. Time elapsed in data set classification by this method is dependent on the size of the input HDFS log file since the algorithmic complexities of Hadoop MapReduce algorithms here are O(n). The SVM methodology produces a list of data sets for deletion along with their respective sizes. This methodology was also compared with a heuristic called Retention Cost which was calculated using size of the data set and the time since its last access to help decide how useful a data set is. Accuracies of both were compared by calculating the percentage of data sets predicted for deletion which were accessed at a later instance of time. Our methodology using SVMs proved to be more accurate than using the Retention Cost heuristic. This methodology could be used to solve similar problems involving other large data sets.
Resumo:
Child sexual abuse continues to be a prevalent and complex problem in today’s society as it poses serious and pervasive mental health risks to child victims and their non-offending parents. The main objectives of this study were (a) to elucidate the psychological symptoms and support needs of parents of child sexual abuse victims as they present to group treatment, (b) to examine changes in psychological symptoms and support needs and their relationship with child functioning over the course of a parallel group treatment, and (c) to examine the impact of these factors on completion of group treatment. Participants included 104 sexually abused youth and their non-offending parent presenting to Project SAFE Group Intervention, a 12-session cognitive-behavioral group treatment for sexually abused children and their non-offending parents. This project had a unique advantage of utilizing a variety of demographic, parent-, and child-report measures, allowing for a more comprehensive examination of change in symptomatology and needs over the course of treatment. Several significant findings were noted, including the identification of four clusters of youth at pre-treatment, which were maintained at post-treatment; elevations on the CTQ Sexual Abuse scale; parents of youth sexually abused by a non-family member had significantly higher PSI-Restriction of Role subscale scores; parental expectations of a negative impact on their child were worse for older children; several parent characteristics predicted client treatment retention (e.g., older parents, lower SCL-90-R GSI scores); and an early age of onset of abuse also increased treatment retention. Future directions, recommendations, and limitations were discussed.
Resumo:
The multiple-instance learning (MIL) model has been successful in areas such as drug discovery and content-based image-retrieval. Recently, this model was generalized and a corresponding kernel was introduced to learn generalized MIL concepts with a support vector machine. While this kernel enjoyed empirical success, it has limitations in its representation. We extend this kernel by enriching its representation and empirically evaluate our new kernel on data from content-based image retrieval, biological sequence analysis, and drug discovery. We found that our new kernel generalized noticeably better than the old one in content-based image retrieval and biological sequence analysis and was slightly better or even with the old kernel in the other applications, showing that an SVM using this kernel does not overfit despite its richer representation.