11 resultados para Associative Classifiers
em Digital Commons at Florida International University
Resumo:
Increasing use of the term, Strategic Human Resource Management (SHRM), reflects the recognition of the interdependencies between corporate strategy, organization and human resource management in the functioning of the firm. Dyer and Holder (1988) proposed a comprehensive Human Resource Strategic Typology consisting of three strategic types--inducement, investment and involvement. This research attempted to empirically validate their typology and also test the performance implications of the match between corporate strategy and HR strategy. Hypotheses were tested to determine the relationships between internal consistency in HRM sub-systems, match between corporate strategy and HR strategy, and firm performance. Data were collected by a mail survey of 998 senior HR executives of whom 263 returned the completed questionnaire. Financial information on 909 firms was collected from secondary sources like 10-K reports and CD-Disclosure. Profitability ratios were indexed to industry averages. Confirmatory Factor Analysis using LISREL provided support in favor of the six-factor HR measurement model; the six factors were staffing, training, compensation, appraisal, job design and corporate involvement. Support was also found for the presence of a second-order factor labeled "HR Strategic Orientation" explaining the variations among the six factors. LISREL analysis also supported the congruence hypothesis that HR Strategic Orientation significantly affects firm performance. There was a significant associative relationship between HR Strategy and Corporate Strategy. However, the contingency effects of the match between HR and Corporate strategies were not supported. Several tests were conducted to show that the survey results are not affected by non-response bias nor by mono-method bias. Implications of these findings for both researchers and practitioners are discussed. ^
Resumo:
The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^
Resumo:
This research is to establish new optimization methods for pattern recognition and classification of different white blood cells in actual patient data to enhance the process of diagnosis. Beckman-Coulter Corporation supplied flow cytometry data of numerous patients that are used as training sets to exploit the different physiological characteristics of the different samples provided. The methods of Support Vector Machines (SVM) and Artificial Neural Networks (ANN) were used as promising pattern classification techniques to identify different white blood cell samples and provide information to medical doctors in the form of diagnostic references for the specific disease states, leukemia. The obtained results prove that when a neural network classifier is well configured and trained with cross-validation, it can perform better than support vector classifiers alone for this type of data. Furthermore, a new unsupervised learning algorithm---Density based Adaptive Window Clustering algorithm (DAWC) was designed to process large volumes of data for finding location of high data cluster in real-time. It reduces the computational load to ∼O(N) number of computations, and thus making the algorithm more attractive and faster than current hierarchical algorithms.
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^
Resumo:
The South American electric knifefish, Brachyhypopomus gauderio, uses weakly electric fields to see and communicate in the dark. Only one study to date has investigated natural behavior in this species during the breeding season; this study proposed that B. guarerio has an exploded lek polygyny breeding system. To test this hypothesis, artificial marshes simulating the native vegetation, temperature, and water conductivities of the South American subtropics were created to study seasonal variation in associative behavior of B. gauderio during the breeding and non-breeding seasons. Mark/recapture methods were used to keep track of individual fish and their dispersion inside the experimental designs. The experimental design proved to be extremely successful at eliciting reproduction. Differences were found in seasonal variations of social behaviors between adult and juvenile populations. Although no apparent sex. differences in movement patterns were found during the breeding season; a trend for male-male aversion was found, suggesting male-male avoidance as a possible strategy guiding aspects of social behaviors in this species. Further, movement may be a tactic for mate seeking as the individuals who moved the most during the breeding season obtained the most opposite sex interactions. These findings support the exploded lek polygyny model. Social interactions are subject to complex regulation by social, physiologic and ecological factors; the extent to which these associations are repeatable may provide novel insights on the evolution of sociality as it has been shaped by natural selection.
Resumo:
Human scent, or the volatile organic compounds (VOCs) produced by an individual, has been recognized as a biometric measurement because of the distinct variations in both the presence and abundance of these VOCs between individuals. In forensic science, human scent has been used as a form of associative evidence by linking a suspect to a scene/object through the use of human scent discriminating canines. The scent most often collected and used with these specially trained canines is from the hands because a majority of the evidence collected is likely to have been handled by the suspect. However, the scents from other biological specimens, especially those that are likely to be present at scenes of violent crimes, have yet to be explored. Hair, fingernails and saliva are examples of these types of specimens. ^ In this work, a headspace solid phase microextraction gas chromatography-mass spectrometry (HS-SPME-GC-MS) technique was used for the identification of VOCs from hand odor, hair, fingernails and saliva. Sixty individuals were sampled and the profiles of the extracted VOCs were evaluated to assess whether they could be used for distinguishing individuals. Preliminary analysis of the biological specimens collected from an individual (intra-subject) showed that, though these materials have some VOCs in common, their overall chemical profile is different for each specimen type. Pair-wise comparisons, using Spearman Rank correlations, were made between the chemical profiles obtained from each subject, per a specimen type. Greater than 98.8% of the collected samples were distinguished from the subjects for all of the specimen types, demonstrating that these specimens can be used for distinguishing individuals. ^ Additionally, field trials were performed to determine the utility of these specimens as scent sources for human scent discriminating canines. Three trials were conducted to evaluate hair, fingernails and saliva in comparison to hand odor, which was considered the standard source of human odor. It was revealed that canines perform similarly to these alternative human scent sources as they do to hand odor implying that, though there are differences in the chemical profiles released by these specimens, they can still be used for the discrimination of individuals by trained canines.^
Resumo:
Physiological signals, which are controlled by the autonomic nervous system (ANS), could be used to detect the affective state of computer users and therefore find applications in medicine and engineering. The Pupil Diameter (PD) seems to provide a strong indication of the affective state, as found by previous research, but it has not been investigated fully yet. ^ In this study, new approaches based on monitoring and processing the PD signal for off-line and on-line affective assessment ("relaxation" vs. "stress") are proposed. Wavelet denoising and Kalman filtering methods are first used to remove abrupt changes in the raw Pupil Diameter (PD) signal. Then three features (PDmean, PDmax and PDWalsh) are extracted from the preprocessed PD signal for the affective state classification. In order to select more relevant and reliable physiological data for further analysis, two types of data selection methods are applied, which are based on the paired t-test and subject self-evaluation, respectively. In addition, five different kinds of the classifiers are implemented on the selected data, which achieve average accuracies up to 86.43% and 87.20%, respectively. Finally, the receiver operating characteristic (ROC) curve is utilized to investigate the discriminating potential of each individual feature by evaluation of the area under the ROC curve, which reaches values above 0.90. ^ For the on-line affective assessment, a hard threshold is implemented first in order to remove the eye blinks from the PD signal and then a moving average window is utilized to obtain the representative value PDr for every one-second time interval of PD. There are three main steps for the on-line affective assessment algorithm, which are preparation, feature-based decision voting and affective determination. The final results show that the accuracies are 72.30% and 73.55% for the data subsets, which were respectively chosen using two types of data selection methods (paired t-test and subject self-evaluation). ^ In order to further analyze the efficiency of affective recognition through the PD signal, the Galvanic Skin Response (GSR) was also monitored and processed. The highest affective assessment classification rate obtained from GSR processing is only 63.57% (based on the off-line processing algorithm). The overall results confirm that the PD signal should be considered as one of the most powerful physiological signals to involve in future automated real-time affective recognition systems, especially for detecting the "relaxation" vs. "stress" states.^
Resumo:
This study documented differences between substance using adolescent participants who either completed or dropped out of a brief motivational intervention. Therapeutic alliance, working alliance and patient involvement were used to describe differences in treatment process ratings in a sample of majority Latino males who either (a) completed a adolescent substance abuse intervention called Alcohol Treatment Targeting Adolescents In Need (ATTAIN) or (b) dropped out after the first or second Guided Self-Change therapy session. Fifteen-minute segments were copied from the midpoint of previously recorded audio-tapes of Guided Self-Change therapy sessions. Raters were trained to a criterion level of interrater reliability for both the Working Alliance Inventory-Short and Vanderbilt Psychotherapy Process Scale. Correlations among Working Alliance Inventory- Short and Vanderbilt Psychotherapy Process Scale subscales reflected a general similarity in the assignment of ratings to client-therapist dyads. Findings underscore why these concepts are often used interchangeably in the treatment process literature. The Vanderbilt Psychotherapy Process Scale patient participation subscale demonstrated substantial empirical differentiation from overall therapeutic alliance. Discriminant function analysis demonstrated the Working Alliance Inventory-Short goal subscale and the Vanderbilt Psychotherapy Process Scale patient participation and therapist warmth and friendliness subscales as successful classifiers of groups of mostly Latino youth based on completion status. Follow-up logistic regression analyses confirmed major findings and successfully predicted group membership. Treatment process constructs can be used as clinical tools to identify participants who may be susceptible to dropping out of treatment services. Further investigation of treatment process may enhance understanding of the influence of alliance between clients and Guided Self-Change therapists. Investigating the role of treatment process as a critical component of brief motivational interventions for substance-using adolescents will inform both practitioners and researchers regarding the effectiveness of community-based substance abuse interventions for adolescents.
Resumo:
Human scent, or the volatile organic compounds (VOCs) produced by an individual, has been recognized as a biometric measurement because of the distinct variations in both the presence and abundance of these VOCs between individuals. In forensic science, human scent has been used as a form of associative evidence by linking a suspect to a scene/object through the use of human scent discriminating canines. The scent most often collected and used with these specially trained canines is from the hands because a majority of the evidence collected is likely to have been handled by the suspect. However, the scents from other biological specimens, especially those that are likely to be present at scenes of violent crimes, have yet to be explored. Hair, fingernails and saliva are examples of these types of specimens. In this work, a headspace solid phase microextraction gas chromatography-mass spectrometry (HS-SPME-GC-MS) technique was used for the identification of VOCs from hand odor, hair, fingernails and saliva. Sixty individuals were sampled and the profiles of the extracted VOCs were evaluated to assess whether they could be used for distinguishing individuals. Preliminary analysis of the biological specimens collected from an individual (intra-subject) showed that, though these materials have some VOCs in common, their overall chemical profile is different for each specimen type. Pair-wise comparisons, using Spearman Rank correlations, were made between the chemical profiles obtained from each subject, per a specimen type. Greater than 98.8% of the collected samples were distinguished from the subjects for all of the specimen types, demonstrating that these specimens can be used for distinguishing individuals. Additionally, field trials were performed to determine the utility of these specimens as scent sources for human scent discriminating canines. Three trials were conducted to evaluate hair, fingernails and saliva in comparison to hand odor, which was considered the standard source of human odor. It was revealed that canines perform similarly to these alternative human scent sources as they do to hand odor implying that, though there are differences in the chemical profiles released by these specimens, they can still be used for the discrimination of individuals by trained canines.
Resumo:
Physiological signals, which are controlled by the autonomic nervous system (ANS), could be used to detect the affective state of computer users and therefore find applications in medicine and engineering. The Pupil Diameter (PD) seems to provide a strong indication of the affective state, as found by previous research, but it has not been investigated fully yet. In this study, new approaches based on monitoring and processing the PD signal for off-line and on-line affective assessment (“relaxation” vs. “stress”) are proposed. Wavelet denoising and Kalman filtering methods are first used to remove abrupt changes in the raw Pupil Diameter (PD) signal. Then three features (PDmean, PDmax and PDWalsh) are extracted from the preprocessed PD signal for the affective state classification. In order to select more relevant and reliable physiological data for further analysis, two types of data selection methods are applied, which are based on the paired t-test and subject self-evaluation, respectively. In addition, five different kinds of the classifiers are implemented on the selected data, which achieve average accuracies up to 86.43% and 87.20%, respectively. Finally, the receiver operating characteristic (ROC) curve is utilized to investigate the discriminating potential of each individual feature by evaluation of the area under the ROC curve, which reaches values above 0.90. For the on-line affective assessment, a hard threshold is implemented first in order to remove the eye blinks from the PD signal and then a moving average window is utilized to obtain the representative value PDr for every one-second time interval of PD. There are three main steps for the on-line affective assessment algorithm, which are preparation, feature-based decision voting and affective determination. The final results show that the accuracies are 72.30% and 73.55% for the data subsets, which were respectively chosen using two types of data selection methods (paired t-test and subject self-evaluation). In order to further analyze the efficiency of affective recognition through the PD signal, the Galvanic Skin Response (GSR) was also monitored and processed. The highest affective assessment classification rate obtained from GSR processing is only 63.57% (based on the off-line processing algorithm). The overall results confirm that the PD signal should be considered as one of the most powerful physiological signals to involve in future automated real-time affective recognition systems, especially for detecting the “relaxation” vs. “stress” states.
Resumo:
Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.