6 resultados para selection methods

em Digital Commons at Florida International University


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Prior research has established that idiosyncratic volatility of the securities prices exhibits a positive trend. This trend and other factors have made the merits of investment diversification and portfolio construction more compelling. ^ A new optimization technique, a greedy algorithm, is proposed to optimize the weights of assets in a portfolio. The main benefits of using this algorithm are to: (a) increase the efficiency of the portfolio optimization process, (b) implement large-scale optimizations, and (c) improve the resulting optimal weights. In addition, the technique utilizes a novel approach in the construction of a time-varying covariance matrix. This involves the application of a modified integrated dynamic conditional correlation GARCH (IDCC - GARCH) model to account for the dynamics of the conditional covariance matrices that are employed. ^ The stochastic aspects of the expected return of the securities are integrated into the technique through Monte Carlo simulations. Instead of representing the expected returns as deterministic values, they are assigned simulated values based on their historical measures. The time-series of the securities are fitted into a probability distribution that matches the time-series characteristics using the Anderson-Darling goodness-of-fit criterion. Simulated and actual data sets are used to further generalize the results. Employing the S&P500 securities as the base, 2000 simulated data sets are created using Monte Carlo simulation. In addition, the Russell 1000 securities are used to generate 50 sample data sets. ^ The results indicate an increase in risk-return performance. Choosing the Value-at-Risk (VaR) as the criterion and the Crystal Ball portfolio optimizer, a commercial product currently available on the market, as the comparison for benchmarking, the new greedy technique clearly outperforms others using a sample of the S&P500 and the Russell 1000 securities. The resulting improvements in performance are consistent among five securities selection methods (maximum, minimum, random, absolute minimum, and absolute maximum) and three covariance structures (unconditional, orthogonal GARCH, and integrated dynamic conditional GARCH). ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Physiological signals, which are controlled by the autonomic nervous system (ANS), could be used to detect the affective state of computer users and therefore find applications in medicine and engineering. The Pupil Diameter (PD) seems to provide a strong indication of the affective state, as found by previous research, but it has not been investigated fully yet. ^ In this study, new approaches based on monitoring and processing the PD signal for off-line and on-line affective assessment ("relaxation" vs. "stress") are proposed. Wavelet denoising and Kalman filtering methods are first used to remove abrupt changes in the raw Pupil Diameter (PD) signal. Then three features (PDmean, PDmax and PDWalsh) are extracted from the preprocessed PD signal for the affective state classification. In order to select more relevant and reliable physiological data for further analysis, two types of data selection methods are applied, which are based on the paired t-test and subject self-evaluation, respectively. In addition, five different kinds of the classifiers are implemented on the selected data, which achieve average accuracies up to 86.43% and 87.20%, respectively. Finally, the receiver operating characteristic (ROC) curve is utilized to investigate the discriminating potential of each individual feature by evaluation of the area under the ROC curve, which reaches values above 0.90. ^ For the on-line affective assessment, a hard threshold is implemented first in order to remove the eye blinks from the PD signal and then a moving average window is utilized to obtain the representative value PDr for every one-second time interval of PD. There are three main steps for the on-line affective assessment algorithm, which are preparation, feature-based decision voting and affective determination. The final results show that the accuracies are 72.30% and 73.55% for the data subsets, which were respectively chosen using two types of data selection methods (paired t-test and subject self-evaluation). ^ In order to further analyze the efficiency of affective recognition through the PD signal, the Galvanic Skin Response (GSR) was also monitored and processed. The highest affective assessment classification rate obtained from GSR processing is only 63.57% (based on the off-line processing algorithm). The overall results confirm that the PD signal should be considered as one of the most powerful physiological signals to involve in future automated real-time affective recognition systems, especially for detecting the "relaxation" vs. "stress" states.^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Prior research has established that idiosyncratic volatility of the securities prices exhibits a positive trend. This trend and other factors have made the merits of investment diversification and portfolio construction more compelling. A new optimization technique, a greedy algorithm, is proposed to optimize the weights of assets in a portfolio. The main benefits of using this algorithm are to: a) increase the efficiency of the portfolio optimization process, b) implement large-scale optimizations, and c) improve the resulting optimal weights. In addition, the technique utilizes a novel approach in the construction of a time-varying covariance matrix. This involves the application of a modified integrated dynamic conditional correlation GARCH (IDCC - GARCH) model to account for the dynamics of the conditional covariance matrices that are employed. The stochastic aspects of the expected return of the securities are integrated into the technique through Monte Carlo simulations. Instead of representing the expected returns as deterministic values, they are assigned simulated values based on their historical measures. The time-series of the securities are fitted into a probability distribution that matches the time-series characteristics using the Anderson-Darling goodness-of-fit criterion. Simulated and actual data sets are used to further generalize the results. Employing the S&P500 securities as the base, 2000 simulated data sets are created using Monte Carlo simulation. In addition, the Russell 1000 securities are used to generate 50 sample data sets. The results indicate an increase in risk-return performance. Choosing the Value-at-Risk (VaR) as the criterion and the Crystal Ball portfolio optimizer, a commercial product currently available on the market, as the comparison for benchmarking, the new greedy technique clearly outperforms others using a sample of the S&P500 and the Russell 1000 securities. The resulting improvements in performance are consistent among five securities selection methods (maximum, minimum, random, absolute minimum, and absolute maximum) and three covariance structures (unconditional, orthogonal GARCH, and integrated dynamic conditional GARCH).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Physiological signals, which are controlled by the autonomic nervous system (ANS), could be used to detect the affective state of computer users and therefore find applications in medicine and engineering. The Pupil Diameter (PD) seems to provide a strong indication of the affective state, as found by previous research, but it has not been investigated fully yet. In this study, new approaches based on monitoring and processing the PD signal for off-line and on-line affective assessment (“relaxation” vs. “stress”) are proposed. Wavelet denoising and Kalman filtering methods are first used to remove abrupt changes in the raw Pupil Diameter (PD) signal. Then three features (PDmean, PDmax and PDWalsh) are extracted from the preprocessed PD signal for the affective state classification. In order to select more relevant and reliable physiological data for further analysis, two types of data selection methods are applied, which are based on the paired t-test and subject self-evaluation, respectively. In addition, five different kinds of the classifiers are implemented on the selected data, which achieve average accuracies up to 86.43% and 87.20%, respectively. Finally, the receiver operating characteristic (ROC) curve is utilized to investigate the discriminating potential of each individual feature by evaluation of the area under the ROC curve, which reaches values above 0.90. For the on-line affective assessment, a hard threshold is implemented first in order to remove the eye blinks from the PD signal and then a moving average window is utilized to obtain the representative value PDr for every one-second time interval of PD. There are three main steps for the on-line affective assessment algorithm, which are preparation, feature-based decision voting and affective determination. The final results show that the accuracies are 72.30% and 73.55% for the data subsets, which were respectively chosen using two types of data selection methods (paired t-test and subject self-evaluation). In order to further analyze the efficiency of affective recognition through the PD signal, the Galvanic Skin Response (GSR) was also monitored and processed. The highest affective assessment classification rate obtained from GSR processing is only 63.57% (based on the off-line processing algorithm). The overall results confirm that the PD signal should be considered as one of the most powerful physiological signals to involve in future automated real-time affective recognition systems, especially for detecting the “relaxation” vs. “stress” states.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Trenchless methods have been considered to be a viable solution for pipeline projects in urban areas. Their applicability in pipeline projects is expected to increase with the rapid advancements in technology and emerging concerns regarding social costs related to trenching methods. Selecting appropriate project delivery system (PDS) is a key to the success of trenchless projects. To ensure success of the project, the selected project delivery should be tailored to trenchless project specific characteristics and owner needs, since the effectiveness of project delivery systems differs based on different project characteristics and owners requirements. Since different trenchless methods have specific characteristics such rate of installation, lengths of installation, and accuracy, the same project delivery systems may not be equally effective for different methods. The intent of this paper is to evaluate the appropriateness of different PDS for different trenchless methods. PDS are examined through a structured decision-making process called Fuzzy Delivery System Selection Model (FDSSM). The process of incorporating the impacts of: (a) the characteristics of trenchless projects and (b) owners’ needs in the FDSSM is performed by collecting data using questionnaires deployed to professionals involved in the trenchless industry in order to determine the importance of delivery systems selection attributes for different trenchless methods, and then analyzing this data. The sensitivity of PDS rankings with respect to trenchless methods is considered in order to evaluate whether similar project delivery systems are equally effective in different trenchless methods. The effectiveness of PDS with respect to attributes is defined as follows: a project delivery system is most effective with respect to an attribute (e.g., ability to control growth in costs ) if there is no project delivery system that is more effective than that PDS. The results of this study may assist trenchless project owners to select the appropriate PDS for the trenchless method selected.