977 resultados para gene selection


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although many feature selection methods for classification have been developed, there is a need to identify genes in high-dimensional data with censored survival outcomes. Traditional methods for gene selection in classification problems have several drawbacks. First, the majority of the gene selection approaches for classification are single-gene based. Second, many of the gene selection procedures are not embedded within the algorithm itself. The technique of random forests has been found to perform well in high-dimensional data settings with survival outcomes. It also has an embedded feature to identify variables of importance. Therefore, it is an ideal candidate for gene selection in high-dimensional data with survival outcomes. In this paper, we develop a novel method based on the random forests to identify a set of prognostic genes. We compare our method with several machine learning methods and various node split criteria using several real data sets. Our method performed well in both simulations and real data analysis.Additionally, we have shown the advantages of our approach over single-gene-based approaches. Our method incorporates multivariate correlations in microarray data for survival outcomes. The described method allows us to better utilize the information available from microarray data with survival outcomes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering analysis of data from DNA microarray hybridization studies is an essential task for identifying biologically relevant groups of genes. Attribute cluster algorithm (ACA) has provided an attractive way to group and select meaningful genes. However, ACA needs much prior knowledge about the genes to set the number of clusters. In practical applications, if the number of clusters is misspecified, the performance of the ACA will deteriorate rapidly. In fact, it is a very demanding to do that because of our little knowledge. We propose the Cooperative Competition Cluster Algorithm (CCCA) in this paper. In the algorithm, we assume that both cooperation and competition exist simultaneously between clusters in the process of clustering. By using this principle of Cooperative Competition, the number of clusters can be found in the process of clustering. Experimental results on a synthetic and gene expression data are demonstrated. The results show that CCCA can choose the number of clusters automatically and get excellent performance with respect to other competing methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper investigates the gene selection problem for microarray data with small samples and variant correlation. Most existing algorithms usually require expensive computational effort, especially under thousands of gene conditions. The main objective of this paper is to effectively select the most informative genes from microarray data, while making the computational expenses affordable. This is achieved by proposing a novel forward gene selection algorithm (FGSA). To overcome the small samples' problem, the augmented data technique is firstly employed to produce an augmented data set. Taking inspiration from other gene selection methods, the L2-norm penalty is then introduced into the recently proposed fast regression algorithm to achieve the group selection ability. Finally, by defining a proper regression context, the proposed method can be fast implemented in the software, which significantly reduces computational burden. Both computational complexity analysis and simulation results confirm the effectiveness of the proposed algorithm in comparison with other approaches

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Real-time quantitative PCR (qPCR) is a highly sensitive and specific method which is used extensively for determining gene expression profiles in a variety of cell and tissue types. In order to obtain accurate and reliable gene expression quantification, qPCR data are generally normalised against so-called reference or housekeeping genes. Ideally, reference genes should have abundant and stable RNA transcriptomes under the experimental conditions employed. However, reference genes are often selected rather arbitrarily and indeed some have been shown to have variable expression in a variety of in vitro experimental conditions.
Objective: The objective of the current study was to investigate reference gene expression in human periodontal ligament (PDL) cells in response to treatment with lipopolysaccharide (LPS).
Method: Primary human PDL cells were grown in Dulbecco’s Modified Eagle Medium with L-glutamine supplemented with 10% fetal bovine serum, 100UI/ml penicillin and 100µg/ml streptomycin. RNA was isolated using the RNeasy Mini Kit (Qiagen) and reverse transcribed using the QuantiTect Reverse Transcription Kit (Qiagen). The expression of a total of 19 reference genes was studied in the presence and absence of LPS treatment using the Roche Reference Gene Panel. Data were analysed using NormFinder and Bestkeeper validation programs.
Results: Treatment of human PDL cells with LPS resulted in changes in expression of several commonly used reference genes, including GAPDH. On the other hand the reference genes β-actin, G6PDH and 18S were identified as stable genes following LPS treatment.
Conclusion: Many of the reference genes studied were robust to LPS treatment (up to 100 ng/ml). However several commonly employed reference genes, including GAPDH varied with LPS treatment, suggesting they would not be ideal candidates for normalisation in qPCR gene expression studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Feature selection techniques are critical to the analysis of high dimensional datasets. This is especially true in gene selection from microarray data which are commonly with extremely high feature-to-sample ratio. In addition to the essential objectives such as to reduce data noise, to reduce data redundancy, to improve sample classification accuracy, and to improve model generalization property, feature selection also helps biologists to focus on the selected genes to further validate their biological hypotheses.
Results: In this paper we describe an improved hybrid system for gene selection. It is based on a recently proposed genetic ensemble (GE) system. To enhance the generalization property of the selected genes or gene subsets and to overcome the overfitting problem of the GE system, we devised a mapping strategy to fuse the goodness information of each gene provided by multiple filtering algorithms. This information is then used for initialization and mutation operation of the genetic ensemble system.
Conclusion: We used four benchmark microarray datasets (including both binary-class and multi-class classification problems) for concept proving and model evaluation. The experimental results indicate that the proposed multi-filter enhanced genetic ensemble (MF-GE) system is able to improve sample classification accuracy, generate more compact gene subset, and converge to the selection results more quickly. The MF-GE system is very flexible as various combinations of multiple filters and classifiers can be incorporated based on the data characteristics and the user preferences.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces a novel method for gene selection based on a modification of analytic hierarchy process (AHP). The modified AHP (MAHP) is able to deal with quantitative factors that are statistics of five individual gene ranking methods: two-sample t-test, entropy test, receiver operating characteristic curve, Wilcoxon test, and signal to noise ratio. The most prominent discriminant genes serve as inputs to a range of classifiers including linear discriminant analysis, k-nearest neighbors, probabilistic neural network, support vector machine, and multilayer perceptron. Gene subsets selected by MAHP are compared with those of four competing approaches: information gain, symmetrical uncertainty, Bhattacharyya distance and ReliefF. Four benchmark microarray datasets: diffuse large B-cell lymphoma, leukemia cancer, prostate and colon are utilized for experiments. As the number of samples in microarray data datasets are limited, the leave one out cross validation strategy is applied rather than the traditional cross validation. Experimental results demonstrate the significant dominance of the proposed MAHP against the competing methods in terms of both accuracy and stability. With a benefit of inexpensive computational cost, MAHP is useful for cancer diagnosis using DNA gene expression profiles in the real clinical practice.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The dataset contains raw data (quantification cycle) for a study which determined the most suitable hepatic reference genes for normalisation of qPCR data orginating from juvenile Atlantic salmon (14 days) exposed to 14 and 22 degrees C. These results will be useful for anyone wanting to study the effects of climate change/elevated temperature on reproductive physiology of fish (and perhaphs other vertebrates).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The dataset contains raw data (quantification cycle) for a study which determined the most suitable hepatic reference genes for normalisation of qPCR data orginating from adult (entire reproductive season) Atlantic salmon (14 days) exposed to 14 and 22 degrees C. These results will be useful for anyone wanting to study the effects of climate change/elevated temperature on reproductive physiology of fish (and perhaphs other vertebrates). In addition, a target gene (vitellogenin) has normalised using an inappropriate and an 'ideal' reference gene to demonstrate the consequences of using an unstable reference gene for normalisation. For the adult experiment, maiden and repeat adult females were held at the Salmon Enterprises of Tasmania (SALTAS) Wayatinah Hatchery (Tasmania, Australia) at ambient temperature and photoperiod in either 200 (maidens) or 50 (repeats) m3 circular tanks at stocking densities of 12-18, and 24-36 kg m-3 for maidens and repeats, respectively, until transfered to the experimental tanks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Today, quantitative real-time PCR is the method of choice for rapid and reliable quantification of mRNA transcription. However, for an exact comparison of mRNA transcription in different samples or tissues it is crucial to choose the appropriate reference gene. Recently glyceraldehyde 3-phosphate dehydrogenase and P-actin have been used for that purpose. However, it has been reported that these genes as well as alternatives, like rRNA genes, are unsuitable references, because their transcription is significantly regulated in various experimental settings and variable in different tissues. Therefore, quantitative real-time PCR was used to determine the mRNA transcription profiles of 13 putative reference genes, comparing their transcription in 16 different tissues and in CCRF-HSB-2 cells stimulated with 12-O-tetradecanoylphorbol-13-acetate and ionomycin. Our results show that Classical reference genes are indeed unsuitable, whereas the RNA polymerase II gene was the gene with the most constant expression in different tissues and following stimulation in CCRF-HSB-2 cells. (C) 2003 Elsevier Inc. All rights reserved.