980 resultados para Labeling hierarchical clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recently, much attention has been given to the mass spectrometry (MS) technology based disease classification, diagnosis, and protein-based biomarker identification. Similar to microarray based investigation, proteomic data generated by such kind of high-throughput experiments are often with high feature-to-sample ratio. Moreover, biological information and pattern are compounded with data noise, redundancy and outliers. Thus, the development of algorithms and procedures for the analysis and interpretation of such kind of data is of paramount importance. In this paper, we propose a hybrid system for analyzing such high dimensional data. The proposed method uses the k-mean clustering algorithm based feature extraction and selection procedure to bridge the filter selection and wrapper selection methods. The potential informative mass/charge (m/z) markers selected by filters are subject to the k-mean clustering algorithm for correlation and redundancy reduction, and a multi-objective Genetic Algorithm selector is then employed to identify discriminative m/z markers generated by k-mean clustering algorithm. Experimental results obtained by using the proposed method indicate that it is suitable for m/z biomarker selection and MS based sample classification.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering with the agglomerative Information Bottleneck (aIB) algorithm suffers from the sub-optimality problem, which cannot guarantee to preserve as much relative information as possible. To handle this problem, we introduce a density connectivity chain, by which we consider not only the information between two data elements, but also the information among the neighbors of a data element. Based on this idea, we propose DCIB, a Density Connectivity Information Bottleneck algorithm that applies the Information Bottleneck method to quantify the relative information during the clustering procedure. As a hierarchical algorithm, the DCIB algorithm produces a pruned clustering tree-structure and gets clustering results in different sizes in a single execution. The experiment results in the documentation clustering indicate that the DCIB algorithm can preserve more relative information and achieve higher precision than the aIB algorithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The multifunctional polypeptide cyclosporin synthetase (CySyn) remains one of the most complex nonribosomal peptide synthetase described. In this study we used a highly specific photoaffinity labeling procedure with the natural cofactor S-adenosyl-l-methionine (AdoMet), 14C-isotopically labeled at the Sδ methyl group to probe the concerted AdoMet-binding interaction of the N-methyltransferase (N-MTase) centers of CySyn. The binding stoichiometry for the enzyme–AdoMet complex was determined to be 1:7, which is in agreement with inferences made from analysis of the complementary DNA sequence of the simA gene encoding the CySyn polypeptide. The photolabeling of the AdoMet-binding sites displayed homotropic negative cooperativity, characterized by a curvilinear Scatchard plot with upward concavity. Although, the process of N-methyl transfer is not a critical event for peptide elongation, the destabilizing homotropic interactions between N-MTase centers that were observed may represent a mechanism whereby the enzyme preserves the proficiency of the substrate-channeling process of cyclosporin peptide assembly over a broad range of cofactor concentrations. Furthermore, we demonstrated the utility of the photolabeling procedure for tracking the enzyme during purification.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The high-throughput experimental data from the new gene microarray technology has spurred numerous efforts to find effective ways of processing microarray data for revealing real biological relationships among genes. This work proposes an innovative data pre-processing approach to identify noise data in the data sets and eliminate or reduce the impact of the noise data on gene clustering, With the proposed algorithm, the pre-processed data sets make the clustering results stable across clustering algorithms with different similarity metrics, the important information of genes and features is kept, and the clustering quality is improved. The primary evaluation on real microarray data sets has shown the effectiveness of the proposed algorithm.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Artificial superhydrophobic surfaces with a hierarchical topography were fabricated by using layer-by-layer assembly of polyelectrolytes and silica nanoparticles on microsphere-patterned polyimide precursor substrates followed with thermal and fluoroalkylsilane treatment. In this special hierarchical topography, micrometer-scale structures were provided by replica molding of polyamic acid using two-dimensional arrays of polystyrene latex spheres as templates, and nanosized silica particles were then assembled on these microspheres to construct finer structures at the nanoscale. Heat treatment was conducted to induce chemical cross-linking between polyelectrolytes and simultaneously convert polyamic acid to polyimide. After surface modification with fluoroalkylsilane, the as-prepared highly hydrophilic surface was endowed with superhydrophobicity due to the bioinspired combination of low surface energy materials and hierarchical surface structures. A superhydrophobic surface with a static water contact angle of 160 degrees and sliding angle of less than 10 degrees was obtained. Notably, the polyimide microspheres were integrated with the substrate and were mechanically stable. In addition, the chemical and mechanical stability of the polyelectrolyte/silica nanoparticle multilayers could be increased by heat-induced cross-linking between polyelectrolytes to form nylon-like films, as well as the formation of interfacial chemical bonds.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Though serving as an effective means for damage identification, the capability of an artificial neural network (ANN) for quantitative prediction is substantially dependent on the amount of training data. In virtue of a concept of “Digital Damage Fingerprints” (DDF), a hierarchical approach for the development of training databases was proposed for ANN-based damage identification. With the object of exploiting the capability of ANN to address the key questions: “Is there damage?” and “Where is the damage?”, the amount of training data (damage cases) was increased progressively. Mutuality was established between the quantity of training data and the accuracy of answers to the two questions of interest, and was experimentally validated by identifying the position of actual damage in carbon fibre-reinforced composite laminates. The results demonstrate that such a hierarchical approach is capable of offering prediction as to the presence and location of damage individually, with substantially reduced computational cost and effort in the development of the ANN training database.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A few of clustering techniques for categorical data exist to group objects having similar characteristics. Some are able to handle uncertainty in the clustering process while others have stability issues. However, the performance of these techniques is an issue due to low accuracy and high computational complexity. This paper proposes a new technique called maximum dependency attributes (MDA) for selecting clustering attribute. The proposed approach is based on rough set theory by taking into account the dependency of attributes of the database. We analyze and compare the performance of MDA technique with the bi-clustering, total roughness (TR) and min–min roughness (MMR) techniques based on four test cases. The results establish the better performance of the proposed approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A long-standing question in the field of immunology concerns the factors that contribute to Th cell epitope immunodominance. For a number of viral membrane proteins, Th cell epitopes are localized to exposed protein surfaces, often overlapping with Ab binding sites. It has therefore been proposed that Abs on B cell surfaces selectively bind and protect exposed protein fragments during Ag processing, and that this interaction helps to shape the Th cell repertoire. While attractive in concept, this hypothesis has not been thoroughly tested. To test this hypothesis, we have compared Th cell peptide immunodominance in normal C57BL/6 mice with that in C57BL/6MT/MT mice (lacking normal B cell activity). Animals were first vaccinated with DNA constructs expressing one of three different HIV envelope proteins, after which the CD4 T cell response profiles were characterized toward overlapping peptides using an IFN- ELISPOT assay. We found a striking similarity between the peptide response profiles in the two mouse strains. Profiles also matched those of previous experiments in which different envelope vaccination regimens were used. Our results clearly demonstrate that normal Ab activity is not required for the establishment or maintenance of Th peptide immunodominance in the HIV envelope response. To explain the clustering of Th cell epitopes, we propose that localization of peptide on exposed envelope surfaces facilitates proteolytic activity and preferential peptide shuttling through the Ag processing pathway.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cluster analysis has played a key role in data understanding. When such an important data mining task is extended to the context of data streams, it becomes more challenging since the data arrive at a mining system in one-pass manner. The problem is even more difficult when the clustering task is considered in a sliding window model which requiring the elimination of outdated data must be dealt with properly. We propose SWEM algorithm that exploits the Expectation Maximization technique to address these challenges. SWEM is not only able to process the stream in an incremental manner, but also capable to adapt to changes happened in the underlying stream distribution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cluster analysis has played a key role in data stream understanding. The problem is difficult when the clustering task is considered in a sliding window model in which the requirement of outdated data elimination must be dealt with properly. We propose SWEM algorithm that is designed based on the Expectation Maximization technique to address these challenges. Equipped in SWEM is the capability to compute clusters incrementally using a small number of statistics summarized over the stream and the capability to adapt to the stream distribution’s changes. The feasibility of SWEM has been verified via a number of experiments and we show that it is superior than Clustream algorithm, for both synthetic and real datasets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In an enterprise grid computing environments, users have access to multiple resources that may be distributed geographically. Thus, resource allocation and scheduling is a fundamental issue in achieving high performance on enterprise grid computing. Most of current job scheduling systems for enterprise grid computing provide batch queuing support and focused solely on the allocation of processors to jobs. However, since I/O is also a critical resource for many jobs, the allocation of processor and I/O resources must be coordinated to allow the system to operate most effectively. To this end, we present a hierarchical scheduling policy paying special attention to I/O and service-demands of parallel jobs in homogeneous and heterogeneous systems with background workload. The performance of the proposed scheduling policy is studied under various system and workload parameters through simulation. We also compare performance of the proposed policy with a static space–time sharing policy. The results show that the proposed policy performs substantially better than the static space–time sharing policy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a novel multi-label classification framework for domains with large numbers of labels. Automatic image annotation is such a domain, as the available semantic concepts are typically hundreds. The proposed framework comprises an initial clustering phase that breaks the original training set into several disjoint clusters of data. It then trains a multi-label classifier from the data of each cluster. Given a new test instance, the framework first finds the nearest cluster and then applies the corresponding model. Empirical results using two clustering algorithms, four multi-label classification algorithms and three image annotation data sets suggest that the proposed approach can improve the performance and reduce the training time of standard multi-label classification algorithms, particularly in the case of large number of labels.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Lung nodules can be detected through examining CT scans. An automated lung nodule classification system is presented in this paper. The system employs random forests as it base classifier. A unique architecture for classification-aided-by-clustering is presented. Four experiments are conducted to study the performance of the developed system. 5721 CT lung image slices from the LIDC database are employed in the experiments. According to the experimental results, the highest sensitivity of 97.92%, and specificty of 96.28% are achieved by the system. The results demonstrate that the system has improved the performances of its tested counterparts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An automated lung nodule detection system can help spot lung abnormalities in CT lung images. Lung nodule detection can be achieved using template-based, segmentation-based, and classification-based methods. The existing systems that include a classification component in their structures have demonstrated better performances than their counterparts. Ensemble learners combine decisions of multiple classifiers to form an integrated output. To improve the performance of automated lung nodule detection, an ensemble classification aided by clustering (CAC) method is proposed. The method takes advantage of the random forest algorithm and offers a structure for a hybrid random forest based lung nodule classification aided by clustering. Several experiments are carried out involving the proposed method as well as two other existing methods. The parameters of the classifiers are varied to identify the best performing classifiers. The experiments are conducted using lung scans of 32 patients including 5721 images within which nodule locations are marked by expert radiologists. Overall, the best sensitivity of 98.33% and specificity of 97.11% have been recorded for proposed system. Also, a high receiver operating characteristic (ROC) Az of 0.9786 has been achieved.