923 resultados para classification aided by clustering
Resumo:
Principally exercises.
Resumo:
beta-turns are important topological motifs for biological recognition of proteins and peptides. Organic molecules that sample the side chain positions of beta-turns have shown broad binding capacity to multiple different receptors, for example benzodiazepines. beta-turns have traditionally been classified into various types based on the backbone dihedral angles (phi 2, psi 2, phi 3 and psi 3). Indeed, 57-68% of beta-turns are currently classified into 8 different backbone families (Type I, Type II, Type I', Type II', Type VIII, Type VIa1, Type VIa2 and Type VIb and Type IV which represents unclassified beta-turns). Although this classification of beta-turns has been useful, the resulting beta-turn types are not ideal for the design of beta-turn mimetics as they do not reflect topological features of the recognition elements, the side chains. To overcome this, we have extracted beta-turns from a data set of non-homologous and high-resolution protein crystal structures. The side chain positions, as defined by C-alpha-C-beta vectors, of these turns have been clustered using the kth nearest neighbor clustering and filtered nearest centroid sorting algorithms. Nine clusters were obtained that cluster 90% of the data, and the average intra-cluster RMSD of the four C-alpha-C-beta vectors is 0.36. The nine clusters therefore represent the topology of the side chain scaffold architecture of the vast majority of beta-turns. The mean structures of the nine clusters are useful for the development of beta-turn mimetics and as biological descriptors for focusing combinatorial chemistry towards biologically relevant topological space.
Resumo:
In a first step toward understanding the molecular basis of pineapple fruit development, a sequencing project was initiated to survey a range of expressed sequences from green unripe and yellow ripe fruit tissue. A highly abundant metallothionein transcript was identified during library construction, and was estimated to account for up to 50% of all EST library clones. Library clones with metallothionein subtracted were sequenced, and 408 unripe green and 1140 ripe yellow edited EST clone sequences were retrieved. Clone redundancy was high, with the combined 1548 clone sequences clustering into just 634 contigs comprising 191 consensus sequences and 443 singletons. Half of the EST clone sequences clustered within 13.5% and 9.3% of contigs from green unripe and yellow ripe libraries, respectively, indicating that a small subset of genes dominate the majority of the transcriptome. Furthermore, sequence cluster analysis, northern analysis, and functional classification revealed major differences between genes expressed in the unripe green and ripe yellow fruit tissues. Abundant genes identified from the green fruit include a fruit bromelain and a bromelain inhibitor. Abundant genes identified in the yellow fruit library include a MADS box gene, and several genes normally associated with protein synthesis, including homologues of ribosomal L10 and the translation factors SUI1 and eIF5A. Both the green unripe and yellow ripe libraries contained high proportions of clones associated with oxidative stress responses and the detoxification of free radicals.
Resumo:
We consider the statistical problem of catalogue matching from a machine learning perspective with the goal of producing probabilistic outputs, and using all available information. A framework is provided that unifies two existing approaches to producing probabilistic outputs in the literature, one based on combining distribution estimates and the other based on combining probabilistic classifiers. We apply both of these to the problem of matching the HI Parkes All Sky Survey radio catalogue with large positional uncertainties to the much denser SuperCOSMOS catalogue with much smaller positional uncertainties. We demonstrate the utility of probabilistic outputs by a controllable completeness and efficiency trade-off and by identifying objects that have high probability of being rare. Finally, possible biasing effects in the output of these classifiers are also highlighted and discussed.
Resumo:
The most common human cancers are malignant neoplasms of the skin(1,2). Incidence of cutaneous melanoma is rising especially steeply, with minimal progress in non-surgical treatment of advanced disease(3,4). Despite significant effort to identify independent predictors of melanoma outcome, no accepted histopathological, molecular or immunohistochemical marker defines subsets of this neoplasm(2,3). Accordingly, though melanoma is thought to present with different 'taxonomic' forms, these are considered part of a continuous spectrum rather than discrete entities(2). Here we report the discovery of a subset of melanomas identified by mathematical analysis of gene expression in a series of samples. Remarkably, many genes underlying the classification of this subset are differentially regulated in invasive melanomas that form primitive tubular networks in vitro, a feature of some highly aggressive metastatic melanomas(5). Global transcript analysis can identify unrecognized subtypes of cutaneous melanoma and predict experimentally verifiable phenotypic characteristics that may be of importance to disease progression.
Resumo:
In this work, a microchanneled chirped fiber Bragg grating (MCFBG) is proposed and fabricated through the femtosecond laser-assisted chemical etching. The microchannel (~550 µm) gives access to the external index liquid, thus inducing refractive index (RI) sensitivity to the structure. In the experiment, the transmission bands induced by the reduced effective index in the microchannel region were used to sense the surrounding RI and temperature changes. The experimental results show good agreement with the theoretical analysis. The proposed MCFBG offers enhanced RI sensitivity without degrading the robustness of the device showing good application potential as bio-chemical sensors.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
The clustering pattern of diffuse, primitive and classic β-amyloid (Aβ) deposits was studied in the upper laminae of the frontal cortex of 9 patients with sporadic Alzheimer's disease (AD). Aβ stained tissue was counterstained with collagen type IV antiserum to determine whether the clusters of Aβ deposits were related to blood vessels. In all patients, Aβ deposits and blood vessels were clustered, with in many patients, a regular periodicity of clusters along the cortex parallel to the pia. The classic Aβ deposit clusters coincided with those of the larger blood vessels in all patients and with clusters of smaller blood vessels in 4 patients. Diffuse deposit clusters were related to blood vessels in 3 patients. Primitive deposit clusters were either unrelated to or negatively correlated with the blood vessels in six patients. Hence, Aβ deposit subtypes differ in their relationship to blood vessels. The data suggest a direct and specific role for the larger blood vessels in the formation of amyloid cores in AD. © 1995.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
In this work, a microchanneled chirped fiber Bragg grating (MCFBG) is proposed and fabricated through the femtosecond laser-assisted chemical etching. The microchannel (~550 µm) gives access to the external index liquid, thus inducing refractive index (RI) sensitivity to the structure. In the experiment, the transmission bands induced by the reduced effective index in the microchannel region were used to sense the surrounding RI and temperature changes. The experimental results show good agreement with the theoretical analysis. The proposed MCFBG offers enhanced RI sensitivity without degrading the robustness of the device showing good application potential as bio-chemical sensors.
Resumo:
Descriptions of vegetation communities are often based on vague semantic terms describing species presence and dominance. For this reason, some researchers advocate the use of fuzzy sets in the statistical classification of plant species data into communities. In this study, spatially referenced vegetation abundance values collected from Greek phrygana were analysed by ordination (DECORANA), and classified on the resulting axes using fuzzy c-means to yield a point data-set representing local memberships in characteristic plant communities. The fuzzy clusters matched vegetation communities noted in the field, which tended to grade into one another, rather than occupying discrete patches. The fuzzy set representation of the community exploited the strengths of detrended correspondence analysis while retaining richer information than a TWINSPAN classification of the same data. Thus, in the absence of phytosociological benchmarks, meaningful and manageable habitat information could be derived from complex, multivariate species data. We also analysed the influence of the reliability of different surveyors' field observations by multiple sampling at a selected sample location. We show that the impact of surveyor error was more severe in the Boolean than the fuzzy classification. © 2007 Springer.
Resumo:
User queries over image collections, based on semantic similarity, can be processed in several ways. In this paper, we propose to reuse the rules produced by rule-based classifiers in their recognition models as query pattern definitions for searching image collections.