968 resultados para Unsupervised classification
Resumo:
Protein Kinase-Like Non-kinases (PKLNKs), which are closely related to protein kinases, lack the crucial catalytic aspartate in the catalytic loop, and hence cannot function as protein kinase, have been analysed. Using various sensitive sequence analysis methods, we have recognized 82 PKLNKs from four higher eukaryotic organisms, namely, Homo sapiens, Mus musculus, Rattus norvegicus, and Drosophila melanogaster. On the basis of their domain combination and function, PKLNKs have been classified mainly into four categories: (1) Ligand binding PKLNKs, (2) PKLNKs with extracellular protein-protein interaction domain, (3) PKLNKs involved in dimerization, and (4) PKLNKs with cytoplasmic protein-protein interaction module. While members of the first two classes of PKLNKs have transmembrane domain tethered to the PKLNK domain, members of the other two classes of PKLNKs are cytoplasmic in nature. The current classification scheme hopes to provide a convenient framework to classify the PKLNKs from other eukaryotes which would be helpful in deciphering their roles in cellular processes.
Resumo:
Elephants use vocalizations for both long and short distance communication. Whereas the acoustic repertoire of the African elephant (Loxodonta africana) has been extensively studied in its savannah habitat, very little is known about the structure and social context of the vocalizations of the Asian elephant (Elephas maximus), which is mostly found in forests. In this study, the vocal repertoire of wild Asian elephants in southern India was examined. The calls could be classified into four mutually exclusive categories, namely, trumpets, chirps, roars, and rumbles, based on quantitative analyses of their spectral and temporal features. One of the call types, the rumble, exhibited high structural diversity, particularly in the direction and extent of frequency modulation of calls. Juveniles produced three of the four call types, including trumpets, roars, and rumbles, in the context of play and distress. Adults produced trumpets and roars in the context of disturbance, aggression, and play. Chirps were typically produced in situations of confusion and alarm. Rumbles were used for contact calling within and among herds, by matriarchs to assemble the herd, in close-range social interactions, and during disturbance and aggression. Spectral and temporal features of the four call types were similar between Asian and African elephants.
Pi-turns in proteins and peptides: Classification, conformation, occurrence, hydration and sequence.
Resumo:
The i + 5-->i hydrogen bonded turn conformation (pi-turn) with the fifth residue adopting alpha L conformation is frequently found at the C-terminus of helices in proteins and hence is speculated to be a "helix termination signal." An analysis of the occurrence of i + 5-->i hydrogen bonded turn conformation at any general position in proteins (not specifically at the helix C-terminus), using coordinates of 228 protein crystal structures determined by X-ray crystallography to better than 2.5 A resolution is reported in this paper. Of 486 detected pi-turn conformations, 367 have the (i + 4)th residue in alpha L conformation, generally occurring at the C-terminus of alpha-helices, consistent with previous observations. However, a significant number (111) of pi-turn conformations occur with (i + 4)th residue in alpha R conformation also, generally occurring in alpha-helices as distortions either at the terminii or at the middle, a novel finding. These two sets of pi-turn conformations are referred to by the names pi alpha L and pi alpha R-turns, respectively, depending upon whether the (i + 4)th residue adopts alpha L or alpha R conformations. Four pi-turns, named pi alpha L'-turns, were noticed to be mirror images of pi alpha L-turns, and four more pi-turns, which have the (i + 4)th residue in beta conformation and denoted as pi beta-turns, occur as a part of hairpin bend connecting twisted beta-strands. Consecutive pi-turns occur, but only with pi alpha R-turns. The preference for amino acid residues is different in pi alpha L and pi alpha R-turns. However, both show a preference for Pro after the C-termini. Hydrophilic residues are preferred at positions i + 1, i + 2, and i + 3 of pi alpha L-turns, whereas positions i and i + 5 prefer hydrophobic residues. Residue i + 4 in pi alpha L-turns is mainly Gly and less often Asn. Although pi alpha R-turns generally occur as distortions in helices, their amino acid preference is different from that of helices. Poor helix formers, such as His, Tyr, and Asn, also were found to be preferred for pi alpha R-turns, whereas good helix former Ala is not preferred. pi-Turns in peptides provide a picture of the pi-turn at atomic resolution. Only nine peptide-based pi-turns are reported so far, and all of them belong to pi alpha L-turn type with an achiral residue in position i + 4. The results are of importance for structure prediction, modeling, and de novo design of proteins.
Resumo:
Context: Pheochromocytomas and paragangliomas (PPGLs) are heritable neoplasms that can be classified into gene-expression subtypes corresponding to their underlying specific genetic drivers. Objective: This study aimed to develop a diagnostic and research tool (Pheo-type) capable of classifying PPGL tumors into gene-expression subtypes that could be used to guide and interpret genetic testing, determine surveillance programs, and aid in elucidation of PPGL biology. Design: A compendium of published microarray data representing 205 PPGL tumors was used for the selection of subtype-specific genes that were then translated to the Nanostring gene-expression platform. A support vector machine was trained on the microarray dataset and then tested on an independent Nanostring dataset representing 38 familial and sporadic cases of PPGL of known genotype (RET, NF1, TMEM127, MAX, HRAS, VHL, and SDHx). Different classifier models involving between three and six subtypes were compared for their discrimination potential. Results: A gene set of 46 genes and six endogenous controls was selected representing six known PPGL subtypes; RTK1–3 (RET, NF1, TMEM127, and HRAS), MAX-like, VHL, and SDHx. Of 38 test cases, 34 (90%) were correctly predicted to six subtypes based on the known genotype to gene-expression subtype association. Removal of the RTK2 subtype from training, characterized by an admixture of tumor and normal adrenal cortex, improved the classification accuracy (35/38). Consolidation of RTK and pseudohypoxic PPGL subtypes to four- and then three-class architectures improved the classification accuracy for clinical application. Conclusions: The Pheo-type gene-expression assay is a reliable method for predicting PPGL genotype using routine diagnostic tumor samples.
Resumo:
Climate change contributes directly or indirectly to changes in species distributions, and there is very high confidence that recent climate warming is already affecting ecosystems. The Arctic has already experienced the greatest regional warming in recent decades, and the trend is continuing. However, studies on the northern ecosystems are scarce compared to more southerly regions. Better understanding of the past and present environmental change is needed to be able to forecast the future. Multivariate methods were used to explore the distributional patterns of chironomids in 50 shallow (≤ 10m) lakes in relation to 24 variables determined in northern Fennoscandia at the ecotonal area from the boreal forest in the south to the orohemiarctic zone in the north. Highest taxon richness was noted at middle elevations around 400 m a.s.l. Significantly lower values were observed from cold lakes situated in the tundra zone. Lake water alkalinity had the strongest positive correlation with the taxon richness. Many taxa had preference for lakes either on tundra area or forested area. The variation in the chironomid abundance data was best correlated with sediment organic content (LOI), lake water total organic carbon content, pH and air temperature, with LOI being the strongest variable. Three major lake groups were separated on the basis of their chironomid assemblages: (i) small and shallow organic-rich lakes, (ii) large and base-rich lakes, and (iii) cold and clear oligotrophic tundra lakes. Environmental variables best discriminating the lake groups were LOI, taxon richness, and Mg. When repeated, this kind of an approach could be useful and efficient in monitoring the effects of global change on species ranges. Many species of fast spreading insects, including chironomids, show a remarkable ability to track environmental changes. Based on this ability, past environmental conditions have been reconstructed using their chitinous remains in the lake sediment profiles. In order to study the Holocene environmental history of subarctic aquatic systems, and quantitatively reconstruct the past temperatures at or near the treeline, long sediment cores covering the last 10000 years (the Holocene) were collected from three lakes. Lower temperature values than expected based on the presence of pine in the catchment during the mid-Holocene were reconstructed from a lake with great water volume and depth. The lake provided thermal refuge for profundal, cold adapted taxa during the warm period. In a shallow lake, the decrease in the reconstructed temperatures during the late Holocene may reflect the indirect response of the midges to climate change through, e.g., pH change. The results from three lakes indicated that the response of chironomids to climate have been more or less indirect. However, concurrent shifts in assemblages of chironomids and vegetation in two lakes during the Holocene time period indicated that the midges together with the terrestrial vegetation had responded to the same ultimate cause, which most likely was the Holocene climate change. This was also supported by the similarity in the long-term trends in faunal succession for the chironomid assemblages in several lakes in the area. In northern Finnish Lapland the distribution of chironomids were significantly correlated with physical and limnological factors that are most likely to change as a result of future climate change. The indirect and individualistic response of aquatic systems, as reconstructed using the chironomid assemblages, to the climate change in the past suggests that in the future, the lake ecosystems in the north do not respond in one predictable way to the global climate change. Lakes in the north may respond to global climate change in various ways that are dependent on the initial characters of the catchment area and the lake.
Resumo:
Partitional clustering algorithms, which partition the dataset into a pre-defined number of clusters, can be broadly classified into two types: algorithms which explicitly take the number of clusters as input and algorithms that take the expected size of a cluster as input. In this paper, we propose a variant of the k-means algorithm and prove that it is more efficient than standard k-means algorithms. An important contribution of this paper is the establishment of a relation between the number of clusters and the size of the clusters in a dataset through the analysis of our algorithm. We also demonstrate that the integration of this algorithm as a pre-processing step in classification algorithms reduces their running-time complexity.
Resumo:
Acoustics is a rich source of environmental information that can reflect the ecological dynamics. To deal with the escalating acoustic data, a variety of automated classification techniques have been used for acoustic patterns or scene recognition, including urban soundscapes such as streets and restaurants; and natural soundscapes such as raining and thundering. It is common to classify acoustic patterns under the assumption that a single type of soundscapes present in an audio clip. This assumption is reasonable for some carefully selected audios. However, only few experiments have been focused on classifying simultaneous acoustic patterns in long-duration recordings. This paper proposes a binary relevance based multi-label classification approach to recognise simultaneous acoustic patterns in one-minute audio clips. By utilising acoustic indices as global features and multilayer perceptron as a base classifier, we achieve good classification performance on in-the-field data. Compared with single-label classification, multi-label classification approach provides more detailed information about the distributions of various acoustic patterns in long-duration recordings. These results will merit further biodiversity investigations, such as bird species surveys.
Resumo:
The present study deals with the application of cluster analysis, Fuzzy Cluster Analysis (FCA) and Kohonen Artificial Neural Networks (KANN) methods for classification of 159 meteorological stations in India into meteorologically homogeneous groups. Eight parameters, namely latitude, longitude, elevation, average temperature, humidity, wind speed, sunshine hours and solar radiation, are considered as the classification criteria for grouping. The optimal number of groups is determined as 14 based on the Davies-Bouldin index approach. It is observed that the FCA approach performed better than the other two methodologies for the present study.
Resumo:
In this paper we propose a novel family of kernels for multivariate time-series classification problems. Each time-series is approximated by a linear combination of piecewise polynomial functions in a Reproducing Kernel Hilbert Space by a novel kernel interpolation technique. Using the associated kernel function a large margin classification formulation is proposed which can discriminate between two classes. The formulation leads to kernels, between two multivariate time-series, which can be efficiently computed. The kernels have been successfully applied to writer independent handwritten character recognition.
Resumo:
We compared student performance on large-scale take-home assignments and small-scale invigilated tests that require competency with exactly the same programming concepts. The purpose of the tests, which were carried out soon after the take home assignments were submitted, was to validate the students' assignments as individual work. We found widespread discrepancies between the marks achieved by students between the two types of tasks. Many students were able to achieve a much higher grade on the take-home assignments than the invigilated tests. We conclude that these paired assessments are an effective way to quickly identify students who are still struggling with programming concepts that we might otherwise assume they understand, given their ability to complete similar, yet more complicated, tasks in their own time. We classify these students as not yet being at the neo-Piagetian stage of concrete operational reasoning.
Resumo:
The introduction of casemix funding for Australian acute health care services has challenged Social Work to demonstrate clear reporting mechanisms, demonstrate effective practice and to justify interventions provided. The term 'casemix' is used to describe the mix and type of patients treated by a hospital or other health care services. There is wide acknowledgement that the procedure-based system of Diagnosis Related Groupings (DRGs) is grounded in a medical/illness perspective and is unsatisfactory in describing and predicting the activity of Social Work and other allied health professions in health care service delivery. The National Allied Health Casemix Committee was established in 1991 as the peak body to represent allied health professions in matters related to casemix classification. This Committee has pioneered a nationally consistent, patient-centred information system for allied health. This paper describes the classification systems and codes developed for Social Work, which includes a minimum data set, a classification hierarchy, the set of activity (input) codes and 'indicator for intervention' codes. The advantages and limitations of the system are also discussed.
Resumo:
Evaluation of intermolecular interactions in terms of both experimental and theoretical charge density analyses has produced a unified picture with which to classify strong and weak hydrogen bonds, along with van der Waals interactions, into three regions.
Resumo:
Background:Overwhelming majority of the Serine/Threonine protein kinases identified by gleaning archaeal and eubacterial genomes could not be classified into any of the well known Hanks and Hunter subfamilies of protein kinases. This is owing to the development of Hanks and Hunter classification scheme based on eukaryotic protein kinases which are highly divergent from their prokaryotic homologues. A large dataset of prokaryotic Serine/Threonine protein kinases recognized from genomes of prokaryotes have been used to develop a classification framework for prokaryotic Ser/Thr protein kinases. Methodology/Principal Findings: We have used traditional sequence alignment and phylogenetic approaches and clustered the prokaryotic kinases which represent 72 subfamilies with at least 4 members in each. Such a clustering enables classification of prokaryotic Ser/Thr kinases and it can be used as a framework to classify newly identified prokaryotic Ser/Thr kinases. After series of searches in a comprehensive sequence database we recognized that 38 subfamilies of prokaryotic protein kinases are associated to a specific taxonomic level. For example 4, 6 and 3 subfamilies have been identified that are currently specific to phylum proteobacteria, cyanobacteria and actinobacteria respectively. Similarly subfamilies which are specific to an order, sub-order, class, family and genus have also been identified. In addition to these, we also identify organism-diverse subfamilies. Members of these clusters are from organisms of different taxonomic levels, such as archaea, bacteria, eukaryotes and viruses.Conclusion/Significance: Interestingly, occurrence of several taxonomic level specific subfamilies of prokaryotic kinases contrasts with classification of eukaryotic protein kinases in which most of the popular subfamilies of eukaryotic protein kinases occur diversely in several eukaryotes. Many prokaryotic Ser/Thr kinases exhibit a wide variety of modular organization which indicates a degree of complexity and protein-protein interactions in the signaling pathways in these microbes.
Resumo:
This paper aims at evaluating the methods of multiclass support vector machines (SVMs) for effective use in distance relay coordination. Also, it describes a strategy of supportive systems to aid the conventional protection philosophy in combating situations where protection systems have maloperated and/or information is missing and provide selective and secure coordinations. SVMs have considerable potential as zone classifiers of distance relay coordination. This typically requires a multiclass SVM classifier to effectively analyze/build the underlying concept between reach of different zones and the apparent impedance trajectory during fault. Several methods have been proposed for multiclass classification where typically several binary SVM classifiers are combined together. Some authors have extended binary SVM classification to one-step single optimization operation considering all classes at once. In this paper, one-step multiclass classification, one-against-all, and one-against-one multiclass methods are compared for their performance with respect to accuracy, number of iterations, number of support vectors, training, and testing time. The performance analysis of these three methods is presented on three data sets belonging to training and testing patterns of three supportive systems for a region and part of a network, which is an equivalent 526-bus system of the practical Indian Western grid.