949 resultados para classification scheme
Resumo:
Die Nützlichkeit des Einsatzes von Computern in Schule und Ausbildung ist schon seit einigen Jahren unbestritten. Uneinigkeit herrscht gegenwärtig allerdings darüber, welche Aufgaben von Computern eigenständig wahrgenommen werden können. Bewertet man die Übernahme von Lehrfunktionen durch computerbasierte Lehrsysteme, müssen häufig Mängel festgestellt werden. Das Ziel der vorliegenden Arbeit ist es, ausgehend von aktuellen Praxisrealisierungen computerbasierter Lehrsysteme unterschiedliche Klassen von zentralen Lehrkompetenzen (Schülermodellierung, Fachwissen und instruktionale Aktivitäten im engeren Sinne) zu bestimmen. Innerhalb jeder Klasse werden globale Leistungen der Lehrsysteme und notwendige, in komplementärer Relation stehende Tätigkeiten menschlicher Tutoren bestimmt. Das dabei entstandene Klassifikationsschema erlaubt sowohl die Einordnung typischer Lehrsysteme als auch die Feststellung von spezifischen Kompetenzen, die in der Lehrer- bzw. Trainerausbildung zukünftig vermehrt berücksichtigt werden sollten. (DIPF/Orig.)
Resumo:
International audience
Resumo:
Global land cover maps play an important role in the understanding of the Earth's ecosystem dynamic. Several global land cover maps have been produced recently namely, Global Land Cover Share (GLC-Share) and GlobeLand30. These datasets are very useful sources of land cover information and potential users and producers are many times interested in comparing these datasets. However these global land cover maps are produced based on different techniques and using different classification schemes making their interoperability in a standardized way a challenge. The Environmental Information and Observation Network (EIONET) Action Group on Land Monitoring in Europe (EAGLE) concept was developed in order to translate the differences in the classification schemes into a standardized format which allows a comparison between class definitions. This is done by elaborating an EAGLE matrix for each classification scheme, where a bar code is assigned to each class definition that compose a certain land cover class. Ahlqvist (2005) developed an overlap metric to cope with semantic uncertainty of geographical concepts, providing this way a measure of how geographical concepts are more related to each other. In this paper, the comparison of global land cover datasets is done by translating each land cover legend into the EAGLE bar coding for the Land Cover Components of the EAGLE matrix. The bar coding values assigned to each class definition are transformed in a fuzzy function that is used to compute the overlap metric proposed by Ahlqvist (2005) and overlap matrices between land cover legends are elaborated. The overlap matrices allow the semantic comparison between the classification schemes of each global land cover map. The proposed methodology is tested on a case study where the overlap metric proposed by Ahlqvist (2005) is computed in the comparison of two global land cover maps for Continental Portugal. The study resulted with the overlap spatial distribution among the two global land cover maps, Globeland30 and GLC-Share. These results shows that Globeland30 product overlap with a degree of 77% with GLC-Share product in Continental Portugal.
Resumo:
Subject ontogeny is the life of the subject in an indexing language (e.g., classification scheme like the DDC). Examining how a subject is treated over time tells us about the anatomy of an indexing language. For example, gypsies as a subject has been handled differently in different editions of the DDC.
Resumo:
The European Nature Information System (EUNIS) has been implemented for the establishment of a marine European habitats inventory. Its hierarchical classification is defined and relies on environmental variables which primarily constrain biological communities (e.g. substrate types, sea energy level, depth and light penetration). The EUNIS habitat classification scheme relies on thresholds (e.g. fraction of light and energy) which are based on expert judgment or on the empirical analysis of the above environmental data. The present paper proposes to establish and validate an appropriate threshold for energy classes (high, moderate and low) and for subtidal biological zonation (infralittoral and circalittoral) suitable for EUNIS habitat classification of the Western Iberian coast. Kineticwave-induced energy and the fraction of photosynthetically available light exerted on the marine bottom were respectively assigned to the presence of kelp (Saccorhiza polyschides, Laminaria hyperborea and Laminaria ochroleuca) and seaweed species in general. Both data were statistically described, ordered fromthe largest to the smallest and percentile analyseswere independently performed. The threshold between infralittoral and circalittoral was based on the first quartile while the ‘moderate energy’ class was established between the 12.5 and 87.5 percentiles. To avoid data dependence on sampling locations and assess the confidence interval a bootstrap technique was applied. According to this analysis,more than 75% of seaweeds are present at locations where more than 3.65% of the surface light reaches the sea bottom. The range of energy levels estimated using S. polyschides data, indicate that on the IberianWest coast the ‘moderate energy’ areas are between 0.00303 and 0.04385 N/m2 of wave-induced energy. The lack of agreement between different studies in different regions of Europe suggests the need for more standardization in the future. However, the obtained thresholds in the present study will be very useful in the near future to implement and establish the Iberian EUNIS habitats inventory.
Resumo:
This paper reviews the ways that quality can be assessed in standing waters, a subject that has hitherto attracted little attention but which is now a legal requirement in Europe. It describes a scheme for the assessment and monitoring of water and ecological quality in standing waters greater than about I ha in area in England & Wales although it is generally relevant to North-west Europe. Thirteen hydrological, chemical and biological variables are used to characterise the standing water body in any current sampling. These are lake volume, maximum depth, onductivity, Secchi disc transparency, pH, total alkalinity, calcium ion concentration, total N concentration,winter total oxidised inorganic nitrogen (effectively nitrate) concentration, total P concentration, potential maximum chlorophyll a concentration, a score based on the nature of the submerged and emergent plant community, and the presence or absence of a fish community. Inter alia these variables are key indicators of the state of eutrophication, acidification, salinisation and infilling of a water body.
Resumo:
In this paper we discuss the temporal aspects of indexing and classification in information systems. Basing this discussion off of the three sources of research of scheme change: of indexing: (1) analytical research on the types of scheme change and (2) empirical data on scheme change in systems and (3) evidence of cataloguer decision-making in the context of scheme change. From this general discussion we propose two constructs along which we might craft metrics to measure scheme change: collocative integrity and semantic gravity. The paper closes with a discussion of these constructs.
Resumo:
The statistical minimum risk pattern recognition problem, when the classification costs are random variables of unknown statistics, is considered. Using medical diagnosis as a possible application, the problem of learning the optimal decision scheme is studied for a two-class twoaction case, as a first step. This reduces to the problem of learning the optimum threshold (for taking appropriate action) on the a posteriori probability of one class. A recursive procedure for updating an estimate of the threshold is proposed. The estimation procedure does not require the knowledge of actual class labels of the sample patterns in the design set. The adaptive scheme of using the present threshold estimate for taking action on the next sample is shown to converge, in probability, to the optimum. The results of a computer simulation study of three learning schemes demonstrate the theoretically predictable salient features of the adaptive scheme.
Resumo:
Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.
Resumo:
Classification of large datasets is a challenging task in Data Mining. In the current work, we propose a novel method that compresses the data and classifies the test data directly in its compressed form. The work forms a hybrid learning approach integrating the activities of data abstraction, frequent item generation, compression, classification and use of rough sets.
Resumo:
Background: Protein phosphorylation is a generic way to regulate signal transduction pathways in all kingdoms of life. In many organisms, it is achieved by the large family of Ser/Thr/Tyr protein kinases which are traditionally classified into groups and subfamilies on the basis of the amino acid sequence of their catalytic domains. Many protein kinases are multidomain in nature but the diversity of the accessory domains and their organization are usually not taken into account while classifying kinases into groups or subfamilies. Methodology: Here, we present an approach which considers amino acid sequences of complete gene products, in order to suggest refinements in sets of pre-classified sequences. The strategy is based on alignment-free similarity scores and iterative Area Under the Curve (AUC) computation. Similarity scores are computed by detecting common patterns between two sequences and scoring them using a substitution matrix, with a consistent normalization scheme. This allows us to handle full-length sequences, and implicitly takes into account domain diversity and domain shuffling. We quantitatively validate our approach on a subset of 212 human protein kinases. We then employ it on the complete repertoire of human protein kinases and suggest few qualitative refinements in the subfamily assignment stored in the KinG database, which is based on catalytic domains only. Based on our new measure, we delineate 37 cases of potential hybrid kinases: sequences for which classical classification based entirely on catalytic domains is inconsistent with the full-length similarity scores computed here, which implicitly consider multi-domain nature and regions outside the catalytic kinase domain. We also provide some examples of hybrid kinases of the protozoan parasite Entamoeba histolytica. Conclusions: The implicit consideration of multi-domain architectures is a valuable inclusion to complement other classification schemes. The proposed algorithm may also be employed to classify other families of enzymes with multidomain architecture.
Resumo:
Our ability to infer the protein quaternary structure automatically from atom and lattice information is inadequate, especially for weak complexes, and heteromeric quaternary structures. Several approaches exist, but they have limited performance. Here, we present a new scheme to infer protein quaternary structure from lattice and protein information, with all-around coverage for strong, weak and very weak affinity homomeric and heteromeric complexes. The scheme combines naive Bayes classifier and point group symmetry under Boolean framework to detect quaternary structures in crystal lattice. It consistently produces >= 90% coverage across diverse benchmarking data sets, including a notably superior 95% coverage for recognition heteromeric complexes, compared with 53% on the same data set by current state-of-the-art method. The detailed study of a limited number of prediction-failed cases offers interesting insights into the intriguing nature of protein contacts in lattice. The findings have implications for accurate inference of quaternary states of proteins, especially weak affinity complexes.
Resumo:
A general analysis of squeezing transformations for two-mode systems is given based on the four-dimensional real symplectic group Sp(4, R). Within the framework of the unitary (metaplectic) representation of this group, a distinction between compact photon-number-conserving and noncompact photon-number-nonconserving squeezing transformations is made. We exploit the U(2) invariant squeezing criterion to divide the set of all squeezing transformations into a two-parameter family of distinct equivalence classes with representative elements chosen for each class. Familiar two-mode squeezing transformations in the literature are recognized in our framework and seen to form a set of measure zero. Examples of squeezed coherent and thermal states are worked out. The need to extend the heterodyne detection scheme to encompass all of U(2) is emphasized, and known experimental situations where all U(2) elements can be reproduced are briefly described.
Resumo:
In this paper we study the problem of designing SVM classifiers when the kernel matrix, K, is affected by uncertainty. Specifically K is modeled as a positive affine combination of given positive semi definite kernels, with the coefficients ranging in a norm-bounded uncertainty set. We treat the problem using the Robust Optimization methodology. This reduces the uncertain SVM problem into a deterministic conic quadratic problem which can be solved in principle by a polynomial time Interior Point (IP) algorithm. However, for large-scale classification problems, IP methods become intractable and one has to resort to first-order gradient type methods. The strategy we use here is to reformulate the robust counterpart of the uncertain SVM problem as a saddle point problem and employ a special gradient scheme which works directly on the convex-concave saddle function. The algorithm is a simplified version of a general scheme due to Juditski and Nemirovski (2011). It achieves an O(1/T-2) reduction of the initial error after T iterations. A comprehensive empirical study on both synthetic data and real-world protein structure data sets show that the proposed formulations achieve the desired robustness, and the saddle point based algorithm outperforms the IP method significantly.
Resumo:
We consider the problem of developing privacy-preserving machine learning algorithms in a dis-tributed multiparty setting. Here different parties own different parts of a data set, and the goal is to learn a classifier from the entire data set with-out any party revealing any information about the individual data points it owns. Pathak et al [7]recently proposed a solution to this problem in which each party learns a local classifier from its own data, and a third party then aggregates these classifiers in a privacy-preserving manner using a cryptographic scheme. The generaliza-tion performance of their algorithm is sensitive to the number of parties and the relative frac-tions of data owned by the different parties. In this paper, we describe a new differentially pri-vate algorithm for the multiparty setting that uses a stochastic gradient descent based procedure to directly optimize the overall multiparty ob-jective rather than combining classifiers learned from optimizing local objectives. The algorithm achieves a slightly weaker form of differential privacy than that of [7], but provides improved generalization guarantees that do not depend on the number of parties or the relative sizes of the individual data sets. Experimental results corrob-orate our theoretical findings.