67 resultados para Document classification,Naive Bayes classifier,Verb-object pairs


Relevância:

30.00% 30.00%

Publicador:

Resumo:

A parts based model is a parametrization of an object class using a collection of landmarks following the object structure. The matching of parts based models is one of the problems where pairwise Conditional Random Fields have been successfully applied. The main reason of their effectiveness is tractable inference and learning due to the simplicity of involved graphs, usually trees. However, these models do not consider possible patterns of statistics among sets of landmarks, and thus they sufffer from using too myopic information. To overcome this limitation, we propoese a novel structure based on a hierarchical Conditional Random Fields, which we explain in the first part of this memory. We build a hierarchy of combinations of landmarks, where matching is performed taking into account the whole hierarchy. To preserve tractable inference we effectively sample the label set. We test our method on facial feature selection and human pose estimation on two challenging datasets: Buffy and MultiPIE. In the second part of this memory, we present a novel approach to multiple kernel combination that relies on stacked classification. This method can be used to evaluate the landmarks of the parts-based model approach. Our method is based on combining responses of a set of independent classifiers for each individual kernel. Unlike earlier approaches that linearly combine kernel responses, our approach uses them as inputs to another set of classifiers. We will show that we outperform state-of-the-art methods on most of the standard benchmark datasets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

En aquest treball s'intenta fer una síntesi de les especificacions aportades per l'estàndard definit com a SQL: 1999, tot analitzant les ampliacions que fan referència a la nova orientació a l'objecte i a la incorporació de l'herència com a principal element diferenciador.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

El projecte consisteix en el desenvolupament d'una eina especialitzada per a la classificació i recerca de productes. L'aplicació pot ser fàcilment ampliable per a adaptar-la a qualsevol tipus de producte o necessitat futura. Per a la programació d'aquest projecte es va aprofitar la potència i senzillesa que proporciona l'entorn .NET i els seus llenguatges orientats a objecte, en aquest cas VB.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

L'objectiu és estudiar les característiques orientades a l'objecte de l'estàndard SQL: 1999 i posar-les a prova amb un producte comercial que les suporti.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos

Relevância:

30.00% 30.00%

Publicador:

Resumo:

For the ∼1% of the human genome in the ENCODE regions, only about half of the transcriptionally active regions (TARs) identified with tiling microarrays correspond to annotated exons. Here we categorize this large amount of “unannotated transcription.” We use a number of disparate features to classify the 6988 novel TARs—array expression profiles across cell lines and conditions, sequence composition, phylogenetic profiles (presence/absence of syntenic conservation across 17 species), and locations relative to genes. In the classification, we first filter out TARs with unusual sequence composition and those likely resulting from cross-hybridization. We then associate some of those remaining with proximal exons having correlated expression profiles. Finally, we cluster unclassified TARs into putative novel loci, based on similar expression and phylogenetic profiles. To encapsulate our classification, we construct a Database of Active Regions and Tools (DART.gersteinlab.org). DART has special facilities for rapidly handling and comparing many sets of TARs and their heterogeneous features, synchronizing across builds, and interfacing with other resources. Overall, we find that ∼14% of the novel TARs can be associated with known genes, while ∼21% can be clustered into ∼200 novel loci. We observe that TARs associated with genes are enriched in the potential to form structural RNAs and many novel TAR clusters are associated with nearby promoters. To benchmark our classification, we design a set of experiments for testing the connectivity of novel TARs. Overall, we find that 18 of the 46 connections tested validate by RT-PCR and four of five sequenced PCR products confirm connectivity unambiguously.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Two main school choice mechanisms have attracted the attention in the literature: Boston and deferred acceptance (DA). The question arises on the ex-ante welfareimplications when the game is played by participants that vary in terms of their strategicsophistication. Abdulkadiroglu, Che and Yasuda (2011) have shown that the chances ofnaive participants getting into a good school are higher under the Boston mechanism thanunder DA, and some naive participants are actually better off. In this note we show thatthese results can be extended to show that, under the veil of ignorance, i.e. students not yetknowing their utility values, all naive students may prefer to adopt the Boston mechanism.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We compare a set of empirical Bayes and composite estimators of the population means of the districts (small areas) of a country, and show that the natural modelling strategy of searching for a well fitting empirical Bayes model and using it for estimation of the area-level means can be inefficient.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we propose a Pyramidal Classification Algorithm,which together with an appropriate aggregation index producesan indexed pseudo-hierarchy (in the strict sense) withoutinversions nor crossings. The computer implementation of thealgorithm makes it possible to carry out some simulation testsby Monte Carlo methods in order to study the efficiency andsensitivity of the pyramidal methods of the Maximum, Minimumand UPGMA. The results shown in this paper may help to choosebetween the three classification methods proposed, in order toobtain the classification that best fits the original structureof the population, provided we have an a priori informationconcerning this structure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Statistical computing when input/output is driven by a Graphical User Interface is considered. A proposal is made for automatic control ofcomputational flow to ensure that only strictly required computationsare actually carried on. The computational flow is modeled by a directed graph for implementation in any object-oriented programming language with symbolic manipulation capabilities. A complete implementation example is presented to compute and display frequency based piecewise linear density estimators such as histograms or frequency polygons.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a simple randomized procedure for the prediction of a binary sequence. The algorithm uses ideas from recent developments of the theory of the prediction of individual sequences. We show that if thesequence is a realization of a stationary and ergodic random process then the average number of mistakes converges, almost surely, to that of the optimum, given by the Bayes predictor.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The classical binary classification problem is investigatedwhen it is known in advance that the posterior probability function(or regression function) belongs to some class of functions. We introduceand analyze a method which effectively exploits this knowledge. The methodis based on minimizing the empirical risk over a carefully selected``skeleton'' of the class of regression functions. The skeleton is acovering of the class based on a data--dependent metric, especiallyfitted for classification. A new scale--sensitive dimension isintroduced which is more useful for the studied classification problemthan other, previously defined, dimension measures. This fact isdemonstrated by performance bounds for the skeleton estimate in termsof the new dimension.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Development of three classification trees (CT) based on the CART (Classification and Regression Trees), CHAID (Chi-Square Automatic Interaction Detection) and C4.5 methodologies for the calculation of probability of hospital mortality; the comparison of the results with the APACHE II, SAPS II and MPM II-24 scores, and with a model based on multiple logistic regression (LR). Methods: Retrospective study of 2864 patients. Random partition (70:30) into a Development Set (DS) n = 1808 and Validation Set (VS) n = 808. Their properties of discrimination are compared with the ROC curve (AUC CI 95%), Percent of correct classification (PCC CI 95%); and the calibration with the Calibration Curve and the Standardized Mortality Ratio (SMR CI 95%). Results: CTs are produced with a different selection of variables and decision rules: CART (5 variables and 8 decision rules), CHAID (7 variables and 15 rules) and C4.5 (6 variables and 10 rules). The common variables were: inotropic therapy, Glasgow, age, (A-a)O2 gradient and antecedent of chronic illness. In VS: all the models achieved acceptable discrimination with AUC above 0.7. CT: CART (0.75(0.71-0.81)), CHAID (0.76(0.72-0.79)) and C4.5 (0.76(0.73-0.80)). PCC: CART (72(69- 75)), CHAID (72(69-75)) and C4.5 (76(73-79)). Calibration (SMR) better in the CT: CART (1.04(0.95-1.31)), CHAID (1.06(0.97-1.15) and C4.5 (1.08(0.98-1.16)). Conclusion: With different methodologies of CTs, trees are generated with different selection of variables and decision rules. The CTs are easy to interpret, and they stratify the risk of hospital mortality. The CTs should be taken into account for the classification of the prognosis of critically ill patients.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Near-infrared spectroscopy (NIRS) was used to analyse the crude protein content of dried and milled samples of wheat and to discriminate samples according to their stage of growth. A calibration set of 72 samples from three growth stages of wheat (tillering, heading and harvest) and a validation set of 28 samples was collected for this purpose. Principal components analysis (PCA) of the calibration set discriminated groups of samples according to the growth stage of the wheat. Based on these differences, a classification procedure (SIMCA) showed a very accurate classification of the validation set samples : all of them were successfully classified in each group using this procedure when both the residual and the leverage were used in the classification criteria. Looking only at the residuals all the samples were also correctly classified except one of tillering stage that was assigned to both tillering and heading stages. Finally, the determination of the crude protein content of these samples was considered in two ways: building up a global model for all the growth stages, and building up local models for each stage, separately. The best prediction results for crude protein were obtained using a global model for samples in the two first growth stages (tillering and heading), and using a local model for the harvest stage samples.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many classification systems rely on clustering techniques in which a collection of training examples is provided as an input, and a number of clusters c1,...cm modelling some concept C results as an output, such that every cluster ci is labelled as positive or negative. Given a new, unlabelled instance enew, the above classification is used to determine to which particular cluster ci this new instance belongs. In such a setting clusters can overlap, and a new unlabelled instance can be assigned to more than one cluster with conflicting labels. In the literature, such a case is usually solved non-deterministically by making a random choice. This paper presents a novel, hybrid approach to solve this situation by combining a neural network for classification along with a defeasible argumentation framework which models preference criteria for performing clustering.