921 resultados para Document classification,Naive Bayes classifier,Verb-object pairs


Relevância:

30.00% 30.00%

Publicador:

Resumo:

With recent advances in remote sensing processing technology, it has become more feasible to begin analysis of the enormous historic archive of remotely sensed data. This historical data provides valuable information on a wide variety of topics which can influence the lives of millions of people if processed correctly and in a timely manner. One such field of benefit is that of landslide mapping and inventory. This data provides a historical reference to those who live near high risk areas so future disasters may be avoided. In order to properly map landslides remotely, an optimum method must first be determined. Historically, mapping has been attempted using pixel based methods such as unsupervised and supervised classification. These methods are limited by their ability to only characterize an image spectrally based on single pixel values. This creates a result prone to false positives and often without meaningful objects created. Recently, several reliable methods of Object Oriented Analysis (OOA) have been developed which utilize a full range of spectral, spatial, textural, and contextual parameters to delineate regions of interest. A comparison of these two methods on a historical dataset of the landslide affected city of San Juan La Laguna, Guatemala has proven the benefits of OOA methods over those of unsupervised classification. Overall accuracies of 96.5% and 94.3% and F-score of 84.3% and 77.9% were achieved for OOA and unsupervised classification methods respectively. The greater difference in F-score is a result of the low precision values of unsupervised classification caused by poor false positive removal, the greatest shortcoming of this method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The electrocardiogram (ECG) signal has been widely used to study the physiological substrates of emotion. However, searching for better filtering techniques in order to obtain a signal with better quality and with the maximum relevant information remains an important issue for researchers in this field. Signal processing is largely performed for ECG analysis and interpretation, but this process can be susceptible to error in the delineation phase. In addition, it can lead to the loss of important information that is usually considered as noise and, consequently, discarded from the analysis. The goal of this study was to evaluate if the ECG noise allows for the classification of emotions, while using its entropy as an input in a decision tree classifier. We collected the ECG signal from 25 healthy participants while they were presented with videos eliciting negative (fear and disgust) and neutral emotions. The results indicated that the neutral condition showed a perfect identification (100%), whereas the classification of negative emotions indicated good identification performances (60% of sensitivity and 80% of specificity). These results suggest that the entropy of noise contains relevant information that can be useful to improve the analysis of the physiological correlates of emotion.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work is a description of Tajio, a Western Malayo-Polynesian language spoken in Central Sulawesi, Indonesia. It covers the essential aspects of Tajio grammar without being exhaustive. Tajio has a medium sized phoneme inventory consisting of twenty consonants and five vowels. The language does not have lexical (word) stress; rather, it has a phrasal accent. This phrasal accent regularly occurs on the penultimate syllable of an intonational phrase, rendering this syllable auditorily prominent through a pitch rise. Possible syllable structures in Tajio are (C)V(C). CVN structures are allowed as closed syllables, but CVN syllables in word-medial position are not frequent. As in other languages in the area, the only sequence of consonants allowed in native Tajio words are sequences of nasals followed by a homorganic obstruent. The homorganic nasal-obstruent sequences found in Tajio can occur word-initially and word-medially but never in word-final position. As in many Austronesian languages, word class classification in Tajio is not straightforward. The classification of words in Tajio must be carried out on two levels: the morphosyntactic level and the lexical level. The open word classes in Tajio consist of nouns and verbs. Verbs are further divided into intransitive verbs (dynamic intransitive verbs and statives) and dynamic transitive verbs. Based on their morphological potential, lexical roots in Tajio fall into three classes: single-class roots, dual-class roots and multi-class roots. There are two basic transitive constructions in Tajio: Actor Voice and Undergoer Voice, where the actor or undergoer argument respectively serves as subjects. It shares many characteristics with symmetrical voice languages, yet it is not fully symmetric, as arguments in AV and UV are not equally marked. Neither subjects nor objects are marked in AV constructions. In UV constructions, however, subjects are unmarked while objects are marked either by prefixation or clitization. Evidence from relativization, control and raising constructions supports the analysis that AV and UV are in fact transitive, with subject arguments and object arguments behaving alike in both voices. Only the subject can be relativized, controlled, raised or function as the implicit subject of subjectless adverbial clauses. In contrast, the objects of AV and UV constructions do not exhibit these features. Tajio is a predominantly head-marking language with basic A-V-O constituent order. V and O form a constituent, and the subject can either precede or follow this complex. Thus, basic word order is S-V-O or V-O-S. Subject, as well as non-subject arguments, may be omitted when contextually specified. Verbs are marked for voice and mood, the latter of which is is obligatory. The two values distinguished are realis and non-realis. Depending on the type of predicate involved in clause formation, three clause types can be distinguished: verbal clauses, existential clauses and non-verbal clauses. Tajio has a small number of multi-verbal structures that appear to qualify as serial verb constructions. SVCs in Tajio always include a motion verb or a directional.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This CPM project focuses on the document approval process that the Division of State Human Resources consulting team utilizes as it relates to classification and compensation requests, e.g. job reclassifications, PD update requests, and salary requests. The ultimate goal is to become more efficient by utilizing electronic signatures and electronic form filling to streamline the current process of document approvals.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As the universe of knowledge and subjects change over time, indexing languages like classification schemes, accommodate that change by restructuring. Restructuring indexing languages affects indexer and cataloguer work. Subjects may split or lump together. They may disappear only to reappear later. And new subjects may emerge that were assumed to be already present, but not clearly articulated (Miksa, 1998). In this context we have the complex relationship between the indexing language, the text being described, and the already described collection (Tennis, 2007). It is possible to imagine indexers placing a document into an outdated class, because it is the one they have already used for their collection. However, doing this erases the semantics in the present indexing language. Given this range of choice in the context of indexing language change, the question arises, what does this look like in practice? How often does this occur? Further, what does this phenomenon tell us about subjects in indexing languages? Does the practice we observe in the reaction to indexing language change provide us evidence of conceptual models of subjects and subject creation? If it is incomplete, but gets us close, what evidence do we still require?

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Résumé : La texture dispose d’un bon potentiel discriminant qui complète celui des paramètres radiométriques dans le processus de classification d’image. L’indice Compact Texture Unit (CTU) multibande, récemment mis au point par Safia et He (2014), permet d’extraire la texture sur plusieurs bandes à la fois, donc de tirer parti d’un surcroît d’informations ignorées jusqu’ici dans les analyses texturales traditionnelles : l’interdépendance entre les bandes. Toutefois, ce nouvel outil n’a pas encore été testé sur des images multisources, usage qui peut se révéler d’un grand intérêt quand on considère par exemple toute la richesse texturale que le radar peut apporter en supplément à l’optique, par combinaison de données. Cette étude permet donc de compléter la validation initiée par Safia (2014) en appliquant le CTU sur un couple d’images optique-radar. L’analyse texturale de ce jeu de données a permis de générer une image en « texture couleur ». Ces bandes texturales créées sont à nouveau combinées avec les bandes initiales de l’optique, avant d’être intégrées dans un processus de classification de l’occupation du sol sous eCognition. Le même procédé de classification (mais sans CTU) est appliqué respectivement sur : la donnée Optique, puis le Radar, et enfin la combinaison Optique-Radar. Par ailleurs le CTU généré sur l’Optique uniquement (monosource) est comparé à celui dérivant du couple Optique-Radar (multisources). L’analyse du pouvoir séparateur de ces différentes bandes à partir d’histogrammes, ainsi que l’outil matrice de confusion, permet de confronter la performance de ces différents cas de figure et paramètres utilisés. Ces éléments de comparaison présentent le CTU, et notamment le CTU multisources, comme le critère le plus discriminant ; sa présence rajoute de la variabilité dans l’image permettant ainsi une segmentation plus nette, une classification à la fois plus détaillée et plus performante. En effet, la précision passe de 0.5 avec l’image Optique à 0.74 pour l’image CTU, alors que la confusion diminue en passant de 0.30 (dans l’Optique) à 0.02 (dans le CTU).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conventional web search engines are centralised in that a single entity crawls and indexes the documents selected for future retrieval, and the relevance models used to determine which documents are relevant to a given user query. As a result, these search engines suffer from several technical drawbacks such as handling scale, timeliness and reliability, in addition to ethical concerns such as commercial manipulation and information censorship. Alleviating the need to rely entirely on a single entity, Peer-to-Peer (P2P) Information Retrieval (IR) has been proposed as a solution, as it distributes the functional components of a web search engine – from crawling and indexing documents, to query processing – across the network of users (or, peers) who use the search engine. This strategy for constructing an IR system poses several efficiency and effectiveness challenges which have been identified in past work. Accordingly, this thesis makes several contributions towards advancing the state of the art in P2P-IR effectiveness by improving the query processing and relevance scoring aspects of a P2P web search. Federated search systems are a form of distributed information retrieval model that route the user’s information need, formulated as a query, to distributed resources and merge the retrieved result lists into a final list. P2P-IR networks are one form of federated search in routing queries and merging result among participating peers. The query is propagated through disseminated nodes to hit the peers that are most likely to contain relevant documents, then the retrieved result lists are merged at different points along the path from the relevant peers to the query initializer (or namely, customer). However, query routing in P2P-IR networks is considered as one of the major challenges and critical part in P2P-IR networks; as the relevant peers might be lost in low-quality peer selection while executing the query routing, and inevitably lead to less effective retrieval results. This motivates this thesis to study and propose query routing techniques to improve retrieval quality in such networks. Cluster-based semi-structured P2P-IR networks exploit the cluster hypothesis to organise the peers into similar semantic clusters where each such semantic cluster is managed by super-peers. In this thesis, I construct three semi-structured P2P-IR models and examine their retrieval effectiveness. I also leverage the cluster centroids at the super-peer level as content representations gathered from cooperative peers to propose a query routing approach called Inverted PeerCluster Index (IPI) that simulates the conventional inverted index of the centralised corpus to organise the statistics of peers’ terms. The results show a competitive retrieval quality in comparison to baseline approaches. Furthermore, I study the applicability of using the conventional Information Retrieval models as peer selection approaches where each peer can be considered as a big document of documents. The experimental evaluation shows comparative and significant results and explains that document retrieval methods are very effective for peer selection that brings back the analogy between documents and peers. Additionally, Learning to Rank (LtR) algorithms are exploited to build a learned classifier for peer ranking at the super-peer level. The experiments show significant results with state-of-the-art resource selection methods and competitive results to corresponding classification-based approaches. Finally, I propose reputation-based query routing approaches that exploit the idea of providing feedback on a specific item in the social community networks and manage it for future decision-making. The system monitors users’ behaviours when they click or download documents from the final ranked list as implicit feedback and mines the given information to build a reputation-based data structure. The data structure is used to score peers and then rank them for query routing. I conduct a set of experiments to cover various scenarios including noisy feedback information (i.e, providing positive feedback on non-relevant documents) to examine the robustness of reputation-based approaches. The empirical evaluation shows significant results in almost all measurement metrics with approximate improvement more than 56% compared to baseline approaches. Thus, based on the results, if one were to choose one technique, reputation-based approaches are clearly the natural choices which also can be deployed on any P2P network.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents our approach of identifying the profile of an unknown user based on the activities of known users. The aim of author profiling task of PAN@CLEF 2016 is cross-genre identification of the gender and age of an unknown user. This means training the system using the behavior of different users from one social media platform and identifying the profile of other user on some different platform. Instead of using single classifier to build the system we used a combination of different classifiers, also known as stacking. This approach allowed us explore the strength of all the classifiers and minimize the bias or error enforced by a single classifier.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this article, we describe a novel methodology to extract semantic characteristics from protein structures using linear algebra in order to compose structural signature vectors which may be used efficiently to compare and classify protein structures into fold families. These signatures are built from the pattern of hydrophobic intrachain interactions using Singular Value Decomposition (SVD) and Latent Semantic Indexing (LSI) techniques. Considering proteins as documents and contacts as terms, we have built a retrieval system which is able to find conserved contacts in samples of myoglobin fold family and to retrieve these proteins among proteins of varied folds with precision of up to 80%. The classifier is a web tool available at our laboratory website. Users can search for similar chains from a specific PDB, view and compare their contact maps and browse their structures using a JMol plug-in.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In these last years a great effort has been put in the development of new techniques for automatic object classification, also due to the consequences in many applications such as medical imaging or driverless cars. To this end, several mathematical models have been developed from logistic regression to neural networks. A crucial aspect of these so called classification algorithms is the use of algebraic tools to represent and approximate the input data. In this thesis, we examine two different models for image classification based on a particular tensor decomposition named Tensor-Train (TT) decomposition. The use of tensor approaches preserves the multidimensional structure of the data and the neighboring relations among pixels. Furthermore the Tensor-Train, differently from other tensor decompositions, does not suffer from the curse of dimensionality making it an extremely powerful strategy when dealing with high-dimensional data. It also allows data compression when combined with truncation strategies that reduce memory requirements without spoiling classification performance. The first model we propose is based on a direct decomposition of the database by means of the TT decomposition to find basis vectors used to classify a new object. The second model is a tensor dictionary learning model, based on the TT decomposition where the terms of the decomposition are estimated using a proximal alternating linearized minimization algorithm with a spectral stepsize.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The dissertation addresses the still not solved challenges concerned with the source-based digital 3D reconstruction, visualisation and documentation in the domain of archaeology, art and architecture history. The emerging BIM methodology and the exchange data format IFC are changing the way of collaboration, visualisation and documentation in the planning, construction and facility management process. The introduction and development of the Semantic Web (Web 3.0), spreading the idea of structured, formalised and linked data, offers semantically enriched human- and machine-readable data. In contrast to civil engineering and cultural heritage, academic object-oriented disciplines, like archaeology, art and architecture history, are acting as outside spectators. Since the 1990s, it has been argued that a 3D model is not likely to be considered a scientific reconstruction unless it is grounded on accurate documentation and visualisation. However, these standards are still missing and the validation of the outcomes is not fulfilled. Meanwhile, the digital research data remain ephemeral and continue to fill the growing digital cemeteries. This study focuses, therefore, on the evaluation of the source-based digital 3D reconstructions and, especially, on uncertainty assessment in the case of hypothetical reconstructions of destroyed or never built artefacts according to scientific principles, making the models shareable and reusable by a potentially wide audience. The work initially focuses on terminology and on the definition of a workflow especially related to the classification and visualisation of uncertainty. The workflow is then applied to specific cases of 3D models uploaded to the DFG repository of the AI Mainz. In this way, the available methods of documenting, visualising and communicating uncertainty are analysed. In the end, this process will lead to a validation or a correction of the workflow and the initial assumptions, but also (dealing with different hypotheses) to a better definition of the levels of uncertainty.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hand gesture recognition based on surface electromyography (sEMG) signals is a promising approach for the development of intuitive human-machine interfaces (HMIs) in domains such as robotics and prosthetics. The sEMG signal arises from the muscles' electrical activity, and can thus be used to recognize hand gestures. The decoding from sEMG signals to actual control signals is non-trivial; typically, control systems map sEMG patterns into a set of gestures using machine learning, failing to incorporate any physiological insight. This master thesis aims at developing a bio-inspired hand gesture recognition system based on neuromuscular spike extraction rather than on simple pattern recognition. The system relies on a decomposition algorithm based on independent component analysis (ICA) that decomposes the sEMG signal into its constituent motor unit spike trains, which are then forwarded to a machine learning classifier. Since ICA does not guarantee a consistent motor unit ordering across different sessions, 3 approaches are proposed: 2 ordering criteria based on firing rate and negative entropy, and a re-calibration approach that allows the decomposition model to retain information about previous sessions. Using a multilayer perceptron (MLP), the latter approach results in an accuracy up to 99.4% in a 1-subject, 1-degree of freedom scenario. Afterwards, the decomposition and classification pipeline for inference is parallelized and profiled on the PULP platform, achieving a latency < 50 ms and an energy consumption < 1 mJ. Both the classification models tested (a support vector machine and a lightweight MLP) yielded an accuracy > 92% in a 1-subject, 5-classes (4 gestures and rest) scenario. These results prove that the proposed system is suitable for real-time execution on embedded platforms and also capable of matching the accuracy of state-of-the-art approaches, while also giving some physiological insight on the neuromuscular spikes underlying the sEMG.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Subaxial Injury Classification (SLIC) system and severity score has been developed to help surgeons in the decision-making process of treatment of subaxial cervical spine injuries. A detailed description of all potential scored injures of the SLIC is lacking. We performed a systematic review in the PubMed database from 2007 to 2014 to describe the relationship between the scored injuries in the SLIC and their eventual treatment according to the system score. Patients with an SLIC of 1-3 points (conservative treatment) are neurologically intact with the spinous process, laminar or small facet fractures. Patients with compression and burst fractures who are neurologically intact are also treated nonsurgically. Patients with an SLIC of 4 points may have an incomplete spinal cord injury such as a central cord syndrome, compression injuries with incomplete neurologic deficits and burst fractures with complete neurologic deficits. SLIC of 5-10 points includes distraction and rotational injuries, traumatic disc herniation in the setting of a neurological deficit and burst fractures with an incomplete neurologic deficit. The SLIC injury severity score can help surgeons guide fracture treatment. Knowledge of the potential scored injures and their relationships with the SLIC are of paramount importance for spine surgeons who treated subaxial cervical spine injuries.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

to assess the construct validity and reliability of the Pediatric Patient Classification Instrument. correlation study developed at a teaching hospital. The classification involved 227 patients, using the pediatric patient classification instrument. The construct validity was assessed through the factor analysis approach and reliability through internal consistency. the Exploratory Factor Analysis identified three constructs with 67.5% of variance explanation and, in the reliability assessment, the following Cronbach's alpha coefficients were found: 0.92 for the instrument as a whole; 0.88 for the Patient domain; 0.81 for the Family domain; 0.44 for the Therapeutic procedures domain. the instrument evidenced its construct validity and reliability, and these analyses indicate the feasibility of the instrument. The validation of the Pediatric Patient Classification Instrument still represents a challenge, due to its relevance for a closer look at pediatric nursing care and management. Further research should be considered to explore its dimensionality and content validity.