925 resultados para Bag-of-marbles
Resumo:
We introduce a type of 2-tier convolutional neural network model for learning distributed paragraph representations for a special task (e.g. paragraph or short document level sentiment analysis and text topic categorization). We decompose the paragraph semantics into 3 cascaded constitutes: word representation, sentence composition and document composition. Specifically, we learn distributed word representations by a continuous bag-of-words model from a large unstructured text corpus. Then, using these word representations as pre-trained vectors, distributed task specific sentence representations are learned from a sentence level corpus with task-specific labels by the first tier of our model. Using these sentence representations as distributed paragraph representation vectors, distributed paragraph representations are learned from a paragraph-level corpus by the second tier of our model. It is evaluated on DBpedia ontology classification dataset and Amazon review dataset. Empirical results show the effectiveness of our proposed learning model for generating distributed paragraph representations.
Resumo:
In recent years, learning word vector representations has attracted much interest in Natural Language Processing. Word representations or embeddings learned using unsupervised methods help addressing the problem of traditional bag-of-word approaches which fail to capture contextual semantics. In this paper we go beyond the vector representations at the word level and propose a novel framework that learns higher-level feature representations of n-grams, phrases and sentences using a deep neural network built from stacked Convolutional Restricted Boltzmann Machines (CRBMs). These representations have been shown to map syntactically and semantically related n-grams to closeby locations in the hidden feature space. We have experimented to additionally incorporate these higher-level features into supervised classifier training for two sentiment analysis tasks: subjectivity classification and sentiment classification. Our results have demonstrated the success of our proposed framework with 4% improvement in accuracy observed for subjectivity classification and improved the results achieved for sentiment classification over models trained without our higher level features.
Resumo:
One of the ultimate aims of Natural Language Processing is to automate the analysis of the meaning of text. A fundamental step in that direction consists in enabling effective ways to automatically link textual references to their referents, that is, real world objects. The work presented in this paper addresses the problem of attributing a sense to proper names in a given text, i.e., automatically associating words representing Named Entities with their referents. The method for Named Entity Disambiguation proposed here is based on the concept of semantic relatedness, which in this work is obtained via a graph-based model over Wikipedia. We show that, without building the traditional bag of words representation of the text, but instead only considering named entities within the text, the proposed method achieves results competitive with the state-of-the-art on two different datasets.
Resumo:
In this paper, we consider the task of recognizing epigraphs in images such as photos taken using mobile devices. Given a set of 17,155 photos related to 14,560 epigraphs, we used a k-NearestNeighbor approach in order to perform the recognition. The contribution of this work is in evaluating state-of-the-art visual object recognition techniques in this specific context. The experimental results conducted show that Vector of Locally Aggregated Descriptors obtained aggregating SIFT descriptors is the best choice for this task.
Resumo:
A class of multi-process models is developed for collections of time indexed count data. Autocorrelation in counts is achieved with dynamic models for the natural parameter of the binomial distribution. In addition to modeling binomial time series, the framework includes dynamic models for multinomial and Poisson time series. Markov chain Monte Carlo (MCMC) and Po ́lya-Gamma data augmentation (Polson et al., 2013) are critical for fitting multi-process models of counts. To facilitate computation when the counts are high, a Gaussian approximation to the P ́olya- Gamma random variable is developed.
Three applied analyses are presented to explore the utility and versatility of the framework. The first analysis develops a model for complex dynamic behavior of themes in collections of text documents. Documents are modeled as a “bag of words”, and the multinomial distribution is used to characterize uncertainty in the vocabulary terms appearing in each document. State-space models for the natural parameters of the multinomial distribution induce autocorrelation in themes and their proportional representation in the corpus over time.
The second analysis develops a dynamic mixed membership model for Poisson counts. The model is applied to a collection of time series which record neuron level firing patterns in rhesus monkeys. The monkey is exposed to two sounds simultaneously, and Gaussian processes are used to smoothly model the time-varying rate at which the neuron’s firing pattern fluctuates between features associated with each sound in isolation.
The third analysis presents a switching dynamic generalized linear model for the time-varying home run totals of professional baseball players. The model endows each player with an age specific latent natural ability class and a performance enhancing drug (PED) use indicator. As players age, they randomly transition through a sequence of ability classes in a manner consistent with traditional aging patterns. When the performance of the player significantly deviates from the expected aging pattern, he is identified as a player whose performance is consistent with PED use.
All three models provide a mechanism for sharing information across related series locally in time. The models are fit with variations on the P ́olya-Gamma Gibbs sampler, MCMC convergence diagnostics are developed, and reproducible inference is emphasized throughout the dissertation.
Resumo:
This paper addresses the problem of colorectal tumour segmentation in complex real world imagery. For efficient segmentation, a multi-scale strategy is developed for extracting the potentially cancerous region of interest (ROI) based on colour histograms while searching for the best texture resolution. To achieve better segmentation accuracy, we apply a novel bag-of-visual-words method based on rotation invariant raw statistical features and random projection based l2-norm sparse representation to classify tumour areas in histopathology images. Experimental results on 20 real world digital slides demonstrate that the proposed algorithm results in better recognition accuracy than several state of the art segmentation techniques.
Resumo:
In clinical documents, medical terms are often expressed in multi-word phrases. Traditional topic modelling approaches relying on the “bag-of-words” assumption are not effective in extracting topic themes from clinical documents. This paper proposes to first extract medical phrases using an off-the-shelf tool for medical concept mention extraction, and then train a topic model which takes a hierarchy of Pitman-Yor processes as prior for modelling the generation of phrases of arbitrary length. Experimental results on patients’ discharge summaries show that the proposed approach outperforms the state-of-the-art topical phrase extraction model on both perplexity and topic coherence measure and finds more interpretable topics.
Resumo:
The purpose of this work in progress study was to test the concept of recognising plants using images acquired by image sensors in a controlled noise-free environment. The presence of vegetation on railway trackbeds and embankments presents potential problems. Woody plants (e.g. Scots pine, Norway spruce and birch) often establish themselves on railway trackbeds. This may cause problems because legal herbicides are not effective in controlling them; this is particularly the case for conifers. Thus, if maintenance administrators knew the spatial position of plants along the railway system, it may be feasible to mechanically harvest them. Primary data were collected outdoors comprising around 700 leaves and conifer seedlings from 11 species. These were then photographed in a laboratory environment. In order to classify the species in the acquired image set, a machine learning approach known as Bag-of-Features (BoF) was chosen. Irrespective of the chosen type of feature extraction and classifier, the ability to classify a previously unseen plant correctly was greater than 85%. The maintenance planning of vegetation control could be improved if plants were recognised and localised. It may be feasible to mechanically harvest them (in particular, woody plants). In addition, listed endangered species growing on the trackbeds can be avoided. Both cases are likely to reduce the amount of herbicides, which often is in the interest of public opinion. Bearing in mind that natural objects like plants are often more heterogeneous within their own class rather than outside it, the results do indeed present a stable classification performance, which is a sound prerequisite in order to later take the next step to include a natural background. Where relevant, species can also be listed under the Endangered Species Act.
Resumo:
This paper presents a study made in a field poorly explored in the Portuguese language – modality and its automatic tagging. Our main goal was to find a set of attributes for the creation of automatic tag- gers with improved performance over the bag-of-words (bow) approach. The performance was measured using precision, recall and F1. Because it is a relatively unexplored field, the study covers the creation of the corpus (composed by eleven verbs), the use of a parser to extract syntac- tic and semantic information from the sentences and a machine learning approach to identify modality values. Based on three different sets of attributes – from trigger itself and the trigger’s path (from the parse tree) and context – the system creates a tagger for each verb achiev- ing (in almost every verb) an improvement in F1 when compared to the traditional bow approach.
Resumo:
This paper describes various experiments done to investigate author profiling of tweets in 4 different languages – English, Dutch, Italian, and Spanish. Profiling consists of age and gender classification, as well as regression on 5 different person- ality dimensions – extroversion, stability, agreeableness, open- ness, and conscientiousness. Different sets of features were tested – bag-of-words, word ngrams, POS ngrams, and average of word embeddings. SVM was used as the classifier. Tfidf worked best for most English tasks while for most of the tasks from the other languages, the combination of the best features worked better.
Resumo:
Bag sampling techniques can be used to temporarily store an aerosol and therefore provide sufficient time to utilize sensitive but slow instrumental techniques for recording detailed particle size distributions. Laboratory based assessment of the method were conducted to examine size dependant deposition loss coefficients for aerosols held in VelostatTM bags conforming to a horizontal cylindrical geometry. Deposition losses of NaCl particles in the range of 10 nm to 160 nm were analysed in relation to the bag size, storage time, and sampling flow rate. Results of this study suggest that the bag sampling method is most useful for moderately short sampling periods of about 5 minutes.
Resumo:
Vacuum cleaners can release large concentrations of particles, both in their exhaust air and from resuspension of settled dust. However, the size, variability and microbial diversity of these emissions are unknown, despite evidence to suggest they may contribute to allergic responses and infection transmission indoors. This study aimed to evaluate bioaerosol emission from various vacuum cleaners. We sampled the air in an experimental flow tunnel where vacuum cleaners were run and their airborne emissions sampled with closed-face cassettes. Dust samples were also 35 collected from the dust bag. Total bacteria, total archaea, Penicillium/Aspergillus and total Clostridium cluster 1 were quantified with specific qPCR protocols and emission rates were calculated. Clostridium botulinum, as well as antibiotic resistance genes were detected in each sample using endpoint PCR. Bacterial diversity was also analyzed using denaturing gel electrophoresis (DGGE), image analysis and band sequencing. We demonstrated that emission of bacteria and moulds (Pen/Asp) can reach values as high as 1E05/min and that those emissions are not related to each other. The bag dust bacterial and mould content was also consistently across the vacuums we assessed, reaching up to 1E07 bacteria or moulds equivalent/g. Antibiotic resistance genes were detected in several samples. No archaea or C. botulinum were detected in any air samples. Diversity analyses showed that most bacteria are from human sources, in keeping with other recent results. These results highlight the potential capability of vacuum cleaners to disseminate appreciable quantities of moulds and human-associated bacteria indoors and their role as a source of exposure to bioaerosols.
Resumo:
Controlled actuation of soft objects with functional surfaces in aqueous environments presents opportunities for liquid phase electronics, novel assembled super-structures and unusual mechanical properties. We show the extraordinary electrochemically induced actuation of liquid metal droplets coated with nanoparticles, so-called “liquid metal marbles”. We demonstrate that nanoparticle coatings of these marbles offer an extra dimension for affecting the bipolar electrochemically induced actuation. The nanoparticles can readily migrate along the surface of liquid metals, upon the application of electric fields, altering the capacitive behaviour and surface tension in a highly asymmetric fashion. Surprising actuation behaviours are observed illustrating that nanoparticle coatings can have a strong effect on the movement of these marbles. This significant novel phenomenon, combined with unique properties of liquid metal marbles, represents an exciting platform for enabling diverse applications that cannot be achieved using rigid metal beads.
Resumo:
Three indoor, sheeted bag-stack fumigations of paddy rice using aluminium phosphide were undertaken in Guangdong Province, southern China. We measured the effect of two types of sheeting (polyvinylchloride [PVC] or polyethylene [PE]) and two types of floor sealing (clips or fixing into a slot with a rubber pipe) on phosphine concentration and retention. The aim was to test the feasibility of retaining fumigant at a sufficient concentration for long enough to control known resistant insect pests. Each stack was pressure tested and phosphine concentrations measured daily during the fumigation. Cages of test insects in culture medium, including resistant and susceptible strains, were placed inside each stack and could be observed through the clear sheeting. Highest concentrations for the longest period were obtained in a PVC-covered stack that included a ground sheet and wall sheets sealed to the floor with rubber pipes. A similar PVC-covered stack sealed to the floor with clips instead of pipe did not retain gas as efficiently and required re-dosing. A PE-covered stack, with no ground sheet but also with wall sheets sealed to the floor with pipe, produced an acceptable fumigation. Susceptible Rhyzopertha dominica were controlled in 2 days and the most resistant strain in 15 days. Resistant Cryptolestes ferrugineus survived until day 21. The paddy was still free of insect infestation 7 months later when the bag-stack was opened to mill the rice. Pressure half-lives correlated with gas concentration and retention. Sorption appeared to be a major limiting factor, reducing potential fumigant dosage by about 50%. The trials demonstrated the feasibility of sealing bag-stacks to a standard high enough to control all known resistant strains.
Resumo:
Digital Image