982 resultados para Extraction de structure


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In many classification problems, it is necessary to consider the specific location of an n-dimensional space from which features have been calculated. For example, considering the location of features extracted from specific areas of a two-dimensional space, as an image, could improve the understanding of a scene for a video surveillance system. In the same way, the same features extracted from different locations could mean different actions for a 3D HCI system. In this paper, we present a self-organizing feature map able to preserve the topology of locations of an n-dimensional space in which the vector of features have been extracted. The main contribution is to implicitly preserving the topology of the original space because considering the locations of the extracted features and their topology could ease the solution to certain problems. Specifically, the paper proposes the n-dimensional constrained self-organizing map preserving the input topology (nD-SOM-PINT). Features in adjacent areas of the n-dimensional space, used to extract the feature vectors, are explicitly in adjacent areas of the nD-SOM-PINT constraining the neural network structure and learning. As a study case, the neural network has been instantiate to represent and classify features as trajectories extracted from a sequence of images into a high level of semantic understanding. Experiments have been thoroughly carried out using the CAVIAR datasets (Corridor, Frontal and Inria) taken into account the global behaviour of an individual in order to validate the ability to preserve the topology of the two-dimensional space to obtain high-performance classification for trajectory classification in contrast of non-considering the location of features. Moreover, a brief example has been included to focus on validate the nD-SOM-PINT proposal in other domain than the individual trajectory. Results confirm the high accuracy of the nD-SOM-PINT outperforming previous methods aimed to classify the same datasets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We sampled leaves from 678 individuals in 21 natural populations (30-36 individuals per population), covering the entire distribution of Euptelea pleiospermum in China.Total DNA was isolated from about 50 mg powdered leaf tissue following the protocol of a DNA extraction kit (Tiangen Biotech Co., LTD., Beijing, China). We used seven fluorescence-labeled microsatellite loci (EP036, EP059, EP081, EP087, EP091, EP278 and EP294; Zhang et al., 2008) to genotype our 678 DNA samples.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Scorpion toxins are important experimental tools for characterization of vast array of ion channels and serve as scaffolds for drug design. General public database entries contain limited annotation whereby rich structure-function information from mutation studies is typically not available. SCORPION2 contains more than 800 records of native and mutant toxin sequences enriched with binding affinity and toxicity information, 624 three-dimensional structures and some 500 references. SCORPION2 has a set of search and prediction tools that allow users to extract and perform specific queries: text searches of scorpion toxin records, sequence similarity search, extraction of sequences, visualization of scorpion toxin structures, analysis of toxic activity, and functional annotation of previously uncharacterized scorpion toxins. The SCORPION2 database is available at http://sdmc.i2r.a-star.edu.sg/scorpion/. (c) 2006 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Government agencies responsible for riparian environments are assessing the combined utility of field survey and remote sensing for mapping and monitoring indicators of riparian zone health. The objective of this work was to determine if the structural attributes of savanna riparian zones in northern Australia can be detected from commercially available remotely sensed image data. Two QuickBird images and coincident field data covering sections of the Daly River and the South Alligator River - Barramundie Creek in the Northern Territory were used. Semi-variograms were calculated to determine the characteristic spatial scales of riparian zone features, both vegetative and landform. Interpretation of semi-variograms showed that structural dimensions of riparian environments could be detected and estimated from the QuickBird image data. The results also show that selecting the correct spatial resolution and spectral bands is essential to maximize the accuracy of mapping spatial characteristics of savanna riparian features. The distribution of foliage projective cover of riparian vegetation affected spectral reflectance variations in individual spectral bands differently. Pan-sharpened image data enabled small-scale information extraction (< 6 m) on riparian zone structural parameters. The semi-variogram analysis results provide the basis for an inversion approach using high spatial resolution satellite image data to map indicators of savanna riparian zone health.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: Biomedical events extraction concerns about events describing changes on the state of bio-molecules from literature. Comparing to the protein-protein interactions (PPIs) extraction task which often only involves the extraction of binary relations between two proteins, biomedical events extraction is much harder since it needs to deal with complex events consisting of embedded or hierarchical relations among proteins, events, and their textual triggers. In this paper, we propose an information extraction system based on the hidden vector state (HVS) model, called HVS-BioEvent, for biomedical events extraction, and investigate its capability in extracting complex events. Methods and material: HVS has been previously employed for extracting PPIs. In HVS-BioEvent, we propose an automated way to generate abstract annotations for HVS training and further propose novel machine learning approaches for event trigger words identification, and for biomedical events extraction from the HVS parse results. Results: Our proposed system achieves an F-score of 49.57% on the corpus used in the BioNLP'09 shared task, which is only 2.38% lower than the best performing system by UTurku in the BioNLP'09 shared task. Nevertheless, HVS-BioEvent outperforms UTurku's system on complex events extraction with 36.57% vs. 30.52% being achieved for extracting regulation events, and 40.61% vs. 38.99% for negative regulation events. Conclusions: The results suggest that the HVS model with the hierarchical hidden state structure is indeed more suitable for complex event extraction since it could naturally model embedded structural context in sentences.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dimensionality reduction is a very important step in the data mining process. In this paper, we consider feature extraction for classification tasks as a technique to overcome problems occurring because of “the curse of dimensionality”. Three different eigenvector-based feature extraction approaches are discussed and three different kinds of applications with respect to classification tasks are considered. The summary of obtained results concerning the accuracy of classification schemes is presented with the conclusion about the search for the most appropriate feature extraction method. The problem how to discover knowledge needed to integrate the feature extraction and classification processes is stated. A decision support system to aid in the integration of the feature extraction and classification processes is proposed. The goals and requirements set for the decision support system and its basic structure are defined. The means of knowledge acquisition needed to build up the proposed system are considered.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Internal quantum efficiency (IQE) of a high-brightness blue LED has been evaluated from the external quantum efficiency measured as a function of current at room temperature. Processing the data with a novel evaluation procedure based on the ABC-model, we have determined separately IQE of the LED structure and light extraction efficiency (LEE) of UX:3 chip. Full text Nowadays, understanding of LED efficiency behavior at high currents is quite critical to find ways for further improve­ment of III-nitride LED performance [1]. External quantum ef­ficiency ηe (EQE) provides integral information on the recom­bination and photon emission processes in LEDs. Meanwhile EQE is the product of IQE ηi and LEE ηext at negligible car­rier leakage from the active region. Separate determination of IQE and LEE would be much more helpful, providing correla­tion between these parameters and specific epi-structure and chip design. In this paper, we extend the approach of [2,3] to the whole range of the current/optical power variation, provid­ing an express tool for separate evaluation of IQE and LEE. We studied an InGaN-based LED fabricated by Osram OS. LED structure grown by MOCVD on sapphire substrate was processed as UX:3 chip and mounted into the Golden Dragon package without molding. EQE was measured with Labsphere CDS-600 spectrometer. Plotting EQE versus output power P and finding the power Pm corresponding to EQE maximum ηm enables comparing the measurements with the analytical rela­tionships ηi = Q/(Q+p1/2+p-1/2) ,p = P/Pm , and Q = B/(AC) 1/2 where A, Band C are recombination constants [4]. As a result, maximum IQE value equal to QI(Q+2) can be found from the ratio ηm/ηe plotted as a function of p1/2 +p1-1/2 (see Fig.la) and then LEE calculated as ηext = ηm (Q+2)/Q . Experimental EQE as a function of normalized optical power p is shown in Fig. 1 b along with the analytical approximation based on the ABC­model. The approximation fits perfectly the measurements in the range of the optical power (or operating current) variation by eight orders of magnitude. In conclusion, new express method for separate evaluation of IQE and LEE of III-nitride LEDs is suggested and applied to characterization of a high-brightness blue LED. With this method, we obtained LEE from the free chip surface to the air as 69.8% and IQE as 85.7% at the maximum and 65.2% at the operation current 350 rnA. [I] G. Verzellesi, D. Saguatti, M. Meneghini, F. Bertazzi, M. Goano, G. Meneghesso, and E. Zanoni, "Efficiency droop in InGaN/GaN blue light-emitting diodes: Physical mechanisms and remedies," 1. AppL Phys., vol. 114, no. 7, pp. 071101, Aug., 2013. [2] C. van Opdorp and G. W. 't Hooft, "Method for determining effective non radiative lifetime and leakage losses in double-heterostructure las­ers," 1. AppL Phys., vol. 52, no. 6, pp. 3827-3839, Feb., 1981. [3] M. Meneghini, N. Trivellin, G. Meneghesso, E. Zanoni, U. Zehnder, and B. Hahn, "A combined electro-optical method for the determination of the recombination parameters in InGaN-based light-emitting diodes," 1. AppL Phys., vol. 106, no. II, pp. 114508, Dec., 2009. [4] Qi Dai, Qifeng Shan, ling Wang, S. Chhajed, laehee Cho, E. F. Schubert, M. H. Crawford, D. D. Koleske, Min-Ho Kim, and Yongjo Park, "Carrier recombination mechanisms and efficiency droop in GalnN/GaN light-emitting diodes," App/. Phys. Leu., vol. 97, no. 13, pp. 133507, Sept., 2010. © 2014 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Computer Game Playing has been an active area of research since Samuel’s first Checkers player (Samuel 1959). Recently interest beyond the classic games of Chess and Checkers has led to competitions such as the General Game Playing competition, in which players have no beforehand knowledge of the games they are to play, and the Computer Poker Competition which force players to reason about imperfect information under conditions of uncertainty. The purpose of this dissertation is to explore the area of General Game Playing both specifically and generally. On the specific side, we describe the design and implementation of our General Game Playing system OGRE. This system includes an innovative method for feature extraction that helped it to achieve second and fourth place in two international General Game Playing competitions. On the more general side, we also introduce the Regular Game Language, which goes beyond current works to provide support for both stochastic and imperfect information games as well as the more traditional games.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Computer Game Playing has been an active area of research since Samuel’s first Checkers player (Samuel 1959). Recently interest beyond the classic games of Chess and Checkers has led to competitions such as the General Game Playing competition, in which players have no beforehand knowledge of the games they are to play, and the Computer Poker Competition which force players to reason about imperfect information under conditions of uncertainty. The purpose of this dissertation is to explore the area of General Game Playing both specifically and generally. On the specific side, we describe the design and implementation of our General Game Playing system OGRE. This system includes an innovative method for feature extraction that helped it to achieve second and fourth place in two international General Game Playing competitions. On the more general side, we also introduce the Regular Game Language, which goes beyond current works to provide support for both stochastic and imperfect information games as well as the more traditional games.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The full-scale base-isolated structure studied in this dissertation is the only base-isolated building in South Island of New Zealand. It sustained hundreds of earthquake ground motions from September 2010 and well into 2012. Several large earthquake responses were recorded in December 2011 by NEES@UCLA and by GeoNet recording station nearby Christchurch Women's Hospital. The primary focus of this dissertation is to advance the state-of-the art of the methods to evaluate performance of seismic-isolated structures and the effects of soil-structure interaction by developing new data processing methodologies to overcome current limitations and by implementing advanced numerical modeling in OpenSees for direct analysis of soil-structure interaction.

This dissertation presents a novel method for recovering force-displacement relations within the isolators of building structures with unknown nonlinearities from sparse seismic-response measurements of floor accelerations. The method requires only direct matrix calculations (factorizations and multiplications); no iterative trial-and-error methods are required. The method requires a mass matrix, or at least an estimate of the floor masses. A stiffness matrix may be used, but is not necessary. Essentially, the method operates on a matrix of incomplete measurements of floor accelerations. In the special case of complete floor measurements of systems with linear dynamics, real modes, and equal floor masses, the principal components of this matrix are the modal responses. In the more general case of partial measurements and nonlinear dynamics, the method extracts a number of linearly-dependent components from Hankel matrices of measured horizontal response accelerations, assembles these components row-wise and extracts principal components from the singular value decomposition of this large matrix of linearly-dependent components. These principal components are then interpolated between floors in a way that minimizes the curvature energy of the interpolation. This interpolation step can make use of a reduced-order stiffness matrix, a backward difference matrix or a central difference matrix. The measured and interpolated floor acceleration components at all floors are then assembled and multiplied by a mass matrix. The recovered in-service force-displacement relations are then incorporated into the OpenSees soil structure interaction model.

Numerical simulations of soil-structure interaction involving non-uniform soil behavior are conducted following the development of the complete soil-structure interaction model of Christchurch Women's Hospital in OpenSees. In these 2D OpenSees models, the superstructure is modeled as two-dimensional frames in short span and long span respectively. The lead rubber bearings are modeled as elastomeric bearing (Bouc Wen) elements. The soil underlying the concrete raft foundation is modeled with linear elastic plane strain quadrilateral element. The non-uniformity of the soil profile is incorporated by extraction and interpolation of shear wave velocity profile from the Canterbury Geotechnical Database. The validity of the complete two-dimensional soil-structure interaction OpenSees model for the hospital is checked by comparing the results of peak floor responses and force-displacement relations within the isolation system achieved from OpenSees simulations to the recorded measurements. General explanations and implications, supported by displacement drifts, floor acceleration and displacement responses, force-displacement relations are described to address the effects of soil-structure interaction.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Human activities represent a significant burden on the global water cycle, with large and increasing demands placed on limited water resources by manufacturing, energy production and domestic water use. In addition to changing the quantity of available water resources, human activities lead to changes in water quality by introducing a large and often poorly-characterized array of chemical pollutants, which may negatively impact biodiversity in aquatic ecosystems, leading to impairment of valuable ecosystem functions and services. Domestic and industrial wastewaters represent a significant source of pollution to the aquatic environment due to inadequate or incomplete removal of chemicals introduced into waters by human activities. Currently, incomplete chemical characterization of treated wastewaters limits comprehensive risk assessment of this ubiquitous impact to water. In particular, a significant fraction of the organic chemical composition of treated industrial and domestic wastewaters remains uncharacterized at the molecular level. Efforts aimed at reducing the impacts of water pollution on aquatic ecosystems critically require knowledge of the composition of wastewaters to develop interventions capable of protecting our precious natural water resources.

The goal of this dissertation was to develop a robust, extensible and high-throughput framework for the comprehensive characterization of organic micropollutants in wastewaters by high-resolution accurate-mass mass spectrometry. High-resolution mass spectrometry provides the most powerful analytical technique available for assessing the occurrence and fate of organic pollutants in the water cycle. However, significant limitations in data processing, analysis and interpretation have limited this technique in achieving comprehensive characterization of organic pollutants occurring in natural and built environments. My work aimed to address these challenges by development of automated workflows for the structural characterization of organic pollutants in wastewater and wastewater impacted environments by high-resolution mass spectrometry, and to apply these methods in combination with novel data handling routines to conduct detailed fate studies of wastewater-derived organic micropollutants in the aquatic environment.

In Chapter 2, chemoinformatic tools were implemented along with novel non-targeted mass spectrometric analytical methods to characterize, map, and explore an environmentally-relevant “chemical space” in municipal wastewater. This was accomplished by characterizing the molecular composition of known wastewater-derived organic pollutants and substances that are prioritized as potential wastewater contaminants, using these databases to evaluate the pollutant-likeness of structures postulated for unknown organic compounds that I detected in wastewater extracts using high-resolution mass spectrometry approaches. Results showed that application of multiple computational mass spectrometric tools to structural elucidation of unknown organic pollutants arising in wastewaters improved the efficiency and veracity of screening approaches based on high-resolution mass spectrometry. Furthermore, structural similarity searching was essential for prioritizing substances sharing structural features with known organic pollutants or industrial and consumer chemicals that could enter the environment through use or disposal.

I then applied this comprehensive methodological and computational non-targeted analysis workflow to micropollutant fate analysis in domestic wastewaters (Chapter 3), surface waters impacted by water reuse activities (Chapter 4) and effluents of wastewater treatment facilities receiving wastewater from oil and gas extraction activities (Chapter 5). In Chapter 3, I showed that application of chemometric tools aided in the prioritization of non-targeted compounds arising at various stages of conventional wastewater treatment by partitioning high dimensional data into rational chemical categories based on knowledge of organic chemical fate processes, resulting in the classification of organic micropollutants based on their occurrence and/or removal during treatment. Similarly, in Chapter 4, high-resolution sampling and broad-spectrum targeted and non-targeted chemical analysis were applied to assess the occurrence and fate of organic micropollutants in a water reuse application, wherein reclaimed wastewater was applied for irrigation of turf grass. Results showed that organic micropollutant composition of surface waters receiving runoff from wastewater irrigated areas appeared to be minimally impacted by wastewater-derived organic micropollutants. Finally, Chapter 5 presents results of the comprehensive organic chemical composition of oil and gas wastewaters treated for surface water discharge. Concurrent analysis of effluent samples by complementary, broad-spectrum analytical techniques, revealed that low-levels of hydrophobic organic contaminants, but elevated concentrations of polymeric surfactants, which may effect the fate and analysis of contaminants of concern in oil and gas wastewaters.

Taken together, my work represents significant progress in the characterization of polar organic chemical pollutants associated with wastewater-impacted environments by high-resolution mass spectrometry. Application of these comprehensive methods to examine micropollutant fate processes in wastewater treatment systems, water reuse environments, and water applications in oil/gas exploration yielded new insights into the factors that influence transport, transformation, and persistence of organic micropollutants in these systems across an unprecedented breadth of chemical space.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Les graines de lin sont des oléagineux largement cultivés au Canada. Cependant, les résidus générés suite au processus d’extraction de l’huile contiennent une importante quantité de protéines et peuvent être valorisées dans l’alimentation humaine en raison, principalement, de certaines fractions peptidiques possédant des propriétés bioactives. Dans le cadre de ce travail, l’influence des hautes pressions hydrostatiques (HPH) sur un isolat de protéines de lin a été étudiée concernant les modifications de la structure protéique, l’hydrolyse enzymatique ainsi que l’activité antioxydante des hydrolysats. Ainsi, des solutions protéiques de lin (1% m/v) ont été soumises à un traitement de HPH à 600 MPa pendant 5 et 20 minutes, à 20°C et comparés à des échantillons non-pressurisés. Deux traitements subséquents d’hydrolyse ont été effectués suite au traitement ou non de pressurisation : une première hydrolyse trypsique suivie d’une deuxième par la pronase. Dans un premier temps, la caractérisation de l’isolat protéique de lin pressurisé et non pressurisé a été réalisée par spectrofluorimétrie et par une analyse de la taille des particules afin d’étudier l’effet de la pressurisation sur les HPH la matrice protéique végétale. Par la suite, les hydrolysats protéiques ont été caractérisés par HPLC-MS et leur capacité antioxydante a été déterminée par ORAC. Les résultats ont démontré que le niveau de pressurisation et la durée du traitement ont un impact sur la structure protéique en induisant la dissociation des protéines, et la formation d’agrégats. Ceux-ci seraient occasionnés par la décompression ou créés durant l’entreposage des isolats. Suite à l’hydrolyse enzymatique des solutions protéiques pressurisées ou non par la trypsine seule et par la trypsine-pronase, les analyses chromatographiques ont révélé que la concentration de certains peptides a été modifiée lorsque la trypsine seule était utilisée après un traitement à HPH. Enfin, les HPH ont amélioré la capacité antioxydante des hydrolysats obtenus lors de l’hydrolyse trypsine-pronase comparativement au contrôle non-pressurisé.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Seafood products fraud, the misrepresentation of them, have been discovered all around the world in different forms as false labeling, species substitution, short-weighting or over glazing in order to hide the correct identity, origin or weight of the seafood products. Due to the value of seafood products such as canned tuna, swordfish or grouper, these species are the subject of the commercial fraud is mainly there placement of valuable species with other little or no value species. A similar situation occurs with the shelled shrimp or shellfish that are reduced into pieces for the commercialization. Food fraud by species substitution is an emerging risk given the increasingly global food supply chain and the potential food safety issues. Economic food fraud is committed when food is deliberately placed on the market, for financial gain deceiving consumers (Woolfe, M. & Primrose, S. 2004). As a result of the increased demand and the globalization of the seafood supply, more fish species are encountered in the market. In this scenary, it becomes essential to unequivocally identify the species. The traditional taxonomy, based primarily on identification keys of species, has shown a number of limitations in the use of the distinctive features in many animal taxa, amplified when fish, crustacean or shellfish are commercially transformed. Many fish species show a similar texture, thus the certification of fish products is particularly important when fishes have undergone procedures which affect the overall anatomical structure, such as heading, slicing or filleting (Marko et al., 2004). The absence of morphological traits, a main characteristic usually used to identify animal species, represents a challenge and molecular identification methods are required. Among them, DNA-based methods are more frequently employed for food authentication (Lockley & Bardsley, 2000). In addition to food authentication and traceability, studies of taxonomy, population and conservation genetics as well as analysis of dietary habits and prey selection, also rely on genetic analyses including the DNA barcoding technology (Arroyave & Stiassny, 2014; Galimberti et al., 2013; Mafra, Ferreira, & Oliveira, 2008; Nicolé et al., 2012; Rasmussen & Morrissey, 2008), consisting in PCR amplification and sequencing of a COI mitochondrial gene specific region. The system proposed by P. Hebert et al. (2003) locates inside the mitochondrial COI gene (cytochrome oxidase subunit I) the bioidentification system useful in taxonomic identification of species (Lo Brutto et al., 2007). The COI region, used for genetic identification - DNA barcode - is short enough to allow, with the current technology, to decode sequence (the pairs of nucleotide bases) in a single step. Despite, this region only represents a tiny fraction of the mitochondrial DNA content in each cell, the COI region has sufficient variability to distinguish the majority of species among them (Biondo et al. 2016). This technique has been already employed to address the demand of assessing the actual identity and/or provenance of marketed products, as well as to unmask mislabelling and fraudulent substitutions, difficult to detect especially in manufactured seafood (Barbuto et al., 2010; Galimberti et al., 2013; Filonzi, Chiesa, Vaghi, & Nonnis Marzano, 2010). Nowadays,the research concerns the use of genetic markers to identify not only the species and/or varieties of fish, but also to identify molecular characters able to trace the origin and to provide an effective control tool forproducers and consumers as a supply chain in agreementwith local regulations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Betalains are plant derived natural pigments that are presently gaining popularity for use as natural colorants in food industry. Although being betalains from red beetroot already used as food colorant (E- 162), these compounds are not as well studied as compared to other natural pigments such as anthocyanins, carotenoids or chlorophylls (I]. Since food additives are on the focus of public interest, it is becoming increasingly important to meet consumers' expectations for natural and healthy products. Hence, the search for new plant-derived colorants for the food industry is still necessary [2]. Betalains were originally called 'nitrogenous anthocyanins', which incorrectly implied structural similarities between the two pigment classes. There are two structurally different types of betalains: the yellow/orange betaxanthins which are the condensation products of betalamic acid and assorted amino compounds, and the red betacyanins which are formed by glycosylation and acylation of cyclo-DOPA [3]. Looking at the chemical structure of the pigment, the addition of an acid to the extraction solvent will increase the affinity of the pigment with the solvent. The aim of this study was to use Gomphrena globosa L. flowers, as an alternative plant source to obtain these pigments and to evaluate the best acid to be used within the extraction procedure. For that purpose three different acids (acetic, hydrochloric and phosphoric acids, all ofthem allowed by the food industry), adjusted at the same pH, were tested during a maceration extraction procedure. After the extraction a purification through C18 column was performed in order to obtain a more concentrate extract in betacyanins. The results were analysed by HPLC-PDA-MSIESI. The betacyanin profile allowed the identification of gomphrenin IIJIII and isogomphrenin IIIIII and the best results were achieved by performing the extraction procedure using hydrochloric acid (6.6 mg/g extract), while phosphoric acid only presented trace amounts of these compounds. When acetic acid was used, the pigment extracted was 6.8 times less (0.97 mg/g extract) when compared to HCI. In conclusion hydrochloric acid can be considered the most suitable acid to be applied in the extraction procedure of these pigments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conventional topic models are ineffective for topic extraction from microblog messages since the lack of structure and context among the posts renders poor message-level word co-occurrence patterns. In this work, we organize microblog posts as conversation trees based on reposting and replying relations, which enrich context information to alleviate data sparseness. Our model generates words according to topic dependencies derived from the conversation structures. In specific, we differentiate messages as leader messages, which initiate key aspects of previously focused topics or shift the focus to different topics, and follower messages that do not introduce any new information but simply echo topics from the messages that they repost or reply. Our model captures the different extents that leader and follower messages may contain the key topical words, thus further enhances the quality of the induced topics. The results of thorough experiments demonstrate the effectiveness of our proposed model.