936 resultados para Vector Space Model
Resumo:
Twitter lists organise Twitter users into multiple, often overlapping, sets. We believe that these lists capture some form of emergent semantics, which may be useful to characterise. In this paper we describe an approach for such characterisation, which consists of deriving semantic relations between lists and users by analyzing the cooccurrence of keywords in list names. We use the vector space model and Latent Dirichlet Allocation to obtain similar keywords according to co-occurrence patterns. These results are then compared to similarity measures relying on WordNet and to existing Linked Data sets. Results show that co-occurrence of keywords based on members of the lists produce more synonyms and more correlated results to that of WordNet similarity measures.
Resumo:
Web document cluster analysis plays an important role in information retrieval by organizing large amounts of documents into a small number of meaningful clusters. Traditional web document clustering is based on the Vector Space Model (VSM), which takes into account only two-level (document and term) knowledge granularity but ignores the bridging paragraph granularity. However, this two-level granularity may lead to unsatisfactory clustering results with “false correlation”. In order to deal with the problem, a Hierarchical Representation Model with Multi-granularity (HRMM), which consists of five-layer representation of data and a twophase clustering process is proposed based on granular computing and article structure theory. To deal with the zero-valued similarity problemresulted from the sparse term-paragraphmatrix, an ontology based strategy and a tolerance-rough-set based strategy are introduced into HRMM. By using granular computing, structural knowledge hidden in documents can be more efficiently and effectively captured in HRMM and thus web document clusters with higher quality can be generated. Extensive experiments show that HRMM, HRMM with tolerancerough-set strategy, and HRMM with ontology all outperform VSM and a representative non VSM-based algorithm, WFP, significantly in terms of the F-Score.
Resumo:
Similar to Genetic algorithm, Evolution strategy is a process of continuous reproduction, trial and selection. Each new generation is an improvement on the one that went before. This paper presents two different proposals based on the vector space model (VSM) as a traditional model in information Retrieval (TIR). The first uses evolution strategy (ES). The second uses the document centroid (DC) in query expansion technique. Then the results are compared; it was noticed that ES technique is more efficient than the other methods.
Resumo:
Electronic publishing exploits numerous possibilities to present or exchange information and to communicate via most current media like the Internet. By utilizing modern Web technologies like Web Services, loosely coupled services, and peer-to-peer networks we describe the integration of an intelligent business news presentation and distribution network. Employing semantics technologies enables the coupling of multinational and multilingual business news data on a scalable international level and thus introduce a service quality that is not achieved by alternative technologies in the news distribution area so far. Architecturally, we identified the loose coupling of existing services as the most feasible way to address multinational and multilingual news presentation and distribution networks. Furthermore we semantically enrich multinational news contents by relating them using AI techniques like the Vector Space Model. Summarizing our experiences we describe the technical integration of semantics and communication technologies in order to create a modern international news network.
Resumo:
This dissertation research points out major challenging problems with current Knowledge Organization (KO) systems, such as subject gateways or web directories: (1) the current systems use traditional knowledge organization systems based on controlled vocabulary which is not very well suited to web resources, and (2) information is organized by professionals not by users, which means it does not reflect intuitively and instantaneously expressed users’ current needs. In order to explore users’ needs, I examined social tags which are user-generated uncontrolled vocabulary. As investment in professionally-developed subject gateways and web directories diminishes (support for both BUBL and Intute, examined in this study, is being discontinued), understanding characteristics of social tagging becomes even more critical. Several researchers have discussed social tagging behavior and its usefulness for classification or retrieval; however, further research is needed to qualitatively and quantitatively investigate social tagging in order to verify its quality and benefit. This research particularly examined the indexing consistency of social tagging in comparison to professional indexing to examine the quality and efficacy of tagging. The data analysis was divided into three phases: analysis of indexing consistency, analysis of tagging effectiveness, and analysis of tag attributes. Most indexing consistency studies have been conducted with a small number of professional indexers, and they tended to exclude users. Furthermore, the studies mainly have focused on physical library collections. This dissertation research bridged these gaps by (1) extending the scope of resources to various web documents indexed by users and (2) employing the Information Retrieval (IR) Vector Space Model (VSM) - based indexing consistency method since it is suitable for dealing with a large number of indexers. As a second phase, an analysis of tagging effectiveness with tagging exhaustivity and tag specificity was conducted to ameliorate the drawbacks of consistency analysis based on only the quantitative measures of vocabulary matching. Finally, to investigate tagging pattern and behaviors, a content analysis on tag attributes was conducted based on the FRBR model. The findings revealed that there was greater consistency over all subjects among taggers compared to that for two groups of professionals. The analysis of tagging exhaustivity and tag specificity in relation to tagging effectiveness was conducted to ameliorate difficulties associated with limitations in the analysis of indexing consistency based on only the quantitative measures of vocabulary matching. Examination of exhaustivity and specificity of social tags provided insights into particular characteristics of tagging behavior and its variation across subjects. To further investigate the quality of tags, a Latent Semantic Analysis (LSA) was conducted to determine to what extent tags are conceptually related to professionals’ keywords and it was found that tags of higher specificity tended to have a higher semantic relatedness to professionals’ keywords. This leads to the conclusion that the term’s power as a differentiator is related to its semantic relatedness to documents. The findings on tag attributes identified the important bibliographic attributes of tags beyond describing subjects or topics of a document. The findings also showed that tags have essential attributes matching those defined in FRBR. Furthermore, in terms of specific subject areas, the findings originally identified that taggers exhibited different tagging behaviors representing distinctive features and tendencies on web documents characterizing digital heterogeneous media resources. These results have led to the conclusion that there should be an increased awareness of diverse user needs by subject in order to improve metadata in practical applications. This dissertation research is the first necessary step to utilize social tagging in digital information organization by verifying the quality and efficacy of social tagging. This dissertation research combined both quantitative (statistics) and qualitative (content analysis using FRBR) approaches to vocabulary analysis of tags which provided a more complete examination of the quality of tags. Through the detailed analysis of tag properties undertaken in this dissertation, we have a clearer understanding of the extent to which social tagging can be used to replace (and in some cases to improve upon) professional indexing.
Resumo:
Background: In the analysis of effects by cell treatment such as drug dosing, identifying changes on gene network structures between normal and treated cells is a key task. A possible way for identifying the changes is to compare structures of networks estimated from data on normal and treated cells separately. However, this approach usually fails to estimate accurate gene networks due to the limited length of time series data and measurement noise. Thus, approaches that identify changes on regulations by using time series data on both conditions in an efficient manner are demanded. Methods: We propose a new statistical approach that is based on the state space representation of the vector autoregressive model and estimates gene networks on two different conditions in order to identify changes on regulations between the conditions. In the mathematical model of our approach, hidden binary variables are newly introduced to indicate the presence of regulations on each condition. The use of the hidden binary variables enables an efficient data usage; data on both conditions are used for commonly existing regulations, while for condition specific regulations corresponding data are only applied. Also, the similarity of networks on two conditions is automatically considered from the design of the potential function for the hidden binary variables. For the estimation of the hidden binary variables, we derive a new variational annealing method that searches the configuration of the binary variables maximizing the marginal likelihood. Results: For the performance evaluation, we use time series data from two topologically similar synthetic networks, and confirm that our proposed approach estimates commonly existing regulations as well as changes on regulations with higher coverage and precision than other existing approaches in almost all the experimental settings. For a real data application, our proposed approach is applied to time series data from normal Human lung cells and Human lung cells treated by stimulating EGF-receptors and dosing an anticancer drug termed Gefitinib. In the treated lung cells, a cancer cell condition is simulated by the stimulation of EGF-receptors, but the effect would be counteracted due to the selective inhibition of EGF-receptors by Gefitinib. However, gene expression profiles are actually different between the conditions, and the genes related to the identified changes are considered as possible off-targets of Gefitinib. Conclusions: From the synthetically generated time series data, our proposed approach can identify changes on regulations more accurately than existing methods. By applying the proposed approach to the time series data on normal and treated Human lung cells, candidates of off-target genes of Gefitinib are found. According to the published clinical information, one of the genes can be related to a factor of interstitial pneumonia, which is known as a side effect of Gefitinib.
Resumo:
The first two articles build procedures to simulate vector of univariate states and estimate parameters in nonlinear and non Gaussian state space models. We propose state space speci fications that offer more flexibility in modeling dynamic relationship with latent variables. Our procedures are extension of the HESSIAN method of McCausland[2012]. Thus, they use approximation of the posterior density of the vector of states that allow to : simulate directly from the state vector posterior distribution, to simulate the states vector in one bloc and jointly with the vector of parameters, and to not allow data augmentation. These properties allow to build posterior simulators with very high relative numerical efficiency. Generic, they open a new path in nonlinear and non Gaussian state space analysis with limited contribution of the modeler. The third article is an essay in commodity market analysis. Private firms coexist with farmers' cooperatives in commodity markets in subsaharan african countries. The private firms have the biggest market share while some theoretical models predict they disappearance once confronted to farmers cooperatives. Elsewhere, some empirical studies and observations link cooperative incidence in a region with interpersonal trust, and thus to farmers trust toward cooperatives. We propose a model that sustain these empirical facts. A model where the cooperative reputation is a leading factor determining the market equilibrium of a price competition between a cooperative and a private firm
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
This paper presents a three-phase integrated inverter suitable for stand-alone and grid-connected applications. Furthermore, the utilization of the special features of the tri-state coupled with the new space vector modulation allows the converter to present an attractive degree of freedom for the designing of the controllers. Additionally, the control is derived through dq0 transformation, all the system is described and interesting simulation results are available to confirm the proposal. © 2012 IEEE.
Resumo:
A model predictive controller (MPC) is proposed, which is robustly stable for some classes of model uncertainty and to unknown disturbances. It is considered as the case of open-loop stable systems, where only the inputs and controlled outputs are measured. It is assumed that the controller will work in a scenario where target tracking is also required. Here, it is extended to the nominal infinite horizon MPC with output feedback. The method considers an extended cost function that can be made globally convergent for any finite input horizon considered for the uncertain system. The method is based on the explicit inclusion of cost contracting constraints in the control problem. The controller considers the output feedback case through a non-minimal state-space model that is built using past output measurements and past input increments. The application of the robust output feedback MPC is illustrated through the simulation of a low-order multivariable system.
Resumo:
In the MPC literature, stability is usually assured under the assumption that the state is measured. Since the closed-loop system may be nonlinear because of the constraints, it is not possible to apply the separation principle to prove global stability for the Output feedback case. It is well known that, a nonlinear closed-loop system with the state estimated via an exponentially converging observer combined with a state feedback controller can be unstable even when the controller is stable. One alternative to overcome the state estimation problem is to adopt a non-minimal state space model, in which the states are represented by measured past inputs and outputs [P.C. Young, M.A. Behzadi, C.L. Wang, A. Chotai, Direct digital and adaptative control by input-output, state variable feedback pole assignment, International journal of Control 46 (1987) 1867-1881; C. Wang, P.C. Young, Direct digital control by input-output, state variable feedback: theoretical background, International journal of Control 47 (1988) 97-109]. In this case, no observer is needed since the state variables can be directly measured. However, an important disadvantage of this approach is that the realigned model is not of minimal order, which makes the infinite horizon approach to obtain nominal stability difficult to apply. Here, we propose a method to properly formulate an infinite horizon MPC based on the output-realigned model, which avoids the use of an observer and guarantees the closed loop stability. The simulation results show that, besides providing closed-loop stability for systems with integrating and stable modes, the proposed controller may have a better performance than those MPC controllers that make use of an observer to estimate the current states. (C) 2008 Elsevier Ltd. All rights reserved.
Resumo:
This paper addresses robust model-order reduction of a high dimensional nonlinear partial differential equation (PDE) model of a complex biological process. Based on a nonlinear, distributed parameter model of the same process which was validated against experimental data of an existing, pilot-scale BNR activated sludge plant, we developed a state-space model with 154 state variables in this work. A general algorithm for robustly reducing the nonlinear PDE model is presented and based on an investigation of five state-of-the-art model-order reduction techniques, we are able to reduce the original model to a model with only 30 states without incurring pronounced modelling errors. The Singular perturbation approximation balanced truncating technique is found to give the lowest modelling errors in low frequency ranges and hence is deemed most suitable for controller design and other real-time applications. (C) 2002 Elsevier Science Ltd. All rights reserved.
Resumo:
We forecast quarterly US inflation based on the generalized Phillips curve using econometric methods which incorporate dynamic model averaging. These methods not only allow for coe¢ cients to change over time, but also allow for the entire forecasting model to change over time. We nd that dynamic model averaging leads to substantial forecasting improvements over simple benchmark regressions and more sophisticated approaches such as those using time varying coe¢ cient models. We also provide evidence on which sets of predictors are relevant for forecasting in each period.
Resumo:
We forecast quarterly US inflation based on the generalized Phillips curve using econometric methods which incorporate dynamic model averaging. These methods not only allow for coe¢ cients to change over time, but also allow for the entire forecasting model to change over time. We nd that dynamic model averaging leads to substantial forecasting improvements over simple benchmark regressions and more sophisticated approaches such as those using time varying coe¢ cient models. We also provide evidence on which sets of predictors are relevant for forecasting in each period.