933 resultados para VECTOR SPACE MODEL
Resumo:
A hierarchical structure is used to represent the content of the semi-structured documents such as XML and XHTML. The traditional Vector Space Model (VSM) is not sufficient to represent both the structure and the content of such web documents. Hence in this paper, we introduce a novel method of representing the XML documents in Tensor Space Model (TSM) and then utilize it for clustering. Empirical analysis shows that the proposed method is scalable for a real-life dataset as well as the factorized matrices produced from the proposed method helps to improve the quality of clusters due to the enriched document representation with both the structure and the content information.
Resumo:
The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information.
Resumo:
In this paper, we propose a text mining method called LRD (latent relation discovery), which extends the traditional vector space model of document representation in order to improve information retrieval (IR) on documents and document clustering. Our LRD method extracts terms and entities, such as person, organization, or project names, and discovers relationships between them by taking into account their co-occurrence in textual corpora. Given a target entity, LRD discovers other entities closely related to the target effectively and efficiently. With respect to such relatedness, a measure of relation strength between entities is defined. LRD uses relation strength to enhance the vector space model, and uses the enhanced vector space model for query based IR on documents and clustering documents in order to discover complex relationships among terms and entities. Our experiments on a standard dataset for query based IR shows that our LRD method performed significantly better than traditional vector space model and other five standard statistical methods for vector expansion.
Resumo:
Estimating and predicting degradation processes of engineering assets is crucial for reducing the cost and insuring the productivity of enterprises. Assisted by modern condition monitoring (CM) technologies, most asset degradation processes can be revealed by various degradation indicators extracted from CM data. Maintenance strategies developed using these degradation indicators (i.e. condition-based maintenance) are more cost-effective, because unnecessary maintenance activities are avoided when an asset is still in a decent health state. A practical difficulty in condition-based maintenance (CBM) is that degradation indicators extracted from CM data can only partially reveal asset health states in most situations. Underestimating this uncertainty in relationships between degradation indicators and health states can cause excessive false alarms or failures without pre-alarms. The state space model provides an efficient approach to describe a degradation process using these indicators that can only partially reveal health states. However, existing state space models that describe asset degradation processes largely depend on assumptions such as, discrete time, discrete state, linearity, and Gaussianity. The discrete time assumption requires that failures and inspections only happen at fixed intervals. The discrete state assumption entails discretising continuous degradation indicators, which requires expert knowledge and often introduces additional errors. The linear and Gaussian assumptions are not consistent with nonlinear and irreversible degradation processes in most engineering assets. This research proposes a Gamma-based state space model that does not have discrete time, discrete state, linear and Gaussian assumptions to model partially observable degradation processes. Monte Carlo-based algorithms are developed to estimate model parameters and asset remaining useful lives. In addition, this research also develops a continuous state partially observable semi-Markov decision process (POSMDP) to model a degradation process that follows the Gamma-based state space model and is under various maintenance strategies. Optimal maintenance strategies are obtained by solving the POSMDP. Simulation studies through the MATLAB are performed; case studies using the data from an accelerated life test of a gearbox and a liquefied natural gas industry are also conducted. The results show that the proposed Monte Carlo-based EM algorithm can estimate model parameters accurately. The results also show that the proposed Gamma-based state space model have better fitness result than linear and Gaussian state space models when used to process monotonically increasing degradation data in the accelerated life test of a gear box. Furthermore, both simulation studies and case studies show that the prediction algorithm based on the Gamma-based state space model can identify the mean value and confidence interval of asset remaining useful lives accurately. In addition, the simulation study shows that the proposed maintenance strategy optimisation method based on the POSMDP is more flexible than that assumes a predetermined strategy structure and uses the renewal theory. Moreover, the simulation study also shows that the proposed maintenance optimisation method can obtain more cost-effective strategies than a recently published maintenance strategy optimisation method by optimising the next maintenance activity and the waiting time till the next maintenance activity simultaneously.
Resumo:
Video surveillance systems using Closed Circuit Television (CCTV) cameras, is one of the fastest growing areas in the field of security technologies. However, the existing video surveillance systems are still not at a stage where they can be used for crime prevention. The systems rely heavily on human observers and are therefore limited by factors such as fatigue and monitoring capabilities over long periods of time. This work attempts to address these problems by proposing an automatic suspicious behaviour detection which utilises contextual information. The utilisation of contextual information is done via three main components: a context space model, a data stream clustering algorithm, and an inference algorithm. The utilisation of contextual information is still limited in the domain of suspicious behaviour detection. Furthermore, it is nearly impossible to correctly understand human behaviour without considering the context where it is observed. This work presents experiments using video feeds taken from CAVIAR dataset and a camera mounted on one of the buildings Z-Block) at the Queensland University of Technology, Australia. From these experiments, it is shown that by exploiting contextual information, the proposed system is able to make more accurate detections, especially of those behaviours which are only suspicious in some contexts while being normal in the others. Moreover, this information gives critical feedback to the system designers to refine the system.
Resumo:
Having a good automatic anomalous human behaviour detection is one of the goals of smart surveillance systems’ domain of research. The automatic detection addresses several human factor issues underlying the existing surveillance systems. To create such a detection system, contextual information needs to be considered. This is because context is required in order to correctly understand human behaviour. Unfortunately, the use of contextual information is still limited in the automatic anomalous human behaviour detection approaches. This paper proposes a context space model which has two benefits: (a) It provides guidelines for the system designers to select information which can be used to describe context; (b)It enables a system to distinguish between different contexts. A comparative analysis is conducted between a context-based system which employs the proposed context space model and a system which is implemented based on one of the existing approaches. The comparison is applied on a scenario constructed using video clips from CAVIAR dataset. The results show that the context-based system outperforms the other system. This is because the context space model allows the system to considering knowledge learned from the relevant context only.
Resumo:
Due to the health impacts caused by exposures to air pollutants in urban areas, monitoring and forecasting of air quality parameters have become popular as an important topic in atmospheric and environmental research today. The knowledge on the dynamics and complexity of air pollutants behavior has made artificial intelligence models as a useful tool for a more accurate pollutant concentration prediction. This paper focuses on an innovative method of daily air pollution prediction using combination of Support Vector Machine (SVM) as predictor and Partial Least Square (PLS) as a data selection tool based on the measured values of CO concentrations. The CO concentrations of Rey monitoring station in the south of Tehran, from Jan. 2007 to Feb. 2011, have been used to test the effectiveness of this method. The hourly CO concentrations have been predicted using the SVM and the hybrid PLS–SVM models. Similarly, daily CO concentrations have been predicted based on the aforementioned four years measured data. Results demonstrated that both models have good prediction ability; however the hybrid PLS–SVM has better accuracy. In the analysis presented in this paper, statistic estimators including relative mean errors, root mean squared errors and the mean absolute relative error have been employed to compare performances of the models. It has been concluded that the errors decrease after size reduction and coefficients of determination increase from 56 to 81% for SVM model to 65–85% for hybrid PLS–SVM model respectively. Also it was found that the hybrid PLS–SVM model required lower computational time than SVM model as expected, hence supporting the more accurate and faster prediction ability of hybrid PLS–SVM model.
Resumo:
Causal relationships existing between observed levels of groundwater in a semi-arid sub-basin of the Kabini River basin (Karnataka state, India) are investigated in this study. A Vector Auto Regressive model is used for this purpose. Its structure is built on an upstream/downstream interaction network based on observed hydro-physical properties. Exogenous climatic forcing is used as an input based on cumulated rainfall departure. Optimal models are obtained thanks to a trial approach and are used as a proxy of the dynamics to derive causal networks. It appears to be an interesting tool for analysing the causal relationships existing inside the basin. The causal network reveals 3 main regions: the Northeastern part of the Gundal basin is closely coupled to the outlet dynamics. The Northwestern part is mainly controlled by the climatic forcing and only marginally linked to the outlet dynamic. Finally, the upper part of the basin plays as a forcing rather than a coupling with the lower part of the basin allowing for a separate analysis of this local behaviour. The analysis also reveals differential time scales at work inside the basin when comparing upstream oriented with downstream oriented causalities. In the upper part of the basin, time delays are close to 2 months in the upward direction and lower than 1 month in the downward direction. These time scales are likely to be good indicators of the hydraulic response time of the basin which is a parameter usually difficult to estimate practically. This suggests that, at the sub-basin scale, intra-annual time scales would be more relevant scales for analysing or modelling tropical basin dynamics in hard rock (granitic and gneissic) aquifers ubiquitous in south India. (c) 2012 Elsevier B.V. All rights reserved.
Resumo:
Let L be the algebra of all linear transformations on an n-dimensional vector space V over a field F and let A, B, ƐL. Let Ai+1 = AiB - BAi, i = 0, 1, 2,…, with A = Ao. Let fk (A, B; σ) = A2K+1 - σ1A2K-1 + σ2A2K-3 -… +(-1)KσKA1 where σ = (σ1, σ2,…, σK), σi belong to F and K = k(k-1)/2. Taussky and Wielandt [Proc. Amer. Math. Soc., 13(1962), 732-735] showed that fn(A, B; σ) = 0 if σi is the ith elementary symmetric function of (β4- βs)2, 1 ≤ r ˂ s ≤ n, i = 1, 2, …, N, with N = n(n-1)/2, where β4 are the characteristic roots of B. In this thesis we discuss relations involving fk(X, Y; σ) where X, Y Ɛ L and 1 ≤ k ˂ n. We show: 1. If F is infinite and if for each X Ɛ L there exists σ so that fk(A, X; σ) = 0 where 1 ≤ k ˂ n, then A is a scalar transformation. 2. If F is algebraically closed, a necessary and sufficient condition that there exists a basis of V with respect to which the matrices of A and B are both in block upper triangular form, where the blocks on the diagonals are either one- or two-dimensional, is that certain products X1, X2…Xr belong to the radical of the algebra generated by A and B over F, where Xi has the form f2(A, P(A,B); σ), for all polynomials P(x, y). We partially generalize this to the case where the blocks have dimensions ≤ k. 3. If A and B generate L, if the characteristic of F does not divide n and if there exists σ so that fk(A, B; σ) = 0, for some k with 1 ≤ k ˂ n, then the characteristic roots of B belong to the splitting field of gk(w; σ) = w2K+1 - σ1w2K-1 + σ2w2K-3 - …. +(-1)K σKw over F. We use this result to prove a theorem involving a generalized form of property L [cf. Motzkin and Taussky, Trans. Amer. Math. Soc., 73(1952), 108-114]. 4. Also we give mild generalizations of results of McCoy [Amer. Math. Soc. Bull., 42(1936), 592-600] and Drazin [Proc. London Math. Soc., 1(1951), 222-231].
Resumo:
A model is presented that deals with problems of motor control, motor learning, and sensorimotor integration. The equations of motion for a limb are parameterized and used in conjunction with a quantized, multi-dimensional memory organized by state variables. Descriptions of desired trajectories are translated into motor commands which will replicate the specified motions. The initial specification of a movement is free of information regarding the mechanics of the effector system. Learning occurs without the use of error correction when practice data are collected and analyzed.
Resumo:
In most previous research on distributional semantics, Vector Space Models (VSMs) of words are built either from topical information (e.g., documents in which a word is present), or from syntactic/semantic types of words (e.g., dependency parse links of a word in sentences), but not both. In this paper, we explore the utility of combining these two representations to build VSM for the task of semantic composition of adjective-noun phrases. Through extensive experiments on benchmark datasets, we find that even though a type-based VSM is effective for semantic composition, it is often outperformed by a VSM built using a combination of topic- and type-based statistics. We also introduce a new evaluation task wherein we predict the composed vector representation of a phrase from the brain activity of a human subject reading that phrase. We exploit a large syntactically parsed corpus of 16 billion tokens to build our VSMs, with vectors for both phrases and words, and make them publicly available.