968 resultados para text vector space model
Resumo:
In this paper a method of copy detection in short Malayalam text passages is proposed. Given two passages one as the source text and another as the copied text it is determined whether the second passage is plagiarized version of the source text. An algorithm for plagiarism detection using the n-gram model for word retrieval is developed and found tri-grams as the best model for comparing the Malayalam text. Based on the probability and the resemblance measures calculated from the n-gram comparison , the text is categorized on a threshold. Texts are compared by variable length n-gram(n={2,3,4}) comparisons. The experiments show that trigram model gives the average acceptable performance with affordable cost in terms of complexity
Resumo:
We compare Naive Bayes and Support Vector Machines on the task of multiclass text classification. Using a variety of approaches to combine the underlying binary classifiers, we find that SVMs substantially outperform Naive Bayes. We present full multiclass results on two well-known text data sets, including the lowest error to date on both data sets. We develop a new indicator of binary performance to show that the SVM's lower multiclass error is a result of its improved binary performance. Furthermore, we demonstrate and explore the surprising result that one-vs-all classification performs favorably compared to other approaches even though it has no error-correcting properties.
Resumo:
Support Vector Machines Regression (SVMR) is a regression technique which has been recently introduced by V. Vapnik and his collaborators (Vapnik, 1995; Vapnik, Golowich and Smola, 1996). In SVMR the goodness of fit is measured not by the usual quadratic loss function (the mean square error), but by a different loss function called Vapnik"s $epsilon$- insensitive loss function, which is similar to the "robust" loss functions introduced by Huber (Huber, 1981). The quadratic loss function is well justified under the assumption of Gaussian additive noise. However, the noise model underlying the choice of Vapnik's loss function is less clear. In this paper the use of Vapnik's loss function is shown to be equivalent to a model of additive and Gaussian noise, where the variance and mean of the Gaussian are random variables. The probability distributions for the variance and mean will be stated explicitly. While this work is presented in the framework of SVMR, it can be extended to justify non-quadratic loss functions in any Maximum Likelihood or Maximum A Posteriori approach. It applies not only to Vapnik's loss function, but to a much broader class of loss functions.
Resumo:
Caches are known to consume up to half of all system power in embedded processors. Co-optimizing performance and power of the cache subsystems is therefore an important step in the design of embedded systems, especially those employing application specific instruction processors. In this project, we propose an analytical cache model that succinctly captures the miss performance of an application over the entire cache parameter space. Unlike exhaustive trace driven simulation, our model requires that the program be simulated once so that a few key characteristics can be obtained. Using these application-dependent characteristics, the model can span the entire cache parameter space consisting of cache sizes, associativity and cache block sizes. In our unified model, we are able to cater for direct-mapped, set and fully associative instruction, data and unified caches. Validation against full trace-driven simulations shows that our model has a high degree of fidelity. Finally, we show how the model can be coupled with a power model for caches such that one can very quickly decide on pareto-optimal performance-power design points for rapid design space exploration.
Resumo:
Este artículo pertenece a una sección de la revista dedicada a psicología social
Resumo:
Resumen basado en el de la publicaci??n
Resumo:
Tropical cyclones have been investigated in a T159 version of the MPI ECHAM5 climate model using a novel technique to diagnose the evolution of the 3-dimensional vorticity structure of tropical cyclones, including their full life cycle from weak initial vortex to their possible extra-tropical transition. Results have been compared with reanalyses (ERA40 and JRA25) and observed tropical storms during the period 1978-1999 for the Northern Hemisphere. There is no indication of any trend in the number or intensity of tropical storms during this period in ECHAM5 or in re-analyses but there are distinct inter-annual variations. The storms simulated by ECHAM5 are realistic both in space and time, but the model and even more so the re-analyses, underestimate the intensities of the most intense storms (in terms of their maximum wind speeds). There is an indication of a response to ENSO with a smaller number of Atlantic storms during El Niño in agreement with previous studies. The global divergence circulation responds to El Niño by setting up a large-scale convergence flow, with the center over the central Pacific with enhanced subsidence over the tropical Atlantic. At the same time there is an increase in the vertical wind shear in the region of the tropical Atlantic where tropical storms normally develop. There is a good correspondence between the model and ERA40 except that the divergence circulation is somewhat stronger in the model. The model underestimates storms in the Atlantic but tends to overestimate them in the Western Pacific and in the North Indian Ocean. It is suggested that the overestimation of storms in the Pacific by the model is related to an overly strong response to the tropical Pacific SST anomalies. The overestimation in 2 the North Indian Ocean is likely to be due to an over prediction in the intensity of monsoon depressions, which are then classified as intense tropical storms. Nevertheless, overall results are encouraging and will further contribute to increased confidence in simulating intense tropical storms with high-resolution climate models.
Resumo:
Results from the first Sun-to-Earth coupled numerical model developed at the Center for Integrated Space Weather Modeling are presented. The model simulates physical processes occurring in space spanning from the corona of the Sun to the Earth's ionosphere, and it represents the first step toward creating a physics-based numerical tool for predicting space weather conditions in the near-Earth environment. Two 6- to 7-d intervals, representing different heliospheric conditions in terms of the three-dimensional configuration of the heliospheric current sheet, are chosen for simulations. These conditions lead to drastically different responses of the simulated magnetosphere-ionosphere system, emphasizing, on the one hand, challenges one encounters in building such forecasting tools, and on the other hand, emphasizing successes that can already be achieved even at this initial stage of Sun-to-Earth modeling.
Resumo:
One of the primary goals of the Center for Integrated Space Weather Modeling (CISM) effort is to assess and improve prediction of the solar wind conditions in near‐Earth space, arising from both quasi‐steady and transient structures. We compare 8 years of L1 in situ observations to predictions of the solar wind speed made by the Wang‐Sheeley‐Arge (WSA) empirical model. The mean‐square error (MSE) between the observed and model predictions is used to reach a number of useful conclusions: there is no systematic lag in the WSA predictions, the MSE is found to be highest at solar minimum and lowest during the rise to solar maximum, and the optimal lead time for 1 AU solar wind speed predictions is found to be 3 days. However, MSE is shown to frequently be an inadequate “figure of merit” for assessing solar wind speed predictions. A complementary, event‐based analysis technique is developed in which high‐speed enhancements (HSEs) are systematically selected and associated from observed and model time series. WSA model is validated using comparisons of the number of hit, missed, and false HSEs, along with the timing and speed magnitude errors between the forecasted and observed events. Morphological differences between the different HSE populations are investigated to aid interpretation of the results and improvements to the model. Finally, by defining discrete events in the time series, model predictions from above and below the ecliptic plane can be used to estimate an uncertainty in the predicted HSE arrival times.
Resumo:
Locality to other nodes on a peer-to-peer overlay network can be established by means of a set of landmarks shared among the participating nodes. Each node independently collects a set of latency measures to landmark nodes, which are used as a multi-dimensional feature vector. Each peer node uses the feature vector to generate a unique scalar index which is correlated to its topological locality. A popular dimensionality reduction technique is the space filling Hilbert’s curve, as it possesses good locality preserving properties. However, there exists little comparison between Hilbert’s curve and other techniques for dimensionality reduction. This work carries out a quantitative analysis of their properties. Linear and non-linear techniques for scaling the landmark vectors to a single dimension are investigated. Hilbert’s curve, Sammon’s mapping and Principal Component Analysis have been used to generate a 1d space with locality preserving properties. This work provides empirical evidence to support the use of Hilbert’s curve in the context of locality preservation when generating peer identifiers by means of landmark vector analysis. A comparative analysis is carried out with an artificial 2d network model and with a realistic network topology model with a typical power-law distribution of node connectivity in the Internet. Nearest neighbour analysis confirms Hilbert’s curve to be very effective in both artificial and realistic network topologies. Nevertheless, the results in the realistic network model show that there is scope for improvements and better techniques to preserve locality information are required.
Resumo:
We consider a non-local version of the NJL model, based on a separable quark-quark interaction. The interaction is extended to include terms that bind vector and axial-vector mesons. The non-locality means that no further regulator is required. Moreover the model is able to confine the quarks by generating a quark propagator without poles at real energies. Working in the ladder approximation, we calculate amplitudes in Euclidean space and discuss features of their continuation to Minkowski energies. Conserved currents are constructed and we demonstrate their consistency with various Ward identities. Various meson masses are calculated, along with their strong and electromagnetic decay amplitudes. We also calculate the electromagnetic form factor of the pion, as well as form factors associated with the processes γγ* → π0 and ω → π0γ*. The results are found to lead to a satisfactory phenomenology and lend some dynamical support to the idea of vector-meson dominance.
Resumo:
A nonlocal version of the NJL model is investigated. It is based on a separable quark-quark interaction, as suggested by the instanton liquid picture of the QCD vacuum. The interaction is extended to include terms that bind vector and axial-vector mesons. The nonlocality means that no further regulator is required. Moreover the model is able to confine the quarks by generating a quark propagator without poles at real energies. Features of the continuation of amplitudes from Euclidean space to Minkowski energies are discussed. These features lead to restrictions on the model parameters as well as on the range of applicability of the model. Conserved currents are constructed, and their consistency with various Ward identities is demonstrated. In particular, the Gell-Mann-Oakes-Renner relation is derived both in the ladder approximation and at meson loop level. The importance of maintaining chiral symmetry in the calculations is stressed throughout. Calculations with the model are performed to all orders in momentum. Meson masses are determined, along with their strong and electromagnetic decay amplitudes. Also calculated are the electromagnetic form factor of the pion and form factors associated with the processes gamma gamma* --> pi0 and omega --> pi0 gamma*. The results are found to lead to a satisfactory phenomenology and demonstrate a possible dynamical origin for vector-meson dominance. In addition, the results produced at meson loop level validate the use of 1/Nc as an expansion parameter and indicate that a light and broad scalar state is inherent in models of the NJL type.
Resumo:
Neurofuzzy modelling systems combine fuzzy logic with quantitative artificial neural networks via a concept of fuzzification by using a fuzzy membership function usually based on B-splines and algebraic operators for inference, etc. The paper introduces a neurofuzzy model construction algorithm using Bezier-Bernstein polynomial functions as basis functions. The new network maintains most of the properties of the B-spline expansion based neurofuzzy system, such as the non-negativity of the basis functions, and unity of support but with the additional advantages of structural parsimony and Delaunay input space partitioning, avoiding the inherent computational problems of lattice networks. This new modelling network is based on the idea that an input vector can be mapped into barycentric co-ordinates with respect to a set of predetermined knots as vertices of a polygon (a set of tiled Delaunay triangles) over the input space. The network is expressed as the Bezier-Bernstein polynomial function of barycentric co-ordinates of the input vector. An inverse de Casteljau procedure using backpropagation is developed to obtain the input vector's barycentric co-ordinates that form the basis functions. Extension of the Bezier-Bernstein neurofuzzy algorithm to n-dimensional inputs is discussed followed by numerical examples to demonstrate the effectiveness of this new data based modelling approach.
Resumo:
The recent decline in the open magnetic flux of the Sun heralds the end of the Grand Solar Maximum (GSM) that has persisted throughout the space age, during which the largest‐fluence Solar Energetic Particle (SEP) events have been rare and Galactic Cosmic Ray (GCR) fluxes have been relatively low. In the absence of a predictive model of the solar dynamo, we here make analogue forecasts by studying past variations of solar activity in order to evaluate how long‐term change in space climate may influence the hazardous energetic particle environment of the Earth in the future. We predict the probable future variations in GCR flux, near‐Earth interplanetary magnetic field (IMF), sunspot number, and the probability of large SEP events, all deduced from cosmogenic isotope abundance changes following 24 GSMs in a 9300‐year record.
Resumo:
In this paper sequential importance sampling is used to assess the impact of observations on a ensemble prediction for the decadal path transitions of the Kuroshio Extension (KE). This particle filtering approach gives access to the probability density of the state vector, which allows us to determine the predictive power — an entropy based measure — of the ensemble prediction. The proposed set-up makes use of an ensemble that, at each time, samples the climatological probability distribution. Then, in a post-processing step, the impact of different sets of observations is measured by the increase in predictive power of the ensemble over the climatological signal during one-year. The method is applied in an identical-twin experiment for the Kuroshio Extension using a reduced-gravity shallow water model. We investigate the impact of assimilating velocity observations from different locations during the elongated and the contracted meandering state of the KE. Optimal observations location correspond to regions with strong potential vorticity gradients. For the elongated state the optimal location is in the first meander of the KE. During the contracted state of the KE it is located south of Japan, where the Kuroshio separates from the coast.