933 resultados para Vector space model


Relevância:

90.00% 90.00%

Publicador:

Resumo:

The problem of social diffusion has animated sociological thinking on topics ranging from the spread of an idea, an innovation or a disease, to the foundations of collective behavior and political polarization. While network diffusion has been a productive metaphor, the reality of diffusion processes is often muddier. Ideas and innovations diffuse differently from diseases, but, with a few exceptions, the diffusion of ideas and innovations has been modeled under the same assumptions as the diffusion of disease. In this dissertation, I develop two new diffusion models for "socially meaningful" contagions that address two of the most significant problems with current diffusion models: (1) that contagions can only spread along observed ties, and (2) that contagions do not change as they spread between people. I augment insights from these statistical and simulation models with an analysis of an empirical case of diffusion - the use of enterprise collaboration software in a large technology company. I focus the empirical study on when people abandon innovations, a crucial, and understudied aspect of the diffusion of innovations. Using timestamped posts, I analyze when people abandon software to a high degree of detail.

To address the first problem, I suggest a latent space diffusion model. Rather than treating ties as stable conduits for information, the latent space diffusion model treats ties as random draws from an underlying social space, and simulates diffusion over the social space. Theoretically, the social space model integrates both actor ties and attributes simultaneously in a single social plane, while incorporating schemas into diffusion processes gives an explicit form to the reciprocal influences that cognition and social environment have on each other. Practically, the latent space diffusion model produces statistically consistent diffusion estimates where using the network alone does not, and the diffusion with schemas model shows that introducing some cognitive processing into diffusion processes changes the rate and ultimate distribution of the spreading information. To address the second problem, I suggest a diffusion model with schemas. Rather than treating information as though it is spread without changes, the schema diffusion model allows people to modify information they receive to fit an underlying mental model of the information before they pass the information to others. Combining the latent space models with a schema notion for actors improves our models for social diffusion both theoretically and practically.

The empirical case study focuses on how the changing value of an innovation, introduced by the innovations' network externalities, influences when people abandon the innovation. In it, I find that people are least likely to abandon an innovation when other people in their neighborhood currently use the software as well. The effect is particularly pronounced for supervisors' current use and number of supervisory team members who currently use the software. This case study not only points to an important process in the diffusion of innovation, but also suggests a new approach -- computerized collaboration systems -- to collecting and analyzing data on organizational processes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Digital collections are growing exponentially in size as the information age takes a firm grip on all aspects of society. As a result Information Retrieval (IR) has become an increasingly important area of research. It promises to provide new and more effective ways for users to find information relevant to their search intentions. Document clustering is one of the many tools in the IR toolbox and is far from being perfected. It groups documents that share common features. This grouping allows a user to quickly identify relevant information. If these groups are misleading then valuable information can accidentally be ignored. There- fore, the study and analysis of the quality of document clustering is important. With more and more digital information available, the performance of these algorithms is also of interest. An algorithm with a time complexity of O(n2) can quickly become impractical when clustering a corpus containing millions of documents. Therefore, the investigation of algorithms and data structures to perform clustering in an efficient manner is vital to its success as an IR tool. Document classification is another tool frequently used in the IR field. It predicts categories of new documents based on an existing database of (doc- ument, category) pairs. Support Vector Machines (SVM) have been found to be effective when classifying text documents. As the algorithms for classifica- tion are both efficient and of high quality, the largest gains can be made from improvements to representation. Document representations are vital for both clustering and classification. Representations exploit the content and structure of documents. Dimensionality reduction can improve the effectiveness of existing representations in terms of quality and run-time performance. Research into these areas is another way to improve the efficiency and quality of clustering and classification results. Evaluating document clustering is a difficult task. Intrinsic measures of quality such as distortion only indicate how well an algorithm minimised a sim- ilarity function in a particular vector space. Intrinsic comparisons are inherently limited by the given representation and are not comparable between different representations. Extrinsic measures of quality compare a clustering solution to a “ground truth” solution. This allows comparison between different approaches. As the “ground truth” is created by humans it can suffer from the fact that not every human interprets a topic in the same manner. Whether a document belongs to a particular topic or not can be subjective.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Consider a person searching electronic health records, a search for the term ‘cracked skull’ should return documents that contain the term ‘cranium fracture’. A information retrieval systems is required that matches concepts, not just keywords. Further more, determining relevance of a query to a document requires inference – its not simply matching concepts. For example a document containing ‘dialysis machine’ should align with a query for ‘kidney disease’. Collectively we describe this problem as the ‘semantic gap’ – the difference between the raw medical data and the way a human interprets it. This paper presents an approach to semantic search of health records by combining two previous approaches: an ontological approach using the SNOMED CT medical ontology; and a distributional approach using semantic space vector space models. Our approach will be applied to a specific problem in health informatics: the matching of electronic patient records to clinical trials.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In vector space based approaches to natural language processing, similarity is commonly measured by taking the angle between two vectors representing words or documents in a semantic space. This is natural from a mathematical point of view, as the angle between unit vectors is, up to constant scaling, the only unitarily invariant metric on the unit sphere. However, similarity judgement tasks reveal that human subjects fail to produce data which satisfies the symmetry and triangle inequality requirements for a metric space. A possible conclusion, reached in particular by Tversky et al., is that some of the most basic assumptions of geometric models are unwarranted in the case of psychological similarity, a result which would impose strong limits on the validity and applicability vector space based (and hence also quantum inspired) approaches to the modelling of cognitive processes. This paper proposes a resolution to this fundamental criticism of of the applicability of vector space models of cognition. We argue that pairs of words imply a context which in turn induces a point of view, allowing a subject to estimate semantic similarity. Context is here introduced as a point of view vector (POVV) and the expected similarity is derived as a measure over the POVV's. Different pairs of words will invoke different contexts and different POVV's. Hence the triangle inequality ceases to be a valid constraint on the angles. We test the proposal on a few triples of words and outline further research.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Asset health inspections can produce two types of indicators: (1) direct indicators (e.g. the thickness of a brake pad, and the crack depth on a gear) which directly relate to a failure mechanism; and (2) indirect indicators (e.g. the indicators extracted from vibration signals and oil analysis data) which can only partially reveal a failure mechanism. While direct indicators enable more precise references to asset health condition, they are often more difficult to obtain than indirect indicators. The state space model provides an efficient approach to estimating direct indicators by using indirect indicators. However, existing state space models to estimate direct indicators largely depend on assumptions such as, discrete time, discrete state, linearity, and Gaussianity. The discrete time assumption requires fixed inspection intervals. The discrete state assumption entails discretising continuous degradation indicators, which often introduces additional errors. The linear and Gaussian assumptions are not consistent with nonlinear and irreversible degradation processes in most engineering assets. This paper proposes a state space model without these assumptions. Monte Carlo-based algorithms are developed to estimate the model parameters and the remaining useful life. These algorithms are evaluated for performance using numerical simulations through MATLAB. The result shows that both the parameters and the remaining useful life are estimated accurately. Finally, the new state space model is used to process vibration and crack depth data from an accelerated test of a gearbox. During this application, the new state space model shows a better fitness result than the state space model with linear and Gaussian assumption.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

How do humans respond to their social context? This question is becoming increasingly urgent in a society where democracy requires that the citizens of a country help to decide upon its policy directions, and yet those citizens frequently have very little knowledge of the complex issues that these policies seek to address. Frequently, we find that humans make their decisions more with reference to their social setting, than to the arguments of scientists, academics, and policy makers. It is broadly anticipated that the agent based modelling (ABM) of human behaviour will make it possible to treat such social effects, but we take the position here that a more sophisticated treatment of context will be required in many such models. While notions such as historical context (where the past history of an agent might affect its later actions) and situational context (where the agent will choose a different action in a different situation) abound in ABM scenarios, we will discuss a case of a potentially changing context, where social effects can have a strong influence upon the perceptions of a group of subjects. In particular, we shall discuss a recently reported case where a biased worm in an election debate led to significant distortions in the reports given by participants as to who won the debate (Davis et al 2011). Thus, participants in a different social context drew different conclusions about the perceived winner of the same debate, with associated significant differences among the two groups as to who they would vote for in the coming election. We extend this example to the problem of modelling the likely electoral responses of agents in the context of the climate change debate, and discuss the notion of interference between related questions that might be asked of an agent in a social simulation that was intended to simulate their likely responses. A modelling technology which could account for such strong social contextual effects would benefit regulatory bodies which need to navigate between multiple interests and concerns, and we shall present one viable avenue for constructing such a technology. A geometric approach will be presented, where the internal state of an agent is represented in a vector space, and their social context is naturally modelled as a set of basis states that are chosen with reference to the problem space.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper investigates the effects of limited speech data in the context of speaker verification using a probabilistic linear discriminant analysis (PLDA) approach. Being able to reduce the length of required speech data is important to the development of automatic speaker verification system in real world applications. When sufficient speech is available, previous research has shown that heavy-tailed PLDA (HTPLDA) modeling of speakers in the i-vector space provides state-of-the-art performance, however, the robustness of HTPLDA to the limited speech resources in development, enrolment and verification is an important issue that has not yet been investigated. In this paper, we analyze the speaker verification performance with regards to the duration of utterances used for both speaker evaluation (enrolment and verification) and score normalization and PLDA modeling during development. Two different approaches to total-variability representation are analyzed within the PLDA approach to show improved performance in short-utterance mismatched evaluation conditions and conditions for which insufficient speech resources are available for adequate system development. The results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset suggest that the HTPLDA system can continue to achieve better performance than Gaussian PLDA (GPLDA) as evaluation utterance lengths are decreased. We also highlight the importance of matching durations for score normalization and PLDA modeling to the expected evaluation conditions. Finally, we found that a pooled total-variability approach to PLDA modeling can achieve better performance than the traditional concatenated total-variability approach for short utterances in mismatched evaluation conditions and conditions for which insufficient speech resources are available for adequate system development.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper investigates the use of the dimensionality-reduction techniques weighted linear discriminant analysis (WLDA), and weighted median fisher discriminant analysis (WMFD), before probabilistic linear discriminant analysis (PLDA) modeling for the purpose of improving speaker verification performance in the presence of high inter-session variability. Recently it was shown that WLDA techniques can provide improvement over traditional linear discriminant analysis (LDA) for channel compensation in i-vector based speaker verification systems. We show in this paper that the speaker discriminative information that is available in the distance between pair of speakers clustered in the development i-vector space can also be exploited in heavy-tailed PLDA modeling by using the weighted discriminant approaches prior to PLDA modeling. Based upon the results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset, we believe that WLDA and WMFD projections before PLDA modeling can provide an improved approach when compared to uncompensated PLDA modeling for i-vector based speaker verification systems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The health effects of cold and hot temperatures are strongest in the frail and elderly. A large number of deaths in this "susceptible pool" after heat waves and cold snaps can cause mortality displacement, where an immediate increase in mortality is somewhat offset by a subsequent decrease in the following weeks. There may also be longer-term implications, as reductions in the pool caused by hot summers can reduce cold-related mortality in the following winter. A state-space model was used to simulate the numbers in the susceptible pool over time. We simulated the effects of harsh winters and heat waves, and varied the size of the susceptible pool. The larger the susceptible pool the smaller the mortality displacement. When 1% of the population were susceptible a harsh winter lead to an average of just 3 months of life lost per cold-related death, whereas a pool size of 10% meant that 24 months of life were lost per death. The impact of a cold spell on months of life lost was greater when the increased risk of death also applied to healthy people. The number of deaths caused by an August heat wave were reduced when there was a prior heat wave in June which reduced the susceptible pool. We were able to mimic some observed seasonal patterns in mortality using a simple state-space model. A better understanding of the size and dynamics of the susceptible pool will improve our understanding of the health effects of temperature.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The symbolic and improvisational nature of Livecoding requires a shared networking framework to be flexible and extensible, while at the same time providing support for synchronisation, persistence and redundancy. Above all the framework should be robust and available across a range of platforms. This paper proposes tuple space as a suitable framework for network communication in ensemble livecoding contexts. The role of tuple space as a concurrency framework and the associated timing aspects of the tuple space model are explored through Spaces, an implementation of tuple space for the Impromptu environment.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Many model-based investigation techniques, such as sensitivity analysis, optimization, and statistical inference, require a large number of model evaluations to be performed at different input and/or parameter values. This limits the application of these techniques to models that can be implemented in computationally efficient computer codes. Emulators, by providing efficient interpolation between outputs of deterministic simulation models, can considerably extend the field of applicability of such computationally demanding techniques. So far, the dominant techniques for developing emulators have been priors in the form of Gaussian stochastic processes (GASP) that were conditioned with a design data set of inputs and corresponding model outputs. In the context of dynamic models, this approach has two essential disadvantages: (i) these emulators do not consider our knowledge of the structure of the model, and (ii) they run into numerical difficulties if there are a large number of closely spaced input points as is often the case in the time dimension of dynamic models. To address both of these problems, a new concept of developing emulators for dynamic models is proposed. This concept is based on a prior that combines a simplified linear state space model of the temporal evolution of the dynamic model with Gaussian stochastic processes for the innovation terms as functions of model parameters and/or inputs. These innovation terms are intended to correct the error of the linear model at each output step. Conditioning this prior to the design data set is done by Kalman smoothing. This leads to an efficient emulator that, due to the consideration of our knowledge about dominant mechanisms built into the simulation model, can be expected to outperform purely statistical emulators at least in cases in which the design data set is small. The feasibility and potential difficulties of the proposed approach are demonstrated by the application to a simple hydrological model.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A novel replaceable, modularized energy storage system with wireless interface is proposed for a battery operated electric vehicle (EV). The operation of the proposed system is explained and analyzed with an equivalent circuit and an averaged state-space model. A non-linear feedback linearization based controller is developed and implemented to regulate the DC link voltage by modulating the phase shift ratio. The working and control of the proposed system is verified through simulation and some preliminary results are presented.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Bidirectional Inductive Power Transfer (IPT) systems are preferred for Vehicle-to-Grid (V2G) applications. Typically, bidirectional IPT systems consist of high order resonant networks, and therefore, the control of bidirectional IPT systems has always been a difficulty. To date several different controllers have been reported, but these have been designed using steady-state models, which invariably, are incapable of providing an accurate insight into the dynamic behaviour of the system A dynamic state-space model of a bidirectional IPT system has been reported. However, currently this model has not been used to optimise the design of controllers. Therefore, this paper proposes an optimised controller based on the dynamic model. To verify the operation of the proposed controller simulated results of the optimised controller and simulated results of another controller are compared. Results indicate that the proposed controller is capable of accurately and stably controlling the power flow in a bidirectional IPT system.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A new online method is presented for estimation of the angular randomwalk and rate randomwalk coefficients of inertial measurement unit gyros and accelerometers. In the online method, a state-space model is proposed, and recursive parameter estimators are proposed for quantities previously measured from offline data techniques such as the Allan variance method. The Allan variance method has large offline computational effort and data storage requirements. The technique proposed here requires no data storage and computational effort of approximately 100 calculations per data sample.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A new online method is presented for estimation of the angular random walk and rate random walk coefficients of IMU (inertial measurement unit) gyros and accelerometers. The online method proposes a state space model and proposes parameter estimators for quantities previously measured from off-line data techniques such as the Allan variance graph. Allan variance graphs have large off-line computational effort and data storage requirements. The technique proposed here requires no data storage and computational effort of O(100) calculations per data sample.