147 resultados para High-Dimensional Space Geometrical Informatics (HDSGI)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Probabilistic topic models have recently been used for activity analysis in video processing, due to their strong capacity to model both local activities and interactions in crowded scenes. In those applications, a video sequence is divided into a collection of uniform non-overlaping video clips, and the high dimensional continuous inputs are quantized into a bag of discrete visual words. The hard division of video clips, and hard assignment of visual words leads to problems when an activity is split over multiple clips, or the most appropriate visual word for quantization is unclear. In this paper, we propose a novel algorithm, which makes use of a soft histogram technique to compensate for the loss of information in the quantization process; and a soft cut technique in the temporal domain to overcome problems caused by separating an activity into two video clips. In the detection process, we also apply a soft decision strategy to detect unusual events.We show that the proposed soft decision approach outperforms its hard decision counterpart in both local and global activity modelling.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Search log data is multi dimensional data consisting of number of searches of multiple users with many searched parameters. This data can be used to identify a user’s interest in an item or object being searched. Identifying highest interests of a Web user from his search log data is a complex process. Based on a user’s previous searches, most recommendation methods employ two-dimensional models to find relevant items. Such items are then recommended to a user. Two-dimensional data models, when used to mine knowledge from such multi dimensional data may not be able to give good mappings of user and his searches. The major problem with such models is that they are unable to find the latent relationships that exist between different searched dimensions. In this research work, we utilize tensors to model the various searches made by a user. Such high dimensional data model is then used to extract the relationship between various dimensions, and find the prominent searched components. To achieve this, we have used popular tensor decomposition methods like PARAFAC, Tucker and HOSVD. All experiments and evaluation is done on real datasets, which clearly show the effectiveness of tensor models in finding prominent searched components in comparison to other widely used two-dimensional data models. Such top rated searched components are then given as recommendation to users.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

During the late 20th century it was proposed that a design aesthetic reflecting current ecological concerns was required within the overall domain of the built environment and specifically within landscape design. To address this, some authors suggested various theoretical frameworks upon which such an aesthetic could be based. Within these frameworks there was an underlying theme that the patterns and processes of Nature may have the potential to form this aesthetic — an aesthetic based on fractal rather than Euclidean geometry. In order to understand how fractal geometry, described as the geometry of Nature, could become the referent for a design aesthetic, this research examines the mathematical concepts of fractal Geometry, and the underlying philosophical concepts behind the terms ‘Nature’ and ‘aesthetics’. The findings of this initial research meant that a new definition of Nature was required in order to overcome the barrier presented by the western philosophical Nature¯culture duality. This new definition of Nature is based on the type and use of energy. Similarly, it became clear that current usage of the term aesthetics has more in common with the term ‘style’ than with its correct philosophical meaning. The aesthetic philosophy of both art and the environment recognises different aesthetic criteria related to either the subject or the object, such as: aesthetic experience; aesthetic attitude; aesthetic value; aesthetic object; and aesthetic properties. Given these criteria, and the fact that the concept of aesthetics is still an active and ongoing philosophical discussion, this work focuses on the criteria of aesthetic properties and the aesthetic experience or response they engender. The examination of fractal geometry revealed that it is a geometry based on scale rather than on the location of a point within a three-dimensional space. This enables fractal geometry to describe the complex forms and patterns created through the processes of Wild Nature. Although fractal geometry has been used to analyse the patterns of built environments from a plan perspective, it became clear from the initial review of the literature that there was a total knowledge vacuum about the fractal properties of environments experienced every day by people as they move through them. To overcome this, 21 different landscapes that ranged from highly developed city centres to relatively untouched landscapes of Wild Nature have been analysed. Although this work shows that the fractal dimension can be used to differentiate between overall landscape forms, it also shows that by itself it cannot differentiate between all images analysed. To overcome this two further parameters based on the underlying structural geometry embedded within the landscape are discussed. These parameters are the Power Spectrum Median Amplitude and the Level of Isotropy within the Fourier Power Spectrum. Based on the detailed analysis of these parameters a greater understanding of the structural properties of landscapes has been gained. With this understanding, this research has moved the field of landscape design a step close to being able to articulate a new aesthetic for ecological design.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In various industrial and scientific fields, conceptual models are derived from real world problem spaces to understand and communicate containing entities and coherencies. Abstracted models mirror the common understanding and information demand of engineers, who apply conceptual models for performing their daily tasks. However, most standardized models in Process Management, Product Lifecycle Management and Enterprise Resource Planning lack of a scientific foundation for their notation. In collaboration scenarios with stakeholders from several disciplines, tailored conceptual models complicate communication processes, as a common understanding is not shared or implemented in specific models. To support direct communication between experts from several disciplines, a visual language is developed which allows a common visualization of discipline-specific conceptual models. For visual discrimination and to overcome visual complexity issues, conceptual models are arranged in a three-dimensional space. The visual language introduced here follows and extends established principles of Visual Language science.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper presents a detailed analysis on the collective dynamics and delayed state feedback control of a three-dimensional delayed small-world network. The trivial equilibrium of the model is first investigated, showing that the uncontrolled model exhibits complicated unbounded behavior. Then three control strategies, namely a position feedback control, a velocity feedback control, and a hybrid control combined velocity with acceleration feedback, are then introduced to stabilize this unstable system. It is shown in these three control schemes that only the hybrid control can easily stabilize the 3-D network system. And with properly chosen delay and gain in the delayed feedback path, the hybrid controlled model may have stable equilibrium, or periodic solutions resulting from the Hopf bifurcation, or complex stranger attractor from the period-doubling bifurcation. Moreover, the direction of Hopf bifurcation and stability of the bifurcation periodic solutions are analyzed. The results are further extended to any "d" dimensional network. It shows that to stabilize a "d" dimensional delayed small-world network, at least a "d – 1" order completed differential feedback is needed. This work provides a constructive suggestion for the high dimensional delayed systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of Bayesian methodologies for solving optimal experimental design problems has increased. Many of these methods have been found to be computationally intensive for design problems that require a large number of design points. A simulation-based approach that can be used to solve optimal design problems in which one is interested in finding a large number of (near) optimal design points for a small number of design variables is presented. The approach involves the use of lower dimensional parameterisations that consist of a few design variables, which generate multiple design points. Using this approach, one simply has to search over a few design variables, rather than searching over a large number of optimal design points, thus providing substantial computational savings. The methodologies are demonstrated on four applications, including the selection of sampling times for pharmacokinetic and heat transfer studies, and involve nonlinear models. Several Bayesian design criteria are also compared and contrasted, as well as several different lower dimensional parameterisation schemes for generating the many design points.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods. Methodology/Principal Findings A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis. Conclusions It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method. It is freely available at http://bioinformatics.awowshop.com/snlpr​ed_page.php.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ambiguity resolution plays a crucial role in real time kinematic GNSS positioning which gives centimetre precision positioning results if all the ambiguities in each epoch are correctly fixed to integers. However, the incorrectly fixed ambiguities can result in large positioning offset up to several meters without notice. Hence, ambiguity validation is essential to control the ambiguity resolution quality. Currently, the most popular ambiguity validation is ratio test. The criterion of ratio test is often empirically determined. Empirically determined criterion can be dangerous, because a fixed criterion cannot fit all scenarios and does not directly control the ambiguity resolution risk. In practice, depending on the underlying model strength, the ratio test criterion can be too conservative for some model and becomes too risky for others. A more rational test method is to determine the criterion according to the underlying model and user requirement. Miss-detected incorrect integers will lead to a hazardous result, which should be strictly controlled. In ambiguity resolution miss-detected rate is often known as failure rate. In this paper, a fixed failure rate ratio test method is presented and applied in analysis of GPS and Compass positioning scenarios. A fixed failure rate approach is derived from the integer aperture estimation theory, which is theoretically rigorous. The criteria table for ratio test is computed based on extensive data simulations in the approach. The real-time users can determine the ratio test criterion by looking up the criteria table. This method has been applied in medium distance GPS ambiguity resolution but multi-constellation and high dimensional scenarios haven't been discussed so far. In this paper, a general ambiguity validation model is derived based on hypothesis test theory, and fixed failure rate approach is introduced, especially the relationship between ratio test threshold and failure rate is examined. In the last, Factors that influence fixed failure rate approach ratio test threshold is discussed according to extensive data simulation. The result shows that fixed failure rate approach is a more reasonable ambiguity validation method with proper stochastic model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis developed and applied Bayesian models for the analysis of survival data. The gene expression was considered as explanatory variables within the Bayesian survival model which can be considered the new contribution in the analysis of such data. The censoring factor that is inherent of survival data has also been addressed in terms of its impact on the fitting of a finite mixture of Weibull distribution with and without covariates. To investigate this, simulation study were carried out under several censoring percentages. Censoring percentage as high as 80% is acceptable here as the work involved high dimensional data. Lastly the Bayesian model averaging approach was developed to incorporate model uncertainty in the prediction of survival.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation seeks to define and classify potential forms of Nonlinear structure and explore the possibilities they afford for the creation of new musical works. It provides the first comprehensive framework for the discussion of Nonlinear structure in musical works and provides a detailed overview of the rise of nonlinearity in music during the 20th century. Nonlinear events are shown to emerge through significant parametrical discontinuity at the boundaries between regions of relatively strong internal cohesion. The dissertation situates Nonlinear structures in relation to linear structures and unstructured sonic phenomena and provides a means of evaluating Nonlinearity in a musical structure through the consideration of the degree to which the structure is integrated, contingent, compressible and determinate as a whole. It is proposed that Nonlinearity can be classified as a three dimensional space described by three continuums: the temporal continuum, encompassing sequential and multilinear forms of organization, the narrative continuum encompassing processual, game structure and developmental narrative forms and the referential continuum encompassing stylistic allusion, adaptation and quotation. The use of spectrograms of recorded musical works is proposed as a means of evaluating Nonlinearity in a musical work through the visual representation of parametrical divergence in pitch, duration, timbre and dynamic over time. Spectral and structural analysis of repertoire works is undertaken as part of an exploration of musical nonlinearity and the compositional and performative features that characterize it. The contribution of cultural, ideological, scientific and technological shifts to the emergence of Nonlinearity in music is discussed and a range of compositional factors that contributed to the emergence of musical Nonlinearity is examined. The evolution of notational innovations from the mobile score to the screen score is plotted and a novel framework for the discussion of these forms of musical transmission is proposed. A computer coordinated performative model is discussed, in which a computer synchronises screening of notational information, provides temporal coordination of the performers through click-tracks or similar methods and synchronises the audio processing and synthesized elements of the work. It is proposed that such a model constitutes a highly effective means of realizing complex Nonlinear structures. A creative folio comprising 29 original works that explore nonlinearity is presented, discussed and categorised utilising the proposed classifications. Spectrograms of these works are employed where appropriate to illustrate the instantiation of parametrically divergent substructures and examples of structural openness through multiple versioning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper elaborates the approach used by the Applied Data Mining Research Group (ADMRG) for the Social Event Detection (SED) Tasks of the 2013 MediaEval Benchmark. We extended the constrained clustering algorithm to apply to the first semi-supervised clustering task, and we compared several classifiers with Latent Dirichlet Allocation as feature selector in the second event classification task. The proposed approach focuses on scalability and efficient memory allocation when applied to a high dimensional data with large clusters. Results of the first task show the effectiveness of the proposed method. Results from task 2 indicate that attention on the imbalance categories distributions is needed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Utility functions in Bayesian experimental design are usually based on the posterior distribution. When the posterior is found by simulation, it must be sampled from for each future data set drawn from the prior predictive distribution. Many thousands of posterior distributions are often required. A popular technique in the Bayesian experimental design literature to rapidly obtain samples from the posterior is importance sampling, using the prior as the importance distribution. However, importance sampling will tend to break down if there is a reasonable number of experimental observations and/or the model parameter is high dimensional. In this paper we explore the use of Laplace approximations in the design setting to overcome this drawback. Furthermore, we consider using the Laplace approximation to form the importance distribution to obtain a more efficient importance distribution than the prior. The methodology is motivated by a pharmacokinetic study which investigates the effect of extracorporeal membrane oxygenation on the pharmacokinetics of antibiotics in sheep. The design problem is to find 10 near optimal plasma sampling times which produce precise estimates of pharmacokinetic model parameters/measures of interest. We consider several different utility functions of interest in these studies, which involve the posterior distribution of parameter functions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE The aim of the study is to examine the spatiotemporal pattern of Japanese Encephalitis (JE) in mainland China during 2002-2010. Specific objectives of the study were to quantify the temporal variation in incidence of JE cases, to determine if clustering of JE cases exists, to detect high risk spatiotemporal clusters of JE cases and to provide evidence-based preventive suggestions to relevant stakeholders. METHODS Monthly JE cases at the county level in mainland China during 2002-2010 were obtained from the China Information System for Diseases Control and Prevention (CISDCP). For the purpose of the analysis, JE case counts for nine years were aggregated into four temporal periods (2002; 2003-2005; 2006; and 2007-2010). Local Indicators of Spatial Association and spatial scan statistics were performed to detect and evaluate local high risk space-time clusters. RESULTS JE incidence showed a decreasing trend from 2002 to 2005 but peaked in 2006, then fluctuated over the study period. Spatial cluster analysis detected high value clusters, mainly located in Southwestern China. Similarly, we identified a primary spatiotemporal cluster of JE in Southwestern China between July and August, with the geographical range of JE transmission increasing over the past years. CONCLUSION JE in China is geographically clustered and its spatial extent dynamically changed during the last nine years in mainland China. This indicates that risk factors for JE infection are likely to be spatially heterogeneous. The results may assist national and local health authorities in the development/refinement of a better preventive strategy and increase the effectiveness of public health interventions against JE transmission.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most of the existing algorithms for approximate Bayesian computation (ABC) assume that it is feasible to simulate pseudo-data from the model at each iteration. However, the computational cost of these simulations can be prohibitive for high dimensional data. An important example is the Potts model, which is commonly used in image analysis. Images encountered in real world applications can have millions of pixels, therefore scalability is a major concern. We apply ABC with a synthetic likelihood to the hidden Potts model with additive Gaussian noise. Using a pre-processing step, we fit a binding function to model the relationship between the model parameters and the synthetic likelihood parameters. Our numerical experiments demonstrate that the precomputed binding function dramatically improves the scalability of ABC, reducing the average runtime required for model fitting from 71 hours to only 7 minutes. We also illustrate the method by estimating the smoothing parameter for remotely sensed satellite imagery. Without precomputation, Bayesian inference is impractical for datasets of that scale.