856 resultados para Data Driven Clustering
Resumo:
We investigate the evolution of magnetohydrodynamic (or hydromagnetic as coined by Chandrasekhar) perturbations in the presence of stochastic noise in rotating shear flows. The particular emphasis is the flows whose angular velocity decreases but specific angular momentum increases with increasing radial coordinate. Such flows, however, are Rayleigh stable but must be turbulent in order to explain astrophysical observed data and, hence, reveal a mismatch between the linear theory and observations and experiments. The mismatch seems to have been resolved, at least in certain regimes, in the presence of a weak magnetic field, revealing magnetorotational instability. The present work explores the effects of stochastic noise on such magnetohydrodynamic flows, in order to resolve the above mismatch generically for the hot flows. We essentially concentrate on a small section of such a flow which is nothing but a plane shear flow supplemented by the Coriolis effect, mimicking a small section of an astrophysical accretion disk around a compact object. It is found that such stochastically driven flows exhibit large temporal and spatial autocorrelations and cross-correlations of perturbation and, hence, large energy dissipations of perturbation, which generate instability. Interestingly, autocorrelations and cross-correlations appear independent of background angular velocity profiles, which are Rayleigh stable, indicating their universality. This work initiates our attempt to understand the evolution of three-dimensional hydromagnetic perturbations in rotating shear flows in the presence of stochastic noise. © 2013 American Physical Society.
Resumo:
Biological experiments often produce enormous amount of data, which are usually analyzed by data clustering. Cluster analysis refers to statistical methods that are used to assign data with similar properties into several smaller, more meaningful groups. Two commonly used clustering techniques are introduced in the following section: principal component analysis (PCA) and hierarchical clustering. PCA calculates the variance between variables and groups them into a few uncorrelated groups or principal components (PCs) that are orthogonal to each other. Hierarchical clustering is carried out by separating data into many clusters and merging similar clusters together. Here, we use an example of human leukocyte antigen (HLA) supertype classification to demonstrate the usage of the two methods. Two programs, Generating Optimal Linear Partial Least Square Estimations (GOLPE) and Sybyl, are used for PCA and hierarchical clustering, respectively. However, the reader should bear in mind that the methods have been incorporated into other software as well, such as SIMCA, statistiXL, and R.
Resumo:
Emerging vehicular comfort applications pose a host of completely new set of requirements such as maintaining end-to-end connectivity, packet routing, and reliable communication for internet access while on the move. One of the biggest challenges is to provide good quality of service (QoS) such as low packet delay while coping with the fast topological changes. In this paper, we propose a clustering algorithm based on minimal path loss ratio (MPLR) which should help in spectrum efficiency and reduce data congestion in the network. The vehicular nodes which experience minimal path loss are selected as the cluster heads. The performance of the MPLR clustering algorithm is calculated by rate of change of cluster heads, average number of clusters and average cluster size. Vehicular traffic models derived from the Traffic Wales data are fed as input to the motorway simulator. A mathematical analysis for the rate of change of cluster head is derived which validates the MPLR algorithm and is compared with the simulated results. The mathematical and simulated results are in good agreement indicating the stability of the algorithm and the accuracy of the simulator. The MPLR system is also compared with V2R system with MPLR system performing better. © 2013 IEEE.
Resumo:
The Securities and Exchange Commission (SEC) in the United States and in particular its immediately past chairman, Christopher Cox, has been actively promoting an upgrade of the EDGAR system of disseminating filings. The new generation of information provision has been dubbed by Chairman Cox, "Interactive Data" (SEC, 2006). In October this year the Office of Interactive Disclosure was created(http://www.sec.gov/news/press/2007/2007-213.htm). The focus of this paper is to examine the way in which the non-professional investor has been constructed by various actors. We examine the manner in which Interactive Data has been sold as the panacea for financial market 'irregularities' by the SEC and others. The academic literature shows almost no evidence of researching non-professional investors in any real sense (Young, 2006). Both this literature and the behaviour of representatives of institutions such as the SEC and FSA appears to find it convenient to construct this class of investor in a particular form and to speak for them. We theorise the activities of the SEC and its chairman in particular over a period of about three years, both following and prior to the 'credit crunch'. Our approach is to examine a selection of the policy documents released by the SEC and other interested parties and the statements made by some of the policy makers and regulators central to the programme to advance the socio-technical project that is constituted by Interactive Data. We adopt insights from ANT and more particularly the sociology of translation (Callon, 1986; Latour, 1987, 2005; Law, 1996, 2002; Law & Singleton, 2005) to show how individuals and regulators have acted as spokespersons for this malleable class of investor. We theorise the processes of accountability to investors and others and in so doing reveal the regulatory bodies taking the regulated for granted. The possible implications of technological developments in digital reporting have been identified also by the CEO's of the six biggest audit firms in a discussion document on the role of accounting information and audit in the future of global capital markets (DiPiazza et al., 2006). The potential for digital reporting enabled through XBRL to "revolutionize the entire company reporting model" (p.16) is discussed and they conclude that the new model "should be driven by the wants of investors and other users of company information,..." (p.17; emphasis in the original). Here rather than examine the somewhat illusive and vexing question of whether adding interactive functionality to 'traditional' reports can achieve the benefits claimed for nonprofessional investors we wish to consider the rhetorical and discursive moves in which the SEC and others have engaged to present such developments as providing clearer reporting and accountability standards and serving the interests of this constructed and largely unknown group - the non-professional investor.
Resumo:
Descriptions of vegetation communities are often based on vague semantic terms describing species presence and dominance. For this reason, some researchers advocate the use of fuzzy sets in the statistical classification of plant species data into communities. In this study, spatially referenced vegetation abundance values collected from Greek phrygana were analysed by ordination (DECORANA), and classified on the resulting axes using fuzzy c-means to yield a point data-set representing local memberships in characteristic plant communities. The fuzzy clusters matched vegetation communities noted in the field, which tended to grade into one another, rather than occupying discrete patches. The fuzzy set representation of the community exploited the strengths of detrended correspondence analysis while retaining richer information than a TWINSPAN classification of the same data. Thus, in the absence of phytosociological benchmarks, meaningful and manageable habitat information could be derived from complex, multivariate species data. We also analysed the influence of the reliability of different surveyors' field observations by multiple sampling at a selected sample location. We show that the impact of surveyor error was more severe in the Boolean than the fuzzy classification. © 2007 Springer.
Resumo:
Ant Colony Optimisation algorithms mimic the way ants use pheromones for marking paths to important locations. Pheromone traces are followed and reinforced by other ants, but also evaporate over time. As a consequence, optimal paths attract more pheromone, whilst the less useful paths fade away. In the Multiple Pheromone Ant Clustering Algorithm (MPACA), ants detect features of objects represented as nodes within graph space. Each node has one or more ants assigned to each feature. Ants attempt to locate nodes with matching feature values, depositing pheromone traces on the way. This use of multiple pheromone values is a key innovation. Ants record other ant encounters, keeping a record of the features and colony membership of ants. The recorded values determine when ants should combine their features to look for conjunctions and whether they should merge into colonies. This ability to detect and deposit pheromone representative of feature combinations, and the resulting colony formation, renders the algorithm a powerful clustering tool. The MPACA operates as follows: (i) initially each node has ants assigned to each feature; (ii) ants roam the graph space searching for nodes with matching features; (iii) when departing matching nodes, ants deposit pheromones to inform other ants that the path goes to a node with the associated feature values; (iv) ant feature encounters are counted each time an ant arrives at a node; (v) if the feature encounters exceed a threshold value, feature combination occurs; (vi) a similar mechanism is used for colony merging. The model varies from traditional ACO in that: (i) a modified pheromone-driven movement mechanism is used; (ii) ants learn feature combinations and deposit multiple pheromone scents accordingly; (iii) ants merge into colonies, the basis of cluster formation. The MPACA is evaluated over synthetic and real-world datasets and its performance compares favourably with alternative approaches.
Resumo:
Failure to detect patients at risk of attempting suicide can result in tragic consequences. Identifying risks earlier and more accurately helps prevent serious incidents occurring and is the objective of the GRiST clinical decision support system (CDSS). One of the problems it faces is high variability in the type and quantity of data submitted for patients, who are assessed in multiple contexts along the care pathway. Although GRiST identifies up to 138 patient cues to collect, only about half of them are relevant for any one patient and their roles may not be for risk evaluation but more for risk management. This paper explores the data collection behaviour of clinicians using GRiST to see whether it can elucidate which variables are important for risk evaluations and when. The GRiST CDSS is based on a cognitive model of human expertise manifested by a sophisticated hierarchical knowledge structure or tree. This structure is used by the GRiST interface to provide top-down controlled access to the patient data. Our research explores relationships between the answers given to these higher-level 'branch' questions to see whether they can help direct assessors to the most important data, depending on the patient profile and assessment context. The outcome is a model for dynamic data collection driven by the knowledge hierarchy. It has potential for improving other clinical decision support systems operating in domains with high dimensional data that are only partially collected and in a variety of combinations.
Resumo:
In data mining, efforts have focused on finding methods for efficient and effective cluster analysis in large databases. Active themes of research focus on the scalability of clustering methods, the effectiveness of methods for clustering complex shapes and types of data, high-dimensional clustering techniques, and methods for clustering mixed numerical and categorical data in large databases. One of the most accuracy approach based on dynamic modeling of cluster similarity is called Chameleon. In this paper we present a modified hierarchical clustering algorithm that used the main idea of Chameleon and the effectiveness of suggested approach will be demonstrated by the experimental results.
Resumo:
The concern over the quality of delivering video streaming services in mobile wireless networks is addressed in this work. A framework that enhances the Quality of Experience (QoE) of end users through a quality driven resource allocation scheme is proposed. To play a key role, an objective no-reference quality metric, Pause Intensity (PI), is adopted to derive a resource allocation algorithm for video streaming. The framework is examined in the context of 3GPP Long Term Evolution (LTE) systems. The requirements and structure of the proposed PI-based framework are discussed, and results are compared with existing scheduling methods on fairness, efficiency and correlation (between the required and allocated data rates). Furthermore, it is shown that the proposed framework can produce a trade-off between the three parameters through the QoE-aware resource allocation process.
Resumo:
* The research is supported partly by INTAS: 04-77-7173 project, http://www.intas.be
Resumo:
Categorising visitors based on their interaction with a website is a key problem in Web content usage. The clickstreams generated by various users often follow distinct patterns, the knowledge of which may help in providing customised content. This paper proposes an approach to clustering weblog data, based on ART2 neural networks. Due to the characteristics of the ART2 neural network model, the proposed approach can be used for unsupervised and self-learning data mining, which makes it adaptable to dynamically changing websites.
Resumo:
This research is focused on the optimisation of resource utilisation in wireless mobile networks with the consideration of the users’ experienced quality of video streaming services. The study specifically considers the new generation of mobile communication networks, i.e. 4G-LTE, as the main research context. The background study provides an overview of the main properties of the relevant technologies investigated. These include video streaming protocols and networks, video service quality assessment methods, the infrastructure and related functionalities of LTE, and resource allocation algorithms in mobile communication systems. A mathematical model based on an objective and no-reference quality assessment metric for video streaming, namely Pause Intensity, is developed in this work for the evaluation of the continuity of streaming services. The analytical model is verified by extensive simulation and subjective testing on the joint impairment effects of the pause duration and pause frequency. Various types of the video contents and different levels of the impairments have been used in the process of validation tests. It has been shown that Pause Intensity is closely correlated with the subjective quality measurement in terms of the Mean Opinion Score and this correlation property is content independent. Based on the Pause Intensity metric, an optimised resource allocation approach is proposed for the given user requirements, communication system specifications and network performances. This approach concerns both system efficiency and fairness when establishing appropriate resource allocation algorithms, together with the consideration of the correlation between the required and allocated data rates per user. Pause Intensity plays a key role here, representing the required level of Quality of Experience (QoE) to ensure the best balance between system efficiency and fairness. The 3GPP Long Term Evolution (LTE) system is used as the main application environment where the proposed research framework is examined and the results are compared with existing scheduling methods on the achievable fairness, efficiency and correlation. Adaptive video streaming technologies are also investigated and combined with our initiatives on determining the distribution of QoE performance across the network. The resulting scheduling process is controlled through the prioritization of users by considering their perceived quality for the services received. Meanwhile, a trade-off between fairness and efficiency is maintained through an online adjustment of the scheduler’s parameters. Furthermore, Pause Intensity is applied to act as a regulator to realise the rate adaptation function during the end user’s playback of the adaptive streaming service. The adaptive rates under various channel conditions and the shape of the QoE distribution amongst the users for different scheduling policies have been demonstrated in the context of LTE. Finally, the work for interworking between mobile communication system at the macro-cell level and the different deployments of WiFi technologies throughout the macro-cell is presented. A QoEdriven approach is proposed to analyse the offloading mechanism of the user’s data (e.g. video traffic) while the new rate distribution algorithm reshapes the network capacity across the macrocell. The scheduling policy derived is used to regulate the performance of the resource allocation across the fair-efficient spectrum. The associated offloading mechanism can properly control the number of the users within the coverages of the macro-cell base station and each of the WiFi access points involved. The performance of the non-seamless and user-controlled mobile traffic offloading (through the mobile WiFi devices) has been evaluated and compared with that of the standard operator-controlled WiFi hotspots.
Resumo:
The sharing of near real-time traceability knowledge in supply chains plays a central role in coordinating business operations and is a key driver for their success. However before traceability datasets received from external partners can be integrated with datasets generated internally within an organisation, they need to be validated against information recorded for the physical goods received as well as against bespoke rules defined to ensure uniformity, consistency and completeness within the supply chain. In this paper, we present a knowledge driven framework for the runtime validation of critical constraints on incoming traceability datasets encapuslated as EPCIS event-based linked pedigrees. Our constraints are defined using SPARQL queries and SPIN rules. We present a novel validation architecture based on the integration of Apache Storm framework for real time, distributed computation with popular Semantic Web/Linked data libraries and exemplify our methodology on an abstraction of the pharmaceutical supply chain.
Resumo:
This paper considers the problem of low-dimensional visualisation of very high dimensional information sources for the purpose of situation awareness in the maritime environment. In response to the requirement for human decision support aids to reduce information overload (and specifically, data amenable to inter-point relative similarity measures) appropriate to the below-water maritime domain, we are investigating a preliminary prototype topographic visualisation model. The focus of the current paper is on the mathematical problem of exploiting a relative dissimilarity representation of signals in a visual informatics mapping model, driven by real-world sonar systems. A realistic noise model is explored and incorporated into non-linear and topographic visualisation algorithms building on the approach of [9]. Concepts are illustrated using a real world dataset of 32 hydrophones monitoring a shallow-water environment in which targets are present and dynamic.
Resumo:
Parkinson's disease is a complex heterogeneous disorder with urgent need for disease-modifying therapies. Progress in successful therapeutic approaches for PD will require an unprecedented level of collaboration. At a workshop hosted by Parkinson's UK and co-organized by Critical Path Institute's (C-Path) Coalition Against Major Diseases (CAMD) Consortiums, investigators from industry, academia, government and regulatory agencies agreed on the need for sharing of data to enable future success. Government agencies included EMA, FDA, NINDS/NIH and IMI (Innovative Medicines Initiative). Emerging discoveries in new biomarkers and genetic endophenotypes are contributing to our understanding of the underlying pathophysiology of PD. In parallel there is growing recognition that early intervention will be key for successful treatments aimed at disease modification. At present, there is a lack of a comprehensive understanding of disease progression and the many factors that contribute to disease progression heterogeneity. Novel therapeutic targets and trial designs that incorporate existing and new biomarkers to evaluate drug effects independently and in combination are required. The integration of robust clinical data sets is viewed as a powerful approach to hasten medical discovery and therapies, as is being realized across diverse disease conditions employing big data analytics for healthcare. The application of lessons learned from parallel efforts is critical to identify barriers and enable a viable path forward. A roadmap is presented for a regulatory, academic, industry and advocacy driven integrated initiative that aims to facilitate and streamline new drug trials and registrations in Parkinson's disease.