939 resultados para Incremental Clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Rigid adherence to pre-specified thresholds and static graphical representations can lead to incorrect decisions on merging of clusters. As an alternative to existing automated or semi-automated methods, we developed a visual analytics approach for performing hierarchical clustering analysis of short time-series gene expression data. Dynamic sliders control parameters such as the similarity threshold at which clusters are merged and the level of relative intra-cluster distinctiveness, which can be used to identify "weak-edges" within clusters. An expert user can drill down to further explore the dendrogram and detect nested clusters and outliers. This is done by using the sliders and by pointing and clicking on the representation to cut the branches of the tree in multiple-heights. A prototype of this tool has been developed in collaboration with a small group of biologists for analysing their own datasets. Initial feedback on the tool has been positive.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The U.S. railroad companies spend billions of dollars every year on railroad track maintenance in order to ensure safety and operational efficiency of their railroad networks. Besides maintenance costs, other costs such as train accident costs, train and shipment delay costs and rolling stock maintenance costs are also closely related to track maintenance activities. Optimizing the track maintenance process on the extensive railroad networks is a very complex problem with major cost implications. Currently, the decision making process for track maintenance planning is largely manual and primarily relies on the knowledge and judgment of experts. There is considerable potential to improve the process by using operations research techniques to develop solutions to the optimization problems on track maintenance. In this dissertation study, we propose a range of mathematical models and solution algorithms for three network-level scheduling problems on track maintenance: track inspection scheduling problem (TISP), production team scheduling problem (PTSP) and job-to-project clustering problem (JTPCP). TISP involves a set of inspection teams which travel over the railroad network to identify track defects. It is a large-scale routing and scheduling problem where thousands of tasks are to be scheduled subject to many difficult side constraints such as periodicity constraints and discrete working time constraints. A vehicle routing problem formulation was proposed for TISP, and a customized heuristic algorithm was developed to solve the model. The algorithm iteratively applies a constructive heuristic and a local search algorithm in an incremental scheduling horizon framework. The proposed model and algorithm have been adopted by a Class I railroad in its decision making process. Real-world case studies show the proposed approach outperforms the manual approach in short-term scheduling and can be used to conduct long-term what-if analyses to yield managerial insights. PTSP schedules capital track maintenance projects, which are the largest track maintenance activities and account for the majority of railroad capital spending. A time-space network model was proposed to formulate PTSP. More than ten types of side constraints were considered in the model, including very complex constraints such as mutual exclusion constraints and consecution constraints. A multiple neighborhood search algorithm, including a decomposition and restriction search and a block-interchange search, was developed to solve the model. Various performance enhancement techniques, such as data reduction, augmented cost function and subproblem prioritization, were developed to improve the algorithm. The proposed approach has been adopted by a Class I railroad for two years. Our numerical results show the model solutions are able to satisfy all hard constraints and most soft constraints. Compared with the existing manual procedure, the proposed approach is able to bring significant cost savings and operational efficiency improvement. JTPCP is an intermediate problem between TISP and PTSP. It focuses on clustering thousands of capital track maintenance jobs (based on the defects identified in track inspection) into projects so that the projects can be scheduled in PTSP. A vehicle routing problem based model and a multiple-step heuristic algorithm were developed to solve this problem. Various side constraints such as mutual exclusion constraints and rounding constraints were considered. The proposed approach has been applied in practice and has shown good performance in both solution quality and efficiency.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder characterized by deficits in social communication/interaction and by unusual repetitive and restricted behaviors and interests. ASD often co-occurs in the same families with other neuropsychiatric diseases (NPD), such as intellectual disability, schizophrenia, epilepsy, depression and attention deficit hyperactivity disorder. Genetic factors have an important role in ASD etiology. Multiple copy number variants (CNVs) and single nucleotide variants (SNVs) in candidate genes have been associated with an increased risk to develop ASD. Nevertheless, recent heritability estimates and the high genotypic and phenotypic heterogeneity characteristic of ASD indicate a role of environmental and epigenetic factors, such as long noncoding RNA (lncRNA) and microRNA (miRNA), as modulators of genetic expression and further clinical presentation. Both miRNA and lncRNA are functional RNA molecules that are transcribed from DNA but not translated into proteins, instead they act as powerful regulators of gene expression. While miRNA are small noncoding RNAs with 22-25 nucleotides in length that act at the post-transcriptional level of gene expression, the lncRNA are bigger molecules (>200 nucleotides in length) that are capped, spliced, and polyadenylated, similar to messenger RNA. Although few lncRNA were well characterized until date, there is a great evidence that they are implicated in several levels of gene expression (transcription/post-transcription/post-translation, organization of protein complexes, cell– cell signaling as well as recombination) as shown in figure 1.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Established models for understanding organizational change processes seem inadequate for explaining changes undergone by organizations facing highly turbulent environments. We propose an alternative model that depicts change as constant regeneration rather than revolutionary episodes. We then propose a set of structures and processes that facilitate this constant regeneration.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

© 2014 Cises This work is distributed with License Creative Commons Attribution-Non commercial-No derivatives 4.0 International (CC BY-BC-ND 4.0)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A strain-based forming limit criterion is widely used in sheet-metal forming industry to predict necking. However, this criterion is usually valid when the strain path is linear throughout the deformation process [1]. Strain path in incremental sheet forming is often found to be severely nonlinear throughout the deformation history. Therefore, the practice of using a strain-based forming limit criterion often leads to erroneous assessments of formability and failure prediction. On the other hands, stress-based forming limit is insensitive against any changes in the strain path and hence it is first used to model the necking limit in incremental sheet forming. The stress-based forming limit is also combined with the fracture limit based on maximum shear stress criterion to show necking and fracture together. A derivation for a general mapping method from strain-based FLC to stress-based FLC using a non-quadratic yield function has been made. Simulation model is evaluated for a single point incremental forming using AA 6022-T43, and checked the accuracy against experiments. By using the path-independent necking and fracture limits, it is able to explain the deformation mechanism successfully in incremental sheet forming. The proposed model has given a good scientific basis for the development of ISF under nonlinear strain path and its usability over conventional sheet forming process as well.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traffic subarea division is vital for traffic system management and traffic network analysis in intelligent transportation systems (ITSs). Since existing methods may not be suitable for big traffic data processing, this paper presents a MapReduce-based Parallel Three-Phase K -Means (Par3PKM) algorithm for solving traffic subarea division problem on a widely adopted Hadoop distributed computing platform. Specifically, we first modify the distance metric and initialization strategy of K -Means and then employ a MapReduce paradigm to redesign the optimized K -Means algorithm for parallel clustering of large-scale taxi trajectories. Moreover, we propose a boundary identifying method to connect the borders of clustering results for each cluster. Finally, we divide traffic subarea of Beijing based on real-world trajectory data sets generated by 12,000 taxis in a period of one month using the proposed approach. Experimental evaluation results indicate that when compared with K -Means, Par2PK-Means, and ParCLARA, Par3PKM achieves higher efficiency, more accuracy, and better scalability and can effectively divide traffic subarea with big taxi trajectory data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Utility companies provide electricity to a large number of consumers. These companies need to have an accurate forecast of the next day electricity demand. Any forecast errors will result in either reliability issues or increased costs for the company. Because of the widespread roll-out of smart meters, a large amount of high resolution consumption data is now accessible which was not available in the past. This new data can be used to improve the load forecast and as a result increase the reliability and decrease the expenses of electricity providers. In this paper, a number of methods for improving load forecast using smart meter data are discussed. In these methods, consumers are first divided into a number of clusters. Then a neural network is trained for each cluster and forecasts of these networks are added together in order to form the prediction for the aggregated load. In this paper, it is demonstrated that clustering increases the forecast accuracy significantly. Criteria used for grouping consumers play an important role in this process. In this work, three different feature selection methods for clustering consumers are explained and the effect of feature extraction methods on forecast error is investigated.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The idea of meta-cognitive learning has enriched the landscape of evolving systems, because it emulates three fundamental aspects of human learning: what-to-learn; how-to-learn; and when-to-learn. However, existing meta-cognitive algorithms still exclude Scaffolding theory, which can realize a plug-and-play classifier. Consequently, these algorithms require laborious pre- and/or post-training processes to be carried out in addition to the main training process. This paper introduces a novel meta-cognitive algorithm termed GENERIC-Classifier (gClass), where the how-to-learn part constitutes a synergy of Scaffolding Theory - a tutoring theory that fosters the ability to sort out complex learning tasks, and Schema Theory - a learning theory of knowledge acquisition by humans. The what-to-learn aspect adopts an online active learning concept by virtue of an extended conflict and ignorance method, making gClass an incremental semi-supervised classifier, whereas the when-to-learn component makes use of the standard sample reserved strategy. A generalized version of the Takagi-Sugeno Kang (TSK) fuzzy system is devised to serve as the cognitive constituent. That is, the rule premise is underpinned by multivariate Gaussian functions, while the rule consequent employs a subset of the non-linear Chebyshev polynomial. Thorough empirical studies, confirmed by their corresponding statistical tests, have numerically validated the efficacy of gClass, which delivers better classification rates than state-of-the-art classifiers while having less complexity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering is applied in wireless sensor networks for increasing energy efficiency. Clustering methods in wireless sensor networks are different from those in traditional data mining systems. This paper proposes a novel clustering algorithm based on Minimal Spanning Tree (MST) and Maximum Energy resource on sensors named MSTME. Also, specified constrains of clustering in wireless sensor networks and several evaluation metrics are given. MSTME performs better than already known clustering methods of Low Energy Adaptive Clustering Hierarchy (LEACH) and Base Station Controlled Dynamic Clustering Protocol (BCDCP) in wireless sensor networks when they are evaluated by these evaluation metrics. Simulation results show MSTME increases energy efficiency and network lifetime compared with LEACH and BCDCP in two-hop and multi-hop networks, respectively. © World Scientific Publishing Company.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel Cluster Heads (CH) choosing algorithm based on both Minimal Spanning Tree and Maximum Energy resource on sensors, named MSTME, is provided for prolonging lifetime of wireless sensor networks. MSTME can satisfy three principles of optimal CHs: to have the most energy resource among sensors in local clusters, to group approximately the same number of closer sensors into clusters, and to distribute evenly in the networks in terms of location. Simulation shows the network lifetime in MSTME excels its counterparts in two-hop and multi-hop wireless sensor networks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cluster analysis has been identified as a core task in data mining. What constitutes a cluster, or a good clustering, may depend on the background of researchers and applications. This paper proposes two optimization criteria of abstract degree and fidelity in the field of image abstract. To satisfy the fidelity criteria, a novel clustering algorithm named Global Optimized Color-based DBSCAN Clustering (GOC-DBSCAN) is provided. Also, non-optimized local color information based version of GOC-DBSCAN, called HSV-DBSCAN, is given. Both of them are based on HSV color space. Clusters of GOC-DBSCAN are analyzed to find the factors that impact on the performance of both abstract degree and fidelity. Examples show generally the greater the abstract degree is, the less is the fidelity. It also shows GOC-DBSCAN outperforms HSV-DBSCAN when they are evaluated by the two optimization criteria.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, a novel approach is proposed to automatically generate both watercolor painting and pencil sketch drawing, or binary image of contour, from realism-style photo by using DBSCAN color clustering based on HSV color space. While the color clusters produced by proposed methods help to create watercolor painting, the noise pixels are useful to generate the pencil sketch drawing. Moreover, noise pixels are reassigned to color clusters by a novel algorithm to refine the contour in the watercolor painting. The main goal of this paper is to inspire non-professional artists' imagination to produce traditional style painting easily by only adjusting a few parameters. Also, another contribution of this paper is to propose an easy method to produce the binary image of contour, which is a vice product when mining image data by DBSCAN clustering. Thus the binary image is useful in resource limited system to reduce data but keep enough information of images. © 2007 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recognition of multiple moving objects is a very important task for achieving user-cared knowledge to send to the base station in wireless video-based sensor networks. However, video based sensor nodes, which have constrained resources and produce huge amount of video streams continuously, bring a challenge to segment multiple moving objects from the video stream online. Traditional efficient clustering algorithms such as DBSCAN cannot run time-efficiently and even fail to run on limited memory space on sensor nodes, because the number of pixel points is too huge. This paper provides a novel algorithm named Inter-Frame Change Directing Online clustering (IFCDO clustering) for segmenting multiple moving objects from video stream on sensor nodes. IFCDO clustering only needs to group inter-frame different pixels, thus it reduces both space and time complexity while achieves robust clusters the same as DBSCAN. Experiment results show IFCDO clustering excels DBSCAN in terms of both time and space efficiency. © 2008 IEEE.