811 resultados para Hierarchical clustering
Resumo:
Acknowledgements MW and RVD have been supported by the German Federal Ministry for Education and Research via the BMBF Young Investigators Group CoSy-CC2 (grant 18 Marc Wiedermann et al. no. 01LN1306A). JFD thanks the Stordalen Foundation and BMBF (project GLUES) for financial support. JK acknowledges the IRTG 1740 funded by DFG and FAPESP. Coupled climate network analysis has been performed using the Python package pyunicorn (Donges et al, 2015a) that is available at https://github.com/pik-copan/pyunicorn.
Resumo:
Acknowledgements MW and RVD have been supported by the German Federal Ministry for Education and Research via the BMBF Young Investigators Group CoSy-CC2 (grant 18 Marc Wiedermann et al. no. 01LN1306A). JFD thanks the Stordalen Foundation and BMBF (project GLUES) for financial support. JK acknowledges the IRTG 1740 funded by DFG and FAPESP. Coupled climate network analysis has been performed using the Python package pyunicorn (Donges et al, 2015a) that is available at https://github.com/pik-copan/pyunicorn.
Resumo:
Acknowledgments This study was financed by FEDER funds through the Programa Operacional Factores de Competitividade— COMPETE, and National funds through the Portuguese Foundation for Science and Technology—FCT, within the scope of the projects PERSIST (PTDC/BIA-BEC/105110/2008), NETPERSIST (PTDC/ AAG-MAA/3227/2012), and MateFrag (PTDC/BIA-BIC/6582/2014). RP was supported by the FCT grant SFRH/BPD/73478/2010 and SFRH/BPD/109235/2015. PB was supported by EDP Biodiversity Chair. We thank Rita Brito and Marta Duarte for help during field work. We thank Chris Sutherland, Douglas Morris, William Morgan, and Richard Hassall for critical reviews of early versions of the paper. We also thank two anonymous reviewers for helpful comments to improve the paper.
Resumo:
Using survey data from 358 online customers, the study finds that the e-service quality construct conforms to the structure of a third-order factor model that links online service quality perceptions to distinct and actionable dimensions, including (1) website design, (2) fulfilment, (3) customer service, and (4) security/privacy. Each dimension is found to consist of several attributes that define the basis of e-service quality perceptions. A comprehensive specification of the construct, which includes attributes not covered in existing scales, is developed. The study contrasts a formative model consisting of 4 dimensions and 16 attributes against a reflective conceptualization. The results of this comparison indicate that studies using an incorrectly specified model overestimate the importance of certain e-service quality attributes. Global fit criteria are also found to support the detection of measurement misspecification. Meta-analytic data from 31,264 online customers are used to show that the developed measurement predicts customer behavior better than widely used scales, such as WebQual and E-S-Qual. The results show that the new measurement enables managers to assess e-service quality more accurately and predict customer behavior more reliably.
Resumo:
Category hierarchy is an abstraction mechanism for efficiently managing large-scale resources. In an open environment, a category hierarchy will inevitably become inappropriate for managing resources that constantly change with unpredictable pattern. An inappropriate category hierarchy will mislead the management of resources. The increasing dynamicity and scale of online resources increase the requirement of automatically maintaining category hierarchy. Previous studies about category hierarchy mainly focus on either the generation of category hierarchy or the classification of resources under a pre-defined category hierarchy. The automatic maintenance of category hierarchy has been neglected. Making abstraction among categories and measuring the similarity between categories are two basic behaviours to generate a category hierarchy. Humans are good at making abstraction but limited in ability to calculate the similarities between large-scale resources. Computing models are good at calculating the similarities between large-scale resources but limited in ability to make abstraction. To take both advantages of human view and computing ability, this paper proposes a two-phase approach to automatically maintaining category hierarchy within two scales by detecting the internal pattern change of categories. The global phase clusters resources to generate a reference category hierarchy and gets similarity between categories to detect inappropriate categories in the initial category hierarchy. The accuracy of the clustering approaches in generating category hierarchy determines the rationality of the global maintenance. The local phase detects topical changes and then adjusts inappropriate categories with three local operations. The global phase can quickly target inappropriate categories top-down and carry out cross-branch adjustment, which can also accelerate the local-phase adjustments. The local phase detects and adjusts the local-range inappropriate categories that are not adjusted in the global phase. By incorporating the two complementary phase adjustments, the approach can significantly improve the topical cohesion and accuracy of category hierarchy. A new measure is proposed for evaluating category hierarchy considering not only the balance of the hierarchical structure but also the accuracy of classification. Experiments show that the proposed approach is feasible and effective to adjust inappropriate category hierarchy. The proposed approach can be used to maintain the category hierarchy for managing various resources in dynamic application environment. It also provides an approach to specialize the current online category hierarchy to organize resources with more specific categories.
Resumo:
With the popularization of GPS-enabled devices such as mobile phones, location data are becoming available at an unprecedented scale. The locations may be collected from many different sources such as vehicles moving around a city, user check-ins in social networks, and geo-tagged micro-blogging photos or messages. Besides the longitude and latitude, each location record may also have a timestamp and additional information such as the name of the location. Time-ordered sequences of these locations form trajectories, which together contain useful high-level information about people's movement patterns.
The first part of this thesis focuses on a few geometric problems motivated by the matching and clustering of trajectories. We first give a new algorithm for computing a matching between a pair of curves under existing models such as dynamic time warping (DTW). The algorithm is more efficient than standard dynamic programming algorithms both theoretically and practically. We then propose a new matching model for trajectories that avoids the drawbacks of existing models. For trajectory clustering, we present an algorithm that computes clusters of subtrajectories, which correspond to common movement patterns. We also consider trajectories of check-ins, and propose a statistical generative model, which identifies check-in clusters as well as the transition patterns between the clusters.
The second part of the thesis considers the problem of covering shortest paths in a road network, motivated by an EV charging station placement problem. More specifically, a subset of vertices in the road network are selected to place charging stations so that every shortest path contains enough charging stations and can be traveled by an EV without draining the battery. We first introduce a general technique for the geometric set cover problem. This technique leads to near-linear-time approximation algorithms, which are the state-of-the-art algorithms for this problem in either running time or approximation ratio. We then use this technique to develop a near-linear-time algorithm for this
shortest-path cover problem.
Resumo:
Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.
Resumo:
China is today facing rapid economic development and the long-term implications of China’s rise for European economy, society and culture, are constantly debated but still almost unknown. Moreover, only recently a new volume edited by Kunzmann has clearly pointed out a particular field of research like the EU spatial impact of China’s convergence in the global market. The aim of the present paper is to deal with the spatial issues related to the growing Chinese communities, especially in Italy, that are part of a more general and considerable transformation process of the traditional Chinese enclaves in EU cities: from recognizable “Chinatowns” to new hybrid urban formations where housing, retail, wholesale and even commodity production often tend to match. Key-Concepts like rise, fragmentation, infringement and fear are useful in analysing some of the more controversial socio-economic dynamics of Chinese clusters especially in a traditionally manufactured-based country like Italy, where it’s recognizable a unique paradox of a “double competition” from outside and from inside. This statement poses a serious threat to local economic systems in terms of sustainability and social cohesion, making it necessary to rethink the role and the nature of public action in facing new forms of marginality at urban and regional level.
Resumo:
The toxic dinoflagellate Alexandrium ostenfeldii is the only bioluminescent bloom-forming phytoplankton in coastal waters of the Baltic Sea. We analysed partial luciferase gene (lcf) sequences and bioluminescence production in Baltic A. ostenfeldii bloom populations to assess the distribution and consistency of the trait in the Baltic Sea, and to evaluate applications for early detection of toxic blooms. Lcf was consistently present in 61 Baltic Sea A. ostenfeldii strains isolated from six separate bloom sites. All Baltic Sea strains except one produced bioluminescence. In contrast, the presence of lcf and the ability to produce bioluminescence did vary among strains from other parts of Europe. In phylogenetic analyses, lcf sequences of Baltic Sea strains clustered separately from North Sea strains, but variation between Baltic Sea strains was not sufficient to distinguish between bloom populations. Clustering of the lcf marker was similar to internal transcribed spacer (ITS) sequences with differences being minor and limited to the lowest hierarchical clusters, indicating a similar rate of evolution of the two genes. In relation to monitoring, the consistent presence of lcf and close coupling of lcf with bioluminescence suggests that bioluminescence can be used to reliably monitor toxic bloom-forming A. ostenfeldii in the Baltic Sea.
Resumo:
The toxic dinoflagellate Alexandrium ostenfeldii is the only bioluminescent bloom-forming phytoplankton in coastal waters of the Baltic Sea. We analysed partial luciferase gene (lcf) sequences and bioluminescence production in Baltic A. ostenfeldii bloom populations to assess the distribution and consistency of the trait in the Baltic Sea, and to evaluate applications for early detection of toxic blooms. Lcf was consistently present in 61 Baltic Sea A. ostenfeldii strains isolated from six separate bloom sites. All Baltic Sea strains except one produced bioluminescence. In contrast, the presence of lcf and the ability to produce bioluminescence did vary among strains from other parts of Europe. In phylogenetic analyses, lcf sequences of Baltic Sea strains clustered separately from North Sea strains, but variation between Baltic Sea strains was not sufficient to distinguish between bloom populations. Clustering of the lcf marker was similar to internal transcribed spacer (ITS) sequences with differences being minor and limited to the lowest hierarchical clusters, indicating a similar rate of evolution of the two genes. In relation to monitoring, the consistent presence of lcf and close coupling of lcf with bioluminescence suggests that bioluminescence can be used to reliably monitor toxic bloom-forming A. ostenfeldii in the Baltic Sea.
Resumo:
Marine heatwaves (MHWs) have been observed around the world and are expected to increase in intensity and frequency under anthropogenic climate change. A variety of impacts have been associated with these anomalous events, including shifts in species ranges, local extinctions and economic impacts on seafood industries through declines in important fishery species and impacts on aquaculture. Extreme temperatures are increasingly seen as important influences on biological systems, yet a consistent definition of MHWs does not exist. A clear definition will facilitate retrospective comparisons between MHWs, enabling the synthesis and a mechanistic understanding of the role of MHWs in marine ecosystems. Building on research into atmospheric heatwaves, we propose both a general and specific definition for MHWs, based on a hierarchy of metrics that allow for different data sets to be used in identifying MHWs. We generally define a MHW as a prolonged discrete anomalously warm water event that can be described by its duration, intensity, rate of evolution, and spatial extent. Specifically, we consider an anomalously warm event to be a MHW if it lasts for five or more days, with temperatures warmer than the 90th percentile based on a 30-year historical baseline period. This structure provides flexibility with regard to the description of MHWs and transparency in communicating MHWs to a general audience. The use of these metrics is illustrated for three 21st century MHWs; the northern Mediterranean event in 2003, the Western Australia ‘Ningaloo Niño’ in 2011, and the northwest Atlantic event in 2012. We recommend a specific quantitative definition for MHWs to facilitate global comparisons and to advance our understanding of these phenomena.
Resumo:
Marine heatwaves (MHWs) have been observed around the world and are expected to increase in intensity and frequency under anthropogenic climate change. A variety of impacts have been associated with these anomalous events, including shifts in species ranges, local extinctions and economic impacts on seafood industries through declines in important fishery species and impacts on aquaculture. Extreme temperatures are increasingly seen as important influences on biological systems, yet a consistent definition of MHWs does not exist. A clear definition will facilitate retrospective comparisons between MHWs, enabling the synthesis and a mechanistic understanding of the role of MHWs in marine ecosystems. Building on research into atmospheric heatwaves, we propose both a general and specific definition for MHWs, based on a hierarchy of metrics that allow for different data sets to be used in identifying MHWs. We generally define a MHW as a prolonged discrete anomalously warm water event that can be described by its duration, intensity, rate of evolution, and spatial extent. Specifically, we consider an anomalously warm event to be a MHW if it lasts for five or more days, with temperatures warmer than the 90th percentile based on a 30-year historical baseline period. This structure provides flexibility with regard to the description of MHWs and transparency in communicating MHWs to a general audience. The use of these metrics is illustrated for three 21st century MHWs; the northern Mediterranean event in 2003, the Western Australia ‘Ningaloo Niño’ in 2011, and the northwest Atlantic event in 2012. We recommend a specific quantitative definition for MHWs to facilitate global comparisons and to advance our understanding of these phenomena.
Resumo:
Clustering algorithms, pattern mining techniques and associated quality metrics emerged as reliable methods for modeling learners’ performance, comprehension and interaction in given educational scenarios. The specificity of available data such as missing values, extreme values or outliers, creates a challenge to extract significant user models from an educational perspective. In this paper we introduce a pattern detection mechanism with-in our data analytics tool based on k-means clustering and on SSE, silhouette, Dunn index and Xi-Beni index quality metrics. Experiments performed on a dataset obtained from our online e-learning platform show that the extracted interaction patterns were representative in classifying learners. Furthermore, the performed monitoring activities created a strong basis for generating automatic feedback to learners in terms of their course participation, while relying on their previous performance. In addition, our analysis introduces automatic triggers that highlight learners who will potentially fail the course, enabling tutors to take timely actions.
Resumo:
Demand response (DR) algorithms manipulate the energy consumption schedules of controllable loads so as to satisfy grid objectives. Implementation of DR algorithms using a centralized agent can be problematic for scalability reasons, and there are issues related to the privacy of data and robustness to communication failures. Thus, it is desirable to use a scalable decentralized algorithm for the implementation of DR. In this paper, a hierarchical DR scheme is proposed for peak minimization based on Dantzig-Wolfe decomposition (DWD). In addition, a time weighted maximization option is included in the cost function, which improves the quality of service for devices seeking to receive their desired energy sooner rather than later. This paper also demonstrates how the DWD algorithm can be implemented more efficiently through the calculation of the upper and lower cost bounds after each DWD iteration.
Resumo:
Community-driven Question Answering (CQA) systems that crowdsource experiential information in the form of questions and answers and have accumulated valuable reusable knowledge. Clustering of QA datasets from CQA systems provides a means of organizing the content to ease tasks such as manual curation and tagging. In this paper, we present a clustering method that exploits the two-part question-answer structure in QA datasets to improve clustering quality. Our method, {\it MixKMeans}, composes question and answer space similarities in a way that the space on which the match is higher is allowed to dominate. This construction is motivated by our observation that semantic similarity between question-answer data (QAs) could get localized in either space. We empirically evaluate our method on a variety of real-world labeled datasets. Our results indicate that our method significantly outperforms state-of-the-art clustering methods for the task of clustering question-answer archives.