814 resultados para Hierarchical clustering model
Resumo:
Objective: Biomedical events extraction concerns about events describing changes on the state of bio-molecules from literature. Comparing to the protein-protein interactions (PPIs) extraction task which often only involves the extraction of binary relations between two proteins, biomedical events extraction is much harder since it needs to deal with complex events consisting of embedded or hierarchical relations among proteins, events, and their textual triggers. In this paper, we propose an information extraction system based on the hidden vector state (HVS) model, called HVS-BioEvent, for biomedical events extraction, and investigate its capability in extracting complex events. Methods and material: HVS has been previously employed for extracting PPIs. In HVS-BioEvent, we propose an automated way to generate abstract annotations for HVS training and further propose novel machine learning approaches for event trigger words identification, and for biomedical events extraction from the HVS parse results. Results: Our proposed system achieves an F-score of 49.57% on the corpus used in the BioNLP'09 shared task, which is only 2.38% lower than the best performing system by UTurku in the BioNLP'09 shared task. Nevertheless, HVS-BioEvent outperforms UTurku's system on complex events extraction with 36.57% vs. 30.52% being achieved for extracting regulation events, and 40.61% vs. 38.99% for negative regulation events. Conclusions: The results suggest that the HVS model with the hierarchical hidden state structure is indeed more suitable for complex event extraction since it could naturally model embedded structural context in sentences.
Resumo:
A major challenge in text mining for biomedicine is automatically extracting protein-protein interactions from the vast amount of biomedical literature. We have constructed an information extraction system based on the Hidden Vector State (HVS) model for protein-protein interactions. The HVS model can be trained using only lightly annotated data whilst simultaneously retaining sufficient ability to capture the hierarchical structure. When applied in extracting protein-protein interactions, we found that it performed better than other established statistical methods and achieved 61.5% in F-score with balanced recall and precision values. Moreover, the statistical nature of the pure data-driven HVS model makes it intrinsically robust and it can be easily adapted to other domains.
Resumo:
In this paper we investigate whether consideration of store-level heterogeneity in marketing mix effects improves the accuracy of the marketing mix elasticities, fit, and forecasting accuracy of the widely-applied SCAN*PRO model of store sales. Models with continuous and discrete representations of heterogeneity, estimated using hierarchical Bayes (HB) and finite mixture (FM) techniques, respectively, are empirically compared to the original model, which does not account for store-level heterogeneity in marketing mix effects, and is estimated using ordinary least squares (OLS). The empirical comparisons are conducted in two contexts: Dutch store-level scanner data for the shampoo product category, and an extensive simulation experiment. The simulation investigates how between- and within-segment variance in marketing mix effects, error variance, the number of weeks of data, and the number of stores impact the accuracy of marketing mix elasticities, model fit, and forecasting accuracy. Contrary to expectations, accommodating store-level heterogeneity does not improve the accuracy of marketing mix elasticities relative to the homogeneous SCAN*PRO model, suggesting that little may be lost by employing the original homogeneous SCAN*PRO model estimated using ordinary least squares. Improvements in fit and forecasting accuracy are also fairly modest. We pursue an explanation for this result since research in other contexts has shown clear advantages from assuming some type of heterogeneity in market response models. In an Afterthought section, we comment on the controversial nature of our result, distinguishing factors inherent to household-level data and associated models vs. general store-level data and associated models vs. the unique SCAN*PRO model specification.
Resumo:
Drawing on the perceived organizational membership theoretical framework and the social identity view of dissonance theory, I examined in this study the dynamics of the relationship between psychological contract breach and organizational identification. I included group-level transformational and transactional leadership as well as procedural justice in the hypothesized model as key antecedents for organizational membership processes. I further explored the mediating role of psychological contract breach in the relationship between leadership, procedural justice climate, and organizational identification and proposed separateness–connectedness self-schema as an important moderator of the above mediated relationship. Hierarchical linear modeling results from a sample of 864 employees from 162 work units in 10 Greek organizations indicated that employees' perception of psychological contract breach negatively affected their organizational identification. I also found psychological contract breach to mediate the impact of transformational and transactional leadership on organizational identification. Results further provided support for moderated mediation and showed that the indirect effects of transformational and transactional leadership on identification through psychological contract breach were stronger for employees with a low connectedness self-schema.
Resumo:
In this paper a Hierarchical Analytical Network Process (HANP) model is demonstrated for evaluating alternative technologies for generating electricity from MSW in India. The technological alternatives and evaluation criteria for the HANP study are characterised by reviewing the literature and consulting experts in the field of waste management. Technologies reviewed in the context of India include landfill, anaerobic digestion, incineration, pelletisation and gasification. To investigate the sensitivity of the result, we examine variations in expert opinions and carry out an Analytical Hierarchy Process (AHP) analysis for comparison. We find that anaerobic digestion is the preferred technology for generating electricity from MSW in India. Gasification is indicated as the preferred technology in an AHP model due to the exclusion of criteria dependencies and in an HANP analysis when placing a high priority on net output and retention time. We conclude that HANP successfully provides a structured framework for recommending which technologies to pursue in India, and the adoption of such tools is critical at a time when key investments in infrastructure are being made. Therefore the presented methodology is thought to have a wider potential for investors, policy makers, researchers and plant developers in India and elsewhere. © 2013 Elsevier Ltd. All rights reserved.
Resumo:
This paper clarifies the role of alternative optimal solutions in the clustering of multidimensional observations using data envelopment analysis (DEA). The paper shows that alternative optimal solutions corresponding to several units produce different groups with different sizes and different decision making units (DMUs) at each class. This implies that a specific DMU may be grouped into different clusters when the corresponding DEA model has multiple optimal solutions. © 2011 Elsevier B.V. All rights reserved.
Resumo:
The Multiple Pheromone Ant Clustering Algorithm (MPACA) models the collective behaviour of ants to find clusters in data and to assign objects to the most appropriate class. It is an ant colony optimisation approach that uses pheromones to mark paths linking objects that are similar and potentially members of the same cluster or class. Its novelty is in the way it uses separate pheromones for each descriptive attribute of the object rather than a single pheromone representing the whole object. Ants that encounter other ants frequently enough can combine the attribute values they are detecting, which enables the MPACA to learn influential variable interactions. This paper applies the model to real-world data from two domains. One is logistics, focusing on resource allocation rather than the more traditional vehicle-routing problem. The other is mental-health risk assessment. The task for the MPACA in each domain was to predict class membership where the classes for the logistics domain were the levels of demand on haulage company resources and the mental-health classes were levels of suicide risk. Results on these noisy real-world data were promising, demonstrating the ability of the MPACA to find patterns in the data with accuracy comparable to more traditional linear regression models. © 2013 Polish Information Processing Society.
Resumo:
Projection of a high-dimensional dataset onto a two-dimensional space is a useful tool to visualise structures and relationships in the dataset. However, a single two-dimensional visualisation may not display all the intrinsic structure. Therefore, hierarchical/multi-level visualisation methods have been used to extract more detailed understanding of the data. Here we propose a multi-level Gaussian process latent variable model (MLGPLVM). MLGPLVM works by segmenting data (with e.g. K-means, Gaussian mixture model or interactive clustering) in the visualisation space and then fitting a visualisation model to each subset. To measure the quality of multi-level visualisation (with respect to parent and child models), metrics such as trustworthiness, continuity, mean relative rank errors, visualisation distance distortion and the negative log-likelihood per point are used. We evaluate the MLGPLVM approach on the ‘Oil Flow’ dataset and a dataset of protein electrostatic potentials for the ‘Major Histocompatibility Complex (MHC) class I’ of humans. In both cases, visual observation and the quantitative quality measures have shown better visualisation at lower levels.
Resumo:
Ant colony optimisation algorithms model the way ants use pheromones for marking paths to important locations in their environment. Pheromone traces are picked up, followed, and reinforced by other ants but also evaporate over time. Optimal paths attract more pheromone and less useful paths fade away. The main innovation of the proposed Multiple Pheromone Ant Clustering Algorithm (MPACA) is to mark objects using many pheromones, one for each value of each attribute describing the objects in multidimensional space. Every object has one or more ants assigned to each attribute value and the ants then try to find other objects with matching values, depositing pheromone traces that link them. Encounters between ants are used to determine when ants should combine their features to look for conjunctions and whether they should belong to the same colony. This paper explains the algorithm and explores its potential effectiveness for cluster analysis. © 2014 Springer International Publishing Switzerland.
Resumo:
We proposed and tested a multilevel model, underpinned by empowerment theory, that examines the processes linking high-performance work systems (HPWS) and performance outcomes at the individual and organizational levels of analyses. Data were obtained from 37 branches of 2 banking institutions in Ghana. Results of hierarchical regression analysis revealed that branch-level HPWS relates to empowerment climate. Additionally, results of hierarchical linear modeling that examined the hypothesized cross-level relationships revealed 3 salient findings. First, experienced HPWS and empowerment climate partially mediate the influence of branch-level HPWS on psychological empowerment. Second, psychological empowerment partially mediates the influence of empowerment climate and experienced HPWS on service performance. Third, service orientation moderates the psychological empowerment-service performance relationship such that the relationship is stronger for those high rather than low in service orientation. Last, ordinary least squares regression results revealed that branch-level HPWS influences branch-level market performance through cross-level and individual-level influences on service performance that emerges at the branch level as aggregated service performance. © 2011 American Psychological Association.
Resumo:
Recently, we have developed the hierarchical Generative Topographic Mapping (HGTM), an interactive method for visualization of large high-dimensional real-valued data sets. In this paper, we propose a more general visualization system by extending HGTM in three ways, which allows the user to visualize a wider range of data sets and better support the model development process. 1) We integrate HGTM with noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM). This enables us to visualize data of inherently discrete nature, e.g., collections of documents, in a hierarchical manner. 2) We give the user a choice of initializing the child plots of the current plot in either interactive, or automatic mode. In the interactive mode, the user selects "regions of interest," whereas in the automatic mode, an unsupervised minimum message length (MML)-inspired construction of a mixture of LTMs is employed. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. 3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualization plots, since they can highlight the boundaries between data clusters. We illustrate our approach on a toy example and evaluate it on three more complex real data sets. © 2005 IEEE.
Resumo:
Projects exposed to an uncertain environment must be adapted to deal with the effective integration of various planning elements and the optimization of project parameters. Time, cost, and quality are the prime objectives of a project that need to be optimized to fulfill the owner's goal. In an uncertain environment, there exist many other conflicting objectives that may also need to be optimized. These objectives are characterized by varying degrees of conflict. Moreover, an uncertain environment also causes several changes in the project plan throughout its life, demanding that the project plan be totally flexible. Goal programming (GP), a multiple criteria decision making technique, offers a good solution for this project planning problem. There the planning problem is considered from the owner's perspective, which leads to classifying the project up to the activity level. GP is applied separately at each level, and the formulated models are integrated through information flow. The flexibility and adaptability of the models lies in the ease of updating the model parameters at the required level through changing priorities and/or constraints and transmitting the information to other levels. The hierarchical model automatically provides integration among various element of planning. The proposed methodology is applied in this paper to plan a petroleum pipeline construction project, and its effectiveness is demonstrated.
Resumo:
Novel macroporous solid bases have been developed as alternative clean technologies to existing commercial homogeneous catalysts for the production of biodiesel from triglycerides; the latter suffer process disadvantages including complex separation and associated saponification and engine corrosion, and are unsuitable for continuous operation. To this end, tuneable macroporous MgAl hydrotalcites have been prepared by an alkali-free route and characterised by TGA, XRD, SEM and XPS. The macropore architecture improves diffusion of bulky triglyceride molecules to the active base sites, increasing activity. Lamellar and macroporous hydrotalcites will be compared for the transesterification of both model and plant oil feedstocks, and structure-reactivity relations identified.
Resumo:
Ant Colony Optimisation algorithms mimic the way ants use pheromones for marking paths to important locations. Pheromone traces are followed and reinforced by other ants, but also evaporate over time. As a consequence, optimal paths attract more pheromone, whilst the less useful paths fade away. In the Multiple Pheromone Ant Clustering Algorithm (MPACA), ants detect features of objects represented as nodes within graph space. Each node has one or more ants assigned to each feature. Ants attempt to locate nodes with matching feature values, depositing pheromone traces on the way. This use of multiple pheromone values is a key innovation. Ants record other ant encounters, keeping a record of the features and colony membership of ants. The recorded values determine when ants should combine their features to look for conjunctions and whether they should merge into colonies. This ability to detect and deposit pheromone representative of feature combinations, and the resulting colony formation, renders the algorithm a powerful clustering tool. The MPACA operates as follows: (i) initially each node has ants assigned to each feature; (ii) ants roam the graph space searching for nodes with matching features; (iii) when departing matching nodes, ants deposit pheromones to inform other ants that the path goes to a node with the associated feature values; (iv) ant feature encounters are counted each time an ant arrives at a node; (v) if the feature encounters exceed a threshold value, feature combination occurs; (vi) a similar mechanism is used for colony merging. The model varies from traditional ACO in that: (i) a modified pheromone-driven movement mechanism is used; (ii) ants learn feature combinations and deposit multiple pheromone scents accordingly; (iii) ants merge into colonies, the basis of cluster formation. The MPACA is evaluated over synthetic and real-world datasets and its performance compares favourably with alternative approaches.
Resumo:
Membrane systems are computational equivalent to Turing machines. However, its distributed and massively parallel nature obtain polynomial solutions opposite to traditional non-polynomial ones. Nowadays, developed investigation for implementing membrane systems has not yet reached the massively parallel character of this computational model. Better published approaches have achieved a distributed architecture denominated “partially parallel evolution with partially parallel communication” where several membranes are allocated at each processor, proxys are used to communicate with membranes allocated at different processors and a policy of access control to the communications is mandatory. With these approaches, it is obtained processors parallelism in the application of evolution rules and in the internal communication among membranes allocated inside each processor. Even though, external communications share a common communication line, needed for the communication among membranes arranged in different processors, are sequential. In this work, we present a new hierarchical architecture that reaches external communication parallelism among processors and substantially increases parallelization in the application of evolution rules and internal communications. Consequently, necessary time for each evolution step is reduced. With all of that, this new distributed hierarchical architecture is near to the massively parallel character required by the model.