897 resultados para "Ranking"
Resumo:
The problem of clustering a large document collection is not only challenged by the number of documents and the number of dimensions, but it is also affected by the number and sizes of the clusters. Traditional clustering methods fail to scale when they need to generate a large number of clusters. Furthermore, when the clusters size in the solution is heterogeneous, i.e. some of the clusters are large in size, the similarity measures tend to degrade. A ranking based clustering method is proposed to deal with these issues in the context of the Social Event Detection task. Ranking scores are used to select a small number of most relevant clusters in order to compare and place a document. Additionally,instead of conventional cluster centroids, cluster patches are proposed to represent clusters, that are hubs-like set of documents. Text, temporal, spatial and visual content information collected from the social event images is utilized in calculating similarity. Results show that these strategies allow us to have a balance between performance and accuracy of the clustering solution gained by the clustering method.
Resumo:
For traditional information filtering (IF) models, it is often assumed that the documents in one collection are only related to one topic. However, in reality users’ interests can be diverse and the documents in the collection often involve multiple topics. Topic modelling was proposed to generate statistical models to represent multiple topics in a collection of documents, but in a topic model, topics are represented by distributions over words which are limited to distinctively represent the semantics of topics. Patterns are always thought to be more discriminative than single terms and are able to reveal the inner relations between words. This paper proposes a novel information filtering model, Significant matched Pattern-based Topic Model (SPBTM). The SPBTM represents user information needs in terms of multiple topics and each topic is represented by patterns. More importantly, the patterns are organized into groups based on their statistical and taxonomic features, from which the more representative patterns, called Significant Matched Patterns, can be identified and used to estimate the document relevance. Experiments on benchmark data sets demonstrate that the SPBTM significantly outperforms the state-of-the-art models.
Resumo:
Oleaginous microorganisms have potential to be used to produce oils as alternative feedstock for biodiesel production. Microalgae (Chlorella protothecoides and Chlorella zofingiensis), yeasts (Cryptococcus albidus and Rhodotorula mucilaginosa), and fungi (Aspergillus oryzae and Mucor plumbeus) were investigated for their ability to produce oil from glucose, xylose and glycerol. Multi-criteria analysis (MCA) using analytic hierarchy process (AHP) and preference ranking organization method for the enrichment of evaluations (PROMETHEE) with graphical analysis for interactive aid (GAIA), was used to rank and select the preferred microorganisms for oil production for biodiesel application. This was based on a number of criteria viz., oil concentration, content, production rate and yield, substrate consumption rate, fatty acids composition, biomass harvesting and nutrient costs. PROMETHEE selected A. oryzae, M. plumbeus and R. mucilaginosa as the most prospective species for oil production. However, further analysis by GAIA Webs identified A. oryzae and M. plumbeus as the best performing microorganisms.
Resumo:
In developing countries high rate of growth in demand of electric energy is felt, and so the addition of new generating units becomes necessary. In deregulated power systems private generating stations are encouraged to add new generations. Finding the appropriate location of new generator to be installed can be obtained by running repeated power flows, carrying system studies like analyzing the voltage profile, voltage stability, loss analysis etc. In this paper a new methodology is proposed which will mainly consider the existing network topology into account. A concept of T-index is introduced in this paper, which considers the electrical distances between generator and load nodes.This index is used for ranking significant new generation expansion locations and also indicates the amount of permissible generations that can be installed at these new locations. This concept facilitates for the medium and long term planning of power generation expansions within the available transmission corridors. Studies carried out on a sample 7-bus system, EHV equivalent 24-bus system and IEEE 39 bus system are presented for illustration purpose.
Resumo:
Reviewers' ratings have become one of the most influential parameters when making a decision to purchase or rent the products or services from the online vendors. Star Rating system is the de-facto standard for rating a product. It is regarded as one of the most visually appealing rating systems that directly interact with the consumers; helping them find products they will like to purchase as well as register their views on the product. It offers visual advantage to pick the popular or most rated product. Any system that is not as appealing as star system will have a chance of rejection by online business community. This paper argues that, the visual advantage is not enough to declare star rating system as a triumphant, the success of a ranking system should be measured by how effectively the system helps customers make decisions that they, retrospectively, consider correct. This paper argues and suggests a novel approach of Relative Ranking within the boundaries of star rating system to overcome a few inherent disadvantages the former system comes with. © Springer Science+Business Media B.V. 2010.
Resumo:
Instability of laminated curved composite beams made of repeated sublaminate construction is studied using finite element method. In repeated sublaminate construction, a full laminate is obtained by repeating a basic sublaminate which has a smaller number of plies. This paper deals with the determination of optimum lay-up for buckling by ranking of such composite curved beams (which may be solid or sandwich). For this purpose, use is made of a two-noded, 16 degress of freedom curved composite beam finite element. The displacements u, v, w of the element reference axis are expressed in terms of one-dimensional first-order Hermite interpolation polynomials, and line member assumptions are invoked in formulation of the elastic stiffness matrix and geometric stiffness matrix. The nonlinear expressions for the strains, occurring in beams subjected to axial, flexural and torsional loads, are incorporated in a general instability analysis. The computer program developed has been used, after extensive checking for correctness, to obtain optimum orientation scheme of the plies in the sublaminate so as to achieve maximum buckling load for typical curved solid/sandwich composite beams.
Resumo:
In this paper we describe a method for the optimum design of fiber rein forced composite laminates for strength by ranking. The software developed based on this method is capable of designing laminates for strength; which are subjected to inplane and/or bending loads and optionally hygrothermal loads. Symmetric laminates only are considered which are assumed to be made of repeated sublaminate construction. Various layup schemes are evaluated based on the laminated plate theory and quadratic failure cri terion for the given mechanical and hygrothermal loads. The optimum layup sequence in the sublaminate and the number of such sublaminates required are obtained. Further, a ply-drop round-off scheme is adopted to arrive at an optimum laminate thickness. As an example, a family of 0/90/45/ -45 bi-directional lamination schemes are examined for dif ferent types of loads and the gains in optimising the ply orientations in a sublaminate are demonstrated.
Resumo:
A successful protein-protein docking study culminates in identification of decoys at top ranks with near-native quaternary structures. However, this task remains enigmatic because no generalized scoring functions exist that effectively infer decoys according to the similarity to near-native quaternary structures. Difficulties arise because of the highly irregular nature of the protein surface and the significant variation of the nonbonding and solvation energies based on the chemical composition of the protein-protein interface. In this work, we describe a novel method combining an interface-size filter, a regression model for geometric compatibility (based on two correlated surface and packing parameters), and normalized interaction energy (calculated from correlated nonbonded and solvation energies), to effectively rank decoys from a set of 10,000 decoys. Tests on 30 unbound binary protein-protein complexes show that in 16 cases we can identify at least one decoy in top three ranks having <= 10 angstrom backbone root mean square deviation from true binding geometry. Comparisons with other state-of-art methods confirm the improved ranking power of our method without the use of any experiment-guided restraints, evolutionary information, statistical propensities, or modified interaction energy equations. Tests on 118 less-difficult bound binary protein-protein complexes with <= 35% sequence redundancy at the interface showed that in 77% cases, at least 1 in 10,000 decoys were identified with <= 5 angstrom backbone root mean square deviation from true geometry at first rank. The work will promote the use of new concepts where correlations among parameters provide more robust scoring models. It will facilitate studies involving molecular interactions, including modeling of large macromolecular assemblies and protein structure prediction. (C) 2010 Wiley Periodicals, Inc. J Comput Chem 32: 787-796, 2011.
Resumo:
Buckling of discretely stiffened composite cylindrical panels made of repeated sublaminate construction is studied using a finite element method. In repeated sublaminate construction, a full laminate is obtained by repeating a basic sublaminate, which has a smaller number of plies. This paper deals with the determination of the optimum lay-up for buckling by ranking of such stiffened (longitudinal and hoop) composite cylindrical panels. For this purpose we use the particularized form of a four-noded, 48 degrees of freedom doubly curved quadrilateral thin shell finite element together with a fully compatible two-noded, 16 degrees of freedom composite stiffener element. The computer program developed has been used, after extensive checking for correctness, to obtain an optimum orientation scheme of the plies in the sublaminate so as to achieve maximum buckling load for a specified thickness of typical stiffened composite cylindrical panels.
Resumo:
Learning to rank from relevance judgment is an active research area. Itemwise score regression, pairwise preference satisfaction, and listwise structured learning are the major techniques in use. Listwise structured learning has been applied recently to optimize important non-decomposable ranking criteria like AUC (area under ROC curve) and MAP(mean average precision). We propose new, almost-lineartime algorithms to optimize for two other criteria widely used to evaluate search systems: MRR (mean reciprocal rank) and NDCG (normalized discounted cumulative gain)in the max-margin structured learning framework. We also demonstrate that, for different ranking criteria, one may need to use different feature maps. Search applications should not be optimized in favor of a single criterion, because they need to cater to a variety of queries. E.g., MRR is best for navigational queries, while NDCG is best for informational queries. A key contribution of this paper is to fold multiple ranking loss functions into a multi-criteria max-margin optimization.The result is a single, robust ranking model that is close to the best accuracy of learners trained on individual criteria. In fact, experiments over the popular LETOR and TREC data sets show that, contrary to conventional wisdom, a test criterion is often not best served by training with the same individual criterion.
Resumo:
Users can rarely reveal their information need in full detail to a search engine within 1--2 words, so search engines need to "hedge their bets" and present diverse results within the precious 10 response slots. Diversity in ranking is of much recent interest. Most existing solutions estimate the marginal utility of an item given a set of items already in the response, and then use variants of greedy set cover. Others design graphs with the items as nodes and choose diverse items based on visit rates (PageRank). Here we introduce a radically new and natural formulation of diversity as finding centers in resistive graphs. Unlike in PageRank, we do not specify the edge resistances (equivalently, conductances) and ask for node visit rates. Instead, we look for a sparse set of center nodes so that the effective conductance from the center to the rest of the graph has maximum entropy. We give a cogent semantic justification for turning PageRank thus on its head. In marked deviation from prior work, our edge resistances are learnt from training data. Inference and learning are NP-hard, but we give practical solutions. In extensive experiments with subtopic retrieval, social network search, and document summarization, our approach convincingly surpasses recently-published diversity algorithms like subtopic cover, max-marginal relevance (MMR), Grasshopper, DivRank, and SVMdiv.
Resumo:
Ranking problems have become increasingly important in machine learning and data mining in recent years, with applications ranging from information retrieval and recommender systems to computational biology and drug discovery. In this paper, we describe a new ranking algorithm that directly maximizes the number of relevant objects retrieved at the absolute top of the list. The algorithm is a support vector style algorithm, but due to the different objective, it no longer leads to a quadratic programming problem. Instead, the dual optimization problem involves l1, ∞ constraints; we solve this dual problem using the recent l1, ∞ projection method of Quattoni et al (2009). Our algorithm can be viewed as an l∞-norm extreme of the lp-norm based algorithm of Rudin (2009) (albeit in a support vector setting rather than a boosting setting); thus we refer to the algorithm as the ‘Infinite Push’. Experiments on real-world data sets confirm the algorithm’s focus on accuracy at the absolute top of the list.
Resumo:
The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.