998 resultados para Evolutionary clustering


Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes an innovative instance similarity based evaluation metric that reduces the search map for clustering to be performed. An aggregate global score is calculated for each instance using the novel idea of Fibonacci series. The use of Fibonacci numbers is able to separate the instances effectively and, in hence, the intra-cluster similarity is increased and the inter-cluster similarity is decreased during clustering. The proposed FIBCLUS algorithm is able to handle datasets with numerical, categorical and a mix of both types of attributes. Results obtained with FIBCLUS are compared with the results of existing algorithms such as k-means, x-means expected maximization and hierarchical algorithms that are widely used to cluster numeric, categorical and mix data types. Empirical analysis shows that FIBCLUS is able to produce better clustering solutions in terms of entropy, purity and F-score in comparison to the above described existing algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose to use the Tensor Space Modeling (TSM) to represent and analyze the user’s web log data that consists of multiple interests and spans across multiple dimensions. Further we propose to use the decomposition factors of the Tensors for clustering the users based on similarity of search behaviour. Preliminary results show that the proposed method outperforms the traditional Vector Space Model (VSM) based clustering.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose: Web search engines are frequently used by people to locate information on the Internet. However, not all queries have an informational goal. Instead of information, some people may be looking for specific web sites or may wish to conduct transactions with web services. This paper aims to focus on automatically classifying the different user intents behind web queries. Design/methodology/approach: For the research reported in this paper, 130,000 web search engine queries are categorized as informational, navigational, or transactional using a k-means clustering approach based on a variety of query traits. Findings: The research findings show that more than 75 percent of web queries (clustered into eight classifications) are informational in nature, with about 12 percent each for navigational and transactional. Results also show that web queries fall into eight clusters, six primarily informational, and one each of primarily transactional and navigational. Research limitations/implications: This study provides an important contribution to web search literature because it provides information about the goals of searchers and a method for automatically classifying the intents of the user queries. Automatic classification of user intent can lead to improved web search engines by tailoring results to specific user needs. Practical implications: The paper discusses how web search engines can use automatically classified user queries to provide more targeted and relevant results in web searching by implementing a real time classification method as presented in this research. Originality/value: This research investigates a new application of a method for automatically classifying the intent of user queries. There has been limited research to date on automatically classifying the user intent of web queries, even though the pay-off for web search engines can be quite beneficial. © Emerald Group Publishing Limited.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose - Since the beginning of human existence, humankind has sought, organized and used information as it evolved patterns and practices of human information behaviors. However, the field of human information behavior (HIB) has not heretofore pursued an evolutionary understanding of information behavior. The goal of this exploratory study is to provide insight about the information behavior of various individuals from the past to begin the development of an evolutionary perspective for our understanding of HIB. Design/methodology/approach - This paper presents findings from a qualitative analysis of the autobiographies and personal writings of several historical figures, including Napoleon Bonaparte, Charles Darwin, Giacomo Casanova and others. Findings - Analysis of their writings shows that these persons of the past articulated aspects of their HIB's, including information seeking, information organization and information use, providing tangible insights into their information-related thoughts and actions. Practical implications - This paper has implications for expanding the nature of our evolutionary understanding of information behavior and provides a broader context for the HIB research field. Originality/value - This the first paper in the information science field of HIB to study the information behavior of historical figures and begin to develop an evolutionary framework for HIB research. © Emerald Group Publishing Limited.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the last few years we have observed a proliferation of approaches for clustering XML docu- ments and schemas based on their structure and content. The presence of such a huge amount of approaches is due to the different applications requiring the XML data to be clustered. These applications need data in the form of similar contents, tags, paths, structures and semantics. In this paper, we first outline the application contexts in which clustering is useful, then we survey approaches so far proposed relying on the abstract representation of data (instances or schema), on the identified similarity measure, and on the clustering algorithm. This presentation leads to draw a taxonomy in which the current approaches can be classified and compared. We aim at introducing an integrated view that is useful when comparing XML data clustering approaches, when developing a new clustering algorithm, and when implementing an XML clustering compo- nent. Finally, the paper moves into the description of future trends and research issues that still need to be faced.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the growing number of XML documents on theWeb it becomes essential to effectively organise these XML documents in order to retrieve useful information from them. A possible solution is to apply clustering on the XML documents to discover knowledge that promotes effective data management, information retrieval and query processing. However, many issues arise in discovering knowledge from these types of semi-structured documents due to their heterogeneity and structural irregularity. Most of the existing research on clustering techniques focuses only on one feature of the XML documents, this being either their structure or their content due to scalability and complexity problems. The knowledge gained in the form of clusters based on the structure or the content is not suitable for reallife datasets. It therefore becomes essential to include both the structure and content of XML documents in order to improve the accuracy and meaning of the clustering solution. However, the inclusion of both these kinds of information in the clustering process results in a huge overhead for the underlying clustering algorithm because of the high dimensionality of the data. The overall objective of this thesis is to address these issues by: (1) proposing methods to utilise frequent pattern mining techniques to reduce the dimension; (2) developing models to effectively combine the structure and content of XML documents; and (3) utilising the proposed models in clustering. This research first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. A clustering framework with two types of models, implicit and explicit, is developed. The implicit model uses a Vector Space Model (VSM) to combine the structure and the content information. The explicit model uses a higher order model, namely a 3- order Tensor Space Model (TSM), to explicitly combine the structure and the content information. This thesis also proposes a novel incremental technique to decompose largesized tensor models to utilise the decomposed solution for clustering the XML documents. The proposed framework and its components were extensively evaluated on several real-life datasets exhibiting extreme characteristics to understand the usefulness of the proposed framework in real-life situations. Additionally, this research evaluates the outcome of the clustering process on the collection selection problem in the information retrieval on the Wikipedia dataset. The experimental results demonstrate that the proposed frequent pattern mining and clustering methods outperform the related state-of-the-art approaches. In particular, the proposed framework of utilising frequent structures for constraining the content shows an improvement in accuracy over content-only and structure-only clustering results. The scalability evaluation experiments conducted on large scaled datasets clearly show the strengths of the proposed methods over state-of-the-art methods. In particular, this thesis work contributes to effectively combining the structure and the content of XML documents for clustering, in order to improve the accuracy of the clustering solution. In addition, it also contributes by addressing the research gaps in frequent pattern mining to generate efficient and concise frequent subtrees with various node relationships that could be used in clustering.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, a hardware-based path planning architecture for unmanned aerial vehicle (UAV) adaptation is proposed. The architecture aims to provide UAVs with higher autonomy using an application specific evolutionary algorithm (EA) implemented entirely on a field programmable gate array (FPGA) chip. The physical attributes of an FPGA chip, being compact in size and low in power consumption, compliments it to be an ideal platform for UAV applications. The design, which is implemented entirely in hardware, consists of EA modules, population storage resources, and three-dimensional terrain information necessary to the path planning process, subject to constraints accounted for separately via UAV, environment and mission profiles. The architecture has been successfully synthesised for a target Xilinx Virtex-4 FPGA platform with 32% logic slices utilisation. Results obtained from case studies for a small UAV helicopter with environment derived from LIDAR (Light Detection and Ranging) data verify the effectiveness of the proposed FPGA-based path planner, and demonstrate convergence at rates above the typical 10 Hz update frequency of an autopilot system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The conclusion that the primary divergences of the modern groups of mammals occurred in the mid-Cretaceous requires fresh thinking about this facet of evolutionary history — especially in ecological terms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Determining the temporal scale of biological evolution has traditionally been the preserve of paleontology, with the timing of species originations and major diversifications all being read from the fossil record. However, the ages of the earliest (correctly identified) records will underestimate actual origins due to the incomplete nature of the fossil record and the necessity for lineages to have evolved sufficiently divergent morphologies in order to be distinguished. The possibility of inferring divergence times more accurately has been promoted by the idea that the accumulation of genetic change between modern lineages can be used as a molecular clock (Zuckerkandl and Pauling, 1965). In practice, though, molecular dates have often been so old as to be incongruent even with liberal readings of the fossil record. Prominent examples include inferred diversifications of metazoan phyla hundreds of millions of years before their Cambrian fossil record appearances (e.g., Nei et al., 2001) and a basal split between modern birds (Neoaves) that is almost double the age of their earliest recognizable fossils (e.g., Cooper and Penny, 1997).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The estimation of phylogenetic divergence times from sequence data is an important component of many molecular evolutionary studies. There is now a general appreciation that the procedure of divergence dating is considerably more complex than that initially described in the 1960s by Zuckerkandl and Pauling (1962, 1965). In particular, there has been much critical attention toward the assumption of a global molecular clock, resulting in the development of increasingly sophisticated techniques for inferring divergence times from sequence data. In response to the documentation of widespread departures from clocklike behavior, a variety of local- and relaxed-clock methods have been proposed and implemented. Local-clock methods permit different molecular clocks in different parts of the phylogenetic tree, thereby retaining the advantages of the classical molecular clock while casting off the restrictive assumption of a single, global rate of substitution (Rambaut and Bromham 1998; Yoder and Yang 2000).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The ratite moa (Aves: Dinornithiformes) were a speciose group of massive graviportal avian herbivores that dominated the New Zealand (NZ) ecosystem until their extinction �600 years ago. The phylogeny and evolutionary history of this morphologically diverse order has remained controversial since their initial description in 1839. We synthesize mitochondrial phylogenetic information from 263 subfossil moa specimens from across NZ with morphological, ecological, and new geological data to create the first comprehensive phylogeny, taxonomy, and evolutionary timeframe for all of the species of an extinct order. We also present an important new geological/paleogeographical model of late Cenozoic NZ, which suggests that terrestrial biota on the North and South Island landmasses were isolated for most of the past 20–30 Ma. The data reveal that the patterns of genetic diversity within and between differentmoaclades reflect a complex history following a major marine transgression in the Oligocene, affected by marine barriers, tectonic activity, and glacial cycles. Surprisingly, the remarkable morphological radiation of moa appears to have occurred much more recently than previous early Miocene (ca. 15 Ma) estimates, and was coincident with the accelerated uplift of the Southern Alps just ca. 5–8.5 Ma. Together with recent fossil evidence, these data suggest that the recent evolutionary history of nearly all of the iconic NZ terrestrial biota occurred principally on just the South Island.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cockatoos are the distinctive family Cacatuidae, a major lineage of the order of parrots (Psittaciformes) and distributed throughout the Australasian region of the world. However, the evolutionary history of cockatoos is not well understood. We investigated the phylogeny of cockatoos based on three mitochondrial and three nuclear DNA genes obtained from 16 of 21 species of Cacatuidae. In addition, five novel mitochondrial genomes were used to estimate time of divergence and our estimates indicate Cacatuidae diverged from Psittacidae approximately 40.7 million years ago (95% CI 51.6–30.3 Ma) during the Eocene. Our data shows Cacatuidae began to diversify approximately 27.9 Ma (95% CI 38.1–18.3 Ma) during the Oligocene. The early to middle Miocene (20–10 Ma) was a significant period in the evolution of modern Australian environments and vegetation, in which a transformation from mainly mesic to xeric habitats (e.g., fire-adapted sclerophyll vegetation and grasslands) occurred. We hypothesize that this environmental transformation was a driving force behind the diversification of cockatoos. A detailed multi-locus molecular phylogeny enabled us to resolve the phylogenetic placements of the Palm Cockatoo (Probosciger aterrimus), Galah (Eolophus roseicapillus), Gang-gang Cockatoo (Callocephalon fimbriatum) and Cockatiel (Nymphicus hollandicus), which have historically been difficult to place within Cacatuidae. When the molecular evidence is analysed in concert with morphology, it is clear that many of the cockatoo species’ diagnostic phenotypic traits such as plumage colour, body size, wing shape and bill morphology have evolved in parallel or convergently across lineages.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background The genus Rattus is highly speciose and has a complex taxonomy that is not fully resolved. As shown previously there are two major groups within the genus, an Asian and an Australo-Papuan group. This study focuses on the Australo-Papuan group and particularly on the Australian rats. There are uncertainties regarding the number of species within the group and the relationships among them. We analysed 16 mitochondrial genomes, including seven novel genomes from six species, to help elucidate the evolutionary history of the Australian rats. We also demonstrate, from a larger dataset, the usefulness of short regions of the mitochondrial genome in identifying these rats at the species level. Results Analyses of 16 mitochondrial genomes representing species sampled from Australo-Papuan and Asian clades of Rattus indicate divergence of these two groups ~2.7 million years ago (Mya). Subsequent diversification of at least 4 lineages within the Australo-Papuan clade was rapid and occurred over the period from ~ 0.9-1.7 Mya, a finding that explains the difficulty in resolving some relationships within this clade. Phylogenetic analyses of our 126 taxon, but shorter sequence (1952 nucleotides long), Rattus database generally give well supported species clades. Conclusions Our whole mitochondrial genome analyses are concordant with a taxonomic division that places the native Australian rats into the Rattus fuscipes species group. We suggest the following order of divergence of the Australian species. R. fuscipes is the oldest lineage among the Australian rats and is not part of a New Guinean radiation. R. lutreolus is also within this Australian clade and shallower than R. tunneyi while the R. sordidus group is the shallowest lineage in the clade. The divergences within the R. sordidus and R. leucopus lineages occurring about half a million years ago support the hypotheses of more recent interchanges of rats between Australia and New Guinea. While problematic for inference of deeper divergences, we report that the analysis of shorter mitochondrial sequences is very useful for species identification in rats.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis develops a detailed conceptual design method and a system software architecture defined with a parametric and generative evolutionary design system to support an integrated interdisciplinary building design approach. The research recognises the need to shift design efforts toward the earliest phases of the design process to support crucial design decisions that have a substantial cost implication on the overall project budget. The overall motivation of the research is to improve the quality of designs produced at the author's employer, the General Directorate of Major Works (GDMW) of the Saudi Arabian Armed Forces. GDMW produces many buildings that have standard requirements, across a wide range of environmental and social circumstances. A rapid means of customising designs for local circumstances would have significant benefits. The research considers the use of evolutionary genetic algorithms in the design process and the ability to generate and assess a wider range of potential design solutions than a human could manage. This wider ranging assessment, during the early stages of the design process, means that the generated solutions will be more appropriate for the defined design problem. The research work proposes a design method and system that promotes a collaborative relationship between human creativity and the computer capability. The tectonic design approach is adopted as a process oriented design that values the process of design as much as the product. The aim is to connect the evolutionary systems to performance assessment applications, which are used as prioritised fitness functions. This will produce design solutions that respond to their environmental and function requirements. This integrated, interdisciplinary approach to design will produce solutions through a design process that considers and balances the requirements of all aspects of the design. Since this thesis covers a wide area of research material, 'methodological pluralism' approach was used, incorporating both prescriptive and descriptive research methods. Multiple models of research were combined and the overall research was undertaken following three main stages, conceptualisation, developmental and evaluation. The first two stages lay the foundations for the specification of the proposed system where key aspects of the system that have not previously been proven in the literature, were implemented to test the feasibility of the system. As a result of combining the existing knowledge in the area with the newlyverified key aspects of the proposed system, this research can form the base for a future software development project. The evaluation stage, which includes building the prototype system to test and evaluate the system performance based on the criteria defined in the earlier stage, is not within the scope this thesis. The research results in a conceptual design method and a proposed system software architecture. The proposed system is called the 'Hierarchical Evolutionary Algorithmic Design (HEAD) System'. The HEAD system has shown to be feasible through the initial illustrative paper-based simulation. The HEAD system consists of the two main components - 'Design Schema' and the 'Synthesis Algorithms'. The HEAD system reflects the major research contribution in the way it is conceptualised, while secondary contributions are achieved within the system components. The design schema provides constraints on the generation of designs, thus enabling the designer to create a wide range of potential designs that can then be analysed for desirable characteristics. The design schema supports the digital representation of the human creativity of designers into a dynamic design framework that can be encoded and then executed through the use of evolutionary genetic algorithms. The design schema incorporates 2D and 3D geometry and graph theory for space layout planning and building formation using the Lowest Common Design Denominator (LCDD) of a parameterised 2D module and a 3D structural module. This provides a bridge between the standard adjacency requirements and the evolutionary system. The use of graphs as an input to the evolutionary algorithm supports the introduction of constraints in a way that is not supported by standard evolutionary techniques. The process of design synthesis is guided as a higher level description of the building that supports geometrical constraints. The Synthesis Algorithms component analyses designs at four levels, 'Room', 'Layout', 'Building' and 'Optimisation'. At each level multiple fitness functions are embedded into the genetic algorithm to target the specific requirements of the relevant decomposed part of the design problem. Decomposing the design problem to allow for the design requirements of each level to be dealt with separately and then reassembling them in a bottom up approach reduces the generation of non-viable solutions through constraining the options available at the next higher level. The iterative approach, in exploring the range of design solutions through modification of the design schema as the understanding of the design problem improves, assists in identifying conflicts in the design requirements. Additionally, the hierarchical set-up allows the embedding of multiple fitness functions into the genetic algorithm, each relevant to a specific level. This supports an integrated multi-level, multi-disciplinary approach. The HEAD system promotes a collaborative relationship between human creativity and the computer capability. The design schema component, as the input to the procedural algorithms, enables the encoding of certain aspects of the designer's subjective creativity. By focusing on finding solutions for the relevant sub-problems at the appropriate levels of detail, the hierarchical nature of the system assist in the design decision-making process.