Biblioteca Digital

Search engines have forever changed the way people access and discover knowledge, allowing information about almost any subject to be quickly and easily retrieved within seconds. As increasingly more material becomes available electronically the influence of search engines on our lives will continue to grow. This presents the problem of how to find what information is contained in each search engine, what bias a search engine may have, and how to select the best search engine for a particular information need. This research introduces a new method, search engine content analysis, in order to solve the above problem. Search engine content analysis is a new development of traditional information retrieval field called collection selection, which deals with general information repositories. Current research in collection selection relies on full access to the collection or estimations of the size of the collections. Also collection descriptions are often represented as term occurrence statistics. An automatic ontology learning method is developed for the search engine content analysis, which trains an ontology with world knowledge of hundreds of different subjects in a multilevel taxonomy. This ontology is then mined to find important classification rules, and these rules are used to perform an extensive analysis of the content of the largest general purpose Internet search engines in use today. Instead of representing collections as a set of terms, which commonly occurs in collection selection, they are represented as a set of subjects, leading to a more robust representation of information and a decrease of synonymy. The ontology based method was compared with ReDDE (Relevant Document Distribution Estimation method for resource selection) using the standard R-value metric, with encouraging results. ReDDE is the current state of the art collection selection method which relies on collection size estimation. The method was also used to analyse the content of the most popular search engines in use today, including Google and Yahoo. In addition several specialist search engines such as Pubmed and the U.S. Department of Agriculture were analysed. In conclusion, this research shows that the ontology based method mitigates the need for collection size estimation.

Veja mais

Improving behaviour classification consistency: a technique from biological taxonomy

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Quantitative behaviour analysis requires the classification of behaviour to produce the basic data. In practice, much of this work will be performed by multiple observers, and maximising inter-observer consistency is of particular importance. Another discipline where consistency in classification is vital is biological taxonomy. A classification tool of great utility, the binary key, is designed to simplify the classification decision process and ensure consistent identification of proper categories. We show how this same decision-making tool - the binary key - can be used to promote consistency in the classification of behaviour. The construction of a binary key also ensures that the categories in which behaviour is classified are complete and non-overlapping. We discuss the general principles of design of binary keys, and illustrate their construction and use with a practical example from education research.

Veja mais

Automatic Detection of Marine Mammal from Aerial Imagery

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The following technical report describes the approach and algorithm used to detect marine mammals from aerial imagery taken from manned/unmanned platform. The aim is to automate the process of counting the population of dugongs and other mammals. We have developed and algorithm that automatically presents to a user a number of possible candidates of these mammals. We tested the algorithm in two distinct datasets taken from different altitudes. Analysis and discussion is presented in regards with the complexity of the input datasets, the detection performance.

Veja mais

281 resultados para Automatic classification

Filtro por publicador