533 resultados para indexing
Resumo:
As the universe of knowledge and subjects change over time, indexing languages like classification schemes, accommodate that change by restructuring. Restructuring indexing languages affects indexer and cataloguer work. Subjects may split or lump together. They may disappear only to reappear later. And new subjects may emerge that were assumed to be already present, but not clearly articulated (Miksa, 1998). In this context we have the complex relationship between the indexing language, the text being described, and the already described collection (Tennis, 2007). It is possible to imagine indexers placing a document into an outdated class, because it is the one they have already used for their collection. However, doing this erases the semantics in the present indexing language. Given this range of choice in the context of indexing language change, the question arises, what does this look like in practice? How often does this occur? Further, what does this phenomenon tell us about subjects in indexing languages? Does the practice we observe in the reaction to indexing language change provide us evidence of conceptual models of subjects and subject creation? If it is incomplete, but gets us close, what evidence do we still require?
Resumo:
This paper outlines a model of conceptual change in indexing languages. Findings from this modeling effort point to three ways meaning and relationships are established and then change in an indexing language. These ways: structural, terminological, and textual point to ways indexing language metadata can aid in managing conceptual change in indexing languages.
Resumo:
With the advent of Internet-based technologies for information organization, many groups have constructed their own indexing languages. Biologists, Library and Information Science practitioners, and now social taggers have worked together to create large and many times complex indexing languages. In this environment of diversity, two questions surface: (1) what are the measurable characteristics of these indexing languages, and (2) do measurements of these indexing languages speciate along these characteristics? This poster presents data from this exploratory work.
Resumo:
This paper proposes a dual conception of work in knowledge organization. The first part is a conception of work as liminal, set apart from everyday work. The second is integrated, without separation. This talk is the beginning of a larger project where we will characterize work in knowledge organization, both as it is set out in our literature (Šauperl, 2004; Hjørland 2003 Wilson, 1968), and in a philosophical argument for its fundamental importance in the activities of society (Shera, 1972; Zandonade, 2004).But in order to do this, we will co-opt the conception of liminality from the anthropology of religion (Turner, 1967), and Zen Buddhist conceptions of moral action, intention, and integration (Harvey, 2000 and cf., Harada, S., 2008).The goal for this talk is to identify the acts repeated (form) and the purpose of those acts (intention), in knowledge organization, with specific regard to thresholds (liminal points) of intention present in those acts.We can then ask the questions: Where is intention in knowledge organization liminal and where is it integrated? What are the limits of knowledge organization work when considered at a foundational level of the intention labor practices? Answering such questions, in this context, allows us to reconsider the assumptions we have about knowledge organization work and its increasingly important role in society. As a consequence, we can consider the limits of classification research if we see the foundations of knowledge organization work when we see forms and intentions. I must also say that incorporating Zen Buddhist philosophy into knowledge organization research seems like it fits well with ethics and ethical responses the practice of knowledge organization. This is because 20th Century Western interpretations of Zen are often rooted in ethical considerations. This translates easily to work.
Resumo:
The problem of determining the script and language of a document image has a number of important applications in the field of document analysis, such as indexing and sorting of large collections of such images, or as a precursor to optical character recognition (OCR). In this paper, we investigate the use of texture as a tool for determining the script of a document image, based on the observation that text has a distinct visual texture. An experimental evaluation of a number of commonly used texture features is conducted on a newly created script database, providing a qualitative measure of which features are most appropriate for this task. Strategies for improving classification results in situations with limited training data and multiple font types are also proposed.
Resumo:
Spoken term detection (STD) popularly involves performing word or sub-word level speech recognition and indexing the result. This work challenges the assumption that improved speech recognition accuracy implies better indexing for STD. Using an index derived from phone lattices, this paper examines the effect of language model selection on the relationship between phone recognition accuracy and STD accuracy. Results suggest that language models usually improve phone recognition accuracy but their inclusion does not always translate to improved STD accuracy. The findings suggest that using phone recognition accuracy to measure the quality of an STD index can be problematic, and highlight the need for an alternative that is more closely aligned with the goals of the specific detection task.
Resumo:
Cultural objects are increasingly generated and stored in digital form, yet effective methods for their indexing and retrieval still remain an important area of research. The main problem arises from the disconnection between the content-based indexing approach used by computer scientists and the description-based approach used by information scientists. There is also a lack of representational schemes that allow the alignment of the semantics and context with keywords and low-level features that can be automatically extracted from the content of these cultural objects. This paper presents an integrated approach to address these problems, taking advantage of both computer science and information science approaches. We firstly discuss the requirements from a number of perspectives: users, content providers, content managers and technical systems. We then present an overview of our system architecture and describe various techniques which underlie the major components of the system. These include: automatic object category detection; user-driven tagging; metadata transform and augmentation, and an expression language for digital cultural objects. In addition, we discuss our experience on testing and evaluating some existing collections, analyse the difficulties encountered and propose ways to address these problems.
Resumo:
Tagging has become one of the key activities in next generation websites which allow users selecting short labels to annotate, manage, and share multimedia information such as photos, videos and bookmarks. Tagging does not require users any prior training before participating in the annotation activities as they can freely choose any terms which best represent the semantic of contents without worrying about any formal structure or ontology. However, the practice of free-form tagging can lead to several problems, such as synonymy, polysemy and ambiguity, which potentially increase the complexity of managing the tags and retrieving information. To solve these problems, this research aims to construct a lightweight indexing scheme to structure tags by identifying and disambiguating the meaning of terms and construct a knowledge base or dictionary. News has been chosen as the primary domain of application to demonstrate the benefits of using structured tags for managing the rapidly changing and dynamic nature of news information. One of the main outcomes of this work is an automatically constructed vocabulary that defines the meaning of each named entity tag, which can be extracted from a news article (including person, location and organisation), based on experts suggestions from major search engines and the knowledge from public database such as Wikipedia. To demonstrate the potential applications of the vocabulary, we have used it to provide more functionalities in an online news website, including topic-based news reading, intuitive tagging, clipping and sharing of interesting news, as well as news filtering or searching based on named entity tags. The evaluation results on the impact of disambiguating tags have shown that the vocabulary can help to significantly improve news searching performance. The preliminary results from our user study have demonstrated that users can benefit from the additional functionalities on the news websites as they are able to retrieve more relevant news, clip and share news with friends and families effectively.
Resumo:
The increasing diversity of the Internet has created a vast number of multilingual resources on the Web. A huge number of these documents are written in various languages other than English. Consequently, the demand for searching in non-English languages is growing exponentially. It is desirable that a search engine can search for information over collections of documents in other languages. This research investigates the techniques for developing high-quality Chinese information retrieval systems. A distinctive feature of Chinese text is that a Chinese document is a sequence of Chinese characters with no space or boundary between Chinese words. This feature makes Chinese information retrieval more difficult since a retrieved document which contains the query term as a sequence of Chinese characters may not be really relevant to the query since the query term (as a sequence Chinese characters) may not be a valid Chinese word in that documents. On the other hand, a document that is actually relevant may not be retrieved because it does not contain the query sequence but contains other relevant words. In this research, we propose two approaches to deal with the problems. In the first approach, we propose a hybrid Chinese information retrieval model by incorporating word-based techniques with the traditional character-based techniques. The aim of this approach is to investigate the influence of Chinese segmentation on the performance of Chinese information retrieval. Two ranking methods are proposed to rank retrieved documents based on the relevancy to the query calculated by combining character-based ranking and word-based ranking. Our experimental results show that Chinese segmentation can improve the performance of Chinese information retrieval, but the improvement is not significant if it incorporates only Chinese segmentation with the traditional character-based approach. In the second approach, we propose a novel query expansion method which applies text mining techniques in order to find the most relevant words to extend the query. Unlike most existing query expansion methods, which generally select the highly frequent indexing terms from the retrieved documents to expand the query. In our approach, we utilize text mining techniques to find patterns from the retrieved documents that highly correlate with the query term and then use the relevant words in the patterns to expand the original query. This research project develops and implements a Chinese information retrieval system for evaluating the proposed approaches. There are two stages in the experiments. The first stage is to investigate if high accuracy segmentation can make an improvement to Chinese information retrieval. In the second stage, a text mining based query expansion approach is implemented and a further experiment has been done to compare its performance with the standard Rocchio approach with the proposed text mining based query expansion method. The NTCIR5 Chinese collections are used in the experiments. The experiment results show that by incorporating the text mining based query expansion with the hybrid model, significant improvement has been achieved in both precision and recall assessments.
Resumo:
LUPTAI is a decision-aiding tool to enable local and state governments to optimise land use and transport integration. In contrast to mobility between land uses (typically via road), accessibility represents opportunity and choice to reach common land use destinations by public transport and/or walking. LUPTAI uses a GIS-based methodology to quantify and map accessibility to common land use destinations by walking and/or public transport. The tool can be applied to small or large study areas. It can be applied to the current situation in a study area or to future scenarios (such as scenarios involving changes to public transport services, public transport corridors or stations, population density or land use). The tool has been piloted on the Gold Coast and the results are encouraging. This paper outlines the GIS-based methodology and the findings related to this pilot study. The paper demonstrates benefits and possible application of LUPTAI to other urbanised local government areas in Queensland. It also discusses how this accessibility indexing approach could be developed into a decision-support tool to assist local and state government agencies in a range of transport and land-use planning activities.
Resumo:
This approach to sustainable design explores the possibility of creating an architectural design process which can iteratively produce optimised and sustainable design solutions. Driven by an evolution process based on genetic algorithms, the system allows the designer to “design the building design generator” rather than to “designs the building”. The design concept is abstracted into a digital design schema, which allows transfer of the human creative vision into the rational language of a computer. The schema is then elaborated into the use of genetic algorithms to evolve innovative, performative and sustainable design solutions. The prioritisation of the project’s constraints and the subsequent design solutions synthesised during design generation are expected to resolve most of the major conflicts in the evaluation and optimisation phases. Mosques are used as the example building typology to ground the research activity. The spatial organisations of various mosque typologies are graphically represented by adjacency constraints between spaces. Each configuration is represented by a planar graph which is then translated into a non-orthogonal dual graph and fed into the genetic algorithm system with fixed constraints and expected performance criteria set to govern evolution. The resultant Hierarchical Evolutionary Algorithmic Design System is developed by linking the evaluation process with environmental assessment tools to rank the candidate designs. The proposed system generates the concept, the seed, and the schema, and has environmental performance as one of the main criteria in driving optimisation.
Resumo:
This paper proposes a security architecture for the basic cross indexing systems emerging as foundational structures in current health information systems. In these systems unique identifiers are issued to healthcare providers and consumers. In most cases, such numbering schemes are national in scope and must therefore necessarily be used via an indexing system to identify records contained in pre-existing local, regional or national health information systems. Most large scale electronic health record systems envisage that such correlation between national healthcare identifiers and pre-existing identifiers will be performed by some centrally administered cross referencing, or index system. This paper is concerned with the security architecture for such indexing servers and the manner in which they interface with pre-existing health systems (including both workstations and servers). The paper proposes two required structures to achieve the goal of a national scale, and secure exchange of electronic health information, including: (a) the employment of high trust computer systems to perform an indexing function, and (b) the development and deployment of an appropriate high trust interface module, a Healthcare Interface Processor (HIP), to be integrated into the connected workstations or servers of healthcare service providers. This proposed architecture is specifically oriented toward requirements identified in the Connectivity Architecture for Australia’s e-health scheme as outlined by NEHTA and the national e-health strategy released by the Australian Health Ministers.
Resumo:
Creating sustainable urban environments is one of the challenging issues that need a clear vision and implementation strategies involving changes in governmental values and decision making process for local governments. Particularly, internalisation of environmental externalities of daily urban activities (e.g. manufacturing, transportation and so on) has immense importance for which local policies are formulated to provide better living conditions for the people inhabiting urban areas. Even if environmental problems are defined succinctly by various stakeholders, complicated nature of sustainability issues demand a structured evaluation strategy and well-defined sustainability parameters for efficient and effective policy making. Following this reasoning, this study involves assessment of sustainability performance of urban settings mainly focusing on environmental problems caused by rapid urban expansion and transformation. By taking into account land-use and transportation interaction, it tries to reveal how future urban developments would alter daily urban travel behaviour of people and affect the urban and natural environments. The paper introduces a grid-based indexing method developed for this research and trailed as a GIS-based decision support tool to analyse and model selected spatial and aspatial indicators of sustainability in the Gold Coast. This process reveals parameters of site specific relationship among selected indicators that are used to evaluate index-based performance characteristics of the area. The evaluation is made through an embedded decision support module by assigning relative weights to indicators. Resolution of selected grid-based unit of analysis provides insights about service level of projected urban development proposals at a disaggregate level, such as accessibility to transportation and urban services, and pollution. The paper concludes by discussing the findings including the capacity of the decision support system to assist decision-makers in determining problematic areas and developing intervention policies for sustainable outcomes of future developments.
Resumo:
Robust image hashing seeks to transform a given input image into a shorter hashed version using a key-dependent non-invertible transform. These image hashes can be used for watermarking, image integrity authentication or image indexing for fast retrieval. This paper introduces a new method of generating image hashes based on extracting Higher Order Spectral features from the Radon projection of an input image. The feature extraction process is non-invertible, non-linear and different hashes can be produced from the same image through the use of random permutations of the input. We show that the transform is robust to typical image transformations such as JPEG compression, noise, scaling, rotation, smoothing and cropping. We evaluate our system using a verification-style framework based on calculating false match, false non-match likelihoods using the publicly available Uncompressed Colour Image database (UCID) of 1320 images. We also compare our results to Swaminathan’s Fourier-Mellin based hashing method with at least 1% EER improvement under noise, scaling and sharpening.
Resumo:
Dasheen mosaic potyvirus (DsMV) is an important virus affecting taro. The virus has been found wherever taro is grown and infects both the edible and ornamental aroids, causing yield losses of up to 60%. The presence of DsMV, and other viruses,prevents the international movement of taro germplasm between countries. This has a significant negative impact on taro production in many countries due to the inability to access improved taro lines produced in breeding programs. To overcome this problem, sensitive and reliable virus diagnostic tests need to be developed to enable the indexing of taro germplasm. The aim of this study was to generate an antiserum against a recombinant DsMV coat protein (CP) and to develop a serological-based diagnostic test that would detect Pacific Island isolates of the virus. The CP-coding region of 16 DsMV isolates from Papua New Guinea, Samoa, Solomon Islands, French Polynesia, New Caledonia and Vietnam were amplified,cloned and sequenced. The size of the CP-coding region ranged from 939 to 1038 nucleotides and encoded putative proteins ranged from 313 to 346 amino acids, with the molecular mass ranging from 34 to 38 kDa. Analysis ofthe amino acid sequences revealed the presence of several amino acid motifs typically found in potyviruses,including DAG, WCIE/DN, RQ and AFDF. When the amino acid sequences were compared with each other and the DsMV sequences on the database, the maximum variability was21.9%. When the core region ofthe CP was analysed, the maximum variability dropped to 6% indicating most variability was present in the N terminus. Within seven PNG isolates ofDsMV, the maximum variability was 16.9% and 3.9% over the entire CP-coding region and core region, respectively. The sequence ofPNG isolate P1 was most similar to all other sequences. Phylogenetic analysis indicated that almost all isolates grouped according to their provenance. Further, the seven PNG isolates were grouped according to the region within PNG from which they were obtained. Due to the extensive variability over the entire CP-coding region, the core region ofthe CP ofPNG isolate Pl was cloned into a protein expression vector and expressed as a recombinant protein. The protein was purified by chromatography and SDS-PAGE and used as an antigen to generate antiserum in a rabbit. In western blots, the antiserum reacted with bands of approximately 45-47 kDa in extracts from purified DsMV and from known DsMV -infected plants from PNG; no bands were observed using healthy plant extracts. The antiserum was subsequently incorporated into an indirect ELISA. This procedure was found to be very sensitive and detected DsMV in sap diluted at least 1:1,000. Using both western blot and ELISA formats,the antiserum was able to detect a wide range ofDsMV isolates including those from Australia, New Zealand, Fiji, French Polynesia, New Caledonia, Papua New Guinea, Samoa, Solomon Islands and Vanuatu. These plants were verified to be infected with DsMV by RT-PCR. In specificity tests, the antiserum was also found to react with sap from plants infected with SCMV, PRSV-P, PRSV-W, but not with PVY or CMV -infected plants.