830 resultados para Information needs – representation
Resumo:
Web document cluster analysis plays an important role in information retrieval by organizing large amounts of documents into a small number of meaningful clusters. Traditional web document clustering is based on the Vector Space Model (VSM), which takes into account only two-level (document and term) knowledge granularity but ignores the bridging paragraph granularity. However, this two-level granularity may lead to unsatisfactory clustering results with “false correlation”. In order to deal with the problem, a Hierarchical Representation Model with Multi-granularity (HRMM), which consists of five-layer representation of data and a twophase clustering process is proposed based on granular computing and article structure theory. To deal with the zero-valued similarity problemresulted from the sparse term-paragraphmatrix, an ontology based strategy and a tolerance-rough-set based strategy are introduced into HRMM. By using granular computing, structural knowledge hidden in documents can be more efficiently and effectively captured in HRMM and thus web document clusters with higher quality can be generated. Extensive experiments show that HRMM, HRMM with tolerancerough-set strategy, and HRMM with ontology all outperform VSM and a representative non VSM-based algorithm, WFP, significantly in terms of the F-Score.
Resumo:
In this paper, we propose a text mining method called LRD (latent relation discovery), which extends the traditional vector space model of document representation in order to improve information retrieval (IR) on documents and document clustering. Our LRD method extracts terms and entities, such as person, organization, or project names, and discovers relationships between them by taking into account their co-occurrence in textual corpora. Given a target entity, LRD discovers other entities closely related to the target effectively and efficiently. With respect to such relatedness, a measure of relation strength between entities is defined. LRD uses relation strength to enhance the vector space model, and uses the enhanced vector space model for query based IR on documents and clustering documents in order to discover complex relationships among terms and entities. Our experiments on a standard dataset for query based IR shows that our LRD method performed significantly better than traditional vector space model and other five standard statistical methods for vector expansion.
Resumo:
Adaptive information filtering is a challenging research problem. It requires the adaptation of a representation of a user’s multiple interests to various changes in them. We investigate the application of an immune-inspired approach to this problem. Nootropia, is a user profiling model that has many properties in common with computational models of the immune system that have been based on Franscisco Varela’s work. In this paper we concentrate on Nootropia’s evaluation. We define an evaluation methodology that uses virtual user’s to simulate various interest changes. The results show that Nootropia exhibits the desirable adaptive behaviour.
Resumo:
Term dependence is a natural consequence of language use. Its successful representation has been a long standing goal for Information Retrieval research. We present a methodology for the construction of a concept hierarchy that takes into account the three basic dimensions of term dependence. We also introduce a document evaluation function that allows the use of the concept hierarchy as a user profile for Information Filtering. Initial experimental results indicate that this is a promising approach for incorporating term dependence in the way documents are filtered.
Resumo:
In this thesis we present an overview of sparse approximations of grey level images. The sparse representations are realized by classic, Matching Pursuit (MP) based, greedy selection strategies. One such technique, termed Orthogonal Matching Pursuit (OMP), is shown to be suitable for producing sparse approximations of images, if they are processed in small blocks. When the blocks are enlarged, the proposed Self Projected Matching Pursuit (SPMP) algorithm, successfully renders equivalent results to OMP. A simple coding algorithm is then proposed to store these sparse approximations. This is shown, under certain conditions, to be competitive with JPEG2000 image compression standard. An application termed image folding, which partially secures the approximated images is then proposed. This is extended to produce a self contained folded image, containing all the information required to perform image recovery. Finally a modified OMP selection technique is applied to produce sparse approximations of Red Green Blue (RGB) images. These RGB approximations are then folded with the self contained approach.
Resumo:
This paper presents the digital imaging results of a collaborative research project working toward the generation of an on-line interactive digital image database of signs from ancient cuneiform tablets. An important aim of this project is the application of forensic analysis to the cuneiform symbols to identify scribal hands. Cuneiform tablets are amongst the earliest records of written communication, and could be considered as one of the original information technologies; an accessible, portable and robust medium for communication across distance and time. The earliest examples are up to 5,000 years old, and the writing technique remained in use for some 3,000 years. Unfortunately, only a small fraction of these tablets can be made available for display in museums and much important academic work has yet to be performed on the very large numbers of tablets to which there is necessarily restricted access. Our paper will describe the challenges encountered in the 2D image capture of a sample set of tablets held in the British Museum, explaining the motivation for attempting 3D imaging and the results of initial experiments scanning the smaller, more densely inscribed cuneiform tablets. We will also discuss the tractability of 3D digital capture, representation and manipulation, and investigate the requirements for scaleable data compression and transmission methods. Additional information can be found on the project website: www.cuneiform.net
Resumo:
DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT
Resumo:
DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT
Resumo:
Although the importance of dataset fitness-for-use evaluation and intercomparison is widely recognised within the GIS community, no practical tools have yet been developed to support such interrogation. GeoViQua aims to develop a GEO label which will visually summarise and allow interrogation of key informational aspects of geospatial datasets upon which users rely when selecting datasets for use. The proposed GEO label will be integrated in the Global Earth Observation System of Systems (GEOSS) and will be used as a value and trust indicator for datasets accessible through the GEO Portal. As envisioned, the GEO label will act as a decision support mechanism for dataset selection and thereby hopefully improve user recognition of the quality of datasets. To date we have conducted 3 user studies to (1) identify the informational aspects of geospatial datasets upon which users rely when assessing dataset quality and trustworthiness, (2) elicit initial user views on a GEO label and its potential role and (3), evaluate prototype label visualisations. Our first study revealed that, when evaluating quality of data, users consider 8 facets: dataset producer information; producer comments on dataset quality; dataset compliance with international standards; community advice; dataset ratings; links to dataset citations; expert value judgements; and quantitative quality information. Our second study confirmed the relevance of these facets in terms of the community-perceived function that a GEO label should fulfil: users and producers of geospatial data supported the concept of a GEO label that provides a drill-down interrogation facility covering all 8 informational aspects. Consequently, we developed three prototype label visualisations and evaluated their comparative effectiveness and user preference via a third user study to arrive at a final graphical GEO label representation. When integrated in the GEOSS, an individual GEO label will be provided for each dataset in the GEOSS clearinghouse (or other data portals and clearinghouses) based on its available quality information. Producer and feedback metadata documents are being used to dynamically assess information availability and generate the GEO labels. The producer metadata document can either be a standard ISO compliant metadata record supplied with the dataset, or an extended version of a GeoViQua-derived metadata record, and is used to assess the availability of a producer profile, producer comments, compliance with standards, citations and quantitative quality information. GeoViQua is also currently developing a feedback server to collect and encode (as metadata records) user and producer feedback on datasets; these metadata records will be used to assess the availability of user comments, ratings, expert reviews and user-supplied citations for a dataset. The GEO label will provide drill-down functionality which will allow a user to navigate to a GEO label page offering detailed quality information for its associated dataset. At this stage, we are developing the GEO label service that will be used to provide GEO labels on demand based on supplied metadata records. In this presentation, we will provide a comprehensive overview of the GEO label development process, with specific emphasis on the GEO label implementation and integration into the GEOSS.
Resumo:
Social Media is becoming an increasingly important part of people’s lives and is being used increasingly in the food and agriculture sector. This paper considers the extent to which each section of the food supply chain is represented in Twitter and use the hashtag #food. We looked at the 20 most popular words for each part of the supply chain by categorising 5000 randomly selected tweets to different sections of the food chain and then analysing each category. We sorted the users by those who tweeted most frequently and categorised their position in the food supply chain. Finally to consider the indegree of influence, we took the top 100 tweeters from the previous list and consider what following these users have. From this we found that consumers are the most represented area of the food chain, and logistics is the least represented. Consumers had 51.50% of the users and 87.42% of the top words tweeted from that part of the food chain. We found little evidence of logistics representation for either tweets or users (0.84% and 0.35% respectively). The top users were found to follow a high percentage of their own followers with most having over 70% the same. This research will bring greater understanding of how people perceive the food sector and how Twitter can be used within this sector.
Resumo:
The traditional role of ports in the wider supply chain context is currently being subject to a process of radical review. In broad terms, the traditional model is being replaced by a model which focuses on higher value and more knowledge intensive activities. This trend requires a change in the way in which new knowledge and skills are developed by staff in companies of all kinds within port communities. Traditional models need to be re-evaluated to reflect the increasing importance of knowledge and skills acquisition, particularly in relation to the supply chain management (SCM) concept and the evolving role of information and communications technology (ICT) in improving supply chain capability. This paper describes the case of NITL’s Foundation Certificate Programme (FCP) learning programme with specific reference to its use in addressing some of current shortcomings related to supply chain knowledge and skills in port communities. The FCP rationale is based on the need to move from traditional approaches of supply chain organisation where the various links in the chain were measured and managed in isolation from each other and thus tended to operate at cross purposes, towards more cooperative and integrated approaches.
Resumo:
Context traditionally has been regarded in vision research as a determinant for the interpretation of sensory information on the basis of previously acquired knowledge. Here we propose a novel, complementary perspective by showing that context also specifically affects visual category learning. In two experiments involving sets of Compound Gabor patterns we explored how context, as given by the stimulus set to be learned, affects the internal representation of pattern categories. In Experiment 1, we changed the (local) context of the individual signal classes by changing the configuration of the learning set. In Experiment 2, we varied the (global) context of a fixed class configuration by changing the degree of signal accentuation. Generalization performance was assessed in terms of the ability to recognize contrast-inverted versions of the learning patterns. Both contextual variations yielded distinct effects on learning and generalization thus indicating a change in internal category representation. Computer simulations suggest that the latter is related to changes in the set of attributes underlying the production rules of the categories. The implications of these findings for phenomena of contrast (in)variance in visual perception are discussed.
Resumo:
Descriptions of vegetation communities are often based on vague semantic terms describing species presence and dominance. For this reason, some researchers advocate the use of fuzzy sets in the statistical classification of plant species data into communities. In this study, spatially referenced vegetation abundance values collected from Greek phrygana were analysed by ordination (DECORANA), and classified on the resulting axes using fuzzy c-means to yield a point data-set representing local memberships in characteristic plant communities. The fuzzy clusters matched vegetation communities noted in the field, which tended to grade into one another, rather than occupying discrete patches. The fuzzy set representation of the community exploited the strengths of detrended correspondence analysis while retaining richer information than a TWINSPAN classification of the same data. Thus, in the absence of phytosociological benchmarks, meaningful and manageable habitat information could be derived from complex, multivariate species data. We also analysed the influence of the reliability of different surveyors' field observations by multiple sampling at a selected sample location. We show that the impact of surveyor error was more severe in the Boolean than the fuzzy classification. © 2007 Springer.
Resumo:
Changes in the international economic scenario in recent years have made it necessary for both industrial and service firms to reformulate their strategies, with a strong focus on the resources required for successful implementation. In this scenario, information and communication technologies (ICT) has a potentially vital role to play both as a key resource for re-engineering business processes within a framework of direct connection between suppliers and customers, and as a source of cost optimisation. There have also been innovations in the logistics and freight transport industry in relation to ICT diffusion. The implementation of such systems by third party logistics providers (3PL) allows the real-time exchange of information between supply chain partners, thereby improving planning capability and customer service. Unlike other industries, the logistics and freight transport industry is lagging somewhat behind other sectors in ICT diffusion. This situation is to be attributed to a series of both industry-specific and other factors, such as: (a) traditional resistance to change on the part of transport and logistics service providers; (b) the small size of firms that places considerable constraints upon investment in ICT; (c) the relative shortage of user-friendly applications; (d) the diffusion of internal standards on the part of the main providers in the industry whose aim is to protect company information, preventing its dissemination among customers and suppliers; (e) the insufficient degree of professional skills for using such technologies on the part of staff in such firms. The latter point is of critical importance insofar as the adoption of ICT is making it increasingly necessary both to develop new technical skills to use different hardware and new software tools, and to be able to plan processes of communication so as to allow the optimal use of ICT. The aim of this paper is to assess the impact of ICT on transport and logistics industry and to highlight how the use of such new technologies is affecting providers' training needs. The first part will provide a conceptual framework of the impact of ICT on the transport and logistics industry. In the second part the state of ICT dissemination in the Italian and Irish third party logistics industry will be outlined. In the third part, the impact of ICT on the training needs of transport and logistics service providers - based on case studies in both countries - are discussed. The implications of the foregoing for the development of appropriate training policies are considered. For the covering abstract see ITRD E126595.
Resumo:
Different types of ontologies and knowledge or metaknowledge connected to them are considered and analyzed aiming at realization in contemporary information security systems (ISS) and especially the case of intrusion detection systems (IDS) or intrusion prevention systems (IPS). Human-centered methods INCONSISTENCY, FUNNEL, CALEIDOSCOPE and CROSSWORD are algorithmic or data-driven methods based on ontologies. All of them interact on a competitive principle ‘survival of the fittest’. They are controlled by a Synthetic MetaMethod SMM. It is shown that the data analysis frequently needs an act of creation especially if it is applied to knowledge-poor environments. It is shown that human-centered methods are very suitable for resolutions in case, and often they are based on the usage of dynamic ontologies