793 resultados para Content-Based Retrieval
Resumo:
Logic based Pattern Recognition extends the well known similarity models, where the distance measure is the base instrument for recognition. Initial part (1) of current publication in iTECH-06 reduces the logic based recognition models to the reduced disjunctive normal forms of partially defined Boolean functions. This step appears as a way to alternative pattern recognition instruments through combining metric and logic hypotheses and features, leading to studies of logic forms, hypotheses, hierarchies of hypotheses and effective algorithmic solutions. Current part (2) provides probabilistic conclusions on effective recognition by logic means in a model environment of binary attributes.
Resumo:
* The research is supported partly by INTAS: 04-77-7173 project, http://www.intas.be
Resumo:
ACM Computing Classification System (1998): K.3.1, K.3.2.
Resumo:
The advent of smart TVs has reshaped the TV-consumer interaction by combining TVs with mobile-like applications and access to the Internet. However, consumers are still unable to seamlessly interact with the contents being streamed. An example of such limitation is TV shopping, in which a consumer makes a purchase of a product or item displayed in the current TV show. Currently, consumers can only stop the current show and attempt to find a similar item in the Web or an actual store. It would be more convenient if the consumer could interact with the TV to purchase interesting items. ^ Towards the realization of TV shopping, this dissertation proposes a scalable multimedia content processing framework. Two main challenges in TV shopping are addressed: the efficient detection of products in the content stream, and the retrieval of similar products given a consumer-selected product. The proposed framework consists of three components. The first component performs computational and temporal aware multimedia abstraction to select a reduced number of frames that summarize the important information in the video stream. By both reducing the number of frames and taking into account the computational cost of the subsequent detection phase, this component component allows the efficient detection of products in the stream. The second component realizes the detection phase. It executes scalable product detection using multi-cue optimization. Additional information cues are formulated into an optimization problem that allows the detection of complex products, i.e., those that do not have a rigid form and can appear in various poses. After the second component identifies products in the video stream, the consumer can select an interesting one for which similar ones must be located in a product database. To this end, the third component of the framework consists of an efficient, multi-dimensional, tree-based indexing method for multimedia databases. The proposed index mechanism serves as the backbone of the search. Moreover, it is able to efficiently bridge the semantic gap and perception subjectivity issues during the retrieval process to provide more relevant results.^
Resumo:
Users seeking information may not find relevant information pertaining to their information need in a specific language. But information may be available in a language different from their own, but users may not know that language. Thus users may experience difficulty in accessing the information present in different languages. Since the retrieval process depends on the translation of the user query, there are many issues in getting the right translation of the user query. For a pair of languages chosen by a user, resources, like incomplete dictionary, inaccurate machine translation system may exist. These resources may be insufficient to map the query terms in one language to its equivalent terms in another language. Also for a given query, there might exist multiple correct translations. The underlying corpus evidence may suggest a clue to select a probable set of translations that could eventually perform a better information retrieval. In this paper, we present a cross language information retrieval approach to effectively retrieve information present in a language other than the language of the user query using the corpus driven query suggestion approach. The idea is to utilize the corpus based evidence of one language to improve the retrieval and re-ranking of news documents in the other language. We use FIRE corpora - Tamil and English news collections in our experiments and illustrate the effectiveness of the proposed cross language information retrieval approach.
Resumo:
Many studies have shown the considerable potential for the application of remote-sensing-based methods for deriving estimates of lake water quality. However, the reliable application of these methods across time and space is complicated by the diversity of lake types, sensor configuration, and the multitude of different algorithms proposed. This study tested one operational and 46 empirical algorithms sourced from the peer-reviewed literature that have individually shown potential for estimating lake water quality properties in the form of chlorophyll-a (algal biomass) and Secchi disc depth (SDD) (water transparency) in independent studies. Nearly half (19) of the algorithms were unsuitable for use with the remote-sensing data available for this study. The remaining 28 were assessed using the Terra/Aqua satellite archive to identify the best performing algorithms in terms of accuracy and transferability within the period 2001–2004 in four test lakes, namely Vänern, Vättern, Geneva, and Balaton. These lakes represent the broad continuum of large European lake types, varying in terms of eco-region (latitude/longitude and altitude), morphology, mixing regime, and trophic status. All algorithms were tested for each lake separately and combined to assess the degree of their applicability in ecologically different sites. None of the algorithms assessed in this study exhibited promise when all four lakes were combined into a single data set and most algorithms performed poorly even for specific lake types. A chlorophyll-a retrieval algorithm originally developed for eutrophic lakes showed the most promising results (R2 = 0.59) in oligotrophic lakes. Two SDD retrieval algorithms, one originally developed for turbid lakes and the other for lakes with various characteristics, exhibited promising results in relatively less turbid lakes (R2 = 0.62 and 0.76, respectively). The results presented here highlight the complexity associated with remotely sensed lake water quality estimates and the high degree of uncertainty due to various limitations, including the lake water optical properties and the choice of methods.
Resumo:
New information on possible resource value of sea floor manganese nodule deposits in the eastern north Pacific has been obtained by a study of records and collections of the 1972 Sea Scope Expedition. Nodule abundance (percent of sea floor covered) varies greatly, according to photographs from eight stations and data from other sources. All estimates considered reliable are plotted on a map of the region. Similar maps show the average content of Ni, Cu, Mn and Co at 89 stations from which three or more nodules were analyzed. Variations in nodule metal content at each station are shown graphically in an appendix, where data on nodule sizes are also given. Results of new analyses of 420 nodules from 93 stations for mn, fe, ni, cu, CO, and zn are listed in another appendix. Relatively high Ni + Cu content is restricted chiefly to four groups of stations in the equatorial region, where group averages are 1.86, 1.99, 2.47, and 2.55 weight-percent. Prepared for United States Department of the Interior, Bureau of Mines. Grant no. GO284008-02-MAS. - NTIS PB82-142571.
Resumo:
We consider the problem of resource selection in clustered Peer-to-Peer Information Retrieval (P2P IR) networks with cooperative peers. The clustered P2P IR framework presents a significant departure from general P2P IR architectures by employing clustering to ensure content coherence between resources at the resource selection layer, without disturbing document allocation. We propose that such a property could be leveraged in resource selection by adapting well-studied and popular inverted lists for centralized document retrieval. Accordingly, we propose the Inverted PeerCluster Index (IPI), an approach that adapts the inverted lists, in a straightforward manner, for resource selection in clustered P2P IR. IPI also encompasses a strikingly simple peer-specific scoring mechanism that exploits the said index for resource selection. Through an extensive empirical analysis on P2P IR testbeds, we establish that IPI competes well with the sophisticated state-of-the-art methods in virtually every parameter of interest for the resource selection task, in the context of clustered P2P IR.
Resumo:
The structured representation of cases by attribute graphs in a Case-Based Reasoning (CBR) system for course timetabling has been the subject of previous research by the authors. In that system, the case base is organised as a decision tree and the retrieval process chooses those cases which are sub attribute graph isomorphic to the new case. The drawback of that approach is that it is not suitable for solving large problems. This paper presents a multiple-retrieval approach that partitions a large problem into small solvable sub-problems by recursively inputting the unsolved part of the graph into the decision tree for retrieval. The adaptation combines the retrieved partial solutions of all the partitioned sub-problems and employs a graph heuristic method to construct the whole solution for the new case. We present a methodology which is not dependant upon problem specific information and which, as such, represents an approach which underpins the goal of building more general timetabling systems. We also explore the question of whether this multiple-retrieval CBR could be an effective initialisation method for local search methods such as Hill Climbing, Tabu Search and Simulated Annealing. Significant results are obtained from a wide range of experiments. An evaluation of the CBR system is presented and the impact of the approach on timetabling research is discussed. We see that the approach does indeed represent an effective initialisation method for these approaches.
Resumo:
The structured representation of cases by attribute graphs in a Case-Based Reasoning (CBR) system for course timetabling has been the subject of previous research by the authors. In that system, the case base is organised as a decision tree and the retrieval process chooses those cases which are sub attribute graph isomorphic to the new case. The drawback of that approach is that it is not suitable for solving large problems. This paper presents a multiple-retrieval approach that partitions a large problem into small solvable sub-problems by recursively inputting the unsolved part of the graph into the decision tree for retrieval. The adaptation combines the retrieved partial solutions of all the partitioned sub-problems and employs a graph heuristic method to construct the whole solution for the new case. We present a methodology which is not dependant upon problem specific information and which, as such, represents an approach which underpins the goal of building more general timetabling systems. We also explore the question of whether this multiple-retrieval CBR could be an effective initialisation method for local search methods such as Hill Climbing, Tabu Search and Simulated Annealing. Significant results are obtained from a wide range of experiments. An evaluation of the CBR system is presented and the impact of the approach on timetabling research is discussed. We see that the approach does indeed represent an effective initialisation method for these approaches.
Resumo:
The In Situ Analysis System (ISAS) was developed to produce gridded fields of temperature and salinity that preserve as much as possible the time and space sampling capabilities of the Argo network of profiling floats. Since the first global re-analysis performed in 2009, the system has evolved and a careful delayed mode processing of the 2002-2012 dataset has been carried out using version 6 of ISAS and updating the statistics to produce the ISAS13 analysis. This last version is now implemented as the operational analysis tool at the Coriolis data centre. The robustness of the results with respect to the system evolution is explored through global quantities of climatological interest: the Ocean Heat Content and the Steric Height. Estimates of errors consistent with the methodology are computed. This study shows that building reliable statistics on the fields is fundamental to improve the monthly estimates and to determine the absolute error bars. The new mean fields and variances deduced from the ISAS13 re-analysis and dataset show significant changes relative to the previous ISAS estimates, in particular in the southern ocean, justifying the iterative procedure. During the decade covered by Argo, the intermediate waters appear warmer and saltier in the North Atlantic and fresher in the Southern Ocean than in WOA05 long term mean. At inter-annual scale, the impact of ENSO on the Ocean Heat Content and Steric Height is observed during the 2006-2007 and 2009-2010 events captured by the network.
Resumo:
Objective: Huntington’s Disease (HD) is an inherited disorder, characterised by a progressive degeneration of the brain. Due to the nature of the symptoms, the genetic element of the disease and the fact that there is no cure, HD patients and those in their support network often experience considerable stress and anxiety. With an expansion in Internet access, individuals affected by HD have new opportunities for information retrieval and social support. The aim of this study is to examine the provision of social support in messages posted to a HD online support group bulletin board. Methods: In total, 1313 messages were content analysed using a modified version of the Social Support Behaviour Code developed by Cutrona & Suhr (1992). Results: The analysis indicates that group members most frequently offered informational (56.2%) and emotional support (51.9%) followed by network support (48.4%) with esteem support (21.7%) and tangible assistance (9.8%) least frequently offered. Conclusion: This study suggests that exchanging informational and emotional support represents a key function of this online group. Practice implications: Online support groups provide a unique opportunity for health professionals to learn about the experiences and views of individuals affected by HD and explore where and why gaps may exist between evidence-based medicine and consumer behaviour and expectations.
Resumo:
Conventional web search engines are centralised in that a single entity crawls and indexes the documents selected for future retrieval, and the relevance models used to determine which documents are relevant to a given user query. As a result, these search engines suffer from several technical drawbacks such as handling scale, timeliness and reliability, in addition to ethical concerns such as commercial manipulation and information censorship. Alleviating the need to rely entirely on a single entity, Peer-to-Peer (P2P) Information Retrieval (IR) has been proposed as a solution, as it distributes the functional components of a web search engine – from crawling and indexing documents, to query processing – across the network of users (or, peers) who use the search engine. This strategy for constructing an IR system poses several efficiency and effectiveness challenges which have been identified in past work. Accordingly, this thesis makes several contributions towards advancing the state of the art in P2P-IR effectiveness by improving the query processing and relevance scoring aspects of a P2P web search. Federated search systems are a form of distributed information retrieval model that route the user’s information need, formulated as a query, to distributed resources and merge the retrieved result lists into a final list. P2P-IR networks are one form of federated search in routing queries and merging result among participating peers. The query is propagated through disseminated nodes to hit the peers that are most likely to contain relevant documents, then the retrieved result lists are merged at different points along the path from the relevant peers to the query initializer (or namely, customer). However, query routing in P2P-IR networks is considered as one of the major challenges and critical part in P2P-IR networks; as the relevant peers might be lost in low-quality peer selection while executing the query routing, and inevitably lead to less effective retrieval results. This motivates this thesis to study and propose query routing techniques to improve retrieval quality in such networks. Cluster-based semi-structured P2P-IR networks exploit the cluster hypothesis to organise the peers into similar semantic clusters where each such semantic cluster is managed by super-peers. In this thesis, I construct three semi-structured P2P-IR models and examine their retrieval effectiveness. I also leverage the cluster centroids at the super-peer level as content representations gathered from cooperative peers to propose a query routing approach called Inverted PeerCluster Index (IPI) that simulates the conventional inverted index of the centralised corpus to organise the statistics of peers’ terms. The results show a competitive retrieval quality in comparison to baseline approaches. Furthermore, I study the applicability of using the conventional Information Retrieval models as peer selection approaches where each peer can be considered as a big document of documents. The experimental evaluation shows comparative and significant results and explains that document retrieval methods are very effective for peer selection that brings back the analogy between documents and peers. Additionally, Learning to Rank (LtR) algorithms are exploited to build a learned classifier for peer ranking at the super-peer level. The experiments show significant results with state-of-the-art resource selection methods and competitive results to corresponding classification-based approaches. Finally, I propose reputation-based query routing approaches that exploit the idea of providing feedback on a specific item in the social community networks and manage it for future decision-making. The system monitors users’ behaviours when they click or download documents from the final ranked list as implicit feedback and mines the given information to build a reputation-based data structure. The data structure is used to score peers and then rank them for query routing. I conduct a set of experiments to cover various scenarios including noisy feedback information (i.e, providing positive feedback on non-relevant documents) to examine the robustness of reputation-based approaches. The empirical evaluation shows significant results in almost all measurement metrics with approximate improvement more than 56% compared to baseline approaches. Thus, based on the results, if one were to choose one technique, reputation-based approaches are clearly the natural choices which also can be deployed on any P2P network.
Resumo:
Most of the existing open-source search engines, utilize keyword or tf-idf based techniques to find relevant documents and web pages relative to an input query. Although these methods, with the help of a page rank or knowledge graphs, proved to be effective in some cases, they often fail to retrieve relevant instances for more complicated queries that would require a semantic understanding to be exploited. In this Thesis, a self-supervised information retrieval system based on transformers is employed to build a semantic search engine over the library of Gruppo Maggioli company. Semantic search or search with meaning can refer to an understanding of the query, instead of simply finding words matches and, in general, it represents knowledge in a way suitable for retrieval. We chose to investigate a new self-supervised strategy to handle the training of unlabeled data based on the creation of pairs of ’artificial’ queries and the respective positive passages. We claim that by removing the reliance on labeled data, we may use the large volume of unlabeled material on the web without being limited to languages or domains where labeled data is abundant.