969 resultados para Query suggestion
Resumo:
International audience
Resumo:
Throughout the last years technologic improvements have enabled internet users to analyze and retrieve data regarding Internet searches. In several fields of study this data has been used. Some authors have been using search engine query data to forecast economic variables, to detect influenza areas or to demonstrate that it is possible to capture some patterns in stock markets indexes. In this paper one investment strategy is presented using Google Trends’ weekly query data from major global stock market indexes’ constituents. The results suggest that it is indeed possible to achieve higher Info Sharpe ratios, especially for the major European stock market indexes in comparison to those provided by a buy-and-hold strategy for the period considered.
Resumo:
Conventional web search engines are centralised in that a single entity crawls and indexes the documents selected for future retrieval, and the relevance models used to determine which documents are relevant to a given user query. As a result, these search engines suffer from several technical drawbacks such as handling scale, timeliness and reliability, in addition to ethical concerns such as commercial manipulation and information censorship. Alleviating the need to rely entirely on a single entity, Peer-to-Peer (P2P) Information Retrieval (IR) has been proposed as a solution, as it distributes the functional components of a web search engine – from crawling and indexing documents, to query processing – across the network of users (or, peers) who use the search engine. This strategy for constructing an IR system poses several efficiency and effectiveness challenges which have been identified in past work. Accordingly, this thesis makes several contributions towards advancing the state of the art in P2P-IR effectiveness by improving the query processing and relevance scoring aspects of a P2P web search. Federated search systems are a form of distributed information retrieval model that route the user’s information need, formulated as a query, to distributed resources and merge the retrieved result lists into a final list. P2P-IR networks are one form of federated search in routing queries and merging result among participating peers. The query is propagated through disseminated nodes to hit the peers that are most likely to contain relevant documents, then the retrieved result lists are merged at different points along the path from the relevant peers to the query initializer (or namely, customer). However, query routing in P2P-IR networks is considered as one of the major challenges and critical part in P2P-IR networks; as the relevant peers might be lost in low-quality peer selection while executing the query routing, and inevitably lead to less effective retrieval results. This motivates this thesis to study and propose query routing techniques to improve retrieval quality in such networks. Cluster-based semi-structured P2P-IR networks exploit the cluster hypothesis to organise the peers into similar semantic clusters where each such semantic cluster is managed by super-peers. In this thesis, I construct three semi-structured P2P-IR models and examine their retrieval effectiveness. I also leverage the cluster centroids at the super-peer level as content representations gathered from cooperative peers to propose a query routing approach called Inverted PeerCluster Index (IPI) that simulates the conventional inverted index of the centralised corpus to organise the statistics of peers’ terms. The results show a competitive retrieval quality in comparison to baseline approaches. Furthermore, I study the applicability of using the conventional Information Retrieval models as peer selection approaches where each peer can be considered as a big document of documents. The experimental evaluation shows comparative and significant results and explains that document retrieval methods are very effective for peer selection that brings back the analogy between documents and peers. Additionally, Learning to Rank (LtR) algorithms are exploited to build a learned classifier for peer ranking at the super-peer level. The experiments show significant results with state-of-the-art resource selection methods and competitive results to corresponding classification-based approaches. Finally, I propose reputation-based query routing approaches that exploit the idea of providing feedback on a specific item in the social community networks and manage it for future decision-making. The system monitors users’ behaviours when they click or download documents from the final ranked list as implicit feedback and mines the given information to build a reputation-based data structure. The data structure is used to score peers and then rank them for query routing. I conduct a set of experiments to cover various scenarios including noisy feedback information (i.e, providing positive feedback on non-relevant documents) to examine the robustness of reputation-based approaches. The empirical evaluation shows significant results in almost all measurement metrics with approximate improvement more than 56% compared to baseline approaches. Thus, based on the results, if one were to choose one technique, reputation-based approaches are clearly the natural choices which also can be deployed on any P2P network.
Resumo:
This paper discusses a framework in which catalog service communities are built, linked for interaction, and constantly monitored and adapted over time. A catalog service community (represented as a peer node in a peer-to-peer network) in our system can be viewed as domain specific data integration mediators representing the domain knowledge and the registry information. The query routing among communities is performed to identify a set of data sources that are relevant to answering a given query. The system monitors the interactions between the communities to discover patterns that may lead to restructuring of the network (e.g., irrelevant peers removed, new relationships created, etc.).
Resumo:
Until recently, integration of enterprise systems has been supported largely by monolithic architectures. From a technical perspective, this approach has been challenged by the suggestion of component-based enterprise systems. Lately, the nature of software as proprietary item has been questioned through the increased use of open source software in business computing in general. This suggests the potential for altered technological and commercial constellations for the design of enterprise systems, which are presented in four scenarios. © Springer-Verlag 2004.
Resumo:
Enterprise systems are located within the antinomy of appearing as generic product, while being means of multiple integrations for the user through configuration and customisation. Technological and organisational integrations are defined by architectures and standardised interfaces. Until recently, technological integration of enterprise systems has been supported largely by monolithic architectures that were designed, and maintained by the respective developers. From a technical perspective, this approach had been challenged by the suggestion of component-based enterprise systems that would allow for a more user-focused system through strict modularisation. Lately, the product nature of software as proprietary item has been questioned through the rapid increase of open source programs that are being used in business computing in general, and also within the overall portfolio that makes up enterprise systems. This suggests the potential for altered technological and commercial constellations for the design of enterprise systems, which are presented in different scenarios. The technological and commercial decomposition of enterprise software and systems may also address some concerns emerging from the users’ experience of those systems, and which may have arisen from their proprietary or product nature.
Resumo:
Our brief is to investigate the role of community and lifestyle in the making of a globally successful knowledge city region. Our approach is essentially pragmatic. We start by broadly examining knowledge-based urban development from a number of different perspectives. The first view is historical. In this context knowledge work and knowledge workers are seen as vital parts of a new emergent mode of production reliant on the continual production of abstract knowledge. We briefly develop this perspective to encompass the work of Richard Florida who has, notedly, claimed: “Wherever talent goes, innovation, creativity, and economic growth are sure to follow.” Our next perspective examines concepts of knowledge and modes of its production to discover knowledge is not an unchanging object but a human activity that changes in form and content through history. The suggestion emerges that not only is the production of contemporary ‘knowledge’ organised in a specific (and new) manner but also the output of this networked production is a particular type of knowledge (i.e. techné). The third perspective locates knowledge production and its workers in the contemporary urban context. As such, it co-ordinates the knowledge city in the increasingly global structure of cities and develops a typology of different groups of knowledge workers in their preferred urban environment(s). We see emerging here a distinctive geography of knowledge production. It is an urban phenomenon. There is, in short, something about the nature of cities that knowledge workers find particularly attractive. In the next, essentially anthropological, perspective we start to explore the needs and desires of the individual knowledge worker. Beyond the needs basic to any modern human household an attempt is made to deduce, from a base understanding of knowledge work as mental labour, the compensatory cultural needs of the knowledge worker when not at work - and the expression of these needs in the urban fabric. Our final perspective consists of two case studies. In a review of the experiences of Austin, Texas and Singapore’s one-north precinct we collect empirical data on, respectively, a knowledge city that has sustained itself for over 50 years and an urban precinct newly launched into the global market for knowledge work and knowledge workers. Interwoven The Role of Community and Lifestyle in the Making of a Knowledge City Urban Research Program 8 through all perspectives, in the form of apposite citation, is that of ‘expert opinion’ gathered in a rudimentary poll of academic and industry sources. This opinion appears in text boxes while details of the survey can be found in Appendix A. In the conclusion of the report we interpret the wide range of evidence gathered above in a policy frame. It is our hope this report will leave the reader with a clearer picture of the decisive organisational, infrastructural, aesthetic and social dimensions of a knowledge precinct.
Resumo:
Habitat fragmentation can have an impact on a wide variety of biological processes including abundance, life history strategies, mating system, inbreeding and genetic diversity levels of individual species. Although fragmented populations have received much attention, ecological and genetic responses of species to fragmentation have still not been fully resolved. The current study investigated the ecological factors that may influence the demographic and genetic structure of the giant white-tailed rat (Uromys caudimaculatus) within fragmented tropical rainforests. It is the first study to examine relationships between food resources, vegetation attributes and Uromys demography in a quantitative manner. Giant white-tailed rat densities were strongly correlated with specific suites of food resources rather than forest structure or other factors linked to fragmentation (i.e. fragment size). Several demographic parameters including the density of resident adults and juvenile recruitment showed similar patterns. Although data were limited, high quality food resources appear to initiate breeding in female Uromys. Where data were sufficient, influx of juveniles was significantly related to the density of high quality food resources that had fallen in the previous three months. Thus, availability of high quality food resources appear to be more important than either vegetation structure or fragment size in influencing giant white-tailed rat demography. These results support the suggestion that a species’ response to fragmentation can be related to their specific habitat requirements and can vary in response to local ecological conditions. In contrast to demographic data, genetic data revealed a significant negative effect of habitat fragmentation on genetic diversity and effective population size in U. caudimaculatus. All three fragments showed lower levels of allelic richness, number of private alleles and expected heterozygosity compared with the unfragmented continuous rainforest site. Populations at all sites were significantly differentiated, suggesting restricted among population gene flow. The combined effects of reduced genetic diversity, lower effective population size and restricted gene flow suggest that long-term viability of small fragmented populations may be at risk, unless effective management is employed in the future. A diverse range of genetic reproductive behaviours and sex-biased dispersal patterns were evident within U. caudimaculatus populations. Genetic paternity analyses revealed that the major mating system in U. caudimaculatus appeared to be polygyny at sites P1, P3 and C1. Evidence of genetic monogamy, however, was also found in the three fragmented sites, and was the dominant mating system in the remaining low density, small fragment (P2). High variability in reproductive skew and reproductive success was also found but was less pronounced when only resident Uromys were considered. Male body condition predicted which males sired offspring, however, neither body condition nor heterozygosity levels were accurate predictors of the number of offspring assigned to individual males or females. Genetic spatial autocorrelation analyses provided evidence for increased philopatry among females at site P1, but increased philopatry among males at site P3. This suggests that male-biased dispersal occurs at site P1 and female-biased dispersal at site P3, implying that in addition to mating systems, Uromys may also be able to adjust their dispersal behaviour to suit local ecological conditions. This study highlights the importance of examining the mechanisms that underlie population-level responses to habitat fragmentation using a combined ecological and genetic approach. The ecological data suggested that habitat quality (i.e. high quality food resources) rather than habitat quantity (i.e. fragment size) was relatively more important in influencing giant white-tailed rat demographics, at least for the populations studied here . Conversely, genetic data showed strong evidence that Uromys populations were affected adversely by habitat fragmentation and that management of isolated populations may be required for long-term viability of populations within isolated rainforest fragments.
Resumo:
With the advent of Service Oriented Architecture, Web Services have gained tremendous popularity. Due to the availability of a large number of Web services, finding an appropriate Web service according to the requirement of the user is a challenge. This warrants the need to establish an effective and reliable process of Web service discovery. A considerable body of research has emerged to develop methods to improve the accuracy of Web service discovery to match the best service. The process of Web service discovery results in suggesting many individual services that partially fulfil the user’s interest. By considering the semantic relationships of words used in describing the services as well as the use of input and output parameters can lead to accurate Web service discovery. Appropriate linking of individual matched services should fully satisfy the requirements which the user is looking for. This research proposes to integrate a semantic model and a data mining technique to enhance the accuracy of Web service discovery. A novel three-phase Web service discovery methodology has been proposed. The first phase performs match-making to find semantically similar Web services for a user query. In order to perform semantic analysis on the content present in the Web service description language document, the support-based latent semantic kernel is constructed using an innovative concept of binning and merging on the large quantity of text documents covering diverse areas of domain of knowledge. The use of a generic latent semantic kernel constructed with a large number of terms helps to find the hidden meaning of the query terms which otherwise could not be found. Sometimes a single Web service is unable to fully satisfy the requirement of the user. In such cases, a composition of multiple inter-related Web services is presented to the user. The task of checking the possibility of linking multiple Web services is done in the second phase. Once the feasibility of linking Web services is checked, the objective is to provide the user with the best composition of Web services. In the link analysis phase, the Web services are modelled as nodes of a graph and an allpair shortest-path algorithm is applied to find the optimum path at the minimum cost for traversal. The third phase which is the system integration, integrates the results from the preceding two phases by using an original fusion algorithm in the fusion engine. Finally, the recommendation engine which is an integral part of the system integration phase makes the final recommendations including individual and composite Web services to the user. In order to evaluate the performance of the proposed method, extensive experimentation has been performed. Results of the proposed support-based semantic kernel method of Web service discovery are compared with the results of the standard keyword-based information-retrieval method and a clustering-based machine-learning method of Web service discovery. The proposed method outperforms both information-retrieval and machine-learning based methods. Experimental results and statistical analysis also show that the best Web services compositions are obtained by considering 10 to 15 Web services that are found in phase-I for linking. Empirical results also ascertain that the fusion engine boosts the accuracy of Web service discovery by combining the inputs from both the semantic analysis (phase-I) and the link analysis (phase-II) in a systematic fashion. Overall, the accuracy of Web service discovery with the proposed method shows a significant improvement over traditional discovery methods.
Resumo:
Peer to peer systems have been widely used in the internet. However, most of the peer to peer information systems are still missing some of the important features, for example cross-language IR (Information Retrieval) and collection selection / fusion features. Cross-language IR is the state-of-art research area in IR research community. It has not been used in any real world IR systems yet. Cross-language IR has the ability to issue a query in one language and receive documents in other languages. In typical peer to peer environment, users are from multiple countries. Their collections are definitely in multiple languages. Cross-language IR can help users to find documents more easily. E.g. many Chinese researchers will search research papers in both Chinese and English. With Cross-language IR, they can do one query in Chinese and get documents in two languages. The Out Of Vocabulary (OOV) problem is one of the key research areas in crosslanguage information retrieval. In recent years, web mining was shown to be one of the effective approaches to solving this problem. However, how to extract Multiword Lexical Units (MLUs) from the web content and how to select the correct translations from the extracted candidate MLUs are still two difficult problems in web mining based automated translation approaches. Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalized-score based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database but do not consider the retrieval performance of the remote search engine. This thesis presents research on building a peer to peer IR system with crosslanguage IR and advance collection profiling technique for fusion features. Particularly, this thesis first presents a new Chinese term measurement and new Chinese MLU extraction process that works well on small corpora. An approach to selection of MLUs in a more accurate manner is also presented. After that, this thesis proposes a collection profiling strategy which can discover not only collection content but also retrieval performance of the remote search engine. Based on collection profiling, a web-based query classification method and two collection fusion approaches are developed and presented in this thesis. Our experiments show that the proposed strategies are effective in merging results in uncooperative peer to peer environments. Here, an uncooperative environment is defined as each peer in the system is autonomous. Peer like to share documents but they do not share collection statistics. This environment is a typical peer to peer IR environment. Finally, all those approaches are grouped together to build up a secure peer to peer multilingual IR system that cooperates through X.509 and email system.
Resumo:
This project is an extension of a previous CRC project (220-059-B) which developed a program for life prediction of gutters in Queensland schools. A number of sources of information on service life of metallic building components were formed into databases linked to a Case-Based Reasoning Engine which extracted relevant cases from each source. In the initial software, no attempt was made to choose between the results offered or construct a case for retention in the casebase. In this phase of the project, alternative data mining techniques will be explored and evaluated. A process for selecting a unique service life prediction for each query will also be investigated. This report summarises the initial evaluation of several data mining techniques.
Resumo:
The project has further developed two programs for the industry partners related to service life prediction and salt deposition. The program for Queensland Department of Main Roads which predicts salt deposition on different bridge structures at any point in Queensland has been further refined by looking at more variables. It was found that the height of the bridge significantly affects the salt deposition levels only when very close to the coast. However the effect of natural cleaning of salt by rainfall was incorporated into the program. The user interface allows selection of a location in Queensland, followed by a bridge component. The program then predicts the annual salt deposition rate and rates the likely severity of the environment. The service life prediction program for the Queensland Department of Public Works has been expanded to include 10 common building components, in a variety of environments. Data mining procedures have been used to develop the program and increase the usefulness of the application. A Query Based Learning System (QBLS) has been developed which is based on a data-centric model with extensions to provide support for user interaction. The program is based on number of sources of information about the service life of building components. These include the Delphi survey, the CSIRO Holistic model and a school survey. During the project, the Holistic model was modified for each building component and databases generated for the locations of all Queensland schools. Experiments were carried out to verify and provide parameters for the modelling. These included instrumentation of a downpipe, measurements on pH and chloride levels in leaf litter, EIS measurements and chromate leaching from Colorbond materials and dose tests to measure corrosion rates of new materials. A further database was also generated for inclusion in the program through a large school survey. Over 30 schools in a range of environments from tropical coastal to temperate inland were visited and the condition of the building components rated on a scale of 0-5. The data was analysed and used to calculate an average service life for each component/material combination in the environments, where sufficient examples were available.
Resumo:
This paper deals with the problem of using the data mining models in a real-world situation where the user can not provide all the inputs with which the predictive model is built. A learning system framework, Query Based Learning System (QBLS), is developed for improving the performance of the predictive models in practice where not all inputs are available for querying to the system. The automatic feature selection algorithm called Query Based Feature Selection (QBFS) is developed for selecting features to obtain a balance between the relative minimum subset of features and the relative maximum classification accuracy. Performance of the QBLS system and the QBFS algorithm is successfully demonstrated with a real-world application