992 resultados para Web databases
Resumo:
Current-day web search engines (e.g., Google) do not crawl and index a significant portion of theWeb and, hence, web users relying on search engines only are unable to discover and access a large amount of information from the non-indexable part of the Web. Specifically, dynamic pages generated based on parameters provided by a user via web search forms (or search interfaces) are not indexed by search engines and cannot be found in searchers’ results. Such search interfaces provide web users with an online access to myriads of databases on the Web. In order to obtain some information from a web database of interest, a user issues his/her query by specifying query terms in a search form and receives the query results, a set of dynamic pages that embed required information from a database. At the same time, issuing a query via an arbitrary search interface is an extremely complex task for any kind of automatic agents including web crawlers, which, at least up to the present day, do not even attempt to pass through web forms on a large scale. In this thesis, our primary and key object of study is a huge portion of the Web (hereafter referred as the deep Web) hidden behind web search interfaces. We concentrate on three classes of problems around the deep Web: characterization of deep Web, finding and classifying deep web resources, and querying web databases. Characterizing deep Web: Though the term deep Web was coined in 2000, which is sufficiently long ago for any web-related concept/technology, we still do not know many important characteristics of the deep Web. Another matter of concern is that surveys of the deep Web existing so far are predominantly based on study of deep web sites in English. One can then expect that findings from these surveys may be biased, especially owing to a steady increase in non-English web content. In this way, surveying of national segments of the deep Web is of interest not only to national communities but to the whole web community as well. In this thesis, we propose two new methods for estimating the main parameters of deep Web. We use the suggested methods to estimate the scale of one specific national segment of the Web and report our findings. We also build and make publicly available a dataset describing more than 200 web databases from the national segment of the Web. Finding deep web resources: The deep Web has been growing at a very fast pace. It has been estimated that there are hundred thousands of deep web sites. Due to the huge volume of information in the deep Web, there has been a significant interest to approaches that allow users and computer applications to leverage this information. Most approaches assumed that search interfaces to web databases of interest are already discovered and known to query systems. However, such assumptions do not hold true mostly because of the large scale of the deep Web – indeed, for any given domain of interest there are too many web databases with relevant content. Thus, the ability to locate search interfaces to web databases becomes a key requirement for any application accessing the deep Web. In this thesis, we describe the architecture of the I-Crawler, a system for finding and classifying search interfaces. Specifically, the I-Crawler is intentionally designed to be used in deepWeb characterization studies and for constructing directories of deep web resources. Unlike almost all other approaches to the deep Web existing so far, the I-Crawler is able to recognize and analyze JavaScript-rich and non-HTML searchable forms. Querying web databases: Retrieving information by filling out web search forms is a typical task for a web user. This is all the more so as interfaces of conventional search engines are also web forms. At present, a user needs to manually provide input values to search interfaces and then extract required data from the pages with results. The manual filling out forms is not feasible and cumbersome in cases of complex queries but such kind of queries are essential for many web searches especially in the area of e-commerce. In this way, the automation of querying and retrieving data behind search interfaces is desirable and essential for such tasks as building domain-independent deep web crawlers and automated web agents, searching for domain-specific information (vertical search engines), and for extraction and integration of information from various deep web resources. We present a data model for representing search interfaces and discuss techniques for extracting field labels, client-side scripts and structured data from HTML pages. We also describe a representation of result pages and discuss how to extract and store results of form queries. Besides, we present a user-friendly and expressive form query language that allows one to retrieve information behind search interfaces and extract useful data from the result pages based on specified conditions. We implement a prototype system for querying web databases and describe its architecture and components design.
Resumo:
Socadi (Sociedad Catalana de Documentació i Informació) viene organizando una serie de sesiones de estudio y debate que están teniendo un gran éxito de asistencia. Ernest Abadal resumió, primero para IweTel y luego para nuestra revista, la que tuvo lugar sobre "Intranets documentales"
Resumo:
La Biblioteca de la Universitat de Girona és l'encarregada de coordinar els repositoris digitals de la Universitat. A l'inici el repositori neix de la necessitat de difondre i preservar tota la recerca de la Universitat, però mica en mica el projecte va creixent i es creen diferents repositoris per a diferents tipus de continguts. Així es creen els repositoris DUGiDocs, DUGiMedia i DUGiFonsEspecials.L'objectiu d'aquest PFC és desenvolupar un portal web amb un sistema d'indexació que permeti la cerca per diferents criteris com, per exemple, autor, matèria i títol. L'entorn ha de tenir un disseny agradable per a l'usuari final i ha de permetre obtenir les dades del repositori que se li indiquin. També ha de ser capaç de detectar els possibles duplicats en els criteris de cerca i notificar-ho a l'administrador
Resumo:
La Biblioteca de la Universitat de Girona és l'encarregada de coordinar els repositoris digitals de la Universitat. A l'inici el repositori neix de la necessitat de difondre i preservar tota la recerca de la Universitat, però mica en mica el projecte va creixent i es creen diferents repositoris per a diferents tipus de continguts. Així es creen els repositoris DUGiDocs, DUGiMedia i DUGiFonsEspecials.L'objectiu d'aquest PFC és desenvolupar un portal web amb un sistema d'indexació que permeti la cerca per diferents criteris com, per exemple, autor, matèria i títol. L'entorn ha de tenir un disseny agradable per a l'usuari final i ha de permetre obtenir les dades del repositori que se li indiquin. També ha de ser capaç de detectar els possibles duplicats en els criteris de cerca i notificar-ho a l'administrador
Resumo:
Prenatal immune challenge (PIC) in pregnant rodents produces offspring with abnormalities in behavior, histology, and gene expression that are reminiscent of schizophrenia and autism. Based on this, the goal of this article was to review the main contributions of PIC models, especially the one using the viral-mimetic particle polyriboinosinic-polyribocytidylic acid (poly-I:C), to the understanding of the etiology, biological basis and treatment of schizophrenia. This systematic review consisted of a search of available web databases (PubMed, SciELO, LILACS, PsycINFO, and ISI Web of Knowledge) for original studies published in the last 10 years (May 2001 to October 2011) concerning animal models of PIC, focusing on those using poly-I:C. The results showed that the PIC model with poly-I:C is able to mimic the prodrome and both the positive and negative/cognitive dimensions of schizophrenia, depending on the specific gestation time window of the immune challenge. The model resembles the neurobiology and etiology of schizophrenia and has good predictive value. In conclusion, this model is a robust tool for the identification of novel molecular targets during prenatal life, adolescence and adulthood that might contribute to the development of preventive and/or treatment strategies (targeting specific symptoms, i.e., positive or negative/cognitive) for this devastating mental disorder, also presenting biosafety as compared to viral infection models. One limitation of this model is the incapacity to model the full spectrum of immune responses normally induced by viral exposure.
Resumo:
Prenatal immune challenge (PIC) in pregnant rodents produces offspring with abnormalities in behavior, histology, and gene expression that are reminiscent of schizophrenia and autism. Based on this, the goal of this article was to review the main contributions of PIC models, especially the one using the viral-mimetic particle polyriboinosinic-polyribocytidylic acid (poly-I:C), to the understanding of the etiology, biological basis and treatment of schizophrenia. This systematic review consisted of a search of available web databases (PubMed, SciELO, LILACS, PsycINFO, and ISI Web of Knowledge) for original studies published in the last 10 years (May 2001 to October 2011) concerning animal models of PIC, focusing on those using poly-I:C. The results showed that the PIC model with poly-I:C is able to mimic the prodrome and both the positive and negative/cognitive dimensions of schizophrenia, depending on the specific gestation time window of the immune challenge. The model resembles the neurobiology and etiology of schizophrenia and has good predictive value. In conclusion, this model is a robust tool for the identification of novel molecular targets during prenatal life, adolescence and adulthood that might contribute to the development of preventive and/or treatment strategies (targeting specific symptoms, i.e., positive or negative/cognitive) for this devastating mental disorder, also presenting biosafety as compared to viral infection models. One limitation of this model is the incapacity to model the full spectrum of immune responses normally induced by viral exposure.
Resumo:
Spatial data has now been used extensively in the Web environment, providing online customized maps and supporting map-based applications. The full potential of Web-based spatial applications, however, has yet to be achieved due to performance issues related to the large sizes and high complexity of spatial data. In this paper, we introduce a multiresolution approach to spatial data management and query processing such that the database server can choose spatial data at the right resolution level for different Web applications. One highly desirable property of the proposed approach is that the server-side processing cost and network traffic can be reduced when the level of resolution required by applications are low. Another advantage is that our approach pushes complex multiresolution structures and algorithms into the spatial database engine. That is, the developer of spatial Web applications needs not to be concerned with such complexity. This paper explains the basic idea, technical feasibility and applications of multiresolution spatial databases.
Resumo:
Selostus: Maatalous- ja elintarviketieteiden www-pohjaiset viitetietokannat ja aihehakemistot - suomalaisen tiedonetsijän näkökulma
Resumo:
With the constant grow of enterprises and the need to share information across departments and business areas becomes more critical, companies are turning to integration to provide a method for interconnecting heterogeneous, distributed and autonomous systems. Whether the sales application needs to interface with the inventory application, the procurement application connect to an auction site, it seems that any application can be made better by integrating it with other applications. Integration between applications can face several troublesome due the fact that applications may not have been designed and implemented having integration in mind. Regarding to integration issues, two tier software systems, composed by the database tier and by the “front-end” tier (interface), have shown some limitations. As a solution to overcome the two tier limitations, three tier systems were proposed in the literature. Thus, by adding a middle-tier (referred as middleware) between the database tier and the “front-end” tier (or simply referred application), three main benefits emerge. The first benefit is related with the fact that the division of software systems in three tiers enables increased integration capabilities with other systems. The second benefit is related with the fact that any modifications to the individual tiers may be carried out without necessarily affecting the other tiers and integrated systems and the third benefit, consequence of the others, is related with less maintenance tasks in software system and in all integrated systems. Concerning software development in three tiers, this dissertation focus on two emerging technologies, Semantic Web and Service Oriented Architecture, combined with middleware. These two technologies blended with middleware, which resulted in the development of Swoat framework (Service and Semantic Web Oriented ArchiTecture), lead to the following four synergic advantages: (1) allow the creation of loosely-coupled systems, decoupling the database from “front-end” tiers, therefore reducing maintenance; (2) the database schema is transparent to “front-end” tiers which are aware of the information model (or domain model) that describes what data is accessible; (3) integration with other heterogeneous systems is allowed by providing services provided by the middleware; (4) the service request by the “frontend” tier focus on ‘what’ data and not on ‘where’ and ‘how’ related issues, reducing this way the application development time by developers.
Resumo:
This chapter presents fuzzy cognitive maps (FCM) as a vehicle for Web knowledge aggregation, representation, and reasoning. The corresponding Web KnowARR framework incorporates findings from fuzzy logic. To this end, a first emphasis is particularly on the Web KnowARR framework along with a stakeholder management use case to illustrate the framework’s usefulness as a second focal point. This management form is to help projects to acceptance and assertiveness where claims for company decisions are actively involved in the management process. Stakeholder maps visually (re-) present these claims. On one hand, they resort to non-public content and on the other they resort to content that is available to the public (mostly on the Web). The Semantic Web offers opportunities not only to present public content descriptively but also to show relationships. The proposed framework can serve as the basis for the public content of stakeholder maps.
Resumo:
Spatial data has now been used extensively in the Web environment, providing online customized maps and supporting map-based applications. The full potential of Web-based spatial applications, however, has yet to be achieved due to performance issues related to the large sizes and high complexity of spatial data. In this paper, we introduce a multiresolution approach to spatial data management and query processing such that the database server can choose spatial data at the right resolution level for different Web applications. One highly desirable property of the proposed approach is that the server-side processing cost and network traffic can be reduced when the level of resolution required by applications are low. Another advantage is that our approach pushes complex multiresolution structures and algorithms into the spatial database engine. That is, the developer of spatial Web applications needs not to be concerned with such complexity. This paper explains the basic idea, technical feasibility and applications of multiresolution spatial databases.
Resumo:
High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.
Impact of Commercial Search Engines and International Databases on Engineering Teaching and Research
Resumo:
For the last three decades, the engineering higher education and professional environments have been completely transformed by the "electronic/digital information revolution" that has included the introduction of personal computer, the development of email and world wide web, and broadband Internet connections at home. Herein the writer compares the performances of several digital tools with traditional library resources. While new specialised search engines and open access digital repositories may fill a gap between conventional search engines and traditional references, these should be not be confused with real libraries and international scientific databases that encompass textbooks and peer-reviewed scholarly works. An absence of listing in some Internet search listings, databases and repositories is not an indication of standing. Researchers, engineers and academics should remember these key differences in assessing the quality of bibliographic "research" based solely upon Internet searches.
Resumo:
Over recent years databases have become an extremely important resource for biomedical research. Immunology research is increasingly dependent on access to extensive biological databases to extract existing information, plan experiments, and analyse experimental results. This review describes 15 immunological databases that have appeared over the last 30 years. In addition, important issues regarding database design and the potential for misuse of information contained within these databases are discussed. Access pointers are provided for the major immunological databases and also for a number of other immunological resources accessible over the World Wide Web (WWW). (C) 2000 Elsevier Science B.V. All rights reserved.
Resumo:
Esta dissertação trata da análise da produção científica e tecnológica internacional e brasileira na área de conhecimento Engenharia Civil, por meio de indicadores bibliométricos. A área Engenharia Civil foi escolhida em razão da sua relevância para o desenvolvimento econômico do país. No entanto, em termos absolutos e relativos, está entre os setores tecnologicamente mais atrasados da economia. A bibliometria é uma disciplina com alcance multidisciplinar que estuda o uso e os aspectos quantitativos da produção científica registrada. Os indicadores de produção científica são objeto de análise de várias áreas do conhecimento, tanto para o planejamento e a execução de políticas públicas de vários setores quanto para maior conhecimento da comunidade científica sobre o sistema em que está inserida. A metodologia utilizada para a elaboração deste estudo descritivo de caráter exploratório foi a análise documental e bibliométrica, baseada em dados das publicações científicas, no período de 1970 a 2012, e tecnológicas, no período de 2001 a 2012, da área Engenharia Civil, indexadas nas bases de dados Science Citattion Index Expanded (SCI); Social Science Citation Index (SSCI); Conference Proceedings Citation Index (CPCI) e da Derwent Innovations Index (DII), que compõem a base de dados multidisciplinar da Web of Sicence (WoS). As informações foram qualificadas e quantificadas com o auxílio do software bibliométrico VantagePoint®. Os resultados obtidos confirmaram o baixo número de publicações científicas e tecnológicas na área de conhecimento Engenharia Civil de autores filiados a instituições de ensino e pesquisa brasileiras quando comparados aos dos países industrializados. Existe um conjunto de fortes condicionantes que ultrapassam o poder de decisão e de influência da academia, dificultando e limitando a disseminação das pesquisas e patentes brasileiras relacionadas a fatores de caráter sistêmico e cultural. A possibilidade de análise de indicadores de produção científica e tecnológica na Engenharia Civil contribui para criar políticas que, se utilizadas por agências de fomento, podem subsidiar investimentos mais fundamentados por parte dos governos e da iniciativa privada, a exemplo do que é feito por outros setores industriais.