951 resultados para knowledge discovery


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Information-centric networking (ICN) enables communication in isolated islands, where fixed infrastructure is not available, but also supports seamless communication if the infrastructure is up and running again. In disaster scenarios, when a fixed infrastructure is broken, content discovery algorit hms are required to learn what content is locally available. For example, if preferred content is not available, users may also be satisfied with second best options. In this paper, we describe a new content discovery algorithm and compare it to existing Depth-first and Breadth-first traversal algorithms. Evaluations in mobile scenarios with up to 100 nodes show that it results in better performance, i.e., faster discovery time and smaller traffic overhead, than existing algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Divalent metal transporter-1 (SLC11A2/DMT1) uses the H+ electrochemical gradient as the driving force to transport divalent metal ions such as Fe2+, Mn2+ and others metals into mammalian cells. DMT1 is ubiquitously expressed, most notably in proximal duodenum, immature erythroid cells, brain and kidney. This transporter mediates H+-coupled transport of ferrous iron across the apical membrane of enterocytes. In addition, in cells such as to erythroid precursors, following transferrin receptor (TfR) mediated endocytosis; it mediates H+-coupled exit of ferrous iron from endocytic vesicles into the cytosol. Dysfunction of human DMT1 is associated with several pathologies such as iron deficiency anemia hemochromatosis, Parkinson's disease and Alzheimer's disease, as well as colorectal cancer and esophageal adenocarcinoma, making DMT1 an attractive target for drug discovery. In the present study, we performed a ligand-based virtual screening of the Princeton database (700,000 commercially available compounds) to search for pharmacophore shape analogs of recently reported DMT1 inhibitors. We discovered a new compound, named pyrimidinone 8, which mediates a reversible linear non-competitive inhibition of human DMT1 (hDMT1) transport activity with a Ki of ∼20 μM. This compound does not affect hDMT1 cell surface expression and shows no dependence on extracellular pH. To our knowledge, this is the first experimental evidence that hDMT1 can be allosterically modulated by pharmacological agents. Pyrimidinone 8 represents a novel versatile tool compound and it may serve as a lead structure for the development of therapeutic compounds for pre-clinical assessment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This PhD thesis contributes to the problem of resource and service discovery in the context of the composable web. In the current web, mashup technologies allow developers reusing services and contents to build new web applications. However, developers face a problem of information flood when searching for appropriate services or resources for their combination. To contribute to overcoming this problem, a framework is defined for the discovery of services and resources. In this framework, three levels are defined for performing discovery at content, discovery and agente levels. The content level involves the information available in web resources. The web follows the Representational Stateless Transfer (REST) architectural style, in which resources are returned as representations from servers to clients. These representations usually employ the HyperText Markup Language (HTML), which, along with Content Style Sheets (CSS), describes the markup employed to render representations in a web browser. Although the use of SemanticWeb standards such as Resource Description Framework (RDF) make this architecture suitable for automatic processes to use the information present in web resources, these standards are too often not employed, so automation must rely on processing HTML. This process, often referred as Screen Scraping in the literature, is the content discovery according to the proposed framework. At this level, discovery rules indicate how the different pieces of data in resources’ representations are mapped onto semantic entities. By processing discovery rules on web resources, semantically described contents can be obtained out of them. The service level involves the operations that can be performed on the web. The current web allows users to perform different tasks such as search, blogging, e-commerce, or social networking. To describe the possible services in RESTful architectures, a high-level feature-oriented service methodology is proposed at this level. This lightweight description framework allows defining service discovery rules to identify operations in interactions with REST resources. The discovery is thus performed by applying discovery rules to contents discovered in REST interactions, in a novel process called service probing. Also, service discovery can be performed by modelling services as contents, i.e., by retrieving Application Programming Interface (API) documentation and API listings in service registries such as ProgrammableWeb. For this, a unified model for composable components in Mashup-Driven Development (MDD) has been defined after the analysis of service repositories from the web. The agent level involves the orchestration of the discovery of services and contents. At this level, agent rules allow to specify behaviours for crawling and executing services, which results in the fulfilment of a high-level goal. Agent rules are plans that allow introspecting the discovered data and services from the web and the knowledge present in service and content discovery rules to anticipate the contents and services to be found on specific resources from the web. By the definition of plans, an agent can be configured to target specific resources. The discovery framework has been evaluated on different scenarios, each one covering different levels of the framework. Contenidos a la Carta project deals with the mashing-up of news from electronic newspapers, and the framework was used for the discovery and extraction of pieces of news from the web. Similarly, in Resulta and VulneraNET projects the discovery of ideas and security knowledge in the web is covered, respectively. The service level is covered in the OMELETTE project, where mashup components such as services and widgets are discovered from component repositories from the web. The agent level is applied to the crawling of services and news in these scenarios, highlighting how the semantic description of rules and extracted data can provide complex behaviours and orchestrations of tasks in the web. The main contributions of the thesis are the unified framework for discovery, which allows configuring agents to perform automated tasks. Also, a scraping ontology has been defined for the construction of mappings for scraping web resources. A novel first-order logic rule induction algorithm is defined for the automated construction and maintenance of these mappings out of the visual information in web resources. Additionally, a common unified model for the discovery of services is defined, which allows sharing service descriptions. Future work comprises the further extension of service probing, resource ranking, the extension of the Scraping Ontology, extensions of the agent model, and contructing a base of discovery rules. Resumen La presente tesis doctoral contribuye al problema de descubrimiento de servicios y recursos en el contexto de la web combinable. En la web actual, las tecnologías de combinación de aplicaciones permiten a los desarrolladores reutilizar servicios y contenidos para construir nuevas aplicaciones web. Pese a todo, los desarrolladores afrontan un problema de saturación de información a la hora de buscar servicios o recursos apropiados para su combinación. Para contribuir a la solución de este problema, se propone un marco de trabajo para el descubrimiento de servicios y recursos. En este marco, se definen tres capas sobre las que se realiza descubrimiento a nivel de contenido, servicio y agente. El nivel de contenido involucra a la información disponible en recursos web. La web sigue el estilo arquitectónico Representational Stateless Transfer (REST), en el que los recursos son devueltos como representaciones por parte de los servidores a los clientes. Estas representaciones normalmente emplean el lenguaje de marcado HyperText Markup Language (HTML), que, unido al estándar Content Style Sheets (CSS), describe el marcado empleado para mostrar representaciones en un navegador web. Aunque el uso de estándares de la web semántica como Resource Description Framework (RDF) hace apta esta arquitectura para su uso por procesos automatizados, estos estándares no son empleados en muchas ocasiones, por lo que cualquier automatización debe basarse en el procesado del marcado HTML. Este proceso, normalmente conocido como Screen Scraping en la literatura, es el descubrimiento de contenidos en el marco de trabajo propuesto. En este nivel, un conjunto de reglas de descubrimiento indican cómo los diferentes datos en las representaciones de recursos se corresponden con entidades semánticas. Al procesar estas reglas sobre recursos web, pueden obtenerse contenidos descritos semánticamente. El nivel de servicio involucra las operaciones que pueden ser llevadas a cabo en la web. Actualmente, los usuarios de la web pueden realizar diversas tareas como búsqueda, blogging, comercio electrónico o redes sociales. Para describir los posibles servicios en arquitecturas REST, se propone en este nivel una metodología de alto nivel para descubrimiento de servicios orientada a funcionalidades. Este marco de descubrimiento ligero permite definir reglas de descubrimiento de servicios para identificar operaciones en interacciones con recursos REST. Este descubrimiento es por tanto llevado a cabo al aplicar las reglas de descubrimiento sobre contenidos descubiertos en interacciones REST, en un nuevo procedimiento llamado sondeo de servicios. Además, el descubrimiento de servicios puede ser llevado a cabo mediante el modelado de servicios como contenidos. Es decir, mediante la recuperación de documentación de Application Programming Interfaces (APIs) y listas de APIs en registros de servicios como ProgrammableWeb. Para ello, se ha definido un modelo unificado de componentes combinables para Mashup-Driven Development (MDD) tras el análisis de repositorios de servicios de la web. El nivel de agente involucra la orquestación del descubrimiento de servicios y contenidos. En este nivel, las reglas de nivel de agente permiten especificar comportamientos para el rastreo y ejecución de servicios, lo que permite la consecución de metas de mayor nivel. Las reglas de los agentes son planes que permiten la introspección sobre los datos y servicios descubiertos, así como sobre el conocimiento presente en las reglas de descubrimiento de servicios y contenidos para anticipar contenidos y servicios por encontrar en recursos específicos de la web. Mediante la definición de planes, un agente puede ser configurado para descubrir recursos específicos. El marco de descubrimiento ha sido evaluado sobre diferentes escenarios, cada uno cubriendo distintos niveles del marco. El proyecto Contenidos a la Carta trata de la combinación de noticias de periódicos digitales, y en él el framework se ha empleado para el descubrimiento y extracción de noticias de la web. De manera análoga, en los proyectos Resulta y VulneraNET se ha llevado a cabo un descubrimiento de ideas y de conocimientos de seguridad, respectivamente. El nivel de servicio se cubre en el proyecto OMELETTE, en el que componentes combinables como servicios y widgets se descubren en repositorios de componentes de la web. El nivel de agente se aplica al rastreo de servicios y noticias en estos escenarios, mostrando cómo la descripción semántica de reglas y datos extraídos permiten proporcionar comportamientos complejos y orquestaciones de tareas en la web. Las principales contribuciones de la tesis son el marco de trabajo unificado para descubrimiento, que permite configurar agentes para realizar tareas automatizadas. Además, una ontología de extracción ha sido definida para la construcción de correspondencias y extraer información de recursos web. Asimismo, un algoritmo para la inducción de reglas de lógica de primer orden se ha definido para la construcción y el mantenimiento de estas correspondencias a partir de la información visual de recursos web. Adicionalmente, se ha definido un modelo común y unificado para el descubrimiento de servicios que permite la compartición de descripciones de servicios. Como trabajos futuros se considera la extensión del sondeo de servicios, clasificación de recursos, extensión de la ontología de extracción y la construcción de una base de reglas de descubrimiento.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

While others have attempted to determine, by way of mathematical formulae, optimal resource duplication strategies for random walk protocols, this paper is concerned with studying the emergent effects of dynamic resource propagation and replication. In particular, we show, via modelling and experimentation, that under any given decay (purge) rate the number of nodes that have knowledge of particular resource converges to a fixed point or a limit cycle. We also show that even for high rates of decay - that is, when few nodes have knowledge of a particular resource - the number of hops required to find that resource is small.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis introduces a flexible visual data exploration framework which combines advanced projection algorithms from the machine learning domain with visual representation techniques developed in the information visualisation domain to help a user to explore and understand effectively large multi-dimensional datasets. The advantage of such a framework to other techniques currently available to the domain experts is that the user is directly involved in the data mining process and advanced machine learning algorithms are employed for better projection. A hierarchical visualisation model guided by a domain expert allows them to obtain an informed segmentation of the input space. Two other components of this thesis exploit properties of these principled probabilistic projection algorithms to develop a guided mixture of local experts algorithm which provides robust prediction and a model to estimate feature saliency simultaneously with the training of a projection algorithm.Local models are useful since a single global model cannot capture the full variability of a heterogeneous data space such as the chemical space. Probabilistic hierarchical visualisation techniques provide an effective soft segmentation of an input space by a visualisation hierarchy whose leaf nodes represent different regions of the input space. We use this soft segmentation to develop a guided mixture of local experts (GME) algorithm which is appropriate for the heterogeneous datasets found in chemoinformatics problems. Moreover, in this approach the domain experts are more involved in the model development process which is suitable for an intuition and domain knowledge driven task such as drug discovery. We also derive a generative topographic mapping (GTM) based data visualisation approach which estimates feature saliency simultaneously with the training of a visualisation model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To date, more than 16 million citations of published articles in biomedical domain are available in the MEDLINE database. These articles describe the new discoveries which accompany a tremendous development in biomedicine during the last decade. It is crucial for biomedical researchers to retrieve and mine some specific knowledge from the huge quantity of published articles with high efficiency. Researchers have been engaged in the development of text mining tools to find knowledge such as protein-protein interactions, which are most relevant and useful for specific analysis tasks. This chapter provides a road map to the various information extraction methods in biomedical domain, such as protein name recognition and discovery of protein-protein interactions. Disciplines involved in analyzing and processing unstructured-text are summarized. Current work in biomedical information extracting is categorized. Challenges in the field are also presented and possible solutions are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The World Wide Web provides plentiful contents for Web-based learning, but its hyperlink-based architecture connects Web resources for browsing freely rather than for effective learning. To support effective learning, an e-learning system should be able to discover and make use of the semantic communities and the emerging semantic relations in a dynamic complex network of learning resources. Previous graph-based community discovery approaches are limited in ability to discover semantic communities. This paper first suggests the Semantic Link Network (SLN), a loosely coupled semantic data model that can semantically link resources and derive out implicit semantic links according to a set of relational reasoning rules. By studying the intrinsic relationship between semantic communities and the semantic space of SLN, approaches to discovering reasoning-constraint, rule-constraint, and classification-constraint semantic communities are proposed. Further, the approaches, principles, and strategies for discovering emerging semantics in dynamic SLNs are studied. The basic laws of the semantic link network motion are revealed for the first time. An e-learning environment incorporating the proposed approaches, principles, and strategies to support effective discovery and learning is suggested.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In current organizations, valuable enterprise knowledge is often buried under rapidly expanding huge amount of unstructured information in the form of web pages, blogs, and other forms of human text communications. We present a novel unsupervised machine learning method called CORDER (COmmunity Relation Discovery by named Entity Recognition) to turn these unstructured data into structured information for knowledge management in these organizations. CORDER exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We discuss the problems associated with evaluating unsupervised learners and report our initial evaluation experiments in an expert evaluation, a quantitative benchmarking, and an application of CORDER in a social networking tool called BuddyFinder.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Discovering who works with whom, on which projects and with which customers is a key task in knowledge management. Although most organizations keep models of organizational structures, these models do not necessarily accurately reflect the reality on the ground. In this paper we present a text mining method called CORDER which first recognizes named entities (NEs) of various types from Web pages, and then discovers relations from a target NE to other NEs which co-occur with it. We evaluated the method on our departmental Website. We used the CORDER method to first find related NEs of four types (organizations, people, projects, and research areas) from Web pages on the Website and then rank them according to their co-occurrence with each of the people in our department. 20 representative people were selected and each of them was presented with ranked lists of each type of NE. Each person specified whether these NEs were related to him/her and changed or confirmed their rankings. Our results indicate that the method can find the NEs with which these people are closely related and provide accurate rankings.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ageing is accompanied by many visible characteristics. Other biological and physiological markers are also well-described e.g. loss of circulating sex hormones and increased inflammatory cytokines. Biomarkers for healthy ageing studies are presently predicated on existing knowledge of ageing traits. The increasing availability of data-intensive methods enables deep-analysis of biological samples for novel biomarkers. We have adopted two discrete approaches in MARK-AGE Work Package 7 for biomarker discovery; (1) microarray analyses and/or proteomics in cell systems e.g. endothelial progenitor cells or T cell ageing including a stress model; and (2) investigation of cellular material and plasma directly from tightly-defined proband subsets of different ages using proteomic, transcriptomic and miR array. The first approach provided longitudinal insight into endothelial progenitor and T cell ageing.This review describes the strategy and use of hypothesis-free, data-intensive approaches to explore cellular proteins, miR, mRNA and plasma proteins as healthy ageing biomarkers, using ageing models and directly within samples from adults of different ages. It considers the challenges associated with integrating multiple models and pilot studies as rational biomarkers for a large cohort study. From this approach, a number of high-throughput methods were developed to evaluate novel, putative biomarkers of ageing in the MARK-AGE cohort.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Entrepreneurial opportunity recognition is an increasingly prevalent phenomenon. Of particular interest is the ability of promising technology based ventures to recognize and exploit opportunities. Recent research drawing on the Austrian economic theory emphasizes the importance of knowledge, particularly market knowledge, behind opportunity recognition. While insightful, this research has tended to overlook those interrelationships that exist between different types of knowledge (technology and market knowledge) as well as between a firm’s knowledge base and its entrepreneurial orientation. Additional shortfalls of prior research include the ambiguous definitions provided for entrepreneurial opportunities, oversight of opportunity exploitation with an extensive focus on opportunity recognition only, and the lack of quantitative, empirical evidence on entrepreneurial opportunity recognition. ^ In this dissertation, these research gaps are addressed by integrating Schumpeterian opportunity development view with a Kirznerian opportunity discovery theory as well as insights from literature on entrepreneurial orientation. A sample of 85 new biotechnology ventures from the United States, Finland, and Sweden was analyzed. While leaders in all 85 companies were interviewed for the research in 2003-2004, 42 firms provided data in 2007. Data was analyzed using regression analysis. ^ The results show the value and importance of early market knowledge and technology knowledge as well as an entrepreneurial company posture for subsequent opportunity recognition. The highest numbers of new opportunities are recognized in firms where high levels of market knowledge are combined with high levels of technology knowledge (measured with a number of patents). A firm’s entrepreneurial orientation also enhances its opportunity recognition. Furthermore, the results show that new ventures with more market knowledge are able to gather more equity investments, license out more technologies, and achieve higher sales than new ventures with lower levels of market knowledge. Overall, the findings of this dissertation help further our understanding of the sources of entrepreneurial opportunities, and should encourage further research in this area. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The increasing amount of available semistructured data demands efficient mechanisms to store, process, and search an enormous corpus of data to encourage its global adoption. Current techniques to store semistructured documents either map them to relational databases, or use a combination of flat files and indexes. These two approaches result in a mismatch between the tree-structure of semistructured data and the access characteristics of the underlying storage devices. Furthermore, the inefficiency of XML parsing methods has slowed down the large-scale adoption of XML into actual system implementations. The recent development of lazy parsing techniques is a major step towards improving this situation, but lazy parsers still have significant drawbacks that undermine the massive adoption of XML. ^ Once the processing (storage and parsing) issues for semistructured data have been addressed, another key challenge to leverage semistructured data is to perform effective information discovery on such data. Previous works have addressed this problem in a generic (i.e. domain independent) way, but this process can be improved if knowledge about the specific domain is taken into consideration. ^ This dissertation had two general goals: The first goal was to devise novel techniques to efficiently store and process semistructured documents. This goal had two specific aims: We proposed a method for storing semistructured documents that maps the physical characteristics of the documents to the geometrical layout of hard drives. We developed a Double-Lazy Parser for semistructured documents which introduces lazy behavior in both the pre-parsing and progressive parsing phases of the standard Document Object Model’s parsing mechanism. ^ The second goal was to construct a user-friendly and efficient engine for performing Information Discovery over domain-specific semistructured documents. This goal also had two aims: We presented a framework that exploits the domain-specific knowledge to improve the quality of the information discovery process by incorporating domain ontologies. We also proposed meaningful evaluation metrics to compare the results of search systems over semistructured documents. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Entrepreneurial opportunity recognition is an increasingly prevalent phenomenon. Of particular interest is the ability of promising technology based ventures to recognize and exploit opportunities. Recent research drawing on the Austrian economic theory emphasizes the importance of knowledge, particularly market knowledge, behind opportunity recognition. While insightful, this research has tended to overlook those interrelationships that exist between different types of knowledge (technology and market knowledge) as well as between a firm’s knowledge base and its entrepreneurial orientation. Additional shortfalls of prior research include the ambiguous definitions provided for entrepreneurial opportunities, oversight of opportunity exploitation with an extensive focus on opportunity recognition only, and the lack of quantitative, empirical evidence on entrepreneurial opportunity recognition. In this dissertation, these research gaps are addressed by integrating Schumpeterian opportunity development view with a Kirznerian opportunity discovery theory as well as insights from literature on entrepreneurial orientation. A sample of 85 new biotechnology ventures from the United States, Finland, and Sweden was analyzed. While leaders in all 85 companies were interviewed for the research in 2003-2004, 42 firms provided data in 2007. Data was analyzed using regression analysis. The results show the value and importance of early market knowledge and technology knowledge as well as an entrepreneurial company posture for subsequent opportunity recognition. The highest numbers of new opportunities are recognized in firms where high levels of market knowledge are combined with high levels of technology knowledge (measured with a number of patents). A firm’s entrepreneurial orientation also enhances its opportunity recognition. Furthermore, the results show that new ventures with more market knowledge are able to gather more equity investments, license out more technologies, and achieve higher sales than new ventures with lower levels of market knowledge. Overall, the findings of this dissertation help further our understanding of the sources of entrepreneurial opportunities, and should encourage further research in this area.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The increasing amount of available semistructured data demands efficient mechanisms to store, process, and search an enormous corpus of data to encourage its global adoption. Current techniques to store semistructured documents either map them to relational databases, or use a combination of flat files and indexes. These two approaches result in a mismatch between the tree-structure of semistructured data and the access characteristics of the underlying storage devices. Furthermore, the inefficiency of XML parsing methods has slowed down the large-scale adoption of XML into actual system implementations. The recent development of lazy parsing techniques is a major step towards improving this situation, but lazy parsers still have significant drawbacks that undermine the massive adoption of XML. Once the processing (storage and parsing) issues for semistructured data have been addressed, another key challenge to leverage semistructured data is to perform effective information discovery on such data. Previous works have addressed this problem in a generic (i.e. domain independent) way, but this process can be improved if knowledge about the specific domain is taken into consideration. This dissertation had two general goals: The first goal was to devise novel techniques to efficiently store and process semistructured documents. This goal had two specific aims: We proposed a method for storing semistructured documents that maps the physical characteristics of the documents to the geometrical layout of hard drives. We developed a Double-Lazy Parser for semistructured documents which introduces lazy behavior in both the pre-parsing and progressive parsing phases of the standard Document Object Model's parsing mechanism. The second goal was to construct a user-friendly and efficient engine for performing Information Discovery over domain-specific semistructured documents. This goal also had two aims: We presented a framework that exploits the domain-specific knowledge to improve the quality of the information discovery process by incorporating domain ontologies. We also proposed meaningful evaluation metrics to compare the results of search systems over semistructured documents.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tsar Peter the Great ruled Russia between 1689 and 1725. Its domains, stretching from the Baltic Sea in the west to the Pacific Ocean in the east. From north to south, its empire stretching from the Arctic Ocean to the borders with China and India. Tsar Peter I tried to extend the geographical knowledge of his government and the rest of the world. He was also interested in the expansion of trade in Russia and in the control of trade routes. Feodor Luzhin and Ivan Yeverinov explored the eastern border of the Russian Empire, the trip between 1719 and 1721 and reported to the Tsar. They had crossed the peninsula of Kamchatka, from west to east and had traveled from the west coast of Kamchatka to the Kuril Islands. The information collected led to the first map of Kamchatka and the Kuril Islands. Tsar Peter ordered Bering surf the Russian Pacific coast, build ships and sail the seas north along the coast to regions of America. The second expedition found equal to those of the previous explorers difficulties. Two ships were eventually thrown away in Okhotsk in 1740. The explorers spent the winter of 1740-1741 stockpiling supplies and then navigate to Petropavlovsk. The two ships sailed eastward and did together until June 20, then separated by fog. After searching Chirikov and his boat for several days, Bering ordered the San Pedro continue to the northeast. There the Russian sailors first sighted Alaska. According to the log, "At 12:30 (pm July 17) in sight of snow-capped mountains and between them a high volcano." This finding came the day of St. Elijah and so named the mountain.