6 resultados para Unstructured content search
em Universidad Politécnica de Madrid
Resumo:
En la realización de este proyecto se ha tratado principalmente la temática del web scraping sobre documentos HTML en Android. Como resultado del mismo, se ha propuesto una metodologÃa para poder realizar web scraping en aplicaciones implementadas para este sistema operativo y se desarrollará una aplicación basada en esta metodologÃa que resulte útil a los alumnos de la escuela. Web scraping se puede definir como una técnica basada en una serie de algoritmos de búsqueda de contenido con el fin de obtener una determinada información de páginas web, descartando aquella que no sea relevante. Como parte central, se ha dedicado bastante tiempo al estudio de los navegadores y servidores Web, y del lenguaje HTML presente en casi todas las páginas web en la actualidad asà como de los mecanismos utilizados para la comunicación entre cliente y servidor ya que son los pilares en los que se basa esta técnica. Se ha realizado un estudio de las técnicas y herramientas necesarias, aportándose todos los conceptos teóricos necesarios, asà como la proposición de una posible metodologÃa para su implementación. Finalmente se ha codificado la aplicación UPMdroid, desarrollada con el fin de ejemplificar la implementación de la metodologÃa propuesta anteriormente y a la vez desarrollar una aplicación cuya finalidad es brindar al estudiante de la ETSIST un soporte móvil en Android que le facilite el acceso y la visualización de aquellos datos más importantes del curso académico como son: el horario de clases y las calificaciones de las asignaturas en las que se matricule. Esta aplicación, además de implementar la metodologÃa propuesta, es una herramienta muy interesante para el alumno, ya que le permite utilizar de una forma sencilla e intuitiva gran número de funcionalidades de la escuela solucionando asà los problemas de visualización de contenido web en los dispositivos. ABSTRACT. The main topic of this project is about the web scraping over HTML documents on Android OS. As a result thereof, it is proposed a methodology to perform web scraping in deployed applications for this operating system and based on this methodology that is useful to the ETSIST school students. Web scraping can be defined as a technique based on a number of content search algorithms in order to obtain certain information from web pages, discarding those that are not relevant. As a main part, has spent considerable time studying browsers and Web servers, and the HTML language that is present today in almost all websites as well as the mechanisms used for communication between client and server because they are the pillars which this technique is based. We performed a study of the techniques and tools needed, providing all the necessary theoretical concepts, as well as the proposal of a possible methodology for implementation. Finally it has codified UPMdroid application, developed in order to illustrate the implementation of the previously proposed methodology and also to give the student a mobile ETSIST Android support to facilitate access and display those most important data of the current academic year such as: class schedules and scores for the subjects in which you are enrolled. This application, in addition to implement the proposed methodology is also a very interesting tool for the student, as it allows a simple and intuitive way of use these school functionalities thus fixing the viewing web content on devices.
Resumo:
After more than a decade of development work and hopes, the usage of mobile Internet has finally taken off. Now, we are witnessing the first signs of evidence of what might become the explosion of mobile content and applications that will be shaping the (mobile) Internet of the future. Similar to the wired Internet, search will become very relevant for the usage of mobile Internet. Current research on mobile search has applied a limited set of methodologies and has also generated a narrow outcome of meaningful results. This article covers new ground, exploring the use and visions of mobile search with a users' interview-based qualitative study. Its main conclusion builds upon the hypothesis that mobile search is sensitive to a mobile logic different than today's one. First, (advanced) users ask for accessing with their mobile devices the entire Internet, rather than subsections of it. Second, success is based on new added-value applications that exploit unique mobile functionalities. The authors interpret that such mobile logic involves fundamentally the use of personalised and context-based services.
Resumo:
It has taken more than a decade of intense technical and market developments for mobile Internet to take off as a mass phenomenon. And it has arrived with great intensity: an avalanche of mobile content and applications is now overrunning us. Similar to its wired counterpart, wireless Web users will continuously demand access to data and content in an efficient and user-friendly manner.
Resumo:
The emergence of cloud datacenters enhances the capability of online data storage. Since massive data is stored in datacenters, it is necessary to effectively locate and access interest data in such a distributed system. However, traditional search techniques only allow users to search images over exact-match keywords through a centralized index. These techniques cannot satisfy the requirements of content based image retrieval (CBIR). In this paper, we propose a scalable image retrieval framework which can efficiently support content similarity search and semantic search in the distributed environment. Its key idea is to integrate image feature vectors into distributed hash tables (DHTs) by exploiting the property of locality sensitive hashing (LSH). Thus, images with similar content are most likely gathered into the same node without the knowledge of any global information. For searching semantically close images, the relevance feedback is adopted in our system to overcome the gap between low-level features and high-level features. We show that our approach yields high recall rate with good load balance and only requires a few number of hops.
Resumo:
Automatic 2D-to-3D conversion is an important application for filling the gap between the increasing number of 3D displays and the still scant 3D content. However, existing approaches have an excessive computational cost that complicates its practical application. In this paper, a fast automatic 2D-to-3D conversion technique is proposed, which uses a machine learning framework to infer the 3D structure of a query color image from a training database with color and depth images. Assuming that photometrically similar images have analogous 3D structures, a depth map is estimated by searching the most similar color images in the database, and fusing the corresponding depth maps. Large databases are desirable to achieve better results, but the computational cost also increases. A clustering-based hierarchical search using compact SURF descriptors to characterize images is proposed to drastically reduce search times. A significant computational time improvement has been obtained regarding other state-of-the-art approaches, maintaining the quality results.
Resumo:
The first feasibility study of using dual-probe heated fiber optics with distributed temperature sensing to measure soil volumetric heat capacity and soil water content is presented. Although results using different combinations of cables demonstrate feasibility, further work is needed to gain accuracy, including a model to account for the finite dimension and the thermal influence of the probes. Implementation of the dual-probe heat-pulse (DPHP) approach for measurement of volumetric heat capacity (C) and water content (θ) with distributed temperature sensing heated fiber optic (FO) systems presents an unprecedented opportunity for environmental monitoring (e.g., simultaneous measurement at thousands of points). We applied uniform heat pulses along a FO cable and monitored the thermal response at adjacent cables. We tested the DPHP method in the laboratory using multiple FO cables at a range of spacings. The amplitude and phase shift in the heat signal with distance was found to be a function of the soil volumetric heat capacity. Estimations of C at a range of moisture contents (θ = 0.09– 0.34 m3 m−3) suggest the feasibility of measurement via responsiveness to the changes in θ, although we observed error with decreasing soil water contents (up to 26% at θ = 0.09 m3 m−3). Optimization will require further models to account for the finite radius and thermal influence of the FO cables. Although the results indicate that the method shows great promise, further study is needed to quantify the effects of soil type, cable spacing, and jacket configurations on accuracy.