Existe una cantidad enorme de información en Internet acerca de incontables temas, y cada día esta información se expande más y más. En teoría, los programas informáticos podrían beneficiarse de esta gran cantidad de información disponible para establecer nuevas conexiones entre conceptos, pero esta información a menudo aparece en formatos no estructurados como texto en lenguaje natural. Por esta razón, es muy importante conseguir obtener automáticamente información de fuentes de diferentes tipos, procesarla, filtrarla y enriquecerla, para lograr maximizar el conocimiento que podemos obtener de Internet. Este proyecto consta de dos partes diferentes. En la primera se explora el filtrado de información. La entrada del sistema consiste en una serie de tripletas proporcionadas por la Universidad de Coimbra (ellos obtuvieron las tripletas mediante un proceso de extracción de información a partir de texto en lenguaje natural). Sin embargo, debido a la complejidad de la tarea de extracción, algunas de las tripletas son de dudosa calidad y necesitan pasar por un proceso de filtrado. Dadas estas tripletas acerca de un tema concreto, la entrada será estudiada para averiguar qué información es relevante al tema y qué información debe ser descartada. Para ello, la entrada será comparada con una fuente de conocimiento online. En la segunda parte de este proyecto, se explora el enriquecimiento de información. Se emplean diferentes fuentes de texto online escritas en lenguaje natural (en inglés) y se extrae información de ellas que pueda ser relevante al tema especificado. Algunas de estas fuentes de conocimiento están escritas en inglés común, y otras están escritas en inglés simple, un subconjunto controlado del lenguaje que consta de vocabulario reducido y estructuras sintácticas más simples. Se estudia cómo esto afecta a la calidad de las tripletas extraídas, y si la información obtenida de fuentes escritas en inglés simple es de una calidad superior a aquella extraída de fuentes en inglés común.
Com a expansão da Televisão Digital e a convergência entre os meios de difusão convencionais e a televisão sobre IP, o número de canais disponíveis tem aumentado de forma gradual colocando o espectador numa situação de difícil escolha quanto ao programa a visionar. Sobrecarregados com uma grande quantidade de programas e informação associada, muitos espectadores desistem sistematicamente de ver um programa e tendem a efectuar zapping entre diversos canais ou a assistir sempre aos mesmos programas ou canais. Diante deste problema de sobrecarga de informação, os sistemas de recomendação apresentam-se como uma solução. Nesta tese pretende estudar-se algumas das soluções existentes dos sistemas de recomendação de televisão e desenvolver uma aplicação que permita a recomendação de um conjunto de programas que representem potencial interesse ao espectador. São abordados os principais conceitos da área dos algoritmos de recomendação e apresentados alguns dos sistemas de recomendação de programas de televisão desenvolvidos até à data. Para realizar as recomendações foram desenvolvidos dois algoritmos baseados respectivamente em técnicas de filtragem colaborativa e de filtragem de conteúdo. Estes algoritmos permitem através do cálculo da similaridade entre itens ou utilizadores realizar a predição da classificação que um utilizador atribuiria a um determinado item (programa de televisão, filme, etc.). Desta forma é possível avaliar o nível de potencial interesse que o utilizador terá em relação ao respectivo item. Os conjuntos de dados que descrevem as características dos programas (título, género, actores, etc.) são armazenados de acordo com a norma TV-Anytime. Esta norma de descrição de conteúdo multimédia apresenta a vantagem de ser especificamente vocacionada para conteúdo audiovisual e está disponível livremente. O conjunto de recomendações obtidas é apresentado ao utilizador através da interacção com uma aplicação Web que permite a integração de todos os componentes do sistema. Para validação do trabalho foi considerado um dataset de teste designado de htrec2011-movielens-2k e cujo conteúdo corresponde a um conjunto de filmes classificados por diversos utilizadores num ambiente real. Este conjunto de filmes possui, para além da classificações atribuídas pelos utilizadores, um conjunto de dados que descrevem o género, directores, realizadores e país de origem. Para validação final do trabalho foram realizados diversos testes dos quais o mais relevante correspondeu à avaliação da distância entre predições e valores reais e cujo objectivo é classificar a capacidade dos algoritmos desenvolvidos preverem com precisão as classificações que os utilizadores atribuiriam aos itens analisados.
he expansion of Digital Television and the convergence between conventional broadcasting and television over IP contributed to the gradual increase of the number of available channels and on demand video content. Moreover, the dissemination of the use of mobile devices like laptops, smartphones and tablets on everyday activities resulted in a shift of the traditional television viewing paradigm from the couch to everywhere, anytime from any device. Although this new scenario enables a great improvement in viewing experiences, it also brings new challenges given the overload of information that the viewer faces. Recommendation systems stand out as a possible solution to help a watcher on the selection of the content that best fits his/her preferences. This paper describes a web based system that helps the user navigating on broadcasted and online television content by implementing recommendations based on collaborative and content based filtering. The algorithms developed estimate the similarity between items and users and predict the rating that a user would assign to a particular item (television program, movie, etc.). To enable interoperability between different systems, programs characteristics (title, genre, actors, etc.) are stored according to the TV-Anytime standard. The set of recommendations produced are presented through a Web Application that allows the user to interact with the system based on the obtained recommendations.
Due to the large amount of television content, which emerged from the Digital TV, viewers are facing a new challenge, how to find interesting content intuitively and efficiently. The Personalized Electronic Programming Guides (pEPG) arise as an answer to this complex challenge. We propose TrendTV a layered architecture that allows the formation of social networks among viewers of Interactive Digital TV based on online microblogging. Associated with a pEPG, this social network allows the viewer to perform content filtering on a particular subject from the indications made by other viewers of his network. Allowing the viewer to create his own indications for a particular content when it is displayed, or to analyze the importance of a particular program online, based on these indications. This allows any user to perform filtering on content and generate or exchange information with other users in a flexible and transparent way, using several different devices (TVs, Smartphones, Tablets or PCs). Moreover, this architecture defines a mechanism to perform the automatic exchange of channels based on the best program that is showing at the moment, suggesting new components to be added to the middleware of the Brazilian Digital TV System (Ginga). The result is a constructed and dynamic database containing the classification of several TV programs as well as an application to automatically switch to the best channel of the moment
Machine (and deep) learning technologies are more and more present in several fields. It is undeniable that many aspects of our society are empowered by such technologies: web searches, content filtering on social networks, recommendations on e-commerce websites, mobile applications, etc., in addition to academic research. Moreover, mobile devices and internet sites, e.g., social networks, support the collection and sharing of information in real time. The pervasive deployment of the aforementioned technological instruments, both hardware and software, has led to the production of huge amounts of data. Such data has become more and more unmanageable, posing challenges to conventional computing platforms, and paving the way to the development and widespread use of the machine and deep learning. Nevertheless, machine learning is not only a technology. Given a task, machine learning is a way of proceeding (a way of thinking), and as such can be approached from different perspectives (points of view). This, in particular, will be the focus of this research. The entire work concentrates on machine learning, starting from different sources of data, e.g., signals and images, applied to different domains, e.g., Sport Science and Social History, and analyzed from different perspectives: from a non-data scientist point of view through tools and platforms; setting a problem stage from scratch; implementing an effective application for classification tasks; improving user interface experience through Data Visualization and eXtended Reality. In essence, not only in a quantitative task, not only in a scientific environment, and not only from a data-scientist perspective, machine (and deep) learning can do the difference.
The number of research papers available today is growing at a staggering rate, generating a huge amount of information that people cannot keep up with. According to a tendency indicated by the United States’ National Science Foundation, more than 10 million new papers will be published in the next 20 years. Because most of these papers will be available on the Web, this research focus on exploring issues on recommending research papers to users, in order to directly lead users to papers of their interest. Recommender systems are used to recommend items to users among a huge stream of available items, according to users’ interests. This research focuses on the two most prevalent techniques to date, namely Content-Based Filtering and Collaborative Filtering. The first explores the text of the paper itself, recommending items similar in content to the ones the user has rated in the past. The second explores the citation web existing among papers. As these two techniques have complementary advantages, we explored hybrid approaches to recommending research papers. We created standalone and hybrid versions of algorithms and evaluated them through both offline experiments on a database of 102,295 papers, and an online experiment with 110 users. Our results show that the two techniques can be successfully combined to recommend papers. The coverage is also increased at the level of 100% in the hybrid algorithms. In addition, we found that different algorithms are more suitable for recommending different kinds of papers. Finally, we verified that users’ research experience influences the way users perceive recommendations. In parallel, we found that there are no significant differences in recommending papers for users from different countries. However, our results showed that users’ interacting with a research paper Recommender Systems are much happier when the interface is presented in the user’s native language, regardless the language that the papers are written. Therefore, an interface should be tailored to the user’s mother language.
Traditional content-based filtering methods usually utilize text extraction and classification techniques for building user profiles as well as for representations of contents, i.e. item profiles. These methods have some disadvantages e.g. mismatch between user profile terms and item profile terms, leading to low performance. Some of the disadvantages can be overcome by incorporating a common ontology which enables representing both the users' and the items' profiles with concepts taken from the same vocabulary. We propose a new content-based method for filtering and ranking the relevancy of items for users, which utilizes a hierarchical ontology. The method measures the similarity of the user's profile to the items' profiles, considering the existing of mutual concepts in the two profiles, as well as the existence of "related" concepts, according to their position in the ontology. The proposed filtering algorithm computes the similarity between the users' profiles and the items' profiles, and rank-orders the relevant items according to their relevancy to each user. The method is being implemented in ePaper, a personalized electronic newspaper project, utilizing a hierarchical ontology designed specifically for classification of News items. It can, however, be utilized in other domains and extended to other ontologies.
Frame rate upconversion (FRUC) is an important post-processing technique to enhance the visual quality of low frame rate video. A major, recent advance in this area is FRUC based on trilateral filtering which novelty mainly derives from the combination of an edge-based motion estimation block matching criterion with the trilateral filter. However, there is still room for improvement, notably towards reducing the size of the uncovered regions in the initial estimated frame, this means the estimated frame before trilateral filtering. In this context, proposed is an improved motion estimation block matching criterion where a combined luminance and edge error metric is weighted according to the motion vector components, notably to regularise the motion field. Experimental results confirm that significant improvements are achieved for the final interpolated frames, reaching PSNR gains up to 2.73 dB, on average, regarding recent alternative solutions, for video content with varied motion characteristics.
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
A common problem in video surveys in very shallow waters is the presence of strong light fluctuations, due to sun light refraction. Refracted sunlight casts fast moving patterns, which can significantly degrade the quality of the acquired data. Motivated by the growing need to improve the quality of shallow water imagery, we propose a method to remove sunlight patterns in video sequences. The method exploits the fact that video sequences allow several observations of the same area of the sea floor, over time. It is based on computing the image difference between a given reference frame and the temporal median of a registered set of neighboring images. A key observation is that this difference will have two components with separable spectral content. One is related to the illumination field (lower spatial frequencies) and the other to the registration error (higher frequencies). The illumination field, recovered by lowpass filtering, is used to correct the reference image. In addition to removing the sunflickering patterns, an important advantage of the approach is the ability to preserve the sharpness in corrected image, even in the presence of registration inaccuracies. The effectiveness of the method is illustrated in image sets acquired under strong camera motion containing non-rigid benthic structures. The results testify the good performance and generality of the approach
Speckle noise formed as a result of the coherent nature of ultrasound imaging affects the lesion detectability. We have proposed a new weighted linear filtering approach using Local Binary Patterns (LBP) for reducing the speckle noise in ultrasound images. The new filter achieves good results in reducing the noise without affecting the image content. The performance of the proposed filter has been compared with some of the commonly used denoising filters. The proposed filter outperforms the existing filters in terms of quantitative analysis and in edge preservation. The experimental analysis is done using various ultrasound images
Four-dimensional variational data assimilation (4D-Var) combines the information from a time sequence of observations with the model dynamics and a background state to produce an analysis. In this paper, a new mathematical insight into the behaviour of 4D-Var is gained from an extension of concepts that are used to assess the qualitative information content of observations in satellite retrievals. It is shown that the 4D-Var analysis increments can be written as a linear combination of the singular vectors of a matrix which is a function of both the observational and the forecast model systems. This formulation is used to consider the filtering and interpolating aspects of 4D-Var using idealized case-studies based on a simple model of baroclinic instability. The results of the 4D-Var case-studies exhibit the reconstruction of the state in unobserved regions as a consequence of the interpolation of observations through time. The results also exhibit the filtering of components with small spatial scales that correspond to noise, and the filtering of structures in unobserved regions. The singular vector perspective gives a very clear view of this filtering and interpolating by the 4D-Var algorithm and shows that the appropriate specification of the a priori statistics is vital to extract the largest possible amount of useful information from the observations. Copyright © 2005 Royal Meteorological Society
There are still major challenges in the area of automatic indexing and retrieval of digital data. The main problem arises from the ever increasing mass of digital media and the lack of efficient methods for indexing and retrieval of such data based on the semantic content rather than keywords. To enable intelligent web interactions or even web filtering, we need to be capable of interpreting the information base in an intelligent manner. Research has been ongoing for a few years in the field of ontological engineering with the aim of using ontologies to add knowledge to information. In this paper we describe the architecture of a system designed to automatically and intelligently index huge repositories of special effects video clips, based on their semantic content, using a network of scalable ontologies to enable intelligent retrieval.