948 resultados para automatic content extraction
Resumo:
Information and content integration are believed to be a possible solution to the problem of information overload in the Internet. The article is an overview of a simple solution for integration of information and content on the Web. Previous approaches to content extraction and integration are discussed, followed by introduction of a novel technology to deal with the problems, based on XML processing. The article includes lessons learned from solving issues of changing webpage layout, incompatibility with HTML standards and multiplicity of the results returned. The method adopting relative XPath queries over DOM tree proves to be more robust than previous approaches to Web information integration. Furthermore, the prototype implementation demonstrates the simplicity that enables non-professional users to easily adopt this approach in their day-to-day information management routines.
Resumo:
Automated feature extraction and correspondence determination is an extremely important problem in the face recognition community as it often forms the foundation of the normalisation and database construction phases of many recognition and verification systems. This paper presents a completely automatic feature extraction system based upon a modified volume descriptor. These features form a stable descriptor for faces and are utilised in a reversible jump Markov chain Monte Carlo correspondence algorithm to automatically determine correspondences which exist between faces. The developed system is invariant to changes in pose and occlusion and results indicate that it is also robust to minor face deformations which may be present with variations in expression.
Resumo:
A building information model (BIM) provides a rich representation of a building's design. However, there are many challenges in getting construction-specific information from a BIM, limiting the usability of BIM for construction and other downstream processes. This paper describes a novel approach that utilizes ontology-based feature modeling, automatic feature extraction based on ifcXML, and query processing to extract information relevant to construction practitioners from a given BIM. The feature ontology generically represents construction-specific information that is useful for a broad range of construction management functions. The software prototype uses the ontology to transform the designer-focused BIM into a construction-specific feature-based model (FBM). The formal query methods operate on the FBM to further help construction users to quickly extract the necessary information from a BIM. Our tests demonstrate that this approach provides a richer representation of construction-specific information compared to existing BIM tools.
Resumo:
This paper describes a preprocessing module for improving the performance of a Spanish into Spanish Sign Language (Lengua de Signos Espanola: LSE) translation system when dealing with sparse training data. This preprocessing module replaces Spanish words with associated tags. The list with Spanish words (vocabulary) and associated tags used by this module is computed automatically considering those signs that show the highest probability of being the translation of every Spanish word. This automatic tag extraction has been compared to a manual strategy achieving almost the same improvement. In this analysis, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not assigned to any sign. The preprocessing module has been incorporated into two well-known statistical translation architectures: a phrase-based system and a Statistical Finite State Transducer (SFST). This system has been developed for a specific application domain: the renewal of Identity Documents and Driver's License. In order to evaluate the system a parallel corpus made up of 4080 Spanish sentences and their LSE translation has been used. The evaluation results revealed a significant performance improvement when including this preprocessing module. In the phrase-based system, the proposed module has given rise to an increase in BLEU (Bilingual Evaluation Understudy) from 73.8% to 81.0% and an increase in the human evaluation score from 0.64 to 0.83. In the case of SFST, BLEU increased from 70.6% to 78.4% and the human evaluation score from 0.65 to 0.82.
Resumo:
Humans have a high ability to extract visual data information acquired by sight. Trought a learning process, which starts at birth and continues throughout life, image interpretation becomes almost instinctively. At a glance, one can easily describe a scene with reasonable precision, naming its main components. Usually, this is done by extracting low-level features such as edges, shapes and textures, and associanting them to high level meanings. In this way, a semantic description of the scene is done. An example of this, is the human capacity to recognize and describe other people physical and behavioral characteristics, or biometrics. Soft-biometrics also represents inherent characteristics of human body and behaviour, but do not allow unique person identification. Computer vision area aims to develop methods capable of performing visual interpretation with performance similar to humans. This thesis aims to propose computer vison methods which allows high level information extraction from images in the form of soft biometrics. This problem is approached in two ways, unsupervised and supervised learning methods. The first seeks to group images via an automatic feature extraction learning , using both convolution techniques, evolutionary computing and clustering. In this approach employed images contains faces and people. Second approach employs convolutional neural networks, which have the ability to operate on raw images, learning both feature extraction and classification processes. Here, images are classified according to gender and clothes, divided into upper and lower parts of human body. First approach, when tested with different image datasets obtained an accuracy of approximately 80% for faces and non-faces and 70% for people and non-person. The second tested using images and videos, obtained an accuracy of about 70% for gender, 80% to the upper clothes and 90% to lower clothes. The results of these case studies, show that proposed methods are promising, allowing the realization of automatic high level information image annotation. This opens possibilities for development of applications in diverse areas such as content-based image and video search and automatica video survaillance, reducing human effort in the task of manual annotation and monitoring.
Resumo:
The methodology of extracting information from texts has widely been described in the current literature. However, the methodology has been developed mainly for the purposes of other fields than terminology science. In addition, the research has been English language oriented. Therefore, there are no satisfactory language-independent methods for extracting terminological information from texts. The aim of the present study is to form the basis for a further improvement of methods for extraction of terminological information. A further aim is to determine differences in term extraction between subject groups with or without knowledge of the special field in question. The study is based on the theory of terminology, and has mainly a qualitative approach. The research material consists of electronically readable specialized texts in the subject domain of maritime safety. Textbooks, conference papers, research reports and articles from professional journals in Finnish and in Russian are included. The thesis first deals with certain term extraction methods. These are manual term identification and semi-automatic term extraction, the latter of which was carried out by using three commercial computer programs. The results of term extraction were compared and the recall and precision of the methods were evaluated. The latter part of the study is dedicated to the identification of concept relations. Certain linguistic expressions, which some researchers call knowledge probes, were applied to identify concept relations. The results of the present thesis suggest that special field knowledge is an advantage in manual term identification. However, in the candidate term lists the variation between subject groups was not as remarkable as it was between individual subjects. The term extraction software tested here produces candidate term lists which can be useful, but only after some manual work. Therefore, the work emphasizes the need to further develop term extraction software. Furthermore, the analyses indicate that there are a certain number of terms which were extracted by all the subjects and the software. These terms we call core terms. As the result of the experiment on linguistic expressions which signal concept relations, a proposal of Finnish and Russian knowledge probes in the field of maritime safety was made. The main finding was that it would be useful to combine the use of knowledge probes with semi-automatic term extraction since knowledge probes usually occur in the vicinity of terms.
Resumo:
The most difficult operation in the flood inundation mapping using optical flood images is to separate fully inundated areas from the ‘wet’ areas where trees and houses are partly covered by water. This can be referred as a typical problem the presence of mixed pixels in the images. A number of automatic information extraction image classification algorithms have been developed over the years for flood mapping using optical remote sensing images. Most classification algorithms generally, help in selecting a pixel in a particular class label with the greatest likelihood. However, these hard classification methods often fail to generate a reliable flood inundation mapping because the presence of mixed pixels in the images. To solve the mixed pixel problem advanced image processing techniques are adopted and Linear Spectral unmixing method is one of the most popular soft classification technique used for mixed pixel analysis. The good performance of linear spectral unmixing depends on two important issues, those are, the method of selecting endmembers and the method to model the endmembers for unmixing. This paper presents an improvement in the adaptive selection of endmember subset for each pixel in spectral unmixing method for reliable flood mapping. Using a fixed set of endmembers for spectral unmixing all pixels in an entire image might cause over estimation of the endmember spectra residing in a mixed pixel and hence cause reducing the performance level of spectral unmixing. Compared to this, application of estimated adaptive subset of endmembers for each pixel can decrease the residual error in unmixing results and provide a reliable output. In this current paper, it has also been proved that this proposed method can improve the accuracy of conventional linear unmixing methods and also easy to apply. Three different linear spectral unmixing methods were applied to test the improvement in unmixing results. Experiments were conducted in three different sets of Landsat-5 TM images of three different flood events in Australia to examine the method on different flooding conditions and achieved satisfactory outcomes in flood mapping.
Resumo:
Multiple flame-flame interactions in premixed combustion are investigated using direct numerical simulations of twin turbulent V-flames for a range of turbulence intensities and length scales. Interactions are identified using a novel automatic feature extraction (AFE) technique, based on data registration using the dual-tree complex wavelet transform. Information on the time, position, and type of interactions, and their influence on the flame area is extracted using AFE. Characteristic length and time scales for the interactions are identified. The effect of interactions on the flame brush is quantified through a global stretch rate, defined as the sum of flamelet stretch and interaction stretch contributions. The effects of each interaction type are discussed. It is found that the magnitude of the fluctuations in flamelet and interaction stretch are comparable, and a qualitative sensitivity to turbulence length scale is found for one interaction type. Implications for modeling are discussed. © 2013 Copyright Taylor and Francis Group, LLC.
Resumo:
The influence of Lewis number on turbulent premixed flame interactions is investigated using automatic feature extraction (AFE) applied to high-resolution flame simulation data. Premixed turbulent twin V-flames under identical turbulence conditions are simulated at global Lewis numbers of 0.4, 0.8, 1.0, and 1.2. Information on the position, frequency, and magnitude of the interactions is compared, and the sensitivity of the results to sample interval is discussed. It is found that both the frequency and magnitude of normal type interactions increases with decreasing Lewis number. Counternormal type interactions become more likely as the Lewis number increases. The variation in both the frequency and the magnitude of the interactions is found to be caused by large-scale changes in flame wrinkling resulting from differences in the thermo-diffusive stability of the flames. During flame interactions, thermo-diffusive effects are found to be insignificant due to the separation of time scales. © 2013 Copyright Taylor and Francis Group, LLC.
Resumo:
Les travaux entrepris dans le cadre de la présente thèse portent sur l’analyse de l’équivalence terminologique en corpus parallèle et en corpus comparable. Plus spécifiquement, nous nous intéressons aux corpus de textes spécialisés appartenant au domaine du changement climatique. Une des originalités de cette étude réside dans l’analyse des équivalents de termes simples. Les bases théoriques sur lesquelles nous nous appuyons sont la terminologie textuelle (Bourigault et Slodzian 1999) et l’approche lexico-sémantique (L’Homme 2005). Cette étude poursuit deux objectifs. Le premier est d’effectuer une analyse comparative de l’équivalence dans les deux types de corpus afin de vérifier si l’équivalence terminologique observable dans les corpus parallèles se distingue de celle que l’on trouve dans les corpus comparables. Le deuxième consiste à comparer dans le détail les équivalents associés à un même terme anglais, afin de les décrire et de les répertorier pour en dégager une typologie. L’analyse détaillée des équivalents français de 343 termes anglais est menée à bien grâce à l’exploitation d’outils informatiques (extracteur de termes, aligneur de textes, etc.) et à la mise en place d’une méthodologie rigoureuse divisée en trois parties. La première partie qui est commune aux deux objectifs de la recherche concerne l’élaboration des corpus, la validation des termes anglais et le repérage des équivalents français dans les deux corpus. La deuxième partie décrit les critères sur lesquels nous nous appuyons pour comparer les équivalents des deux types de corpus. La troisième partie met en place la typologie des équivalents associés à un même terme anglais. Les résultats pour le premier objectif montrent que sur les 343 termes anglais analysés, les termes présentant des équivalents critiquables dans les deux corpus sont relativement peu élevés (12), tandis que le nombre de termes présentant des similitudes d’équivalence entre les corpus est très élevé (272 équivalents identiques et 55 équivalents non critiquables). L’analyse comparative décrite dans ce chapitre confirme notre hypothèse selon laquelle la terminologie employée dans les corpus parallèles ne se démarque pas de celle des corpus comparables. Les résultats pour le deuxième objectif montrent que de nombreux termes anglais sont rendus par plusieurs équivalents (70 % des termes analysés). Il est aussi constaté que ce ne sont pas les synonymes qui forment le groupe le plus important des équivalents, mais les quasi-synonymes. En outre, les équivalents appartenant à une autre partie du discours constituent une part importante des équivalents. Ainsi, la typologie élaborée dans cette thèse présente des mécanismes de l’équivalence terminologique peu décrits aussi systématiquement dans les travaux antérieurs.
Resumo:
This paper presents the results of the crowd image analysis challenge, as part of the PETS 2009 workshop. The evaluation is carried out using a selection of the metrics available in the Video Analysis and Content Extraction (VACE) program and the CLassification of Events, Activities, and Relationships (CLEAR) consortium. The evaluation highlights the strengths of the authors’ systems in areas such as precision, accuracy and robustness.
Resumo:
This paper presents the results of the crowd image analysis challenge of the Winter PETS 2009 workshop. The evaluation is carried out using a selection of the metrics developed in the Video Analysis and Content Extraction (VACE) program and the CLassification of Events, Activities, and Relationships (CLEAR) consortium [13]. The evaluation highlights the detection and tracking performance of the authors’systems in areas such as precision, accuracy and robustness. The performance is also compared to the PETS 2009 submitted results.
Resumo:
This paper presents the results of the crowd image analysis challenge of the PETS2010 workshop. The evaluation was carried out using a selection of the metrics developed in the Video Analysis and Content Extraction (VACE) program and the CLassification of Events, Activities, and Relationships (CLEAR) consortium. The PETS 2010 evaluation was performed using new ground truthing create from each independant two dimensional view. In addition, the performance of the submissions to the PETS 2009 and Winter-PETS 2009 were evaluated and included in the results. The evaluation highlights the detection and tracking performance of the authors’ systems in areas such as precision, accuracy and robustness.
Resumo:
This paper presents the PETS2009 outdoor crowd image analysis surveillance dataset and the performance evaluation of people counting, detection and tracking results using the dataset submitted to five IEEE Performance Evaluation of Tracking and Surveillance (PETS) workshops. The evaluation was carried out using well established metrics developed in the Video Analysis and Content Extraction (VACE) programme and the CLassification of Events, Activities, and Relationships (CLEAR) consortium. The comparative evaluation highlights the detection and tracking performance of the authors’ systems in areas such as precision, accuracy and robustness and provides a brief analysis of the metrics themselves to provide further insights into the performance of the authors’ systems.
Resumo:
An overview is given on the possibility of controlling the status of circuit breakers (CB) in a substations with the use of a knowledge base that relates some of the operation magnitudes, mixing status variables with time variables and fuzzy sets. It is shown that even when all the magnitudes to be controlled cannot be included in the analysis, it is possible to control the desired status while supervising some important magnitudes as the voltage, power factor, and harmonic distortion, as well as the present status.