15 resultados para Automatic term extraction

em Deakin Research Online - Australia


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sport video data is growing rapidly as a result of the maturing digital technologies that support digital video capture, faster data processing, and large storage. However, (1) semi-automatic content extraction and annotation, (2) scalable indexing model, and (3) effective retrieval and browsing, still pose the most challenging problems for maximizing the usage of large video databases. This article will present the findings from a comprehensive work that proposes a scalable and extensible sports video retrieval system with two major contributions in the area of sports video indexing and retrieval. The first contribution is a new sports video indexing model that utilizes semi-schema-based indexing scheme on top of an Object-Relationship approach. This indexing model is scalable and extensible as it enables gradual index construction which is supported by ongoing development of future content extraction algorithms. The second contribution is a set of novel queries which are based on XQuery to generate dynamic and user-oriented summaries and event structures. The proposed sports video retrieval system has been fully implemented and populated with soccer, tennis, swimming, and diving video. The system has been evaluated against 20 users to demonstrate and confirm its feasibility and benefits. The experimental sports genres were specifically selected to represent the four main categories of sports domain: period-, set-point-, time (race)-, and performance-based sports. Thus, the proposed system should be generic and robust for all types of sports.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, we propose a model for discovering frequent sequential patterns, phrases, which can be used as profile descriptors of documents. It is indubitable that we can obtain numerous phrases using data mining algorithms. However, it is difficult to use these phrases effectively for answering what users want. Therefore, we present a pattern taxonomy extraction model which performs the task of extracting descriptive frequent sequential patterns by pruning the meaningless ones. The model then is extended and tested by applying it to the information filtering system. The results of the experiment show that pattern-based methods outperform the keyword-based methods. The results also indicate that removal of meaningless patterns not only reduces the cost of computation but also improves the effectiveness of the system.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A system that could automatically extract abnormal lung regions may assist expert radiologists in verifying lung tissue abnormalities. This paper presents an automated lung nodule detection system consisting of five components: acquisition, pre-processing, background removal, detection, and false positives reduction. The system employs a combination of an ensemble classification and clustering methods. The performance of the developed system is compared against some existing counterparts. Based 011 the experimental results, the proposed system achieved a sensitivity of 100% and a false-positives/slice of 0.67 for 30 tested CT images.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper addresses the challenge of bridging the semantic gap that exists between the simplicity of features that can be currently computed in automated content indexing systems and the richness of semantics in user queries posed for media search and retrieval. It proposes a unique computational approach to extraction of expressive elements of motion pictures for deriving high-level semantics of stories portrayed, thus enabling rich video annotation and interpretation. This approach, motivated and directed by the existing cinematic conventions known as film grammar, as a first step toward demonstrating its effectiveness, uses the attributes of motion and shot length to define and compute a novel measure of tempo of a movie. Tempo flow plots are defined and derived for a number of full-length movies and edge analysis is performed leading to the extraction of dramatic story sections and events signaled by their unique tempo. The results confirm tempo as a useful high-level semantic construct in its own right and a promising component of others such as rhythm, tone or mood of a film. In addition to the development of this computable tempo measure, a study is conducted as to the usefulness of biasing it toward either of its constituents, namely, motion or shot length. Finally, a refinement is made to the shot length normalizing mechanism, driven by the peculiar characteristics of shot length distribution exhibited by movies. Results of these additional studies, and possible applications and limitations are discussed.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper proposes a unique computational approach to extraction of expressive elements of motion pictures for deriving high level semantics of stories portrayed, thus enabling better video annotation and interpretation systems. This approach, motivated and directed by the existing cinematic conventions known as film grammar, as a first step towards demonstrating its effectiveness, uses the attributes of motion and shot length to define and compute a novel measure of tempo of a movie. Tempo flow plots are defined and derived for four full-length movies and edge analysis is performed leading to the extraction of dramatic story sections and events signaled by their unique tempo. The results confirm tempo as a useful attribute in its own right and a promising component of semantic constructs such as tone or mood of a film.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To enable content-based retrieval, highlights extraction from broadcasted sport video has been an active research topic in the last decade. There is a well-known theory that high-level semantic, such as goal in soccer can be detected based on the occurrences of specific audio and visual features that can be extracted automatically. However, there is yet a definitive solution for the scope (i.e. start and end) of the detection for self consumable highlights. Thus, in this paper we will primarily demonstrate the benefits of using play-break for this purpose. Moreover, we also propose a browsing scheme that is based on integrated play-break and highlights (extended from [1]). To validate our approach, we will present the results from some experiments and a user study.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the last decade, the efforts of spoken language processing have achieved significant advances, however, the work with emotional recognition has not progressed so far, and can only achieve 50% to 60% in accuracy. This is because a majority of researchers in this field have focused on the synthesis of emotional speech rather than focusing on automating human emotion recognition. Many research groups have focused on how to improve the performance of the classifier they used for emotion recognition, and few work has been done on data pre-processing, such as the extraction and selection of a set of specifying acoustic features instead of using all the possible ones they had in hand. To work with well-selected acoustic features does not mean to delay the whole job, but this will save much time and resources by removing the irrelative information and reducing the high-dimension data calculation. In this paper, we developed an automatic feature selector based on a RF2TREE algorithm and the traditional C4.5 algorithm. RF2TREE applied here helped us to solve the problems that did not have enough data examples. The ensemble learning technique was applied to enlarge the original data set by building a bagged random forest to generate many virtual examples, and then the new data set was used to train a single decision tree, which selects the most efficient features to represent the speech signals for the emotion recognition. Finally, the output of the selector was a set of specifying acoustic features, produced by RF2TREE and a single decision tree.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Web data extraction systems are the kernel of information mediators between users and heterogeneous Web data resources. How to extract structured data from semi-structured documents has been a problem of active research. Supervised and unsupervised methods have been devised to learn extraction rules from training sets. However, trying to prepare training sets (especially to annotate them for supervised methods), is very time-consuming. We propose a framework for Web data extraction, which logged usersrsquo access history and exploit them to assist automatic training set generation. We cluster accessed Web documents according to their structural details; define criteria to measure the importance of sub-structures; and then generate extraction rules. We also propose a method to adjust the rules according to historical data. Our experiments confirm the viability of our proposal.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper addresses the challenge of bridging the semantic gap between the rich meaning users desire when they query to locate and browse media and the shallowness of media descriptions that can be computed in today's content management systems. To facilitate high-level semantics-based content annotation and interpretation, we tackle the problem of automatic decomposition of motion pictures into meaningful story units, namely scenes. Since a scene is a complicated and subjective concept, we first propose guidelines from fill production to determine when a scene change occurs. We then investigate different rules and conventions followed as part of Fill Grammar that would guide and shape an algorithmic solution for determining a scene. Two different techniques using intershot analysis are proposed as solutions in this paper. In addition, we present different refinement mechanisms, such as film-punctuation detection founded on Film Grammar, to further improve the results. These refinement techniques demonstrate significant improvements in overall performance. Furthermore, we analyze errors in the context of film-production techniques, which offer useful insights into the limitations of our method.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a set of computational features originating from our study of editing effects, motion, and color used in videos, for the task of automatic video categorization. These features besides representing human understanding of typical attributes of different video genres, are also inspired by the techniques and rules used by many directors to endow specific characteristics to a genre-program which lead to certain emotional impact on viewers. We propose new features whilst also employing traditionally used ones for classification. This research, goes beyond the existing work with a systematic analysis of trends exhibited by each of our features in genres such as cartoons, commercials, music, news, and sports, and it enables an understanding of the similarities, dissimilarities, and also likely confusion between genres. Classification results from our experiments on several hours of video establish the usefulness of this feature set. We also explore the issue of video clip duration required to achieve reliable genre identification and demonstrate its impact on classification accuracy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In order to enable high-level semantics-based video annotation and interpretation, we tackle the problem of automatic decomposition of motion pictures into meaningful story units, namely scenes. Since a scene is a complicated and subjective concept, we first propose guidelines from film production to determine when a scene change occurs in film. We examine different rules and conventions followed as part of Film Grammar to guide and shape our algorithmic solution for determining a scene boundary. Two different techniques are proposed as new solutions in this paper. Our experimental results on 10 full-length movies show that our technique based on shot sequence coherence performs well and reasonably better than the color edges-based approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An important area of recent research in forensic entomology has been the use of insect DNA to provide identification of insects for fast and accurate estimation of time since death. This requires DNA to be extracted efficiently and in a state suitable for use in molecular procedures, and then stored on a long-term basis. In this study, Whatman FTA™ cards were tested for use with the Calliphoridae (Diptera). In particular, testing examined their ability to effectively extract DNA from specimens, and store and provide DNA template in a suitable condition for amplification using the polymerase chain reaction (PCR). The cards provided DNA that was able to be amplified from a variety of life stages, and thus appears to be of sufficient quality and quantity for use in subsequent procedures. FTA cards therefore appear suitable for use with calliphorids, and provide a new method of extraction that is simple and efficient and allows for storage and transportation without refrigeration, consequently simplifying the handling of DNA in forensic entomological cases.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A useful patient admission prediction model that helps the emergency department of a hospital admit patients efficiently is of great importance. It not only improves the care quality provided by the emergency department but also reduces waiting time of patients. This paper proposes an automatic prediction method for patient admission based on a fuzzy min–max neural network (FMM) with rules extraction. The FMM neural network forms a set of hyperboxes by learning through data samples, and the learned knowledge is used for prediction. In addition to providing predictions, decision rules are extracted from the FMM hyperboxes to provide an explanation for each prediction. In order to simplify the structure of FMM and the decision rules, an optimization method that simultaneously maximizes prediction accuracy and minimizes the number of FMM hyperboxes is proposed. Specifically, a genetic algorithm is formulated to find the optimal configuration of the decision rules. The experimental results using a large data set consisting of 450740 real patient records reveal that the proposed method achieves comparable or even better prediction accuracy than state-of-the-art classifiers with the additional ability to extract a set of explanatory rules to justify its predictions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

For years, opinion polls rely on data collected through telephone or person-to-person surveys. The process is costly, inconvenient, and slow. Recently online search data has emerged as potential proxies for the survey data. However considerable human involvement is still needed for the selection of search indices, a task that requires knowledge of both the target issue and how search terms are used by the online community. The robustness of such manually selected search indices can be questionable. In this paper, we propose an automatic polling system through a novel application of machine learning. In this system, the needs for examining, comparing, and selecting search indices have been eliminated through automatic generation of candidate search indices and intelligent combination of the indices. The results include a publicly accessible web application that provides real-time, robust, and accurate measurements of public opinions on several subjects of general interest.