27 resultados para Text-Based Image Retrieval
em Digital Commons at Florida International University
Resumo:
The main challenges of multimedia data retrieval lie in the effective mapping between low-level features and high-level concepts, and in the individual users' subjective perceptions of multimedia content. ^ The objectives of this dissertation are to develop an integrated multimedia indexing and retrieval framework with the aim to bridge the gap between semantic concepts and low-level features. To achieve this goal, a set of core techniques have been developed, including image segmentation, content-based image retrieval, object tracking, video indexing, and video event detection. These core techniques are integrated in a systematic way to enable the semantic search for images/videos, and can be tailored to solve the problems in other multimedia related domains. In image retrieval, two new methods of bridging the semantic gap are proposed: (1) for general content-based image retrieval, a stochastic mechanism is utilized to enable the long-term learning of high-level concepts from a set of training data, such as user access frequencies and access patterns of images. (2) In addition to whole-image retrieval, a novel multiple instance learning framework is proposed for object-based image retrieval, by which a user is allowed to more effectively search for images that contain multiple objects of interest. An enhanced image segmentation algorithm is developed to extract the object information from images. This segmentation algorithm is further used in video indexing and retrieval, by which a robust video shot/scene segmentation method is developed based on low-level visual feature comparison, object tracking, and audio analysis. Based on shot boundaries, a novel data mining framework is further proposed to detect events in soccer videos, while fully utilizing the multi-modality features and object information obtained through video shot/scene detection. ^ Another contribution of this dissertation is the potential of the above techniques to be tailored and applied to other multimedia applications. This is demonstrated by their utilization in traffic video surveillance applications. The enhanced image segmentation algorithm, coupled with an adaptive background learning algorithm, improves the performance of vehicle identification. A sophisticated object tracking algorithm is proposed to track individual vehicles, while the spatial and temporal relationships of vehicle objects are modeled by an abstract semantic model. ^
Resumo:
Since multimedia data, such as images and videos, are way more expressive and informative than ordinary text-based data, people find it more attractive to communicate and express with them. Additionally, with the rising popularity of social networking tools such as Facebook and Twitter, multimedia information retrieval can no longer be considered a solitary task. Rather, people constantly collaborate with one another while searching and retrieving information. But the very cause of the popularity of multimedia data, the huge and different types of information a single data object can carry, makes their management a challenging task. Multimedia data is commonly represented as multidimensional feature vectors and carry high-level semantic information. These two characteristics make them very different from traditional alpha-numeric data. Thus, to try to manage them with frameworks and rationales designed for primitive alpha-numeric data, will be inefficient. An index structure is the backbone of any database management system. It has been seen that index structures present in existing relational database management frameworks cannot handle multimedia data effectively. Thus, in this dissertation, a generalized multidimensional index structure is proposed which accommodates the atypical multidimensional representation and the semantic information carried by different multimedia data seamlessly from within one single framework. Additionally, the dissertation investigates the evolving relationships among multimedia data in a collaborative environment and how such information can help to customize the design of the proposed index structure, when it is used to manage multimedia data in a shared environment. Extensive experiments were conducted to present the usability and better performance of the proposed framework over current state-of-art approaches.
Resumo:
With the recent explosion in the complexity and amount of digital multimedia data, there has been a huge impact on the operations of various organizations in distinct areas, such as government services, education, medical care, business, entertainment, etc. To satisfy the growing demand of multimedia data management systems, an integrated framework called DIMUSE is proposed and deployed for distributed multimedia applications to offer a full scope of multimedia related tools and provide appealing experiences for the users. This research mainly focuses on video database modeling and retrieval by addressing a set of core challenges. First, a comprehensive multimedia database modeling mechanism called Hierarchical Markov Model Mediator (HMMM) is proposed to model high dimensional media data including video objects, low-level visual/audio features, as well as historical access patterns and frequencies. The associated retrieval and ranking algorithms are designed to support not only the general queries, but also the complicated temporal event pattern queries. Second, system training and learning methodologies are incorporated such that user interests are mined efficiently to improve the retrieval performance. Third, video clustering techniques are proposed to continuously increase the searching speed and accuracy by architecting a more efficient multimedia database structure. A distributed video management and retrieval system is designed and implemented to demonstrate the overall performance. The proposed approach is further customized for a mobile-based video retrieval system to solve the perception subjectivity issue by considering individual user's profile. Moreover, to deal with security and privacy issues and concerns in distributed multimedia applications, DIMUSE also incorporates a practical framework called SMARXO, which supports multilevel multimedia security control. SMARXO efficiently combines role-based access control (RBAC), XML and object-relational database management system (ORDBMS) to achieve the target of proficient security control. A distributed multimedia management system named DMMManager (Distributed MultiMedia Manager) is developed with the proposed framework DEMUR; to support multimedia capturing, analysis, retrieval, authoring and presentation in one single framework.
Resumo:
Poor informational reading and writing skills in early grades and the need to provide students more experience with informational text have been identified by research as areas of concern. Wilkinson and Son (2011) support future research in dialogic approaches to investigate the impact dialogic teaching has on comprehension. This study (N = 39) examined the gains in reading comprehension, science achievement, and metacognitive functioning of individual second grade students interacting with instructors using dialogue journals alongside their textbook. The 38 week study consisted of two instructional phases, and three assessment points. After a period of oral metacognitive strategies, one class formed the treatment group (n=17), consisting of two teachers following the co-teaching method, and two classes formed the comparison group ( n=22). The dialogue journal intervention for the treatment group embraced the transactional theory of instruction through the use of dialogic interaction between teachers and students. Students took notes on the assigned lesson after an oral discussion. Teachers responded to students' entries with scaffolding using reading strategies (prior knowledge, skim, slow down, mental integration, and diagrams) modeled after Schraw's (1998) strategy evaluation matrix, to enhance students' comprehension. The comparison group utilized text-based, teacher-led whole group discussion. Data were collected using different measures: (a) Florida Assessments for Instruction in Reading (FAIR) Broad Diagnostic Inventory; (b) Scott Foresman end of chapter tests; (c) Metacomprehension Strategy Index (Schmitt, 1990); and (d) researcher-made metacognitive scaffolding rubric. Statistical analyses were performed using paired sample t-tests, regression analysis of covariance, and two way analysis of covariance. Findings from the study revealed that experimental participants performed significantly better on the linear combination of reading comprehension, science achievement, and metacognitive function, than their comparison group counterparts while controlling for pretest scores. Overall, results from the study established that teacher scaffolding using metacognitive strategies can potentially develop students' reading comprehension, science achievement, and metacognitive awareness. This suggests that early childhood students gain from the integration of reading and writing when using authentic materials (science textbooks) in science classrooms. A replication of this study with more students across more schools, and different grade levels would improve the generalizability of these results.
Resumo:
According to the American Podiatric Medical Association, about 15 percent of the patients with diabetes would develop a diabetic foot ulcer. Furthermore, foot ulcerations leads to 85 percent of the diabetes-related amputations. Foot ulcers are caused due to a combination of factors, such as lack of feeling in the foot, poor circulation, foot deformities and the duration of the diabetes. To date, the wounds are inspected visually to monitor the wound healing, without any objective imaging approach to look before the wound’s surface. Herein, a non-contact, portable handheld optical device was developed at the Optical Imaging Laboratory as an objective approach to monitor wound healing in foot ulcer. This near-infrared optical technology is non-radiative, safe and fast in imaging large wounds on patients. The FIU IRB-approved study will involve subjects that have been diagnosed with diabetes by a physician and who have developed foot ulcers. Currently, in-vivo imaging studies are carried out every week on diabetic patients with foot ulcers at two clinical sites in Miami. Near-infrared images of the wound are captured on subjects every week and the data is processed using customdeveloped Matlab-based image processing tools. The optical contrast of the wound to its peripheries and the wound size are analyzed and compared from the NIR and white light images during the weekly systematic imaging of wound healing.
Resumo:
This paper examines the history of schema theory and how culture is incorporated into schema theory. Furthermore, the author argues that cultural schema affects students’ usage of reader-based processing and text-based processing in reading.
Resumo:
The primary goal of this dissertation is to develop point-based rigid and non-rigid image registration methods that have better accuracy than existing methods. We first present point-based PoIRe, which provides the framework for point-based global rigid registrations. It allows a choice of different search strategies including (a) branch-and-bound, (b) probabilistic hill-climbing, and (c) a novel hybrid method that takes advantage of the best characteristics of the other two methods. We use a robust similarity measure that is insensitive to noise, which is often introduced during feature extraction. We show the robustness of PoIRe using it to register images obtained with an electronic portal imaging device (EPID), which have large amounts of scatter and low contrast. To evaluate PoIRe we used (a) simulated images and (b) images with fiducial markers; PoIRe was extensively tested with 2D EPID images and images generated by 3D Computer Tomography (CT) and Magnetic Resonance (MR) images. PoIRe was also evaluated using benchmark data sets from the blind retrospective evaluation project (RIRE). We show that PoIRe is better than existing methods such as Iterative Closest Point (ICP) and methods based on mutual information. We also present a novel point-based local non-rigid shape registration algorithm. We extend the robust similarity measure used in PoIRe to non-rigid registrations adapting it to a free form deformation (FFD) model and making it robust to local minima, which is a drawback common to existing non-rigid point-based methods. For non-rigid registrations we show that it performs better than existing methods and that is less sensitive to starting conditions. We test our non-rigid registration method using available benchmark data sets for shape registration. Finally, we also explore the extraction of features invariant to changes in perspective and illumination, and explore how they can help improve the accuracy of multi-modal registration. For multimodal registration of EPID-DRR images we present a method based on a local descriptor defined by a vector of complex responses to a circular Gabor filter.
Resumo:
The primary goal of this dissertation is to develop point-based rigid and non-rigid image registration methods that have better accuracy than existing methods. We first present point-based PoIRe, which provides the framework for point-based global rigid registrations. It allows a choice of different search strategies including (a) branch-and-bound, (b) probabilistic hill-climbing, and (c) a novel hybrid method that takes advantage of the best characteristics of the other two methods. We use a robust similarity measure that is insensitive to noise, which is often introduced during feature extraction. We show the robustness of PoIRe using it to register images obtained with an electronic portal imaging device (EPID), which have large amounts of scatter and low contrast. To evaluate PoIRe we used (a) simulated images and (b) images with fiducial markers; PoIRe was extensively tested with 2D EPID images and images generated by 3D Computer Tomography (CT) and Magnetic Resonance (MR) images. PoIRe was also evaluated using benchmark data sets from the blind retrospective evaluation project (RIRE). We show that PoIRe is better than existing methods such as Iterative Closest Point (ICP) and methods based on mutual information. We also present a novel point-based local non-rigid shape registration algorithm. We extend the robust similarity measure used in PoIRe to non-rigid registrations adapting it to a free form deformation (FFD) model and making it robust to local minima, which is a drawback common to existing non-rigid point-based methods. For non-rigid registrations we show that it performs better than existing methods and that is less sensitive to starting conditions. We test our non-rigid registration method using available benchmark data sets for shape registration. Finally, we also explore the extraction of features invariant to changes in perspective and illumination, and explore how they can help improve the accuracy of multi-modal registration. For multimodal registration of EPID-DRR images we present a method based on a local descriptor defined by a vector of complex responses to a circular Gabor filter.
Resumo:
The objectives of this research are to analyze and develop a modified Principal Component Analysis (PCA) and to develop a two-dimensional PCA with applications in image processing. PCA is a classical multivariate technique where its mathematical treatment is purely based on the eigensystem of positive-definite symmetric matrices. Its main function is to statistically transform a set of correlated variables to a new set of uncorrelated variables over $\IR\sp{n}$ by retaining most of the variations present in the original variables.^ The variances of the Principal Components (PCs) obtained from the modified PCA form a correlation matrix of the original variables. The decomposition of this correlation matrix into a diagonal matrix produces a set of orthonormal basis that can be used to linearly transform the given PCs. It is this linear transformation that reproduces the original variables. The two-dimensional PCA can be devised as a two successive of one-dimensional PCA. It can be shown that, for an $m\times n$ matrix, the PCs obtained from the two-dimensional PCA are the singular values of that matrix.^ In this research, several applications for image analysis based on PCA are developed, i.e., edge detection, feature extraction, and multi-resolution PCA decomposition and reconstruction. ^
Resumo:
Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. ^ Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a twofold “custom wrapper” approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. ^ Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. ^ This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases. ^
Resumo:
With the proliferation of multimedia data and ever-growing requests for multimedia applications, there is an increasing need for efficient and effective indexing, storage and retrieval of multimedia data, such as graphics, images, animation, video, audio and text. Due to the special characteristics of the multimedia data, the Multimedia Database management Systems (MMDBMSs) have emerged and attracted great research attention in recent years. Though much research effort has been devoted to this area, it is still far from maturity and there exist many open issues. In this dissertation, with the focus of addressing three of the essential challenges in developing the MMDBMS, namely, semantic gap, perception subjectivity and data organization, a systematic and integrated framework is proposed with video database and image database serving as the testbed. In particular, the framework addresses these challenges separately yet coherently from three main aspects of a MMDBMS: multimedia data representation, indexing and retrieval. In terms of multimedia data representation, the key to address the semantic gap issue is to intelligently and automatically model the mid-level representation and/or semi-semantic descriptors besides the extraction of the low-level media features. The data organization challenge is mainly addressed by the aspect of media indexing where various levels of indexing are required to support the diverse query requirements. In particular, the focus of this study is to facilitate the high-level video indexing by proposing a multimodal event mining framework associated with temporal knowledge discovery approaches. With respect to the perception subjectivity issue, advanced techniques are proposed to support users' interaction and to effectively model users' perception from the feedback at both the image-level and object-level.
Resumo:
Fluorescence-enhanced optical imaging is an emerging non-invasive and non-ionizing modality towards breast cancer diagnosis. Various optical imaging systems are currently available, although most of them are limited by bulky instrumentation, or their inability to flexibly image different tissue volumes and shapes. Hand-held based optical imaging systems are a recent development for its improved portability, but are currently limited only to surface mapping. Herein, a novel optical imager, consisting primarily of a hand-held probe and a gain-modulated intensified charge coupled device (ICCD) detector, is developed towards both surface and tomographic breast imaging. The unique features of this hand-held probe based optical imager are its ability to; (i) image large tissue areas (5×10 sq. cm) in a single scan, (ii) reduce overall imaging time using a unique measurement geometry, and (iii) perform tomographic imaging for tumor three-dimensional (3-D) localization. Frequency-domain based experimental phantom studies have been performed on slab geometries (650 ml) under different target depths (1-2.5 cm), target volumes (0.45, 0.23 and 0.10 cc), fluorescence absorption contrast ratios (1:0, 1000:1 to 5:1), and number of targets (up to 3), using Indocyanine Green (ICG) as fluorescence contrast agents. An approximate extended Kalman filter based inverse algorithm has been adapted towards 3-D tomographic reconstructions. Single fluorescence target(s) was reconstructed when located: (i) up to 2.5 cm deep (at 1:0 contrast ratio) and 1.5 cm deep (up to 10:1 contrast ratio) for 0.45 cc-target; and (ii) 1.5 cm deep for target as small as 0.10 cc at 1:0 contrast ratio. In the case of multiple targets, two targets as close as 0.7 cm were tomographically resolved when located 1.5 cm deep. It was observed that performing multi-projection (here dual) based tomographic imaging using a priori target information from surface images, improved the target depth recovery over using single projection based imaging. From a total of 98 experimental phantom studies, the sensitivity and specificity of the imager was estimated as 81-86% and 43-50%, respectively. With 3-D tomographic imaging successfully demonstrated for the first time using a hand-held based optical imager, the clinical translation of this technology is promising upon further experimental validation from in-vitro and in-vivo studies.
Resumo:
The aim of this research was to demonstrate a high current and stable field emission (FE) source based on carbon nanotubes (CNTs) and electron multiplier microchannel plate (MCP) and design efficient field emitters. In recent years various CNT based FE devices have been demonstrated including field emission displays, x-ray source and many more. However to use CNTs as source in high powered microwave (HPM) devices higher and stable current in the range of few milli-amperes to amperes is required. To achieve such high current we developed a novel technique of introducing a MCP between CNT cathode and anode. MCP is an array of electron multipliers; it operates by avalanche multiplication of secondary electrons, which are generated when electrons strike channel walls of MCP. FE current from CNTs is enhanced due to avalanche multiplication of secondary electrons and in addition MCP also protects CNTs from irreversible damage during vacuum arcing. Conventional MCP is not suitable for this purpose due to the lower secondary emission properties of their materials. To achieve higher and stable currents we have designed and fabricated a unique ceramic MCP consisting of high SEY materials. The MCP was fabricated utilizing optimum design parameters, which include channel dimensions and material properties obtained from charged particle optics (CPO) simulation. Child Langmuir law, which gives the optimum current density from an electron source, was taken into account during the system design and experiments. Each MCP channel consisted of MgO coated CNTs which was chosen from various material systems due to its very high SEY. With MCP inserted between CNT cathode and anode stable and higher emission current was achieved. It was ∼25 times higher than without MCP. A brighter emission image was also evidenced due to enhanced emission current. The obtained results are a significant technological advance and this research holds promise for electron source in new generation lightweight, efficient and compact microwave devices for telecommunications in satellites or space applications. As part of this work novel emitters consisting of multistage geometry with improved FE properties were was also developed.
Resumo:
Today, most conventional surveillance networks are based on analog system, which has a lot of constraints like manpower and high-bandwidth requirements. It becomes the barrier for today's surveillance network development. This dissertation describes a digital surveillance network architecture based on the H.264 coding/decoding (CODEC) System-on-a-Chip (SoC) platform. The proposed digital surveillance network architecture includes three major layers: software layer, hardware layer, and the network layer. The following outlines the contributions to the proposed digital surveillance network architecture. (1) We implement an object recognition system and an object categorization system on the software layer by applying several Digital Image Processing (DIP) algorithms. (2) For better compression ratio and higher video quality transfer, we implement two new modules on the hardware layer of the H.264 CODEC core, i.e., the background elimination module and the Directional Discrete Cosine Transform (DDCT) module. (3) Furthermore, we introduce a Digital Signal Processor (DSP) sub-system on the main bus of H.264 SoC platforms as the major hardware support system for our software architecture. Thus we combine the software and hardware platforms to be an intelligent surveillance node. Lab results show that the proposed surveillance node can dramatically save the network resources like bandwidth and storage capacity.
Resumo:
This dissertation introduces a new system for handwritten text recognition based on an improved neural network design. Most of the existing neural networks treat mean square error function as the standard error function. The system as proposed in this dissertation utilizes the mean quartic error function, where the third and fourth derivatives are non-zero. Consequently, many improvements on the training methods were achieved. The training results are carefully assessed before and after the update. To evaluate the performance of a training system, there are three essential factors to be considered, and they are from high to low importance priority: (1) error rate on testing set, (2) processing time needed to recognize a segmented character and (3) the total training time and subsequently the total testing time. It is observed that bounded training methods accelerate the training process, while semi-third order training methods, next-minimal training methods, and preprocessing operations reduce the error rate on the testing set. Empirical observations suggest that two combinations of training methods are needed for different case character recognition. Since character segmentation is required for word and sentence recognition, this dissertation provides also an effective rule-based segmentation method, which is different from the conventional adaptive segmentation methods. Dictionary-based correction is utilized to correct mistakes resulting from the recognition and segmentation phases. The integration of the segmentation methods with the handwritten character recognition algorithm yielded an accuracy of 92% for lower case characters and 97% for upper case characters. In the testing phase, the database consists of 20,000 handwritten characters, with 10,000 for each case. The testing phase on the recognition 10,000 handwritten characters required 8.5 seconds in processing time.