850 resultados para Text-Based Image Retrieval
Resumo:
Over the past five years, XML has been embraced by both the research and industrial community due to its promising prospects as a new data representation and exchange format on the Internet. The widespread popularity of XML creates an increasing need to store XML data in persistent storage systems and to enable sophisticated XML queries over the data. The currently available approaches to addressing the XML storage and retrieval issue have the limitations of either being not mature enough (e.g. native approaches) or causing inflexibility, a lot of fragmentation and excessive join operations (e.g. non-native approaches such as the relational database approach). ^ In this dissertation, I studied the issue of storing and retrieving XML data using the Semantic Binary Object-Oriented Database System (Sem-ODB) to leverage the advanced Sem-ODB technology with the emerging XML data model. First, a meta-schema based approach was implemented to address the data model mismatch issue that is inherent in the non-native approaches. The meta-schema based approach captures the meta-data of both Document Type Definitions (DTDs) and Sem-ODB Semantic Schemas, thus enables a dynamic and flexible mapping scheme. Second, a formal framework was presented to ensure precise and concise mappings. In this framework, both schemas and the conversions between them are formally defined and described. Third, after major features of an XML query language, XQuery, were analyzed, a high-level XQuery to Semantic SQL (Sem-SQL) query translation scheme was described. This translation scheme takes advantage of the navigation-oriented query paradigm of the Sem-SQL, thus avoids the excessive join problem of relational approaches. Finally, the modeling capability of the Semantic Binary Object-Oriented Data Model (Sem-ODM) was explored from the perspective of conceptually modeling an XML Schema using a Semantic Schema. ^ It was revealed that the advanced features of the Sem-ODB, such as multi-valued attributes, surrogates, the navigation-oriented query paradigm, among others, are indeed beneficial in coping with the XML storage and retrieval issue using a non-XML approach. Furthermore, extensions to the Sem-ODB to make it work more effectively with XML data were also proposed. ^
Resumo:
With hundreds of millions of users reporting locations and embracing mobile technologies, Location Based Services (LBSs) are raising new challenges. In this dissertation, we address three emerging problems in location services, where geolocation data plays a central role. First, to handle the unprecedented growth of generated geolocation data, existing location services rely on geospatial database systems. However, their inability to leverage combined geographical and textual information in analytical queries (e.g. spatial similarity joins) remains an open problem. To address this, we introduce SpsJoin, a framework for computing spatial set-similarity joins. SpsJoin handles combined similarity queries that involve textual and spatial constraints simultaneously. LBSs use this system to tackle different types of problems, such as deduplication, geolocation enhancement and record linkage. We define the spatial set-similarity join problem in a general case and propose an algorithm for its efficient computation. Our solution utilizes parallel computing with MapReduce to handle scalability issues in large geospatial databases. Second, applications that use geolocation data are seldom concerned with ensuring the privacy of participating users. To motivate participation and address privacy concerns, we propose iSafe, a privacy preserving algorithm for computing safety snapshots of co-located mobile devices as well as geosocial network users. iSafe combines geolocation data extracted from crime datasets and geosocial networks such as Yelp. In order to enhance iSafe's ability to compute safety recommendations, even when crime information is incomplete or sparse, we need to identify relationships between Yelp venues and crime indices at their locations. To achieve this, we use SpsJoin on two datasets (Yelp venues and geolocated businesses) to find venues that have not been reviewed and to further compute the crime indices of their locations. Our results show a statistically significant dependence between location crime indices and Yelp features. Third, review centered LBSs (e.g., Yelp) are increasingly becoming targets of malicious campaigns that aim to bias the public image of represented businesses. Although Yelp actively attempts to detect and filter fraudulent reviews, our experiments showed that Yelp is still vulnerable. Fraudulent LBS information also impacts the ability of iSafe to provide correct safety values. We take steps toward addressing this problem by proposing SpiDeR, an algorithm that takes advantage of the richness of information available in Yelp to detect abnormal review patterns. We propose a fake venue detection solution that applies SpsJoin on Yelp and U.S. housing datasets. We validate the proposed solutions using ground truth data extracted by our experiments and reviews filtered by Yelp.
Resumo:
With the progress of computer technology, computers are expected to be more intelligent in the interaction with humans, presenting information according to the user's psychological and physiological characteristics. However, computer users with visual problems may encounter difficulties on the perception of icons, menus, and other graphical information displayed on the screen, limiting the efficiency of their interaction with computers. In this dissertation, a personalized and dynamic image precompensation method was developed to improve the visual performance of the computer users with ocular aberrations. The precompensation was applied on the graphical targets before presenting them on the screen, aiming to counteract the visual blurring caused by the ocular aberration of the user's eye. A complete and systematic modeling approach to describe the retinal image formation of the computer user was presented, taking advantage of modeling tools, such as Zernike polynomials, wavefront aberration, Point Spread Function and Modulation Transfer Function. The ocular aberration of the computer user was originally measured by a wavefront aberrometer, as a reference for the precompensation model. The dynamic precompensation was generated based on the resized aberration, with the real-time pupil diameter monitored. The potential visual benefit of the dynamic precompensation method was explored through software simulation, with the aberration data from a real human subject. An "artificial eye'' experiment was conducted by simulating the human eye with a high-definition camera, providing objective evaluation to the image quality after precompensation. In addition, an empirical evaluation with 20 human participants was also designed and implemented, involving image recognition tests performed under a more realistic viewing environment of computer use. The statistical analysis results of the empirical experiment confirmed the effectiveness of the dynamic precompensation method, by showing significant improvement on the recognition accuracy. The merit and necessity of the dynamic precompensation were also substantiated by comparing it with the static precompensation. The visual benefit of the dynamic precompensation was further confirmed by the subjective assessments collected from the evaluation participants.
Resumo:
The outcome of this research is an Intelligent Retrieval System for Conditions of Contract Documents. The objective of the research is to improve the method of retrieving data from a computer version of a construction Conditions of Contract document. SmartDoc, a prototype computer system has been developed for this purpose. The system provides recommendations to aid the user in the process of retrieving clauses from the construction Conditions of Contract document. The prototype system integrates two computer technologies: hypermedia and expert systems. Hypermedia is utilized to provide a dynamic way for retrieving data from the document. Expert systems technology is utilized to build a set of rules that activate the recommendations to aid the user during the process of retrieval of clauses. The rules are based on experts knowledge. The prototype system helps the user retrieve related clauses that are not explicitly cross-referenced but, according to expert experience, are relevant to the topic that the user is interested in.
Resumo:
This study examined the role of corporate websites and company Facebook profiles in shaping perceptions of organizational image in the recruitment context. A primary purpose of this research was to determine whether or not perceptions of organizational image vary across different web-based recruitment methods, specifically examining corporate websites and social networking (SNW) sites, such as company Facebook profiles. A secondary goal was to determine how these perceptions of image are shaped by the objective components of websites and Facebook profiles. Finally, this study sought to determine the most influential components of websites and Facebook profiles, in terms of impacting image, to better understand how organizations can maximize their web-based recruitment efforts. A total of 102 companies selected from Fortune Magazine’s 2011 top 500 were chosen for the study. Perceptions of organizational personality as well as objective assessments of personality were gathered for each organization in a two phase approach. Results indicate that exposure to corporate websites and company Facebook profiles do influence perceptions of image in different ways. Furthermore, individual components of the websites were identified as key drivers for influencing specific image dimensions, particularly for company Facebook pages. Findings are beneficial for advising practitioners on how to best manage their web-based recruitment sources in order to maximize efficiency. The present study serves to further our understanding of the process through which perceptions of organizational image are influenced by new recruitment sources.
Resumo:
Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a two-fold "custom wrapper" approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases.
Resumo:
In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.
Resumo:
An experimental setup to measure the three-dimensional phase-intensity distribution of an infrared laser beam in the focal region has been presented. It is based on the knife-edge method to perform a tomographic reconstruction and on a transport of intensity equation-based numerical method to obtain the propagating wavefront. This experimental approach allows us to characterize a focalized laser beam when the use of image or interferometer arrangements is not possible. Thus, we have recovered intensity and phase of an aberrated beam dominated by astigmatism. The phase evolution is fully consistent with that of the beam intensity along the optical axis. Moreover, this method is based on an expansion on both the irradiance and the phase information in a series of Zernike polynomials. We have described guidelines to choose a proper set of these polynomials depending on the experimental conditions and showed that, by abiding these criteria, numerical errors can be reduced.
Resumo:
The importance of non-destructive techniques (NDT) in structural health monitoring programmes is being critically felt in the recent times. The quality of the measured data, often affected by various environmental conditions can be a guiding factor in terms usefulness and prediction efficiencies of the various detection and monitoring methods used in this regard. Often, a preprocessing of the acquired data in relation to the affecting environmental parameters can improve the information quality and lead towards a significantly more efficient and correct prediction process. The improvement can be directly related to the final decision making policy about a structure or a network of structures and is compatible with general probabilistic frameworks of such assessment and decision making programmes. This paper considers a preprocessing technique employed for an image analysis based structural health monitoring methodology to identify sub-marine pitting corrosion in the presence of variable luminosity, contrast and noise affecting the quality of images. A preprocessing of the gray-level threshold of the various images is observed to bring about a significant improvement in terms of damage detection as compared to an automatically computed gray-level threshold. The case dependent adjustments of the threshold enable to obtain the best possible information from an existing image. The corresponding improvements are observed in a qualitative manner in the present study.
Resumo:
Users seeking information may not find relevant information pertaining to their information need in a specific language. But information may be available in a language different from their own, but users may not know that language. Thus users may experience difficulty in accessing the information present in different languages. Since the retrieval process depends on the translation of the user query, there are many issues in getting the right translation of the user query. For a pair of languages chosen by a user, resources, like incomplete dictionary, inaccurate machine translation system may exist. These resources may be insufficient to map the query terms in one language to its equivalent terms in another language. Also for a given query, there might exist multiple correct translations. The underlying corpus evidence may suggest a clue to select a probable set of translations that could eventually perform a better information retrieval. In this paper, we present a cross language information retrieval approach to effectively retrieve information present in a language other than the language of the user query using the corpus driven query suggestion approach. The idea is to utilize the corpus based evidence of one language to improve the retrieval and re-ranking of news documents in the other language. We use FIRE corpora - Tamil and English news collections in our experiments and illustrate the effectiveness of the proposed cross language information retrieval approach.
Resumo:
The amount and quality of available biomass is a key factor for the sustainable livestock industry and agricultural management related decision making. Globally 31.5% of land cover is grassland while 80% of Ireland’s agricultural land is grassland. In Ireland, grasslands are intensively managed and provide the cheapest feed source for animals. This dissertation presents a detailed state of the art review of satellite remote sensing of grasslands, and the potential application of optical (Moderate–resolution Imaging Spectroradiometer (MODIS)) and radar (TerraSAR-X) time series imagery to estimate the grassland biomass at two study sites (Moorepark and Grange) in the Republic of Ireland using both statistical and state of the art machine learning algorithms. High quality weather data available from the on-site weather station was also used to calculate the Growing Degree Days (GDD) for Grange to determine the impact of ancillary data on biomass estimation. In situ and satellite data covering 12 years for the Moorepark and 6 years for the Grange study sites were used to predict grassland biomass using multiple linear regression, Neuro Fuzzy Inference Systems (ANFIS) models. The results demonstrate that a dense (8-day composite) MODIS image time series, along with high quality in situ data, can be used to retrieve grassland biomass with high performance (R2 = 0:86; p < 0:05, RMSE = 11.07 for Moorepark). The model for Grange was modified to evaluate the synergistic use of vegetation indices derived from remote sensing time series and accumulated GDD information. As GDD is strongly linked to the plant development, or phonological stage, an improvement in biomass estimation would be expected. It was observed that using the ANFIS model the biomass estimation accuracy increased from R2 = 0:76 (p < 0:05) to R2 = 0:81 (p < 0:05) and the root mean square error was reduced by 2.72%. The work on the application of optical remote sensing was further developed using a TerraSAR-X Staring Spotlight mode time series over the Moorepark study site to explore the extent to which very high resolution Synthetic Aperture Radar (SAR) data of interferometrically coherent paddocks can be exploited to retrieve grassland biophysical parameters. After filtering out the non-coherent plots it is demonstrated that interferometric coherence can be used to retrieve grassland biophysical parameters (i. e., height, biomass), and that it is possible to detect changes due to the grass growth, and grazing and mowing events, when the temporal baseline is short (11 days). However, it not possible to automatically uniquely identify the cause of these changes based only on the SAR backscatter and coherence, due to the ambiguity caused by tall grass laid down due to the wind. Overall, the work presented in this dissertation has demonstrated the potential of dense remote sensing and weather data time series to predict grassland biomass using machine-learning algorithms, where high quality ground data were used for training. At present a major limitation for national scale biomass retrieval is the lack of spatial and temporal ground samples, which can be partially resolved by minor modifications in the existing PastureBaseIreland database by adding the location and extent ofeach grassland paddock in the database. As far as remote sensing data requirements are concerned, MODIS is useful for large scale evaluation but due to its coarse resolution it is not possible to detect the variations within the fields and between the fields at the farm scale. However, this issue will be resolved in terms of spatial resolution by the Sentinel-2 mission, and when both satellites (Sentinel-2A and Sentinel-2B) are operational the revisit time will reduce to 5 days, which together with Landsat-8, should enable sufficient cloud-free data for operational biomass estimation at a national scale. The Synthetic Aperture Radar Interferometry (InSAR) approach is feasible if there are enough coherent interferometric pairs available, however this is difficult to achieve due to the temporal decorrelation of the signal. For repeat-pass InSAR over a vegetated area even an 11 days temporal baseline is too large. In order to achieve better coherence a very high resolution is required at the cost of spatial coverage, which limits its scope for use in an operational context at a national scale. Future InSAR missions with pair acquisition in Tandem mode will minimize the temporal decorrelation over vegetation areas for more focused studies. The proposed approach complements the current paradigm of Big Data in Earth Observation, and illustrates the feasibility of integrating data from multiple sources. In future, this framework can be used to build an operational decision support system for retrieval of grassland biophysical parameters based on data from long term planned optical missions (e. g., Landsat, Sentinel) that will ensure the continuity of data acquisition. Similarly, Spanish X-band PAZ and TerraSAR-X2 missions will ensure the continuity of TerraSAR-X and COSMO-SkyMed.
Resumo:
Many studies have shown the considerable potential for the application of remote-sensing-based methods for deriving estimates of lake water quality. However, the reliable application of these methods across time and space is complicated by the diversity of lake types, sensor configuration, and the multitude of different algorithms proposed. This study tested one operational and 46 empirical algorithms sourced from the peer-reviewed literature that have individually shown potential for estimating lake water quality properties in the form of chlorophyll-a (algal biomass) and Secchi disc depth (SDD) (water transparency) in independent studies. Nearly half (19) of the algorithms were unsuitable for use with the remote-sensing data available for this study. The remaining 28 were assessed using the Terra/Aqua satellite archive to identify the best performing algorithms in terms of accuracy and transferability within the period 2001–2004 in four test lakes, namely Vänern, Vättern, Geneva, and Balaton. These lakes represent the broad continuum of large European lake types, varying in terms of eco-region (latitude/longitude and altitude), morphology, mixing regime, and trophic status. All algorithms were tested for each lake separately and combined to assess the degree of their applicability in ecologically different sites. None of the algorithms assessed in this study exhibited promise when all four lakes were combined into a single data set and most algorithms performed poorly even for specific lake types. A chlorophyll-a retrieval algorithm originally developed for eutrophic lakes showed the most promising results (R2 = 0.59) in oligotrophic lakes. Two SDD retrieval algorithms, one originally developed for turbid lakes and the other for lakes with various characteristics, exhibited promising results in relatively less turbid lakes (R2 = 0.62 and 0.76, respectively). The results presented here highlight the complexity associated with remotely sensed lake water quality estimates and the high degree of uncertainty due to various limitations, including the lake water optical properties and the choice of methods.
Resumo:
Notre étude porte le western crépusculaire et cherche plus précisément à extraire le « crépusculaire » du genre. L'épithète « crépusculaire », héritée du vocabulaire critique des années 1960 et 1970, définit généralement un nombre relativement restreint d'œuvres dont le récit met en scène des cowboys vieillissants dans un style qui privilégie un réalisme esthétique et psychologique, fréquemment associé à un révisionnisme historique, voire au « western pro-indien », mais qui se démarque par sa propension à filmer des protagonistes fatigués et dépassés par la marche de l'Histoire. Par un détour sur les formes littéraires ayant comme contexte diégétique l’Ouest américain (dime-novel et romans de la frontière), nous effectuons des allers et retours entre les formes épique et romanesque, entre l’Histoire et son mythe, entre le littéraire et le filmique pour mieux saisir la relation dyadique qu’entretient le western avec l’écriture, d’une part monumentale et d’autre part critique, de l’Histoire. Moins intéressée à l’esthétique des images qu’aux aspects narratologiques du film pris comme texte, notre approche tire profit des analyses littéraires pour remettre en cause les classifications étanches qui ont marqué l’évolution du western cinématographique. Nous étudions, à partir des intuitions d’André Bazin au sujet du sur-western, les modulations narratives du western ainsi que l’émergence d’une conscience critique à partir de ses héros mythologiques (notamment le cow-boy). Notre approche est à la fois épistémologique et transhistorique en ce qu’elle cherche à dégager du western crépusculaire un genre au-delà des genres, fondé sur une incitation à la narrativisation crépusculaire de la part du spectateur. Cette dernière, concentrée par une approche deleuzienne de l’image-cristal, renvoie non plus seulement à une conception existentialiste du personnage dans l’Histoire, mais aussi à une mise en relief pointue du hors-cadre du cinéma, moment de clairvoyance à la fois pragmatique et historicisant que nous définissons comme une image-fin, une image chronogénétique relevant de la contemporanéité de ses figures et de leurs auteurs.
Resumo:
Scientists planning to use underwater stereoscopic image technologies are often faced with numerous problems during the methodological implementations: commercial equipment is too expensive; the setup or calibration is too complex; or the imaging processing (i.e. measuring objects in the stereo-images) is too complicated to be performed without a time-consuming phase of training and evaluation. The present paper addresses some of these problems and describes a workflow for stereoscopic measurements for marine biologists. It also provides instructions on how to assemble an underwater stereo-photographic system with two digital consumer cameras and gives step-by-step guidelines for setting up the hardware. The second part details a software procedure to correct stereo-image pairs for lens distortions, which is especially important when using cameras with non-calibrated optical units. The final part presents a guide to the process of measuring the lengths (or distances) of objects in stereoscopic image pairs. To reveal the applicability and the restrictions of the described systems and to test the effects of different types of camera (a compact camera and an SLR type), experiments were performed to determine the precision and accuracy of two generic stereo-imaging units: a diver-operated system based on two Olympus Mju 1030SW compact cameras and a cable-connected observatory system based on two Canon 1100D SLR cameras. In the simplest setup without any correction for lens distortion, the low-budget Olympus Mju 1030SW system achieved mean accuracy errors (percentage deviation of a measurement from the object's real size) between 10.2 and -7.6% (overall mean value: -0.6%), depending on the size, orientation and distance of the measured object from the camera. With the single lens reflex (SLR) system, very similar values between 10.1% and -3.4% (overall mean value: -1.2%) were observed. Correction of the lens distortion significantly improved the mean accuracy errors of either system. Even more, system precision (spread of the accuracy) improved significantly in both systems. Neither the use of a wide-angle converter nor multiple reassembly of the system had a significant negative effect on the results. The study shows that underwater stereophotography, independent of the system, has a high potential for robust and non-destructive in situ sampling and can be used without prior specialist training.
Resumo:
Aerial observations of light pollution can fill an important gap between ground based surveys and nighttime satellite data. Terrestrially bound surveys are labor intensive and are generally limited to a small spatial extent, and while existing satellite data cover the whole world, they are limited to coarse resolution. This paper describes the production of a high resolution (1 m) mosaic image of the city of Berlin, Germany at night. The dataset is spatially analyzed to identify themajor sources of light pollution in the city based on urban land use data. An area-independent 'brightness factor' is introduced that allows direct comparison of the light emission from differently sized land use classes, and the percentage area with values above average brightness is calculated for each class. Using this methodology, lighting associated with streets has been found to be the dominant source of zenith directed light pollution (31.6%), although other land use classes have much higher average brightness. These results are compared with other urban light pollution quantification studies. The minimum resolution required for an analysis of this type is found to be near 10 m. Future applications of high resolution datasets such as this one could include: studies of the efficacy of light pollution mitigation measures, improved light pollution simulations, economic and energy use, the relationship between artificial light and ecological parameters (e.g. circadian rhythm, fitness, mate selection, species distributions, migration barriers and seasonal behavior), or the management of nightscapes. To encourage further scientific inquiry, the mosaic data is freely available at Pangaea.