880 resultados para INFORMATION RETRIEVAL


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Big Data is a rising IT trend similar to cloud computing, social networking or ubiquitous computing. Big Data can offer beneficial scenarios in the e-health arena. However, one of the scenarios can be that Big Data needs to be kept secured for a long period of time in order to gain its benefits such as finding cures for infectious diseases and protecting patient privacy. From this connection, it is beneficial to analyse Big Data to make meaningful information while the data is stored securely. Therefore, the analysis of various database encryption techniques is essential. In this study, we simulated 3 types of technical environments, namely, Plain-text, Microsoft Built-in Encryption, and custom Advanced Encryption Standard, using Bucket Index in Data-as-a-Service. The results showed that custom AES-DaaS has a faster range query response time than MS built-in encryption. Furthermore, while carrying out the scalability test, we acknowledged that there are performance thresholds depending on physical IT resources. Therefore, for the purpose of efficient Big Data management in eHealth it is noteworthy to examine their scalability limits as well even if it is under a cloud computing environment. In addition, when designing an e-health database, both patient privacy and system performance needs to be dealt as top priorities.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents the results from a study of information behaviors, with specific focus on information organisation-related behaviours conducted as part of a larger daily diary study with 34 participants. The findings indicate that organization of information in everyday life is a problematic area due to various factors. The self-evident one is the inter-subjectivity between the person who may have organized the information and the person looking for that same information (Berlin et. al., 1993). Increasingly though, we are not just looking for information within collections that have been designed by someone else, but within our own personal collections of information, which frequently include books, electronic files, photos, records, documents, desktops, web bookmarks, and portable devices. The passage of time between when we categorized or classified the information, and the time when we look for the same information, poses several problems of intra-subjectivity, or the difference between our own past and present perceptions of the same information. Information searching, and hence the retrieval of information from one's own collection of information in everyday life involved a spatial and temporal coordination with one's own past selves in a sort of cognitive and affective time travel, just as organizing information is a form of anticipatory coordination with one's future information needs. This has implications for finding information and also on personal information management.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Process-Aware Information Systems (PAISs) support executions of operational processes that involve people, resources, and software applications on the basis of process models. Process models describe vast, often infinite, amounts of process instances, i.e., workflows supported by the systems. With the increasing adoption of PAISs, large process model repositories emerged in companies and public organizations. These repositories constitute significant information resources. Accurate and efficient retrieval of process models and/or process instances from such repositories is interesting for multiple reasons, e.g., searching for similar models/instances, filtering, reuse, standardization, process compliance checking, verification of formal properties, etc. This paper proposes a technique for indexing process models that relies on their alternative representations, called untanglings. We show the use of untanglings for retrieval of process models based on process instances that they specify via a solution to the total executability problem. Experiments with industrial process models testify that the proposed retrieval approach is up to three orders of magnitude faster than the state of the art.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We revisit the venerable question of access credentials management, which concerns the techniques that we, humans with limited memory, must employ to safeguard our various access keys and tokens in a connected world. Although many existing solutions can be employed to protect a long secret using a short password, those solutions typically require certain assumptions on the distribution of the secret and/or the password, and are helpful against only a subset of the possible attackers. After briefly reviewing a variety of approaches, we propose a user-centric comprehensive model to capture the possible threats posed by online and offline attackers, from the outside and the inside, against the security of both the plaintext and the password. We then propose a few very simple protocols, adapted from the Ford-Kaliski server-assisted password generator and the Boldyreva unique blind signature in particular, that provide the best protection against all kinds of threats, for all distributions of secrets. We also quantify the concrete security of our approach in terms of online and offline password guesses made by outsiders and insiders, in the random-oracle model. The main contribution of this paper lies not in the technical novelty of the proposed solution, but in the identification of the problem and its model. Our results have an immediate and practical application for the real world: they show how to implement single-sign-on stateless roaming authentication for the internet, in a ad-hoc user-driven fashion that requires no change to protocols or infrastructure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we consider the problem of document ranking in a non-traditional retrieval task, called subtopic retrieval. This task involves promoting relevant documents that cover many subtopics of a query at early ranks, providing thus diversity within the ranking. In the past years, several approaches have been proposed to diversify retrieval results. These approaches can be classified into two main paradigms, depending upon how the ranks of documents are revised for promoting diversity. In the first approach subtopic diversification is achieved implicitly, by choosing documents that are different from each other, while in the second approach this is done explicitly, by estimating the subtopics covered by documents. Within this context, we compare methods belonging to the two paradigms. Furthermore, we investigate possible strategies for integrating the two paradigms with the aim of formulating a new ranking method for subtopic retrieval. We conduct a number of experiments to empirically validate and contrast the state-of-the-art approaches as well as instantiations of our integration approach. The results show that the integration approach outperforms state-of-the-art strategies with respect to a number of measures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we describe the approaches adopted to generate the runs submitted to ImageCLEFPhoto 2009 with an aim to promote document diversity in the rankings. Four of our runs are text based approaches that employ textual statistics extracted from the captions of images, i.e. MMR [1] as a state of the art method for result diversification, two approaches that combine relevance information and clustering techniques, and an instantiation of Quantum Probability Ranking Principle. The fifth run exploits visual features of the provided images to re-rank the initial results by means of Factor Analysis. The results reveal that our methods based on only text captions consistently improve the performance of the respective baselines, while the approach that combines visual features with textual statistics shows lower levels of improvements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This project is a step forward in the study of text mining where enhanced text representation with semantic information plays a significant role. It develops effective methods of entity-oriented retrieval, semantic relation identification and text clustering utilizing semantically annotated data. These methods are based on enriched text representation generated by introducing semantic information extracted from Wikipedia into the input text data. The proposed methods are evaluated against several start-of-art benchmarking methods on real-life data-sets. In particular, this thesis improves the performance of entity-oriented retrieval, identifies different lexical forms for an entity relation and handles clustering documents with multiple feature spaces.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automatic Vehicle Identification Systems are being increasingly used as a new source of travel information. As in the last decades these systems relied on expensive new technologies, few of them were scattered along a networks making thus Travel-Time and Average Speed estimation their main objectives. However, as their price dropped, the opportunity of building dense AVI networks arose, as in Brisbane where more than 250 Bluetooth detectors are now installed. As a consequence this technology represents an effective means to acquire accurate time dependant Origin Destination information. In order to obtain reliable estimations, however, a number of issues need to be addressed. Some of these problems stem from the structure of a network made out of isolated detectors itself while others are inherent of Bluetooth technology (overlapping detection area, missing detections,\...). The aim of this paper is threefold: First, after having presented the level of details that can be reached with a network of isolated detectors we present how we modelled Brisbane's network, keeping only the information valuable for the retrieval of trip information. Second, we give an overview of the issues inherent to the Bluetooth technology and we propose a method for retrieving the itineraries of the individual Bluetooth vehicles. Last, through a comparison with Brisbane Transport Strategic Model results, we highlight the opportunities and the limits of Bluetooth detectors networks. The aim of this paper is twofold. We first give a comprehensive overview of the aforementioned issues. Further, we propose a methodology that can be followed, in order to cleanse, correct and aggregate Bluetooth data. We postulate that the methods introduced by this paper are the first crucial steps that need to be followed in order to compute accurate Origin-Destination matrices in urban road networks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Ozone Monitoring Instrument (OMI) aboard EOS-Aura and the Moderate Resolution Imaging Spectroradiometer (MODIS) onboard EOS-Aqua fly in formation as part of the A-train. Though OMI retrieves aerosol optical depth (AOD) and aerosol absorption, it must assume aerosol layer height. The MODIS cannot retrieve aerosol absorption, but MODIS aerosol retrieval is not sensitive to aerosol layer height and with its smaller pixel size is less affected by subpixel clouds. Here we demonstrate an approach that uses MODIS-retrieved AOD to constrain the OMI retrieval, freeing OMI from making an a priori estimate of aerosol height and allowing a more direct retrieval of aerosol absorption. To predict near-UV optical depths using MODIS data we rely on the spectral curvature of the MODIS-retrieved visible and near-IR spectral AODs. Application of an OMI-MODIS joint retrieval over the north tropical Atlantic shows good agreement between OMI and MODIS-predicted AODs in the UV, which implies that the aerosol height assumed in the OMI-standard algorithm is probably correct. In contrast, over the Arabian Sea, MODIS-predicted AOD deviated from the OMI-standard retrieval, but combined OMI-MODIS retrievals substantially improved information on aerosol layer height (on the basis of validation against airborne lidar measurements). This implies an improvement in the aerosol absorption retrieval, but lack of UV absorption measurements prevents a true validation. Our study demonstrates the potential of multisatellite analysis of A-train data to improve the accuracy of retrieved aerosol products and suggests that a combined OMI-MODIS-CALIPSO retrieval has large potential to further improve assessments of aerosol absorption.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A wide range of models used in agriculture, ecology, carbon cycling, climate and other related studies require information on the amount of leaf material present in a given environment to correctly represent radiation, heat, momentum, water, and various gas exchanges with the overlying atmosphere or the underlying soil. Leaf area index (LAI) thus often features as a critical land surface variable in parameterisations of global and regional climate models, e.g., radiation uptake, precipitation interception, energy conversion, gas exchange and momentum, as all areas are substantially determined by the vegetation surface. Optical wavelengths of remote sensing are the common electromagnetic regions used for LAI estimations and generally for vegetation studies. The main purpose of this dissertation was to enhance the determination of LAI using close-range remote sensing (hemispherical photography), airborne remote sensing (high resolution colour and colour infrared imagery), and satellite remote sensing (high resolution SPOT 5 HRG imagery) optical observations. The commonly used light extinction models are applied at all levels of optical observations. For the sake of comparative analysis, LAI was further determined using statistical relationships between spectral vegetation index (SVI) and ground based LAI. The study areas of this dissertation focus on two regions, one located in Taita Hills, South-East Kenya characterised by tropical cloud forest and exotic plantations, and the other in Gatineau Park, Southern Quebec, Canada dominated by temperate hardwood forest. The sampling procedure of sky map of gap fraction and size from hemispherical photographs was proven to be one of the most crucial steps in the accurate determination of LAI. LAI and clumping index estimates were significantly affected by the variation of the size of sky segments for given zenith angle ranges. On sloping ground, gap fraction and size distributions present strong upslope/downslope asymmetry of foliage elements, and thus the correction and the sensitivity analysis for both LAI and clumping index computations were demonstrated. Several SVIs can be used for LAI mapping using empirical regression analysis provided that the sensitivities of SVIs at varying ranges of LAI are large enough. Large scale LAI inversion algorithms were demonstrated and were proven to be a considerably efficient alternative approach for LAI mapping. LAI can be estimated nonparametrically from the information contained solely in the remotely sensed dataset given that the upper-end (saturated SVI) value is accurately determined. However, further study is still required to devise a methodology as well as instrumentation to retrieve on-ground green leaf area index . Subsequently, the large scale LAI inversion algorithms presented in this work can be precisely validated. Finally, based on literature review and this dissertation, potential future research prospects and directions were recommended.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the differences can be expressed by lists of basic edit operations. Flexible and efficient data analysis on a such typically huge collection is plausible using suffix trees. However, suffix tree occupies O(N log N) bits, which very soon inhibits in-memory analyses. Recent advances in full-text self-indexing reduce the space of suffix tree to O(N log σ) bits, where σ is the alphabet size. In practice, the space reduction is more than 10-fold, for example on suffix tree of Human Genome. However, this reduction factor remains constant when more sequences are added to the collection. We develop a new family of self-indexes suited for the repetitive sequence collection setting. Their expected space requirement depends only on the length n of the base sequence and the number s of variations in its repeated copies. That is, the space reduction factor is no longer constant, but depends on N / n. We believe the structures developed in this work will provide a fundamental basis for storage and retrieval of individual genomes as they become available due to rapid progress in the sequencing technologies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A model of the information and material activities that comprise the overall construction process is presented, using the SADT activity modelling methodology. The basic model is further refined into a number of generic information handling activities such as creation of new information, information search and retrieval, information distribution and person-to-person communication. The viewpoint could be described as information logistics. This model is then combined with a more traditional building process model, consisting of phases such as design and construction. The resulting two-dimensional matrix can be used for positioning different types of generic IT-tools or construction specific applications. The model can thus provide a starting point for a discussion of the application of information and communication technology in construction and for measurements of the impacts of IT on the overall process and its related costs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Several investigators in the past have used the radiance depression (with respect to clear-sky infrared radiance), resulting from the presence of mineral dust aerosols in the atmosphere, as an index of dust aerosol load in the atmosphere during local noon. Here, we have used a modified approach to retrieve dust index during night since assessment of diurnal average infrared dust forcing essentially requires information on dust aerosols during night. For this purpose, we used infrared radiance (10.5-12.5 mu m), acquired from the METEOSAT-5 satellite (similar to 5 km resolution). We found that the `dust index' algorithm, valid for daytime, will no longer hold during the night because dust is then hotter than the theoretical dust-free reference. Hence we followed a `minimum reference' approach instead of a conventional `maximum reference' approach. A detailed analysis suggests that the maximum dust load occurs during the daytime. Over the desert regions of India and Africa, maximum change in dust load is as much as a factor of four between day and night and factor of two variations are commonly observed. By realizing the consequent impact on long wave dust forcing, sensitivity studies were carried out, which indicate that utilizing day time data for estimating the diurnally averaged long-wave dust radiative forcing results in significant errors (as much as 50 to 70%). Annually and regionally averaged long wave dust radiative forcing (which account for the diurnal variation of dust) at the top of the atmosphere over Afro-Asian region is 2.6 +/- 1.8 W m(-2), which is 30 to 50% lower than those reported earlier. Our studies indicate that neglecting diurnal variation of dust while assessing its radiative impact leads to an overestimation of dust radiative forcing, which in turn result in underestimation of the radiative impact of anthropogenic aerosols.