977 resultados para database systems


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the past five years, XML has been embraced by both the research and industrial community due to its promising prospects as a new data representation and exchange format on the Internet. The widespread popularity of XML creates an increasing need to store XML data in persistent storage systems and to enable sophisticated XML queries over the data. The currently available approaches to addressing the XML storage and retrieval issue have the limitations of either being not mature enough (e.g. native approaches) or causing inflexibility, a lot of fragmentation and excessive join operations (e.g. non-native approaches such as the relational database approach). ^ In this dissertation, I studied the issue of storing and retrieving XML data using the Semantic Binary Object-Oriented Database System (Sem-ODB) to leverage the advanced Sem-ODB technology with the emerging XML data model. First, a meta-schema based approach was implemented to address the data model mismatch issue that is inherent in the non-native approaches. The meta-schema based approach captures the meta-data of both Document Type Definitions (DTDs) and Sem-ODB Semantic Schemas, thus enables a dynamic and flexible mapping scheme. Second, a formal framework was presented to ensure precise and concise mappings. In this framework, both schemas and the conversions between them are formally defined and described. Third, after major features of an XML query language, XQuery, were analyzed, a high-level XQuery to Semantic SQL (Sem-SQL) query translation scheme was described. This translation scheme takes advantage of the navigation-oriented query paradigm of the Sem-SQL, thus avoids the excessive join problem of relational approaches. Finally, the modeling capability of the Semantic Binary Object-Oriented Data Model (Sem-ODM) was explored from the perspective of conceptually modeling an XML Schema using a Semantic Schema. ^ It was revealed that the advanced features of the Sem-ODB, such as multi-valued attributes, surrogates, the navigation-oriented query paradigm, among others, are indeed beneficial in coping with the XML storage and retrieval issue using a non-XML approach. Furthermore, extensions to the Sem-ODB to make it work more effectively with XML data were also proposed. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This poster presentation from the May 2015 Florida Library Association Conference, along with the Everglades Explorer discovery portal at http://ee.fiu.edu, demonstrates how traditional bibliographic and curatorial principles can be applied to: 1) selection, cross-walking and aggregation of metadata linking end-users to wide-spread digital resources from multiple silos; 2) harvesting of select PDFs, HTML and media for web archiving and access; 3) selection of CMS domains, sub-domains and folders for targeted searching using an API. Choosing content for this discovery portal is comparable to past scholarly practice of creating and publishing subject bibliographies, except metadata and data are housed in relational databases. This new and yet traditional capacity coincides with: Growth of bibliographic utilities (MarcEdit); Evolution of open-source discovery systems (eXtensible Catalog); Development of target-capable web crawling and archiving systems (Archive-it); and specialized search APIs (Google). At the same time, historical and technical changes – specifically the increasing fluidity and re-purposing of syndicated metadata – make this possible. It equally stems from the expansion of freely accessible digitized legacy and born-digital resources. Innovation principles helped frame the process by which the thematic Everglades discovery portal was created at Florida International University. The path -- to providing for more effective searching and co-location of digital scientific, educational and historical material related to the Everglades -- is contextualized through five concepts found within Dyer and Christensen’s “The Innovator’s DNA: Mastering the five skills of disruptive innovators (2011). The project also aligns with Ranganathan’s Laws of Library Science, especially the 4th Law -- to "save the time of the user.”

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Database design is a difficult problem for non-expert designers. It is desirable to assist such designers during the problem solving process by means of a knowledge based (KB) system. Although a number of prototype KB systems have been proposed, there are many shortcomings. Firstly, few have incorporated sufficient expertise in modeling relationships, particularly higher order relationships. Secondly, there does not seem to be any published empirical study that experimentally tested the effectiveness of any of these KB tools. Thirdly, problem solving behavior of non-experts, whom the systems were intended to assist, has not been one of the bases for system design. In this project, a consulting system, called CODA, for conceptual database design that addresses the above short comings was developed and empirically validated. More specifically, the CODA system incorporates (a) findings on why non-experts commit errors and (b) heuristics for modeling relationships. Two approaches to knowledge base implementation were used and compared in this project, namely system restrictiveness and decisional guidance (Silver 1990). The Restrictive system uses a proscriptive approach and limits the designer's choices at various design phases by forcing him/her to follow a specific design path. The Guidance system approach, which is less restrictive, involves providing context specific, informative and suggestive guidance throughout the design process. Both the approaches would prevent erroneous design decisions. The main objectives of the study are to evaluate (1) whether the knowledge-based system is more effective than the system without a knowledge-base and (2) which approach to knowledge implementation - whether Restrictive or Guidance - is more effective. To evaluate the effectiveness of the knowledge base itself, the systems were compared with a system that does not incorporate the expertise (Control). An experimental procedure using student subjects was used to test the effectiveness of the systems. The subjects solved a task without using the system (pre-treatment task) and another task using one of the three systems, viz. Control, Guidance or Restrictive (experimental task). Analysis of experimental task scores of those subjects who performed satisfactorily in the pre-treatment task revealed that the knowledge based approach to database design support lead to more accurate solutions than the control system. Among the two KB approaches, Guidance approach was found to lead to better performance when compared to the Control system. It was found that the subjects perceived the Restrictive system easier to use than the Guidance system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

X-ray computed tomography (CT) imaging constitutes one of the most widely used diagnostic tools in radiology today with nearly 85 million CT examinations performed in the U.S in 2011. CT imparts a relatively high amount of radiation dose to the patient compared to other x-ray imaging modalities and as a result of this fact, coupled with its popularity, CT is currently the single largest source of medical radiation exposure to the U.S. population. For this reason, there is a critical need to optimize CT examinations such that the dose is minimized while the quality of the CT images is not degraded. This optimization can be difficult to achieve due to the relationship between dose and image quality. All things being held equal, reducing the dose degrades image quality and can impact the diagnostic value of the CT examination.

A recent push from the medical and scientific community towards using lower doses has spawned new dose reduction technologies such as automatic exposure control (i.e., tube current modulation) and iterative reconstruction algorithms. In theory, these technologies could allow for scanning at reduced doses while maintaining the image quality of the exam at an acceptable level. Therefore, there is a scientific need to establish the dose reduction potential of these new technologies in an objective and rigorous manner. Establishing these dose reduction potentials requires precise and clinically relevant metrics of CT image quality, as well as practical and efficient methodologies to measure such metrics on real CT systems. The currently established methodologies for assessing CT image quality are not appropriate to assess modern CT scanners that have implemented those aforementioned dose reduction technologies.

Thus the purpose of this doctoral project was to develop, assess, and implement new phantoms, image quality metrics, analysis techniques, and modeling tools that are appropriate for image quality assessment of modern clinical CT systems. The project developed image quality assessment methods in the context of three distinct paradigms, (a) uniform phantoms, (b) textured phantoms, and (c) clinical images.

The work in this dissertation used the “task-based” definition of image quality. That is, image quality was broadly defined as the effectiveness by which an image can be used for its intended task. Under this definition, any assessment of image quality requires three components: (1) A well defined imaging task (e.g., detection of subtle lesions), (2) an “observer” to perform the task (e.g., a radiologists or a detection algorithm), and (3) a way to measure the observer’s performance in completing the task at hand (e.g., detection sensitivity/specificity).

First, this task-based image quality paradigm was implemented using a novel multi-sized phantom platform (with uniform background) developed specifically to assess modern CT systems (Mercury Phantom, v3.0, Duke University). A comprehensive evaluation was performed on a state-of-the-art CT system (SOMATOM Definition Force, Siemens Healthcare) in terms of noise, resolution, and detectability as a function of patient size, dose, tube energy (i.e., kVp), automatic exposure control, and reconstruction algorithm (i.e., Filtered Back-Projection– FPB vs Advanced Modeled Iterative Reconstruction– ADMIRE). A mathematical observer model (i.e., computer detection algorithm) was implemented and used as the basis of image quality comparisons. It was found that image quality increased with increasing dose and decreasing phantom size. The CT system exhibited nonlinear noise and resolution properties, especially at very low-doses, large phantom sizes, and for low-contrast objects. Objective image quality metrics generally increased with increasing dose and ADMIRE strength, and with decreasing phantom size. The ADMIRE algorithm could offer comparable image quality at reduced doses or improved image quality at the same dose (increase in detectability index by up to 163% depending on iterative strength). The use of automatic exposure control resulted in more consistent image quality with changing phantom size.

Based on those results, the dose reduction potential of ADMIRE was further assessed specifically for the task of detecting small (<=6 mm) low-contrast (<=20 HU) lesions. A new low-contrast detectability phantom (with uniform background) was designed and fabricated using a multi-material 3D printer. The phantom was imaged at multiple dose levels and images were reconstructed with FBP and ADMIRE. Human perception experiments were performed to measure the detection accuracy from FBP and ADMIRE images. It was found that ADMIRE had equivalent performance to FBP at 56% less dose.

Using the same image data as the previous study, a number of different mathematical observer models were implemented to assess which models would result in image quality metrics that best correlated with human detection performance. The models included naïve simple metrics of image quality such as contrast-to-noise ratio (CNR) and more sophisticated observer models such as the non-prewhitening matched filter observer model family and the channelized Hotelling observer model family. It was found that non-prewhitening matched filter observers and the channelized Hotelling observers both correlated strongly with human performance. Conversely, CNR was found to not correlate strongly with human performance, especially when comparing different reconstruction algorithms.

The uniform background phantoms used in the previous studies provided a good first-order approximation of image quality. However, due to their simplicity and due to the complexity of iterative reconstruction algorithms, it is possible that such phantoms are not fully adequate to assess the clinical impact of iterative algorithms because patient images obviously do not have smooth uniform backgrounds. To test this hypothesis, two textured phantoms (classified as gross texture and fine texture) and a uniform phantom of similar size were built and imaged on a SOMATOM Flash scanner (Siemens Healthcare). Images were reconstructed using FBP and a Sinogram Affirmed Iterative Reconstruction (SAFIRE). Using an image subtraction technique, quantum noise was measured in all images of each phantom. It was found that in FBP, the noise was independent of the background (textured vs uniform). However, for SAFIRE, noise increased by up to 44% in the textured phantoms compared to the uniform phantom. As a result, the noise reduction from SAFIRE was found to be up to 66% in the uniform phantom but as low as 29% in the textured phantoms. Based on this result, it clear that further investigation was needed into to understand the impact that background texture has on image quality when iterative reconstruction algorithms are used.

To further investigate this phenomenon with more realistic textures, two anthropomorphic textured phantoms were designed to mimic lung vasculature and fatty soft tissue texture. The phantoms (along with a corresponding uniform phantom) were fabricated with a multi-material 3D printer and imaged on the SOMATOM Flash scanner. Scans were repeated a total of 50 times in order to get ensemble statistics of the noise. A novel method of estimating the noise power spectrum (NPS) from irregularly shaped ROIs was developed. It was found that SAFIRE images had highly locally non-stationary noise patterns with pixels near edges having higher noise than pixels in more uniform regions. Compared to FBP, SAFIRE images had 60% less noise on average in uniform regions for edge pixels, noise was between 20% higher and 40% lower. The noise texture (i.e., NPS) was also highly dependent on the background texture for SAFIRE. Therefore, it was concluded that quantum noise properties in the uniform phantoms are not representative of those in patients for iterative reconstruction algorithms and texture should be considered when assessing image quality of iterative algorithms.

The move beyond just assessing noise properties in textured phantoms towards assessing detectability, a series of new phantoms were designed specifically to measure low-contrast detectability in the presence of background texture. The textures used were optimized to match the texture in the liver regions actual patient CT images using a genetic algorithm. The so called “Clustured Lumpy Background” texture synthesis framework was used to generate the modeled texture. Three textured phantoms and a corresponding uniform phantom were fabricated with a multi-material 3D printer and imaged on the SOMATOM Flash scanner. Images were reconstructed with FBP and SAFIRE and analyzed using a multi-slice channelized Hotelling observer to measure detectability and the dose reduction potential of SAFIRE based on the uniform and textured phantoms. It was found that at the same dose, the improvement in detectability from SAFIRE (compared to FBP) was higher when measured in a uniform phantom compared to textured phantoms.

The final trajectory of this project aimed at developing methods to mathematically model lesions, as a means to help assess image quality directly from patient images. The mathematical modeling framework is first presented. The models describe a lesion’s morphology in terms of size, shape, contrast, and edge profile as an analytical equation. The models can be voxelized and inserted into patient images to create so-called “hybrid” images. These hybrid images can then be used to assess detectability or estimability with the advantage that the ground truth of the lesion morphology and location is known exactly. Based on this framework, a series of liver lesions, lung nodules, and kidney stones were modeled based on images of real lesions. The lesion models were virtually inserted into patient images to create a database of hybrid images to go along with the original database of real lesion images. ROI images from each database were assessed by radiologists in a blinded fashion to determine the realism of the hybrid images. It was found that the radiologists could not readily distinguish between real and virtual lesion images (area under the ROC curve was 0.55). This study provided evidence that the proposed mathematical lesion modeling framework could produce reasonably realistic lesion images.

Based on that result, two studies were conducted which demonstrated the utility of the lesion models. The first study used the modeling framework as a measurement tool to determine how dose and reconstruction algorithm affected the quantitative analysis of liver lesions, lung nodules, and renal stones in terms of their size, shape, attenuation, edge profile, and texture features. The same database of real lesion images used in the previous study was used for this study. That database contained images of the same patient at 2 dose levels (50% and 100%) along with 3 reconstruction algorithms from a GE 750HD CT system (GE Healthcare). The algorithms in question were FBP, Adaptive Statistical Iterative Reconstruction (ASiR), and Model-Based Iterative Reconstruction (MBIR). A total of 23 quantitative features were extracted from the lesions under each condition. It was found that both dose and reconstruction algorithm had a statistically significant effect on the feature measurements. In particular, radiation dose affected five, three, and four of the 23 features (related to lesion size, conspicuity, and pixel-value distribution) for liver lesions, lung nodules, and renal stones, respectively. MBIR significantly affected 9, 11, and 15 of the 23 features (including size, attenuation, and texture features) for liver lesions, lung nodules, and renal stones, respectively. Lesion texture was not significantly affected by radiation dose.

The second study demonstrating the utility of the lesion modeling framework focused on assessing detectability of very low-contrast liver lesions in abdominal imaging. Specifically, detectability was assessed as a function of dose and reconstruction algorithm. As part of a parallel clinical trial, images from 21 patients were collected at 6 dose levels per patient on a SOMATOM Flash scanner. Subtle liver lesion models (contrast = -15 HU) were inserted into the raw projection data from the patient scans. The projections were then reconstructed with FBP and SAFIRE (strength 5). Also, lesion-less images were reconstructed. Noise, contrast, CNR, and detectability index of an observer model (non-prewhitening matched filter) were assessed. It was found that SAFIRE reduced noise by 52%, reduced contrast by 12%, increased CNR by 87%. and increased detectability index by 65% compared to FBP. Further, a 2AFC human perception experiment was performed to assess the dose reduction potential of SAFIRE, which was found to be 22% compared to the standard of care dose.

In conclusion, this dissertation provides to the scientific community a series of new methodologies, phantoms, analysis techniques, and modeling tools that can be used to rigorously assess image quality from modern CT systems. Specifically, methods to properly evaluate iterative reconstruction have been developed and are expected to aid in the safe clinical implementation of dose reduction technologies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As the world population continues to grow past seven billion people and global challenges continue to persist including resource availability, biodiversity loss, climate change and human well-being, a new science is required that can address the integrated nature of these challenges and the multiple scales on which they are manifest. Sustainability science has emerged to fill this role. In the fifteen years since it was first called for in the pages of Science, it has rapidly matured, however its place in the history of science and the way it is practiced today must be continually evaluated. In Part I, two chapters address this theoretical and practical grounding. Part II transitions to the applied practice of sustainability science in addressing the urban heat island (UHI) challenge wherein the climate of urban areas are warmer than their surrounding rural environs. The UHI has become increasingly important within the study of earth sciences given the increased focus on climate change and as the balance of humans now live in urban areas.

In Chapter 2 a novel contribution to the historical context of sustainability is argued. Sustainability as a concept characterizing the relationship between humans and nature emerged in the mid to late 20th century as a response to findings used to also characterize the Anthropocene. Emerging from the human-nature relationships that came before it, evidence is provided that suggests Sustainability was enabled by technology and a reorientation of world-view and is unique in its global boundary, systematic approach and ambition for both well being and the continued availability of resources and Earth system function. Sustainability is further an ambition that has wide appeal, making it one of the first normative concepts of the Anthropocene.

Despite its widespread emergence and adoption, sustainability science continues to suffer from definitional ambiguity within the academe. In Chapter 3, a review of efforts to provide direction and structure to the science reveals a continuum of approaches anchored at either end by differing visions of how the science interfaces with practice (solutions). At one end, basic science of societally defined problems informs decisions about possible solutions and their application. At the other end, applied research directly affects the options available to decision makers. While clear from the literature, survey data further suggests that the dichotomy does not appear to be as apparent in the minds of practitioners.

In Chapter 4, the UHI is first addressed at the synoptic, mesoscale. Urban climate is the most immediate manifestation of the warming global climate for the majority of people on earth. Nearly half of those people live in small to medium sized cities, an understudied scale in urban climate research. Widespread characterization would be useful to decision makers in planning and design. Using a multi-method approach, the mesoscale UHI in the study region is characterized and the secular trend over the last sixty years evaluated. Under isolated ideal conditions the findings indicate a UHI of 5.3 ± 0.97 °C to be present in the study area, the magnitude of which is growing over time.

Although urban heat islands (UHI) are well studied, there remain no panaceas for local scale mitigation and adaptation methods, therefore continued attention to characterization of the phenomenon in urban centers of different scales around the globe is required. In Chapter 5, a local scale analysis of the canopy layer and surface UHI in a medium sized city in North Carolina, USA is conducted using multiple methods including stationary urban sensors, mobile transects and remote sensing. Focusing on the ideal conditions for UHI development during an anticyclonic summer heat event, the study observes a range of UHI intensity depending on the method of observation: 8.7 °C from the stationary urban sensors; 6.9 °C from mobile transects; and, 2.2 °C from remote sensing. Additional attention is paid to the diurnal dynamics of the UHI and its correlation with vegetation indices, dewpoint and albedo. Evapotranspiration is shown to drive dynamics in the study region.

Finally, recognizing that a bridge must be established between the physical science community studying the Urban Heat Island (UHI) effect, and the planning community and decision makers implementing urban form and development policies, Chapter 6 evaluates multiple urban form characterization methods. Methods evaluated include local climate zones (LCZ), national land cover database (NCLD) classes and urban cluster analysis (UCA) to determine their utility in describing the distribution of the UHI based on three standard observation types 1) fixed urban temperature sensors, 2) mobile transects and, 3) remote sensing. Bivariate, regression and ANOVA tests are used to conduct the analyses. Findings indicate that the NLCD classes are best correlated to the UHI intensity and distribution in the study area. Further, while the UCA method is not useful directly, the variables included in the method are predictive based on regression analysis so the potential for better model design exists. Land cover variables including albedo, impervious surface fraction and pervious surface fraction are found to dominate the distribution of the UHI in the study area regardless of observation method.

Chapter 7 provides a summary of findings, and offers a brief analysis of their implications for both the scientific discourse generally, and the study area specifically. In general, the work undertaken does not achieve the full ambition of sustainability science, additional work is required to translate findings to practice and more fully evaluate adoption. The implications for planning and development in the local region are addressed in the context of a major light-rail infrastructure project including several systems level considerations like human health and development. Finally, several avenues for future work are outlined. Within the theoretical development of sustainability science, these pathways include more robust evaluations of the theoretical and actual practice. Within the UHI context, these include development of an integrated urban form characterization model, application of study methodology in other geographic areas and at different scales, and use of novel experimental methods including distributed sensor networks and citizen science.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The section of CN railway between Vancouver and Kamloops runs along the base of many hazardous slopes, including the White Canyon, which is located just outside the town of Lytton, BC. The slope has a history of frequent rockfall activity, which presents a hazard to the railway below. Rockfall inventories can be used to understand the frequency-magnitude relationship of events on hazardous slopes, however it can be difficult to consistently and accurately identify rockfall source zones and volumes on large slopes with frequent activity, leaving many inventories incomplete. We have studied this slope as a part of the Canadian Railway Ground Hazard Research Program and have collected remote sensing data, including terrestrial laser scanning (TLS), photographs, and photogrammetry data since 2012, and used change detection to identify rockfalls on the slope. The objective of this thesis is to use a subset of this data to understand how rockfalls identified from TLS data could be used to understand the frequency-magnitude relationship of rockfalls on the slope. This includes incorporating both new and existing methods to develop a semi-automated workflow to extract rockfall events from the TLS data. We show that these methods can be used to identify events as small as 0.01 m3 and that the duration between scans can have an effect on the frequency-magnitude relationship of the rockfalls. We also show that by incorporating photogrammetry data into our analysis, we can create a 3D geological model of the slope and use this to classify rockfalls by lithology, to further understand the rockfall failure patterns. When relating the rockfall activity to triggering factors, we found that the amount of precipitation occurring over the winter has an effect on the overall rockfall frequency for the remainder of the year. These results can provide the railways with a more complete inventory of events compared to records created through track inspection, or rockfall monitoring systems that are installed on the slope. In addition, we can use the database to understand the spatial and temporal distribution of events. The results can also be used as an input to rockfall modelling programs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modern software applications are becoming more dependent on database management systems (DBMSs). DBMSs are usually used as black boxes by software developers. For example, Object-Relational Mapping (ORM) is one of the most popular database abstraction approaches that developers use nowadays. Using ORM, objects in Object-Oriented languages are mapped to records in the database, and object manipulations are automatically translated to SQL queries. As a result of such conceptual abstraction, developers do not need deep knowledge of databases; however, all too often this abstraction leads to inefficient and incorrect database access code. Thus, this thesis proposes a series of approaches to improve the performance of database-centric software applications that are implemented using ORM. Our approaches focus on troubleshooting and detecting inefficient (i.e., performance problems) database accesses in the source code, and we rank the detected problems based on their severity. We first conduct an empirical study on the maintenance of ORM code in both open source and industrial applications. We find that ORM performance-related configurations are rarely tuned in practice, and there is a need for tools that can help improve/tune the performance of ORM-based applications. Thus, we propose approaches along two dimensions to help developers improve the performance of ORM-based applications: 1) helping developers write more performant ORM code; and 2) helping developers configure ORM configurations. To provide tooling support to developers, we first propose static analysis approaches to detect performance anti-patterns in the source code. We automatically rank the detected anti-pattern instances according to their performance impacts. Our study finds that by resolving the detected anti-patterns, the application performance can be improved by 34% on average. We then discuss our experience and lessons learned when integrating our anti-pattern detection tool into industrial practice. We hope our experience can help improve the industrial adoption of future research tools. However, as static analysis approaches are prone to false positives and lack runtime information, we also propose dynamic analysis approaches to further help developers improve the performance of their database access code. We propose automated approaches to detect redundant data access anti-patterns in the database access code, and our study finds that resolving such redundant data access anti-patterns can improve application performance by an average of 17%. Finally, we propose an automated approach to tune performance-related ORM configurations using both static and dynamic analysis. Our study shows that our approach can help improve application throughput by 27--138%. Through our case studies on real-world applications, we show that all of our proposed approaches can provide valuable support to developers and help improve application performance significantly.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objective of this research was to develop a methodology for transforming and dynamically segmenting data. Dynamic segmentation enables transportation system attributes and associated data to be stored in separate tables and merged when a specific query requires a particular set of data to be considered. A major benefit of dynamic segmentation is that individual tables can be more easily updated when attributes, performance characteristics, or usage patterns change over time. Applications of a progressive geographic database referencing system in transportation planning are vast. Summaries of system condition and performance can be made, and analyses of specific portions of a road system are facilitated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thesis (Master's)--University of Washington, 2016-08

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper the problem of the evolution of an object-oriented database in the context of orthogonal persistent programming systems is addressed. We have observed two characteristics in that type of systems that offer particular conditions to implement the evolution in a semi-transparent fashion. That transparency can further be enhanced with the obliviousness provided by the Aspect-Oriented Programming techniques. Was conceived a meta-model and developed a prototype to test the feasibility of our approach. The system allows programs, written to a schema, access semi-transparently to data in other versions of the schema.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fault tolerance allows a system to remain operational to some degree when some of its components fail. One of the most common fault tolerance mechanisms consists on logging the system state periodically, and recovering the system to a consistent state in the event of a failure. This paper describes a general fault tolerance logging-based mechanism, which can be layered over deterministic systems. Our proposal describes how a logging mechanism can recover the underlying system to a consistent state, even if an action or set of actions were interrupted mid-way, due to a server crash. We also propose different methods of storing the logging information, and describe how to deploy a fault tolerant master-slave cluster for information replication. We adapt our model to a previously proposed framework, which provided common relational features, like transactions with atomic, consistent, isolated and durable properties, to NoSQL database management systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In database applications, access control security layers are mostly developed from tools provided by vendors of database management systems and deployed in the same servers containing the data to be protected. This solution conveys several drawbacks. Among them we emphasize: 1) if policies are complex, their enforcement can lead to performance decay of database servers; 2) when modifications in the established policies implies modifications in the business logic (usually deployed at the client-side), there is no other possibility than modify the business logic in advance and, finally, 3) malicious users can issue CRUD expressions systematically against the DBMS expecting to identify any security gap. In order to overcome these drawbacks, in this paper we propose an access control stack characterized by: most of the mechanisms are deployed at the client-side; whenever security policies evolve, the security mechanisms are automatically updated at runtime and, finally, client-side applications do not handle CRUD expressions directly. We also present an implementation of the proposed stack to prove its feasibility. This paper presents a new approach to enforce access control in database applications, this way expecting to contribute positively to the state of the art in the field.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In database applications, access control security layers are mostly developed from tools provided by vendors of database management systems and deployed in the same servers containing the data to be protected. This solution conveys several drawbacks. Among them we emphasize: (1) if policies are complex, their enforcement can lead to performance decay of database servers; (2) when modifications in the established policies implies modifications in the business logic (usually deployed at the client-side), there is no other possibility than modify the business logic in advance and, finally, 3) malicious users can issue CRUD expressions systematically against the DBMS expecting to identify any security gap. In order to overcome these drawbacks, in this paper we propose an access control stack characterized by: most of the mechanisms are deployed at the client-side; whenever security policies evolve, the security mechanisms are automatically updated at runtime and, finally, client-side applications do not handle CRUD expressions directly. We also present an implementation of the proposed stack to prove its feasibility. This paper presents a new approach to enforce access control in database applications, this way expecting to contribute positively to the state of the art in the field.