969 resultados para on-disk data layout
Resumo:
The increasing amount of available semistructured data demands efficient mechanisms to store, process, and search an enormous corpus of data to encourage its global adoption. Current techniques to store semistructured documents either map them to relational databases, or use a combination of flat files and indexes. These two approaches result in a mismatch between the tree-structure of semistructured data and the access characteristics of the underlying storage devices. Furthermore, the inefficiency of XML parsing methods has slowed down the large-scale adoption of XML into actual system implementations. The recent development of lazy parsing techniques is a major step towards improving this situation, but lazy parsers still have significant drawbacks that undermine the massive adoption of XML. ^ Once the processing (storage and parsing) issues for semistructured data have been addressed, another key challenge to leverage semistructured data is to perform effective information discovery on such data. Previous works have addressed this problem in a generic (i.e. domain independent) way, but this process can be improved if knowledge about the specific domain is taken into consideration. ^ This dissertation had two general goals: The first goal was to devise novel techniques to efficiently store and process semistructured documents. This goal had two specific aims: We proposed a method for storing semistructured documents that maps the physical characteristics of the documents to the geometrical layout of hard drives. We developed a Double-Lazy Parser for semistructured documents which introduces lazy behavior in both the pre-parsing and progressive parsing phases of the standard Document Object Model’s parsing mechanism. ^ The second goal was to construct a user-friendly and efficient engine for performing Information Discovery over domain-specific semistructured documents. This goal also had two aims: We presented a framework that exploits the domain-specific knowledge to improve the quality of the information discovery process by incorporating domain ontologies. We also proposed meaningful evaluation metrics to compare the results of search systems over semistructured documents. ^
Resumo:
The increasing amount of available semistructured data demands efficient mechanisms to store, process, and search an enormous corpus of data to encourage its global adoption. Current techniques to store semistructured documents either map them to relational databases, or use a combination of flat files and indexes. These two approaches result in a mismatch between the tree-structure of semistructured data and the access characteristics of the underlying storage devices. Furthermore, the inefficiency of XML parsing methods has slowed down the large-scale adoption of XML into actual system implementations. The recent development of lazy parsing techniques is a major step towards improving this situation, but lazy parsers still have significant drawbacks that undermine the massive adoption of XML. Once the processing (storage and parsing) issues for semistructured data have been addressed, another key challenge to leverage semistructured data is to perform effective information discovery on such data. Previous works have addressed this problem in a generic (i.e. domain independent) way, but this process can be improved if knowledge about the specific domain is taken into consideration. This dissertation had two general goals: The first goal was to devise novel techniques to efficiently store and process semistructured documents. This goal had two specific aims: We proposed a method for storing semistructured documents that maps the physical characteristics of the documents to the geometrical layout of hard drives. We developed a Double-Lazy Parser for semistructured documents which introduces lazy behavior in both the pre-parsing and progressive parsing phases of the standard Document Object Model's parsing mechanism. The second goal was to construct a user-friendly and efficient engine for performing Information Discovery over domain-specific semistructured documents. This goal also had two aims: We presented a framework that exploits the domain-specific knowledge to improve the quality of the information discovery process by incorporating domain ontologies. We also proposed meaningful evaluation metrics to compare the results of search systems over semistructured documents.
Resumo:
Unequaled improvements in processor and I/O speeds make many applications such as databases and operating systems to be increasingly I/O bound. Many schemes such as disk caching and disk mirroring have been proposed to address the problem. In this thesis we focus only on disk mirroring. In disk mirroring, a logical disk image is maintained on two physical disks allowing a single disk failure to be transparent to application programs. Although disk mirroring improves data availability and reliability, it has two major drawbacks. First, writes are expensive because both disks must be updated. Second, load balancing during failure mode operation is poor because all requests are serviced by the surviving disk. Distorted mirrors was proposed to address the write problem and interleaved declustering to address the load balancing problem. In this thesis we perform a comparative study of these two schemes under various operating modes. In addition we also study traditional mirroring to provide a common basis for comparison.
Resumo:
This project is about retrieving data in range without allowing the server to read it, when the database is stored in the server. Basically, our goal is to build a database that allows the client to maintain the confidentiality of the data stored, despite all the data is stored in a different location from the client's hard disk. This means that all the information written on the hard disk can be easily read by another person who can do anything with it. Given that, we need to encrypt that data from eavesdroppers or other people. This is because they could sell it or log into accounts and use them for stealing money or identities. In order to achieve this, we need to encrypt the data stored in the hard drive, so that only the possessor of the key can easily read the information stored, while all the others are going to read only encrypted data. Obviously, according to that, all the data management must be done by the client, otherwise any malicious person can easily retrieve it and use it for any malicious intention. All the methods analysed here relies on encrypting data in transit. In the end of this project we analyse 2 theoretical and practical methods for the creation of the above databases and then we tests them with 3 datasets and with 10, 100 and 1000 queries. The scope of this work is to retrieve a trend that can be useful for future works based on this project.
Resumo:
This research aimed to identify the link between the layout of workspaces in offices and the design strategies for environmental comfort. Strategies surveyed were focused on the thermal, visual and luminic comfort. In this research, visual comfort is related to issues of visual integration within and between the interior and exterior of the building. This is a case study conducted at the administrative headquarters of Centro Regional Nordeste do Instituto de Pesquisas Espaciais (INPE-CRN), located in Natal/RN. The methodological strategy used was the Post-Occupancy Evaluation, which combined the survey data on the building (layout of workspaces, bioclimatic strategies adopted in the design, use of these strategies) with some techniques aimed at acquiring qualitative information related to users. The workspace layout is primordial to satisfaction and productivity of workers. Issues such as concentration, communication, privacy, personal identity, density and space efficiency, barriers (access, visual and even ventilation and lighting), among others, are associated with the layout. The environmental comfort is one of the essential elements to maintaining life quality in workplace. Moreover, it is an important factor in user`s perception of the space in which he or she are inserted. Both layout and environmental comfort issues should be collected and analyzed in the establishment phase of the programming step. That way, it is possible to get adequate answers to these questions in subsequent project phases. It was found that changes in the program that occurred over time, especially concerning persons (number and characteristics), resulted in changes in layout, generating high density and inflexible environments. It turns difficult to adjust the furniture to the occupants` requirement, including comfort needs. However, the presence of strategies for environmental quality provides comfort to spaces, ensuring that, even in situations not considered optimal, users perceive the environment in a positive way. It was found that the relationship between environmental comfort and layout takes the following forms: in changing the perception of comfort, depending on the layout of the arrangements; adjustments in layout, due to needs for comfort; and the elevation of user satisfaction and environmental quality due to the presence of strategies comfort even in situations of inadequate layout
Resumo:
Discovery Driven Analysis (DDA) is a common feature of OLAP technology to analyze structured data. In essence, DDA helps analysts to discover anomalous data by highlighting 'unexpected' values in the OLAP cube. By giving indications to the analyst on what dimensions to explore, DDA speeds up the process of discovering anomalies and their causes. However, Discovery Driven Analysis (and OLAP in general) is only applicable on structured data, such as records in databases. We propose a system to extend DDA technology to semi-structured text documents, that is, text documents with a few structured data. Our system pipeline consists of two stages: first, the text part of each document is structured around user specified dimensions, using semi-PLSA algorithm; then, we adapt DDA to these fully structured documents, thus enabling DDA on text documents. We present some applications of this system in OLAP analysis and show how scalability issues are solved. Results show that our system can handle reasonable datasets of documents, in real time, without any need for pre-computation.
Resumo:
Data leakage is a serious issue and can result in the loss of sensitive data, compromising user accounts and details, potentially affecting millions of internet users. This paper contributes to research in online security and reducing personal footprint by evaluating the levels of privacy provided by the Firefox browser. The aim of identifying conditions that would minimize data leakage and maximize data privacy is addressed by assessing and comparing data leakage in the four possible browsing modes: normal and private modes using a browser installed on the host PC or using a portable browser from a connected USB device respectively. To provide a firm foundation for analysis, a series of carefully designed, pre-planned browsing sessions were repeated in each of the various modes of Firefox. This included low RAM environments to determine any effects low RAM may have on browser data leakage. The results show that considerable data leakage may occur within Firefox. In normal mode, all of the browsing information is stored within the Mozilla profile folder in Firefox-specific SQLite databases and sessionstore.js. While passwords were not stored as plain text, other confidential information such as credit card numbers could be recovered from the Form history under certain conditions. There is no difference when using a portable browser in normal mode, except that the Mozilla profile folder is located on the USB device rather than the host's hard disk. By comparison, private browsing reduces data leakage. Our findings confirm that no information is written to the Firefox-related locations on the hard disk or USB device during private browsing, implying that no deletion would be necessary and no remnants of data would be forensically recoverable from unallocated space. However, two aspects of data leakage occurred equally in all four browsing modes. Firstly, all of the browsing history was stored in the live RAM and was therefore accessible while the browser remained open. Secondly, in low RAM situations, the operating system caches out RAM to pagefile.sys on the host's hard disk. Irrespective of the browsing mode used, this may include Firefox history elements which can then remain forensically recoverable for considerable time.
Resumo:
This research aimed to identify the link between the layout of workspaces in offices and the design strategies for environmental comfort. Strategies surveyed were focused on the thermal, visual and luminic comfort. In this research, visual comfort is related to issues of visual integration within and between the interior and exterior of the building. This is a case study conducted at the administrative headquarters of Centro Regional Nordeste do Instituto de Pesquisas Espaciais (INPE-CRN), located in Natal/RN. The methodological strategy used was the Post-Occupancy Evaluation, which combined the survey data on the building (layout of workspaces, bioclimatic strategies adopted in the design, use of these strategies) with some techniques aimed at acquiring qualitative information related to users. The workspace layout is primordial to satisfaction and productivity of workers. Issues such as concentration, communication, privacy, personal identity, density and space efficiency, barriers (access, visual and even ventilation and lighting), among others, are associated with the layout. The environmental comfort is one of the essential elements to maintaining life quality in workplace. Moreover, it is an important factor in user`s perception of the space in which he or she are inserted. Both layout and environmental comfort issues should be collected and analyzed in the establishment phase of the programming step. That way, it is possible to get adequate answers to these questions in subsequent project phases. It was found that changes in the program that occurred over time, especially concerning persons (number and characteristics), resulted in changes in layout, generating high density and inflexible environments. It turns difficult to adjust the furniture to the occupants` requirement, including comfort needs. However, the presence of strategies for environmental quality provides comfort to spaces, ensuring that, even in situations not considered optimal, users perceive the environment in a positive way. It was found that the relationship between environmental comfort and layout takes the following forms: in changing the perception of comfort, depending on the layout of the arrangements; adjustments in layout, due to needs for comfort; and the elevation of user satisfaction and environmental quality due to the presence of strategies comfort even in situations of inadequate layout
Resumo:
This thesis presents a study of the Grid data access patterns in distributed analysis in the CMS experiment at the LHC accelerator. This study ranges from the deep analysis of the historical patterns of access to the most relevant data types in CMS, to the exploitation of a supervised Machine Learning classification system to set-up a machinery able to eventually predict future data access patterns - i.e. the so-called dataset “popularity” of the CMS datasets on the Grid - with focus on specific data types. All the CMS workflows run on the Worldwide LHC Computing Grid (WCG) computing centers (Tiers), and in particular the distributed analysis systems sustains hundreds of users and applications submitted every day. These applications (or “jobs”) access different data types hosted on disk storage systems at a large set of WLCG Tiers. The detailed study of how this data is accessed, in terms of data types, hosting Tiers, and different time periods, allows to gain precious insight on storage occupancy over time and different access patterns, and ultimately to extract suggested actions based on this information (e.g. targetted disk clean-up and/or data replication). In this sense, the application of Machine Learning techniques allows to learn from past data and to gain predictability potential for the future CMS data access patterns. Chapter 1 provides an introduction to High Energy Physics at the LHC. Chapter 2 describes the CMS Computing Model, with special focus on the data management sector, also discussing the concept of dataset popularity. Chapter 3 describes the study of CMS data access patterns with different depth levels. Chapter 4 offers a brief introduction to basic machine learning concepts and gives an introduction to its application in CMS and discuss the results obtained by using this approach in the context of this thesis.
Resumo:
Near-infrared polarimetry observation is a powerful tool to study the central sources at the center of the Milky Way. My aim of this thesis is to analyze the polarized emission present in the central few light years of the Galactic Center region, in particular the non-thermal polarized emission of Sagittarius~A* (Sgr~A*), the electromagnetic manifestation of the super-massive black hole, and the polarized emission of an infrared-excess source in the literature referred to as DSO/G2. This source is in orbit about Sgr~A*. In this thesis I focus onto the Galactic Center observations at $\lambda=2.2~\mu m$ ($K_\mathrm{s}$-band) in polarimetry mode during several epochs from 2004 to 2012. The near-infrared polarized observations have been carried out using the adaptive optics instrument NAOS/CONICA and Wollaston prism at the Very Large Telescope of ESO (European Southern Observatory). Linear polarization at 2.2 $\mu m$, its flux statistics and time variation, can be used to constrain the physical conditions of the accretion process onto the central super-massive black hole. I present a statistical analysis of polarized $K_\mathrm{s}$-band emission from Sgr~A* and investigate the most comprehensive sample of near-infrared polarimetric light curves of this source up to now. I find several polarized flux excursions during the years and obtain an exponent of about 4 for the power-law fitted to polarized flux density distribution of fluxes above 5~mJy. Therefore, this distribution is closely linked to the single state power-law distribution of the total $K_\mathrm{s}$-band flux densities reported earlier by us. I find polarization degrees of the order of 20\%$\pm$10\% and a preferred polarization angle of $13^o\pm15^o$. Based on simulations of polarimetric measurements given the observed flux density and its uncertainty in orthogonal polarimetry channels, I find that the uncertainties of polarization parameters under a total flux density of $\sim 2\,{\mathrm{mJy}}$ are probably dominated by observational uncertainties. At higher flux densities there are intrinsic variations of polarization degree and angle within rather well constrained ranges. Since the emission is most likely due to optically thin synchrotron radiation, the obtained preferred polarization angle is very likely reflecting the intrinsic orientation of the Sgr~A* system i.e. an accretion disk or jet/wind scenario coupled to the super-massive black hole. Our polarization statistics show that Sgr~A* must be a stable system, both in terms of geometry, and the accretion process. I also investigate an infrared-excess source called G2 or Dusty S-cluster Object (DSO) moving on a highly eccentric orbit around the Galaxy's central black hole, Sgr~A*. I use for the first time the near-infrared polarimetric imaging data to determine the nature and the properties of DSO and obtain an improved $K_\mathrm{s}$-band identification of this source in median polarimetry images of different observing years. The source starts to deviate from the stellar confusion in 2008 data and it does not show a flux density variability based on our data set. Furthermore, I measure the polarization degree and angle of this source and conclude based on the simulations on polarization parameters that it is an intrinsically polarized source with a varying polarization angle as it approaches Sgr~A* position. I use the interpretation of the DSO polarimetry measurements to assess its possible properties.
Resumo:
To identify the prevalence and the severity of malocclusions and to analyze factors associated with the need for orthodontic treatment of Brazilian adolescents. This exploratory, cross-sectional study was carried out based on secondary data from the national epidemiological survey on oral health in Brazil (2002-2003). Socio-demographic conditions, self-perception, and the existence and degree of malocclusion, using the Dental Aesthetic Index, were evaluated in 16,833 adolescent Brazilians selected by probabilistic sample by conglomerates. The dependent variable - need orthodontic treatment - was estimated from the severity of malocclusion. The magnitude and direction of the association in bivariate and multivariate analyzes from a Robust Poisson regression was estimated RESULTS: The majority of the adolescents needed orthodontic treatment (53.2%). In the multivariate analysis, the prevalence of the need for orthodontic treatment was larger among females, non-whites, those that perceived a need for treatment, and those that perceived their appearance as normal, bad, or very bad. The need for orthodontic treatment was smaller among those that lived in the Northeast and Central West macro-regions compared to those living in Southeast Brazil and it was also smaller among those that perceived their chewing to be normal or their oral health to be bad or very bad. There was a high prevalence of orthodontic treatment need among adolescents in Brazil and this need was associated with demographic and subjective issues. The high prevalence of orthodontic needs in adolescents is a challenge to the goals of Brazil's universal public health system.
Resumo:
Context. B[e] supergiants are luminous, massive post-main sequence stars exhibiting non-spherical winds, forbidden lines, and hot dust in a disc-like structure. The physical properties of their rich and complex circumstellar environment (CSE) are not well understood, partly because these CSE cannot be easily resolved at the large distances found for B[e] supergiants (typically greater than or similar to 1 kpc). Aims. From mid-IR spectro-interferometric observations obtained with VLTI/MIDI we seek to resolve and study the CSE of the Galactic B[e] supergiant CPD-57 degrees 2874. Methods. For a physical interpretation of the observables (visibilities and spectrum) we use our ray-tracing radiative transfer code (FRACS), which is optimised for thermal spectro-interferometric observations. Results. Thanks to the short computing time required by FRACS (<10 s per monochromatic model), best-fit parameters and uncertainties for several physical quantities of CPD-57 degrees 2874 were obtained, such as inner dust radius, relative flux contribution of the central source and of the dusty CSE, dust temperature profile, and disc inclination. Conclusions. The analysis of VLTI/MIDI data with FRACS allowed one of the first direct determinations of physical parameters of the dusty CSE of a B[e] supergiant based on interferometric data and using a full model-fitting approach. In a larger context, the study of B[e] supergiants is important for a deeper understanding of the complex structure and evolution of hot, massive stars.
Resumo:
In geophysics and seismology, raw data need to be processed to generate useful information that can be turned into knowledge by researchers. The number of sensors that are acquiring raw data is increasing rapidly. Without good data management systems, more time can be spent in querying and preparing datasets for analyses than in acquiring raw data. Also, a lot of good quality data acquired at great effort can be lost forever if they are not correctly stored. Local and international cooperation will probably be reduced, and a lot of data will never become scientific knowledge. For this reason, the Seismological Laboratory of the Institute of Astronomy, Geophysics and Atmospheric Sciences at the University of Sao Paulo (IAG-USP) has concentrated fully on its data management system. This report describes the efforts of the IAG-USP to set up a seismology data management system to facilitate local and international cooperation.
Resumo:
Objective: The aim of this study was the evaluation of two different photosensitizers activated by red light emitted by light-emitting diodes (LEDs) in the decontamination of carious bovine dentin. Materials and Methods: Fifteen bovine incisors were used to obtain dentin samples which were immersed in brain-heart infusion culture medium supplemented with 1% glucose, 2% sucrose, and 1% young primary culture of Lactobacillus acidophilus 108 CFU/mL and Streptococcus mutans 108 CFU/mL for caries induction. Three different concentrations of the Photogem solution, a hematoporphyrin derivative (1, 2, and 3 mg/mL) and two different concentrations of toluidine blue O (TBO), a basic dye (0.025 and 0.1 mg/mL) were used. To activate the photosensitizers two different light exposure times were used: 60 sec and 120 sec, corresponding respectively to the doses of 24 J/cm(2) and 48 J/cm(2). Results: After counting the numbers of CFU per milligram of carious dentin, we observed that the use of LED energy in association with Photogem or TBO was effective for bacterial reduction in carious dentin, and that the greatest effect on S. mutans and L. acidophilus was obtained with TBO at 0.1 mg/mL and a dose of 48 J/cm(2). It was also observed that the overall toxicity of TBO was higher than that of Photogem, and that the phototoxicity of TBO was higher than that of Photogem. Conclusion: Based on our data we propose a mathematical model for the photodynamic effect when different photosensitizer concentrations and light doses are used.