997 resultados para Data Warehousing
Resumo:
Current commercial and academic OLAP tools do not process XML data that contains XLink. Aiming at overcoming this issue, this paper proposes an analytical system composed by LMDQL, an analytical query language. Also, the XLDM metamodel is given to model cubes of XML documents with XLink and to deal with syntactic, semantic and structural heterogeneities commonly found in XML documents. As current W3C query languages for navigating in XML documents do not support XLink, XLPath is discussed in this article to provide features for the LMDQL query processing. A prototype system enabling the analytical processing of XML documents that use XLink is also detailed. This prototype includes a driver, named sql2xquery, which performs the mapping of SQL queries into XQuery. To validate the proposed system, a case study and its performance evaluation are presented to analyze the impact of analytical processing over XML/XLink documents.
Resumo:
In the last few years, a new generation of Business Intelligence (BI) tools called BI 2.0 has emerged to meet the new and ambitious requirements of business users. BI 2.0 not only introduces brand new topics, but in some cases it re-examines past challenges according to new perspectives depending on the market changes and needs. In this context, the term pervasive BI has gained increasing interest as an innovative and forward-looking perspective. This thesis investigates three different aspects of pervasive BI: personalization, timeliness, and integration. Personalization refers to the capacity of BI tools to customize the query result according to the user who takes advantage of it, facilitating the fruition of BI information by different type of users (e.g., front-line employees, suppliers, customers, or business partners). In this direction, the thesis proposes a model for On-Line Analytical Process (OLAP) query personalization to reduce the query result to the most relevant information for the specific user. Timeliness refers to the timely provision of business information for decision-making. In this direction, this thesis defines a new Data Warehuose (DW) methodology, Four-Wheel-Drive (4WD), that combines traditional development approaches with agile methods; the aim is to accelerate the project development and reduce the software costs, so as to decrease the number of DW project failures and favour the BI tool penetration even in small and medium companies. Integration refers to the ability of BI tools to allow users to access information anywhere it can be found, by using the device they prefer. To this end, this thesis proposes Business Intelligence Network (BIN), a peer-to-peer data warehousing architecture, where a user can formulate an OLAP query on its own system and retrieve relevant information from both its local system and the DWs of the net, preserving its autonomy and independency.
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. ^ This thesis describes a heterogeneous database system being developed at High-performance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii) a framework for intelligent computing and communication on the Internet applying the concepts of our work. ^
Resumo:
With increasing competition and more demanding members, clubs need a tool to help them belter attract and retain members and predict their behavior. Data mining is such a tool. This article presents an overview of how data warehousing, data marting, and data mining can provide the foundation on which clubs can build strategies to outsmart competitors, build Ioyalty identify new members, and lower costs.
Resumo:
Large read-only or read-write transactions with a large read set and a small write set constitute an important class of transactions used in such applications as data mining, data warehousing, statistical applications, and report generators. Such transactions are best supported with optimistic concurrency, because locking of large amounts of data for extended periods of time is not an acceptable solution. The abort rate in regular optimistic concurrency algorithms increases exponentially with the size of the transaction. The algorithm proposed in this dissertation solves this problem by using a new transaction scheduling technique that allows a large transaction to commit safely with significantly greater probability that can exceed several orders of magnitude versus regular optimistic concurrency algorithms. A performance simulation study and a formal proof of serializability and external consistency of the proposed algorithm are also presented.^ This dissertation also proposes a new query optimization technique (lazy queries). Lazy Queries is an adaptive query execution scheme which optimizes itself as the query runs. Lazy queries can be used to find an intersection of sub-queries in a very efficient way, which does not require full execution of large sub-queries nor does it require any statistical knowledge about the data.^ An efficient optimistic concurrency control algorithm used in a massively parallel B-tree with variable-length keys is introduced. B-trees with variable-length keys can be effectively used in a variety of database types. In particular, we show how such a B-tree was used in our implementation of a semantic object-oriented DBMS. The concurrency control algorithm uses semantically safe optimistic virtual "locks" that achieve very fine granularity in conflict detection. This algorithm ensures serializability and external consistency by using logical clocks and backward validation of transactional queries. A formal proof of correctness of the proposed algorithm is also presented. ^
Resumo:
The primary purpose of this thesis was to design and develop a prototype e-commerce system where dynamic parameters are included in the decision-making process and execution of an online transaction. The system developed and implemented takes into account previous usage history, priority and associated engineering capabilities. The system was developed using three-tiered client server architecture. The interface was the Internet browser. The middle tiered web server was implemented using Active Server Pages, which form a link between the client system and other servers. A relational database management system formed the data component of the three-tiered architecture. It includes a capability for data warehousing which extracts needed information from the stored data of the customers as well as their orders. The system organizes and analyzes the data that is generated during a transaction to formulate a client's behavior model during and after a transaction. This is used for making decisions like pricing, order rescheduling during a client's forthcoming transaction. The system helps among other things to bring about predictability to a transaction execution process, which could be highly desirable in the current competitive scenario.
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. This thesis describes a heterogeneous database system being developed at Highperformance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i.) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii.) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii.) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv.) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v.) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi.) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii.) a framework for intelligent computing and communication on the Internet applying the concepts of our work.
Resumo:
Libraries since their inception 4000 years ago have been in a process of constant change. Although, changes were in slow motion for centuries, in the last decades, academic libraries have been continuously striving to adapt their services to the ever-changing user needs of students and academic staff. In addition, e-content revolution, technological advances, and ever-shrinking budgets have obliged libraries to efficiently allocate their limited resources among collection and services. Unfortunately, this resource allocation is a complex process due to the diversity of data sources and formats required to be analyzed prior to decision-making, as well as the lack of efficient integration methods. The main purpose of this study is to develop an integrated model that supports libraries in making optimal budgeting and resource allocation decisions among their services and collection by means of a holistic analysis. To this end, a combination of several methodologies and structured approaches is conducted. Firstly, a holistic structure and the required toolset to holistically assess academic libraries are proposed to collect and organize the data from an economic point of view. A four-pronged theoretical framework is used in which the library system and collection are analyzed from the perspective of users and internal stakeholders. The first quadrant corresponds to the internal perspective of the library system that is to analyze the library performance, and costs incurred and resources consumed by library services. The second quadrant evaluates the external perspective of the library system; user’s perception about services quality is judged in this quadrant. The third quadrant analyses the external perspective of the library collection that is to evaluate the impact of the current library collection on its users. Eventually, the fourth quadrant evaluates the internal perspective of the library collection; the usage patterns followed to manipulate the library collection are analyzed. With a complete framework for data collection, these data coming from multiple sources and therefore with different formats, need to be integrated and stored in an adequate scheme for decision support. A data warehousing approach is secondly designed and implemented to integrate, process, and store the holistic-based collected data. Ultimately, strategic data stored in the data warehouse are analyzed and implemented for different purposes including the following: 1) Data visualization and reporting is proposed to allow library managers to publish library indicators in a simple and quick manner by using online reporting tools. 2) Sophisticated data analysis is recommended through the use of data mining tools; three data mining techniques are examined in this research study: regression, clustering and classification. These data mining techniques have been applied to the case study in the following manner: predicting the future investment in library development; finding clusters of users that share common interests and similar profiles, but belong to different faculties; and predicting library factors that affect student academic performance by analyzing possible correlations of library usage and academic performance. 3) Input for optimization models, early experiences of developing an optimal resource allocation model to distribute resources among the different processes of a library system are documented in this study. Specifically, the problem of allocating funds for digital collection among divisions of an academic library is addressed. An optimization model for the problem is defined with the objective of maximizing the usage of the digital collection over-all library divisions subject to a single collection budget. By proposing this holistic approach, the research study contributes to knowledge by providing an integrated solution to assist library managers to make economic decisions based on an “as realistic as possible” perspective of the library situation.
Resumo:
With the increasing awareness of protein folding disorders, the explosion of genomic information, and the need for efficient ways to predict protein structure, protein folding and unfolding has become a central issue in molecular sciences research. Molecular dynamics computer simulations are increasingly employed to understand the folding and unfolding of proteins. Running protein unfolding simulations is computationally expensive and finding ways to enhance performance is a grid issue on its own. However, more and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. This paper describes efforts to provide a grid-enabled data warehouse for protein unfolding data. We outline the challenge and present first results in the design and implementation of the data warehouse.
Resumo:
Purpose: This paper extends the use of Radio Frequency Identification (RFID) data for accounting of warehouse costs and services. Time Driven Activity Based Costing (TDABC) methodology is enhanced with the real-time collected RFID data about duration of warehouse activities. This allows warehouse managers to have accurate and instant calculations of costs. The RFID enhanced TDABC (RFID-TDABC) is proposed as a novel application of the RFID technology. Research Approach: Application of RFID-TDABC in a warehouse is implemented on warehouse processes of a case study company. Implementation covers receiving, put-away, order picking, and despatching. Findings and Originality: RFID technology is commonly used for the identification and tracking items. The use of the RFID generated information with the TDABC can be successfully extended to the area of costing. This RFID-TDABC costing model will benefit warehouse managers with accurate and instant calculations of costs. Research Impact: There are still unexplored benefits to RFID technology in its applications in warehousing and the wider supply chain. A multi-disciplinary research approach led to combining RFID technology and TDABC accounting method in order to propose RFID-TDABC. Combining methods and theories from different fields with RFID, may lead researchers to develop new techniques such as RFID-TDABC presented in this paper. Practical Impact: RFID-TDABC concept will be of value to practitioners by showing how warehouse costs can be accurately measured by using this approach. Providing better understanding of incurred costs may result in a further optimisation of warehousing operations, lowering costs of activities, and thus provide competitive pricing to customers. RFID-TDABC can be applied in a wider supply chain.
Resumo:
La filosofia del lean thinking ha dimostrato in numerose occasioni, dalla sua nascita ad oggi, di poter apportare reali e consistenti benefici all’interno degli ambienti aziendali rivoluzionando a volte non solo il modo di produrre delle aziende ma anche quello di pensare, generando un profondo cambiamento culturale. La filosofia lean nasce come esigenza di riadattamento del sistema produttivo in contrapposizione a quello della mass production per questo molti dei testi di riferimento storici sul lean thinking citano prassi e casi aziendali che toccano esclusivamente l’ambito della produzione, tuttavia tale ambito rappresenta solamente uno dei tanti che si possono osservare in ambito aziendale. Successivamente si è compreso come il lean thinking rappresenti in realtà un sistema, composto da: principi, tecniche, metodi e strumenti in grado di garantire il miglioramento dei processi. Si tratta di fatto di un framework adattabile alle diverse funzioni aziendali e che per massimizzare la propria efficacia deve essere assimilato dall’intera organizzazione e non solo dall’ambiente produttivo. Negli anni infatti l’aggettivo “lean” è stato accostato agli ambiti più vari: lean sales, lean accounting, lean services, lean managment, lean organization, lean enterprise, lean office e molti altri. Tuttavia raramente ci si imbatte nel termine “lean warehousing” e ancora più raramente ci si imbatte in casi di applicazioni di strumenti lean in ambito logistico o a testi di riferimento che trattino l’argomento. In un mercato sempre più̀ competitivo è fondamentale per qualsiasi azienda avere una logistica efficiente e flessibile che consenta di offrire un alto livello di servizio per il cliente. Il magazzino rappresenta il punto di arrivo e di partenza per ogni flusso logistico e data la rilevanza che ancora oggi la sua funzione detiene per le aziende risulta fondamentale per il successo dell’interno sistema implementare anche in questa divisione la filosofia lean.
Resumo:
High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.
Resumo:
The article seeks to investigate patterns of performance and relationships between grip strength, gait speed and self-rated health, and investigate the relationships between them, considering the variables of gender, age and family income. This was conducted in a probabilistic sample of community-dwelling elderly aged 65 and over, members of a population study on frailty. A total of 689 elderly people without cognitive deficit suggestive of dementia underwent tests of gait speed and grip strength. Comparisons between groups were based on low, medium and high speed and strength. Self-related health was assessed using a 5-point scale. The males and the younger elderly individuals scored significantly higher on grip strength and gait speed than the female and oldest did; the richest scored higher than the poorest on grip strength and gait speed; females and men aged over 80 had weaker grip strength and lower gait speed; slow gait speed and low income arose as risk factors for a worse health evaluation. Lower muscular strength affects the self-rated assessment of health because it results in a reduction in functional capacity, especially in the presence of poverty and a lack of compensatory factors.
Resumo:
Obstructive sleep apnea syndrome has a high prevalence among adults. Cephalometric variables can be a valuable method for evaluating patients with this syndrome. To correlate cephalometric data with the apnea-hypopnea sleep index. We performed a retrospective and cross-sectional study that analyzed the cephalometric data of patients followed in the Sleep Disorders Outpatient Clinic of the Discipline of Otorhinolaryngology of a university hospital, from June 2007 to May 2012. Ninety-six patients were included, 45 men, and 51 women, with a mean age of 50.3 years. A total of 11 patients had snoring, 20 had mild apnea, 26 had moderate apnea, and 39 had severe apnea. The distance from the hyoid bone to the mandibular plane was the only variable that showed a statistically significant correlation with the apnea-hypopnea index. Cephalometric variables are useful tools for the understanding of obstructive sleep apnea syndrome. The distance from the hyoid bone to the mandibular plane showed a statistically significant correlation with the apnea-hypopnea index.
Resumo:
In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.