36 resultados para Data Migration Processes Modeling

em Universidade do Minho


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Developing and implementing data-oriented workflows for data migration processes are complex tasks involving several problems related to the integration of data coming from different schemas. Usually, they involve very specific requirements - every process is almost unique. Having a way to abstract their representation will help us to better understand and validate them with business users, which is a crucial step for requirements validation. In this demo we present an approach that provides a way to enrich incrementally conceptual models in order to support an automatic way for producing their correspondent physical implementation. In this demo we will show how B2K (Business to Kettle) system works transforming BPMN 2.0 conceptual models into Kettle data-integration executable processes, approaching the most relevant aspects related to model design and enrichment, model to system transformation, and system execution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Today it is easy to find a lot of tools to define data migration schemas among different types of information systems. Data migration processes use to be implemented on a very diverse range of applications, ranging from conventional operational systems to data warehousing platforms. The implementation of a data migration process often involves a serious planning, considering the development of conceptual migration schemas at early stages. Such schemas help architects and engineers to plan and discuss the most adequate way to migrate data between two different systems. In this paper we present and discuss a way for enriching data migration conceptual schemas in BPMN using a domain-specific language, demonstrating how to convert such enriched schemas to a first correspondent physical representation (a skeleton) in a conventional ETL implementation tool like Kettle.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Current data mining engines are difficult to use, requiring optimizations by data mining experts in order to provide optimal results. To solve this problem a new concept was devised, by maintaining the functionality of current data mining tools and adding pervasive characteristics such as invisibility and ubiquity which focus on their users, providing better ease of use and usefulness, by providing autonomous and intelligent data mining processes. This article introduces an architecture to implement a data mining engine, composed by four major components: database; Middleware (control); Middleware (processing); and interface. These components are interlinked but provide independent scaling, allowing for a system that adapts to the user’s needs. A prototype has been developed in order to test the architecture. The results are very promising and showed their functionality and the need for further improvements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The environmental and socio-economic importance of coastal areas is widely recognized, but at present these areas face severe weaknesses and high-risk situations. The increased demand and growing human occupation of coastal zones have greatly contributed to exacerbating such weaknesses. Today, throughout the world, in all countries with coastal regions, episodes of waves overtopping and coastal flooding are frequent. These episodes are usually responsible for property losses and often put human lives at risk. The floods are caused by coastal storms primarily due to the action of very strong winds. The propagation of these storms towards the coast induces high water levels. It is expected that climate change phenomena will contribute to the intensification of coastal storms. In this context, an estimation of coastal flooding hazards is of paramount importance for the planning and management of coastal zones. Consequently, carrying out a series of storm scenarios and analyzing their impacts through numerical modeling is of prime interest to coastal decision-makers. Firstly, throughout this work, historical storm tracks and intensities are characterized for the northeastern region of United States coast, in terms of probability of occurrence. Secondly, several storm events with high potential of occurrence are generated using a specific tool of DelftDashboard interface for Delft3D software. Hydrodynamic models are then used to generate ensemble simulations to assess storms' effects on coastal water levels. For the United States’ northeastern coast, a highly refined regional domain is considered surrounding the area of The Battery, New York, situated in New York Harbor. Based on statistical data of numerical modeling results, a review of the impact of coastal storms to different locations within the study area is performed.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

During the last few years many research efforts have been done to improve the design of ETL (Extract-Transform-Load) systems. ETL systems are considered very time-consuming, error-prone and complex involving several participants from different knowledge domains. ETL processes are one of the most important components of a data warehousing system that are strongly influenced by the complexity of business requirements, their changing and evolution. These aspects influence not only the structure of a data warehouse but also the structures of the data sources involved with. To minimize the negative impact of such variables, we propose the use of ETL patterns to build specific ETL packages. In this paper, we formalize this approach using BPMN (Business Process Modelling Language) for modelling more conceptual ETL workflows, mapping them to real execution primitives through the use of a domain-specific language that allows for the generation of specific instances that can be executed in an ETL commercial tool.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Modeling Extract-Transform-Load (ETL) processes of a Data Warehousing System has always been a challenge. The heterogeneity of the sources, the quality of the data obtained and the conciliation process are some of the issues that must be addressed in the design phase of this critical component. Commercial ETL tools often provide proprietary diagrammatic components and modeling languages that are not standard, thus not providing the ideal separation between a modeling platform and an execution platform. This separation in conjunction with the use of standard notations and languages is critical in a system that tends to evolve through time and which cannot be undermined by a normally expensive tool that becomes an unsatisfactory component. In this paper we demonstrate the application of Relational Algebra as a modeling language of an ETL system as an effort to standardize operations and provide a basis for uncommon ETL execution platforms.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The MAP-i Doctoral Programme in Informatics, of the Universities of Minho, Aveiro and Porto

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper describes the concept, technical realisation and validation of a largely data-driven method to model events with Z→ττ decays. In Z→μμ events selected from proton-proton collision data recorded at s√=8 TeV with the ATLAS experiment at the LHC in 2012, the Z decay muons are replaced by τ leptons from simulated Z→ττ decays at the level of reconstructed tracks and calorimeter cells. The τ lepton kinematics are derived from the kinematics of the original muons. Thus, only the well-understood decays of the Z boson and τ leptons as well as the detector response to the τ decay products are obtained from simulation. All other aspects of the event, such as the Z boson and jet kinematics as well as effects from multiple interactions, are given by the actual data. This so-called τ-embedding method is particularly relevant for Higgs boson searches and analyses in ττ final states, where Z→ττ decays constitute a large irreducible background that cannot be obtained directly from data control samples.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Mathematical and computational models play an essential role in understanding the cellular metabolism. They are used as platforms to integrate current knowledge on a biological system and to systematically test and predict the effect of manipulations to such systems. The recent advances in genome sequencing techniques have facilitated the reconstruction of genome-scale metabolic networks for a wide variety of organisms from microbes to human cells. These models have been successfully used in multiple biotechnological applications. Despite these advancements, modeling cellular metabolism still presents many challenges. The aim of this Research Topic is not only to expose and consolidate the state-of-the-art in metabolic modeling approaches, but also to push this frontier beyond the current edge through the introduction of innovative solutions. The articles presented in this e-book address some of the main challenges in the field, including the integration of different modeling formalisms, the integration of heterogeneous data sources into metabolic models, explicit representation of other biological processes during phenotype simulation, and standardization efforts in the representation of metabolic models and simulation results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper assesses land-use changes related to naturbanization processes on three biosphere reserves in Southern Europe. A comparative analysis has been done on the National Parks in Peneda-Ger^es in North Portugal, C_evennes in South France and Sierra Nevada in South Spain, using Corine Land Cover data from 1990 until 2006. Results indicate that the process of land-use intensification is taking place in the frame of naturbanization dynamics that could jeopardize the role of Protected Areas. Focusing on the trends faced by National Parks and their surrounding territories, the analysis demonstrates, both in quantitative and spatial terms, the intensification processes of land-use changes and how it is important to know them for coping with increasing threats. The article concludes that in the current context of increasing stresses, a broader focus on nature protection, encompassing the wider countryside, is needed if the initiatives for biodiversity protection are to be effective.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Earthworks tasks aim at levelling the ground surface at a target construction area and precede any kind of structural construction (e.g., road and railway construction). It is comprised of sequential tasks, such as excavation, transportation, spreading and compaction, and it is strongly based on heavy mechanical equipment and repetitive processes. Under this context, it is essential to optimize the usage of all available resources under two key criteria: the costs and duration of earthwork projects. In this paper, we present an integrated system that uses two artificial intelligence based techniques: data mining and evolutionary multi-objective optimization. The former is used to build data-driven models capable of providing realistic estimates of resource productivity, while the latter is used to optimize resource allocation considering the two main earthwork objectives (duration and cost). Experiments held using real-world data, from a construction site, have shown that the proposed system is competitive when compared with current manual earthwork design.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PhD Thesis in Bioengineering

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Doctoral Thesis Civil Engineering

Relevância:

30.00% 30.00%

Publicador:

Resumo:

[Extrat] Multiphase flows are relevant in several industrial processes, thus the availability of accurate numerical modeling tools, able to support the design of products and processes, is of much significance. OpenFOAM version 2.3.x comprises a multiphase flow solver able to couple Eulerian and Lagrangian phases using the discrete particles method (DPM), the DPMFoam. In this work the DPMFoam solver is assessed by comparing its predictions with analytical results and experimental and simulated data available in the literature. They are results from Goldschmidt’s [1] and Hoomans’s [2] theses and the analytical Ergun equation. The goal was to define accuracy and performance of DPMFoam in general scientific or commercial applications. Obtained results demonstrate a good agreement with the reference simulation data and is within reasonable deviations from the experimental values. (...)