33 resultados para Data-representation


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Today it is easy to find a lot of tools to define data migration schemas among different types of information systems. Data migration processes use to be implemented on a very diverse range of applications, ranging from conventional operational systems to data warehousing platforms. The implementation of a data migration process often involves a serious planning, considering the development of conceptual migration schemas at early stages. Such schemas help architects and engineers to plan and discuss the most adequate way to migrate data between two different systems. In this paper we present and discuss a way for enriching data migration conceptual schemas in BPMN using a domain-specific language, demonstrating how to convert such enriched schemas to a first correspondent physical representation (a skeleton) in a conventional ETL implementation tool like Kettle.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

ETL conceptual modeling is a very important activity in any data warehousing system project implementation. Owning a high-level system representation allowing for a clear identification of the main parts of a data warehousing system is clearly a great advantage, especially in early stages of design and development. However, the effort to model conceptually an ETL system rarely is properly rewarded. Translating ETL conceptual models directly into something that saves work and time on the concrete implementation of the system process it would be, in fact, a great help. In this paper we present and discuss a hybrid approach to this problem, combining the simplicity of interpretation and power of expression of BPMN on ETL systems conceptualization with the use of ETL patterns to produce automatically an ETL skeleton, a first prototype system, which has the ability to be executed in a commercial ETL tool like Kettle.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Os recursos computacionais exigidos durante o processamento de grandes volumes de dados durante um processo de povoamento de um data warehouse faz com que a necessidade da procura de novas implementações tenha também em atenção a eficiência energética dos diversos componentes processuais que integram um qualquer sistema de povoamento. A lacuna de técnicas ou metodologias para categorizar e avaliar o consumo de energia em sistemas de povoamento de data warehouses é claramente notória. O acesso a esse tipo de informação possibilitaria a construção de sistemas de povoamento de data warehouses com níveis de consumo de energia mais baixos e, portanto, mais eficientes. Partindo da adaptação de técnicas aplicadas a sistemas de gestão de base de dados para a obtenção dos consumos energéticos da execução de interrogações, desenhámos e implementámos uma nova técnica que nos permite obter os consumos de energia para um qualquer processo de povoamento de um data warehouse, através da avaliação do consumo de cada um dos componentes utilizados na sua implementação utilizando uma ferramenta convencional. Neste artigo apresentamos a forma como fazemos tal avaliação, utilizando na demonstração da viabilidade da nossa proposta um processo de povoamento bastante típico em data warehouses – substituição encadeada de chaves operacionais -, que foi implementado através da ferramenta Kettle.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Worldwide, around 9% of the children are born with less than 37 weeks of labour, causing risk to the premature child, whom it is not prepared to develop a number of basic functions that begin soon after the birth. In order to ensure that those risk pregnancies are being properly monitored by the obstetricians in time to avoid those problems, Data Mining (DM) models were induced in this study to predict preterm births in a real environment using data from 3376 patients (women) admitted in the maternal and perinatal care unit of Centro Hospitalar of Oporto. A sensitive metric to predict preterm deliveries was developed, assisting physicians in the decision-making process regarding the patients’ observation. It was possible to obtain promising results, achieving sensitivity and specificity values of 96% and 98%, respectively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Lecture Notes in Computer Science, 9273

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In Maternity Care, a quick decision has to be made about the most suitable delivery type for the current patient. Guidelines are followed by physicians to support that decision; however, those practice recommendations are limited and underused. In the last years, caesarean delivery has been pursued in over 28% of pregnancies, and other operative techniques regarding specific problems have also been excessively employed. This study identifies obstetric and pregnancy factors that can be used to predict the most appropriate delivery technique, through the induction of data mining models using real data gathered in the perinatal and maternal care unit of Centro Hospitalar of Oporto (CHP). Predicting the type of birth envisions high-quality services, increased safety and effectiveness of specific practices to help guide maternity care decisions and facilitate optimal outcomes in mother and child. In this work was possible to acquire good results, achieving sensitivity and specificity values of 90.11% and 80.05%, respectively, providing the CHP with a model capable of correctly identify caesarean sections and vaginal deliveries.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Tese de Doutoramento em Ciências da Literatura - Especialidade em Teoria da Literatura

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Rockburst is characterized by a violent explosion of a block causing a sudden rupture in the rock and is quite common in deep tunnels. It is critical to understand the phenomenon of rockburst, focusing on the patterns of occurrence so these events can be avoided and/or managed saving costs and possibly lives. The failure mechanism of rockburst needs to be better understood. Laboratory experiments are undergoing at the Laboratory for Geomechanics and Deep Underground Engineering (SKLGDUE) of Beijing and the system is described. A large number of rockburst tests were performed and their information collected, stored in a database and analyzed. Data Mining (DM) techniques were applied to the database in order to develop predictive models for the rockburst maximum stress (σRB) and rockburst risk index (IRB) that need the results of such tests to be determined. With the developed models it is possible to predict these parameters with high accuracy levels using data from the rock mass and specific project.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A search is performed for Higgs bosons produced in association with top quarks using the diphoton decay mode of the Higgs boson. Selection requirements are optimized separately for leptonic and fully hadronic final states from the top quark decays. The dataset used corresponds to an integrated luminosity of 4.5 fb−1 of proton--proton collisions at a center-of-mass energy of 7 TeV and 20.3 fb−1 at 8 TeV recorded by the ATLAS detector at the CERN Large Hadron Collider. No significant excess over the background prediction is observed and upper limits are set on the tt¯H production cross section. The observed exclusion upper limit at 95% confidence level is 6.7 times the predicted Standard Model cross section value. In addition, limits are set on the strength of the Yukawa coupling between the top quark and the Higgs boson, taking into account the dependence of the tt¯H and tH cross sections as well as the H→γγ branching fraction on the Yukawa coupling. Lower and upper limits at 95% confidence level are set at −1.3 and +8.0 times the Yukawa coupling strength in the Standard Model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Searches are performed for resonant and non-resonant Higgs boson pair production in the hh→γγbb¯ final state using 20 fb−1 of proton--proton collisions at a center-of-mass energy of 8TeV recorded with the ATLAS detector at the CERN Large Hadron Collider. A 95% confidence level upper limit on the cross section times branching ratio of non--resonant production is set at 2.2 pb, while the expected limit is 1.0 pb. The corresponding limit observed for a narrow resonance ranges between 0.8 and 3.5 pb as a function of its mass.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The MAP-i Doctoral Program of the Universities of Minho, Aveiro and Porto

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The study of the interaction between hair filaments and formulations or peptides is of utmost importance in fields like cosmetic research. Keratin intermediate filaments structure is not fully described, limiting the molecular dynamics (MD) studies in this field although its high potential to improve the area. We developed a computational model of a truncated protofibril, simulated its behavior in alcoholic based formulations and with one peptide. The simulations showed a strong interaction between the benzyl alcohol molecules of the formulations and the model, leading to the disorganization of the keratin chains, which regress with the removal of the alcohol molecules. This behavior can explain the increase of peptide uptake in hair shafts evidenced in fluorescence microscopy pictures. The model developed is valid to computationally reproduce the interaction between hair and alcoholic formulations and provide a robust base for new MD studies about hair properties. It is shown that the MD simulations can improve hair cosmetic research, improving the uptake of a compound of interest.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Propolis is a chemically complex biomass produced by honeybees (Apis mellifera) from plant resins added of salivary enzymes, beeswax, and pollen. The biological activities described for propolis were also identified for donor plants resin, but a big challenge for the standardization of the chemical composition and biological effects of propolis remains on a better understanding of the influence of seasonality on the chemical constituents of that raw material. Since propolis quality depends, among other variables, on the local flora which is strongly influenced by (a)biotic factors over the seasons, to unravel the harvest season effect on the propolis chemical profile is an issue of recognized importance. For that, fast, cheap, and robust analytical techniques seem to be the best choice for large scale quality control processes in the most demanding markets, e.g., human health applications. For that, UV-Visible (UV-Vis) scanning spectrophotometry of hydroalcoholic extracts (HE) of seventy-three propolis samples, collected over the seasons in 2014 (summer, spring, autumn, and winter) and 2015 (summer and autumn) in Southern Brazil was adopted. Further machine learning and chemometrics techniques were applied to the UV-Vis dataset aiming to gain insights as to the seasonality effect on the claimed chemical heterogeneity of propolis samples determined by changes in the flora of the geographic region under study. Descriptive and classification models were built following a chemometric approach, i.e. principal component analysis (PCA) and hierarchical clustering analysis (HCA) supported by scripts written in the R language. The UV-Vis profiles associated with chemometric analysis allowed identifying a typical pattern in propolis samples collected in the summer. Importantly, the discrimination based on PCA could be improved by using the dataset of the fingerprint region of phenolic compounds ( = 280-400m), suggesting that besides the biological activities of those secondary metabolites, they also play a relevant role for the discrimination and classification of that complex matrix through bioinformatics tools. Finally, a series of machine learning approaches, e.g., partial least square-discriminant analysis (PLS-DA), k-Nearest Neighbors (kNN), and Decision Trees showed to be complementary to PCA and HCA, allowing to obtain relevant information as to the sample discrimination.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

DNA microarrays are one of the most used technologies for gene expression measurement. However, there are several distinct microarray platforms, from different manufacturers, each with its own measurement protocol, resulting in data that can hardly be compared or directly integrated. Data integration from multiple sources aims to improve the assertiveness of statistical tests, reducing the data dimensionality problem. The integration of heterogeneous DNA microarray platforms comprehends a set of tasks that range from the re-annotation of the features used on gene expression, to data normalization and batch effect elimination. In this work, a complete methodology for gene expression data integration and application is proposed, which comprehends a transcript-based re-annotation process and several methods for batch effect attenuation. The integrated data will be used to select the best feature set and learning algorithm for a brain tumor classification case study. The integration will consider data from heterogeneous Agilent and Affymetrix platforms, collected from public gene expression databases, such as The Cancer Genome Atlas and Gene Expression Omnibus.