72 resultados para Data Warehouse Hadoop Spark GMQL HDFS YARN MapReduce genomica bioinformatica dipendenze funzionali

em Universidade do Minho


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação de mestrado em Systems Engineering

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação

Relevância:

100.00% 100.00%

Publicador:

Resumo:

During the last few years many research efforts have been done to improve the design of ETL (Extract-Transform-Load) systems. ETL systems are considered very time-consuming, error-prone and complex involving several participants from different knowledge domains. ETL processes are one of the most important components of a data warehousing system that are strongly influenced by the complexity of business requirements, their changing and evolution. These aspects influence not only the structure of a data warehouse but also the structures of the data sources involved with. To minimize the negative impact of such variables, we propose the use of ETL patterns to build specific ETL packages. In this paper, we formalize this approach using BPMN (Business Process Modelling Language) for modelling more conceptual ETL workflows, mapping them to real execution primitives through the use of a domain-specific language that allows for the generation of specific instances that can be executed in an ETL commercial tool.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Os recursos computacionais exigidos durante o processamento de grandes volumes de dados durante um processo de povoamento de um data warehouse faz com que a necessidade da procura de novas implementações tenha também em atenção a eficiência energética dos diversos componentes processuais que integram um qualquer sistema de povoamento. A lacuna de técnicas ou metodologias para categorizar e avaliar o consumo de energia em sistemas de povoamento de data warehouses é claramente notória. O acesso a esse tipo de informação possibilitaria a construção de sistemas de povoamento de data warehouses com níveis de consumo de energia mais baixos e, portanto, mais eficientes. Partindo da adaptação de técnicas aplicadas a sistemas de gestão de base de dados para a obtenção dos consumos energéticos da execução de interrogações, desenhámos e implementámos uma nova técnica que nos permite obter os consumos de energia para um qualquer processo de povoamento de um data warehouse, através da avaliação do consumo de cada um dos componentes utilizados na sua implementação utilizando uma ferramenta convencional. Neste artigo apresentamos a forma como fazemos tal avaliação, utilizando na demonstração da viabilidade da nossa proposta um processo de povoamento bastante típico em data warehouses – substituição encadeada de chaves operacionais -, que foi implementado através da ferramenta Kettle.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, organizations are increasingly looking to invest in business intelligence solutions, mainly private companies in order to get advantage over its competitors, however they do not know what is necessary. Business intelligence allows an analysis of consolidated information in order to obtain more specific outlets and certain indications in order to support the decision making process. You can take the right decision based on the data collected from different information systems present in the organization and outside of them. The textile sector is a sector where concept of Business Intelligence it is not many explored yet. Actually there are few textile companies that have a BI platform. Thus, the article objective is present an architecture and show all the steps by which companies need to spend to implement a successful free homemade Business Intelligence system. As result the proposed approach it was validated using real data aiming assess the steps defined.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The MAP-i Doctoral Programme in Informatics, of the Universities of Minho, Aveiro and Porto

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação de mestrado integrado em Engenharia de Gestão e Sistemas de Informação

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação de mestrado integrado em Engenharia Biomédica (área de especialização em Informática Médica)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dissertação de mestrado integrado em Engenharia e Gestão Industrial

Relevância:

20.00% 20.00%

Publicador:

Resumo:

As huge amounts of data become available in organizations and society, specific data analytics skills and techniques are needed to explore this data and extract from it useful patterns, tendencies, models or other useful knowledge, which could be used to support the decision-making process, to define new strategies or to understand what is happening in a specific field. Only with a deep understanding of a phenomenon it is possible to fight it. In this paper, a data-driven analytics approach is used for the analysis of the increasing incidence of fatalities by pneumonia in the Portuguese population, characterizing the disease and its incidence in terms of fatalities, knowledge that can be used to define appropriate strategies that can aim to reduce this phenomenon, which has increased more than 65% in a decade.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a methodology based on the Bayesian data fusion techniques applied to non-destructive and destructive tests for the structural assessment of historical constructions. The aim of the methodology is to reduce the uncertainties of the parameter estimation. The Young's modulus of granite stones was chosen as an example for the present paper. The methodology considers several levels of uncertainty since the parameters of interest are considered random variables with random moments. A new concept of Trust Factor was introduced to affect the uncertainty related to each test results, translated by their standard deviation, depending on the higher or lower reliability of each test to predict a certain parameter.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Earthworks tasks aim at levelling the ground surface at a target construction area and precede any kind of structural construction (e.g., road and railway construction). It is comprised of sequential tasks, such as excavation, transportation, spreading and compaction, and it is strongly based on heavy mechanical equipment and repetitive processes. Under this context, it is essential to optimize the usage of all available resources under two key criteria: the costs and duration of earthwork projects. In this paper, we present an integrated system that uses two artificial intelligence based techniques: data mining and evolutionary multi-objective optimization. The former is used to build data-driven models capable of providing realistic estimates of resource productivity, while the latter is used to optimize resource allocation considering the two main earthwork objectives (duration and cost). Experiments held using real-world data, from a construction site, have shown that the proposed system is competitive when compared with current manual earthwork design.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Desde 2009 que a Porto Editora elege em “infopédia.pt” a palavra que melhor representa os anos que terminam. Este trabalho apresenta uma forma alternativa a essa eleição, substituindo a votação dos cidadãos pela recolha de dados da rede social Twitter ao longo do ano, e procedendo à análise dos mesmos em substituição da votação. Assim sendo, foram recolhidos dados associados às dez palavras finalistas incluídas no conjunto da palavra do ano 2014, os quais foram armazenados em ambiente Hadoop para seguidamente e recorrendo a dois lexicons ser possível a classificação dos tweets. Os lexicons utilizados incluem, por um lado, a lista de palavras positivas e negativas e, por outro, as polaridades associadas às palavras em conjugação com o top vinte e cinco de emoticons utilizados no Twitter. Os resultados obtidos permitem identificar a palavra mais referida e o sentimento, positivo ou negativo associado à mesma.