903 resultados para Data-driven knowledge acquisition
Resumo:
Projecto para obtenção do grau de Mestre em Engenharia Informática e de computadores
Resumo:
Esta dissertação apresenta uma proposta de sistema capaz de preencher a lacuna entre documentos legislativos em formato PDF e documentos legislativos em formato aberto. O objetivo principal é mapear o conhecimento presente nesses documentos de maneira a representar essa coleção como informação interligada. O sistema é composto por vários componentes responsáveis pela execução de três fases propostas: extração de dados, organização de conhecimento, acesso à informação. A primeira fase propõe uma abordagem à extração de estrutura, texto e entidades de documentos PDF de maneira a obter a informação desejada, de acordo com a parametrização do utilizador. Esta abordagem usa dois métodos de extração diferentes, de acordo com as duas fases de processamento de documentos – análise de documento e compreensão de documento. O critério utilizado para agrupar objetos de texto é a fonte usada nos objetos de texto de acordo com a sua definição no código de fonte (Content Stream) do PDF. A abordagem está dividida em três partes: análise de documento, compreensão de documento e conjunção. A primeira parte da abordagem trata da extração de segmentos de texto, adotando uma abordagem geométrica. O resultado é uma lista de linhas do texto do documento; a segunda parte trata de agrupar os objetos de texto de acordo com o critério estipulado, produzindo um documento XML com o resultado dessa extração; a terceira e última fase junta os resultados das duas fases anteriores e aplica regras estruturais e lógicas no sentido de obter o documento XML final. A segunda fase propõe uma ontologia no domínio legal capaz de organizar a informação extraída pelo processo de extração da primeira fase. Também é responsável pelo processo de indexação do texto dos documentos. A ontologia proposta apresenta três características: pequena, interoperável e partilhável. A primeira característica está relacionada com o facto da ontologia não estar focada na descrição pormenorizada dos conceitos presentes, propondo uma descrição mais abstrata das entidades presentes; a segunda característica é incorporada devido à necessidade de interoperabilidade com outras ontologias do domínio legal, mas também com as ontologias padrão que são utilizadas geralmente; a terceira característica é definida no sentido de permitir que o conhecimento traduzido, segundo a ontologia proposta, seja independente de vários fatores, tais como o país, a língua ou a jurisdição. A terceira fase corresponde a uma resposta à questão do acesso e reutilização do conhecimento por utilizadores externos ao sistema através do desenvolvimento dum Web Service. Este componente permite o acesso à informação através da disponibilização de um grupo de recursos disponíveis a atores externos que desejem aceder à informação. O Web Service desenvolvido utiliza a arquitetura REST. Uma aplicação móvel Android também foi desenvolvida de maneira a providenciar visualizações dos pedidos de informação. O resultado final é então o desenvolvimento de um sistema capaz de transformar coleções de documentos em formato PDF para coleções em formato aberto de maneira a permitir o acesso e reutilização por outros utilizadores. Este sistema responde diretamente às questões da comunidade de dados abertos e de Governos, que possuem muitas coleções deste tipo, para as quais não existe a capacidade de raciocinar sobre a informação contida, e transformá-la em dados que os cidadãos e os profissionais possam visualizar e utilizar.
Resumo:
Paper presented at the 9th European Conference on Knowledge Management, Southampton Solent University, Southampton, UK, 4-5 Sep. 2008. URL: http://academic-conferences.org/eckm/eckm2008/eckm08-home.htm
Resumo:
This paper discusses the results of applied research on the eco-driving domain based on a huge data set produced from a fleet of Lisbon's public transportation buses for a three-year period. This data set is based on events automatically extracted from the control area network bus and enriched with GPS coordinates, weather conditions, and road information. We apply online analytical processing (OLAP) and knowledge discovery (KD) techniques to deal with the high volume of this data set and to determine the major factors that influence the average fuel consumption, and then classify the drivers involved according to their driving efficiency. Consequently, we identify the most appropriate driving practices and styles. Our findings show that introducing simple practices, such as optimal clutch, engine rotation, and engine running in idle, can reduce fuel consumption on average from 3 to 5l/100 km, meaning a saving of 30 l per bus on one day. These findings have been strongly considered in the drivers' training sessions.
Resumo:
Dissertação apresentada para obtenção do Grau de Doutor em Engenharia do Ambiente pela Universidade Nova de Lisboa,Faculdade de Ciências e Tecnologia
Resumo:
The transition process between information and knowledge is faster and so the inputs that influence social and political practises. The dissemination of information is now determinant in terms of territorial competitiveness and both public and private sector take large benefits when the data-information- knowledge value chain repeats itself trough space and time. Mankind depends nowadays on the creation and diffusion of good and reliable information. Speed is also important and the greater the speed, the faster the opportunities for global markets. Information must be an input for knowledge and obviously for decision. So, the power of information is unquestionable. This paper focuses on concepts like information, knowledge and other, more geographical and tries to explain how territories change from real to virtual. Knowledge society appears on an evolutional context in which information dissemination is wider and technological potential overwrites traditional notions of Geography. To understand the mutations over the territories, the causes and the consequences emerges the Geography of the Knowledge Society, a new discipline inside Geography with a special concern about modern society and socio-economical developing models.
Resumo:
O estudo que apresentamos tem como objetivos de investigação contribuir para o desenvolvimento global das crianças com Necessidades Educativas Especiais, nomeadamente, portadoras de Dislexia, designadamente em aulas onde são lecionados conteúdos de Ciências da Natureza e de Matemática, e conceber um recurso didático apoiado nas TIC que seja adequado à utilização em sala de aula por alunos com dislexia. Neste sentido, o estudo foi orientado pelas seguintes questões de investigação: A utilização do Scratch é adequada e exequível no trabalho quotidiano de alunos portadores de Dislexia? e Em que medida o Scratch pode contribuir para um melhor envolvimento dos alunos portadores de Dislexia na realização das tarefas propostas? O presente trabalho segue uma metodologia de carácter qualitativo, centrando-se no estudo de caso. Tentou-se dar uma resposta à reduzida utilização do software Scratch no ensino das Ciências da Natureza e Matemática como instrumentos viáveis para uma aprendizagem de sucesso em alunos portadores de Dislexia. Este estudo foi aplicado a três alunos do 4º ano, com este diagnóstico, de um agrupamento de escolas do concelho de Viseu. Para dar resposta às questões de investigação supracitadas foram implementadas e projetadas duas atividades no software Scratch para serem desenvolvidas em duas situações formativas contextualizadas. O subdomínio que se pretendeu trabalhar foi “A importância da água para os seres vivos” articulando com o conteúdo dos volumes. Os dados foram recolhidos através da análise dos Programas Educativos Individuais de cada um dos alunos participantes, das entrevistas às docentes de Educação Especial, de gravações áudio, fotografias, trabalhos dos alunos e os registos da professora investigadora. Os resultados obtidos permitiram caracterizar o modo como os alunos, portadores de dislexia, se envolveram com o Scratch na aquisição de conhecimento. Permitiram ainda demonstrar aos professores que é possível construir atividades nesta ferramenta envolvendo conteúdos curriculares interdisciplinares.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Eletrotécnica e de Computadores
Resumo:
Polysaccharides are gaining increasing attention as potential environmental friendly and sustainable building blocks in many fields of the (bio)chemical industry. The microbial production of polysaccharides is envisioned as a promising path, since higher biomass growth rates are possible and therefore higher productivities may be achieved compared to vegetable or animal polysaccharides sources. This Ph.D. thesis focuses on the modeling and optimization of a particular microbial polysaccharide, namely the production of extracellular polysaccharides (EPS) by the bacterial strain Enterobacter A47. Enterobacter A47 was found to be a metabolically versatile organism in terms of its adaptability to complex media, notably capable of achieving high growth rates in media containing glycerol byproduct from the biodiesel industry. However, the industrial implementation of this production process is still hampered due to a largely unoptimized process. Kinetic rates from the bioreactor operation are heavily dependent on operational parameters such as temperature, pH, stirring and aeration rate. The increase of culture broth viscosity is a common feature of this culture and has a major impact on the overall performance. This fact complicates the mathematical modeling of the process, limiting the possibility to understand, control and optimize productivity. In order to tackle this difficulty, data-driven mathematical methodologies such as Artificial Neural Networks can be employed to incorporate additional process data to complement the known mathematical description of the fermentation kinetics. In this Ph.D. thesis, we have adopted such an hybrid modeling framework that enabled the incorporation of temperature, pH and viscosity effects on the fermentation kinetics in order to improve the dynamical modeling and optimization of the process. A model-based optimization method was implemented that enabled to design bioreactor optimal control strategies in the sense of EPS productivity maximization. It is also critical to understand EPS synthesis at the level of the bacterial metabolism, since the production of EPS is a tightly regulated process. Methods of pathway analysis provide a means to unravel the fundamental pathways and their controls in bioprocesses. In the present Ph.D. thesis, a novel methodology called Principal Elementary Mode Analysis (PEMA) was developed and implemented that enabled to identify which cellular fluxes are activated under different conditions of temperature and pH. It is shown that differences in these two parameters affect the chemical composition of EPS, hence they are critical for the regulation of the product synthesis. In future studies, the knowledge provided by PEMA could foster the development of metabolically meaningful control strategies that target the EPS sugar content and oder product quality parameters.
Resumo:
Customer lifetime value (LTV) enables using client characteristics, such as recency, frequency and monetary (RFM) value, to describe the value of a client through time in terms of profitability. We present the concept of LTV applied to telemarketing for improving the return-on-investment, using a recent (from 2008 to 2013) and real case study of bank campaigns to sell long- term deposits. The goal was to benefit from past contacts history to extract additional knowledge. A total of twelve LTV input variables were tested, un- der a forward selection method and using a realistic rolling windows scheme, highlighting the validity of five new LTV features. The results achieved by our LTV data-driven approach using neural networks allowed an improvement up to 4 pp in the Lift cumulative curve for targeting the deposit subscribers when compared with a baseline model (with no history data). Explanatory knowledge was also extracted from the proposed model, revealing two highly relevant LTV features, the last result of the previous campaign to sell the same product and the frequency of past client successes. The obtained results are particularly valuable for contact center companies, which can improve pre- dictive performance without even having to ask for more information to the companies they serve.
Resumo:
Earthworks tasks aim at levelling the ground surface at a target construction area and precede any kind of structural construction (e.g., road and railway construction). It is comprised of sequential tasks, such as excavation, transportation, spreading and compaction, and it is strongly based on heavy mechanical equipment and repetitive processes. Under this context, it is essential to optimize the usage of all available resources under two key criteria: the costs and duration of earthwork projects. In this paper, we present an integrated system that uses two artificial intelligence based techniques: data mining and evolutionary multi-objective optimization. The former is used to build data-driven models capable of providing realistic estimates of resource productivity, while the latter is used to optimize resource allocation considering the two main earthwork objectives (duration and cost). Experiments held using real-world data, from a construction site, have shown that the proposed system is competitive when compared with current manual earthwork design.
Resumo:
Tese de Doutoramento Ramo Engenharia Industrial e de Sistemas
Resumo:
The mass of the top quark is measured in a data set corresponding to 4.6 fb−1 of proton--proton collisions with centre-of-mass energy s√=7 TeV collected by the ATLAS detector at the LHC. Events consistent with hadronic decays of top--antitop quark pairs with at least six jets in the final state are selected. The substantial background from multijet production is modelled with data-driven methods that utilise the number of identified b-quark jets and the transverse momentum of the sixth leading jet, which have minimal correlation. The top-quark mass is obtained from template fits to the ratio of three-jet to dijet mass. The three-jet mass is calculated from the three jets of a top-quark decay. Using these three jets the dijet mass is obtained from the two jets of the W boson decay. The top-quark mass obtained from this fit is thus less sensitive to the uncertainty in the energy measurement of the jets. A binned likelihood fit yields a top-quark mass of mt = 175.1 ± 1.4 (stat.) ± 1.2 (syst.) GeV.
Resumo:
Currently, the quality of the Indonesian national road network is inadequate due to several constraints, including overcapacity and overloaded trucks. The high deterioration rate of the road infrastructure in developing countries along with major budgetary restrictions and high growth in traffic have led to an emerging need for improving the performance of the highway maintenance system. However, the high number of intervening factors and their complex effects require advanced tools to successfully solve this problem. The high learning capabilities of Data Mining (DM) are a powerful solution to this problem. In the past, these tools have been successfully applied to solve complex and multi-dimensional problems in various scientific fields. Therefore, it is expected that DM can be used to analyze the large amount of data regarding the pavement and traffic, identify the relationship between variables, and provide information regarding the prediction of the data. In this paper, we present a new approach to predict the International Roughness Index (IRI) of pavement based on DM techniques. DM was used to analyze the initial IRI data, including age, Equivalent Single Axle Load (ESAL), crack, potholes, rutting, and long cracks. This model was developed and verified using data from an Integrated Indonesia Road Management System (IIRMS) that was measured with the National Association of Australian State Road Authorities (NAASRA) roughness meter. The results of the proposed approach are compared with the IIRMS analytical model adapted to the IRI, and the advantages of the new approach are highlighted. We show that the novel data-driven model is able to learn (with high accuracy) the complex relationships between the IRI and the contributing factors of overloaded trucks
Resumo:
This paper describes the concept, technical realisation and validation of a largely data-driven method to model events with Z→ττ decays. In Z→μμ events selected from proton-proton collision data recorded at s√=8 TeV with the ATLAS experiment at the LHC in 2012, the Z decay muons are replaced by τ leptons from simulated Z→ττ decays at the level of reconstructed tracks and calorimeter cells. The τ lepton kinematics are derived from the kinematics of the original muons. Thus, only the well-understood decays of the Z boson and τ leptons as well as the detector response to the τ decay products are obtained from simulation. All other aspects of the event, such as the Z boson and jet kinematics as well as effects from multiple interactions, are given by the actual data. This so-called τ-embedding method is particularly relevant for Higgs boson searches and analyses in ττ final states, where Z→ττ decays constitute a large irreducible background that cannot be obtained directly from data control samples.