905 resultados para Web Mining, Data Mining, User Topic Model, Web User Profiles


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Lecture Notes in Computer Science, 9273

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In Maternity Care, a quick decision has to be made about the most suitable delivery type for the current patient. Guidelines are followed by physicians to support that decision; however, those practice recommendations are limited and underused. In the last years, caesarean delivery has been pursued in over 28% of pregnancies, and other operative techniques regarding specific problems have also been excessively employed. This study identifies obstetric and pregnancy factors that can be used to predict the most appropriate delivery technique, through the induction of data mining models using real data gathered in the perinatal and maternal care unit of Centro Hospitalar of Oporto (CHP). Predicting the type of birth envisions high-quality services, increased safety and effectiveness of specific practices to help guide maternity care decisions and facilitate optimal outcomes in mother and child. In this work was possible to acquire good results, achieving sensitivity and specificity values of 90.11% and 80.05%, respectively, providing the CHP with a model capable of correctly identify caesarean sections and vaginal deliveries.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The research aimed to establish tyre-road noise models by using a Data Mining approach that allowed to build a predictive model and assess the importance of the tested input variables. The data modelling took into account three learning algorithms and three metrics to define the best predictive model. The variables tested included basic properties of pavement surfaces, macrotexture, megatexture, and uneven- ness and, for the first time, damping. Also, the importance of those variables was measured by using a sensitivity analysis procedure. Two types of models were set: one with basic variables and another with complex variables, such as megatexture and damping, all as a function of vehicles speed. More detailed models were additionally set by the speed level. As a result, several models with very good tyre-road noise predictive capacity were achieved. The most relevant variables were Speed, Temperature, Aggregate size, Mean Profile Depth, and Damping, which had the highest importance, even though influenced by speed. Megatexture and IRI had the lowest importance. The applicability of the models developed in this work is relevant for trucks tyre-noise prediction, represented by the AVON V4 test tyre, at the early stage of road pavements use. Therefore, the obtained models are highly useful for the design of pavements and for noise prediction by road authorities and contractors.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Currently, the quality of the Indonesian national road network is inadequate due to several constraints, including overcapacity and overloaded trucks. The high deterioration rate of the road infrastructure in developing countries along with major budgetary restrictions and high growth in traffic have led to an emerging need for improving the performance of the highway maintenance system. However, the high number of intervening factors and their complex effects require advanced tools to successfully solve this problem. The high learning capabilities of Data Mining (DM) are a powerful solution to this problem. In the past, these tools have been successfully applied to solve complex and multi-dimensional problems in various scientific fields. Therefore, it is expected that DM can be used to analyze the large amount of data regarding the pavement and traffic, identify the relationship between variables, and provide information regarding the prediction of the data. In this paper, we present a new approach to predict the International Roughness Index (IRI) of pavement based on DM techniques. DM was used to analyze the initial IRI data, including age, Equivalent Single Axle Load (ESAL), crack, potholes, rutting, and long cracks. This model was developed and verified using data from an Integrated Indonesia Road Management System (IIRMS) that was measured with the National Association of Australian State Road Authorities (NAASRA) roughness meter. The results of the proposed approach are compared with the IIRMS analytical model adapted to the IRI, and the advantages of the new approach are highlighted. We show that the novel data-driven model is able to learn (with high accuracy) the complex relationships between the IRI and the contributing factors of overloaded trucks

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Football is considered nowadays one of the most popular sports. In the betting world, it has acquired an outstanding position, which moves millions of euros during the period of a single football match. The lack of profitability of football betting users has been stressed as a problem. This lack gave origin to this research proposal, which it is going to analyse the possibility of existing a way to support the users to increase their profits on their bets. Data mining models were induced with the purpose of supporting the gamblers to increase their profits in the medium/long term. Being conscience that the models can fail, the results achieved by four of the seven targets in the models are encouraging and suggest that the system can help to increase the profits. All defined targets have two possible classes to predict, for example, if there are more or less than 7.5 corners in a single game. The data mining models of the targets, more or less than 7.5 corners, 8.5 corners, 1.5 goals and 3.5 goals achieved the pre-defined thresholds. The models were implemented in a prototype, which it is a pervasive decision support system. This system was developed with the purpose to be an interface for any user, both for an expert user as to a user who has no knowledge in football games.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Driven by concerns about rising energy costs, security of supply and climate change a new wave of Sustainable Energy Technologies (SET’s) have been embraced by the Irish consumer. Such systems as solar collectors, heat pumps and biomass boilers have become common due to government backed financial incentives and revisions of the building regulations. However, there is a deficit of knowledge and understanding of how these technologies operate and perform under Ireland’s maritime climate. This AQ-WBL project was designed to address both these needs by developing a Data Acquisition (DAQ) system to monitor the performance of such technologies and a web-based learning environment to disseminate performance characteristics and supplementary information about these systems. A DAQ system consisting of 108 sensors was developed as part of Galway-Mayo Institute of Technology’s (GMIT’s) Centre for the Integration of Sustainable EnergyTechnologies (CiSET) in an effort to benchmark the performance of solar thermal collectors and Ground Source Heat Pumps (GSHP’s) under Irish maritime climate, research new methods of integrating these systems within the built environment and raise awareness of SET’s. It has operated reliably for over 2 years and has acquired over 25 million data points. Raising awareness of these SET’s is carried out through the dissemination of the performance data through an online learning environment. A learning environment was created to provide different user groups with a basic understanding of a SET’s with the support of performance data, through a novel 5 step learning process and two examples were developed for the solar thermal collectors and the weather station which can be viewed at http://www.kdp 1 .aquaculture.ie/index.aspx. This online learning environment has been demonstrated to and well received by different groups of GMIT’s undergraduate students and plans have been made to develop it further to support education, awareness, research and regional development.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Aquesta memòria tracta sobre el procediment de creació d’una aplicació web de notícies. Està dividida en 3 zones, una on usuaris amb permisos d’administració poden penjar notícies per ser visualitzades per tothom, una altra que s’hi accedeix si s’és usuari registrat i permet visualitzar noticies d’altres servidors mitjançant el format de dades RSS, i un tercer apartat de gestió administrativa, incorporar noves notícies, modificar-ne de presents o introduir noves pàgines web que continguin notícies. Els usuaris registrats podran seleccionar el diaris dels quals rebran informació, així com especificar quines temàtiques prefereixen en la cerca de notícies.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Consider a model with parameter phi, and an auxiliary model with parameter theta. Let phi be a randomly sampled from a given density over the known parameter space. Monte Carlo methods can be used to draw simulated data and compute the corresponding estimate of theta, say theta_tilde. A large set of tuples (phi, theta_tilde) can be generated in this manner. Nonparametric methods may be use to fit the function E(phi|theta_tilde=a), using these tuples. It is proposed to estimate phi using the fitted E(phi|theta_tilde=theta_hat), where theta_hat is the auxiliary estimate, using the real sample data. This is a consistent and asymptotically normally distributed estimator, under certain assumptions. Monte Carlo results for dynamic panel data and vector autoregressions show that this estimator can have very attractive small sample properties. Confidence intervals can be constructed using the quantiles of the phi for which theta_tilde is close to theta_hat. Such confidence intervals are found to have very accurate coverage.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

En aquest treball es realitzarà bàsicament l'estudi preliminar d'investigació i disseny de l'aplicació.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

BACKGROUND Spain shows the highest bladder cancer incidence rates in men among European countries. The most important risk factors are tobacco smoking and occupational exposure to a range of different chemical substances, such as aromatic amines. METHODS This paper describes the municipal distribution of bladder cancer mortality and attempts to "adjust" this spatial pattern for the prevalence of smokers, using the autoregressive spatial model proposed by Besag, York and Molliè, with relative risk of lung cancer mortality as a surrogate. RESULTS It has been possible to compile and ascertain the posterior distribution of relative risk for bladder cancer adjusted for lung cancer mortality, on the basis of a single Bayesian spatial model covering all of Spain's 8077 towns. Maps were plotted depicting smoothed relative risk (RR) estimates, and the distribution of the posterior probability of RR>1 by sex. Towns that registered the highest relative risks for both sexes were mostly located in the Provinces of Cadiz, Seville, Huelva, Barcelona and Almería. The highest-risk area in Barcelona Province corresponded to very specific municipal areas in the Bages district, e.g., Suría, Sallent, Balsareny, Manresa and Cardona. CONCLUSION Mining/industrial pollution and the risk entailed in certain occupational exposures could in part be dictating the pattern of municipal bladder cancer mortality in Spain. Population exposure to arsenic is a matter that calls for attention. It would be of great interest if the relationship between the chemical quality of drinking water and the frequency of bladder cancer could be studied.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The objective of the PANACEA ICT-2007.2.2 EU project is to build a platform that automates the stages involved in the acquisition,production, updating and maintenance of the large language resources required by, among others, MT systems. The development of a Corpus Acquisition Component (CAC) for extracting monolingual and bilingual data from the web is one of the most innovative building blocks of PANACEA. The CAC, which is the first stage in the PANACEA pipeline for building Language Resources, adopts an efficient and distributed methodology to crawl for web documents with rich textual content in specific languages and predefined domains. The CAC includes modules that can acquire parallel data from sites with in-domain content available in more than one language. In order to extrinsically evaluate the CAC methodology, we have conducted several experiments that used crawled parallel corpora for the identification and extraction of parallel sentences using sentence alignment. The corpora were then successfully used for domain adaptation of Machine Translation Systems.