927 resultados para Mining extraction model
Resumo:
This article examines whether UK portfolio returns are time varying so that expected returns follow an AR(1) process as proposed by Conrad and Kaul for the USA. It explores this hypothesis for four portfolios that have been formed on the basis of market capitalization. The portfolio returns are modelled using a kalman filter signal extraction model in which the unobservable expected return is the state variable and is allowed to evolve as a stationary first order autoregressive process. It finds that this model is a good representation of returns and can account for most of the autocorrelation present in observed portfolio returns. This study concludes that UK portfolio returns are time varying and the nature of the time variation appears to introduce a substantial amount of autocorrelation to portfolio returns. Like Conrad and Kaul if finds a link between the extent to which portfolio returns are time varying and the size of firms within a portfolio but not the monotonic one found for the USA. © 2004 Taylor and Francis Ltd.
Resumo:
In clinical documents, medical terms are often expressed in multi-word phrases. Traditional topic modelling approaches relying on the “bag-of-words” assumption are not effective in extracting topic themes from clinical documents. This paper proposes to first extract medical phrases using an off-the-shelf tool for medical concept mention extraction, and then train a topic model which takes a hierarchy of Pitman-Yor processes as prior for modelling the generation of phrases of arbitrary length. Experimental results on patients’ discharge summaries show that the proposed approach outperforms the state-of-the-art topical phrase extraction model on both perplexity and topic coherence measure and finds more interpretable topics.
Resumo:
This paper proposes a novel framework of incorporating protein-protein interactions (PPI) ontology knowledge into PPI extraction from biomedical literature in order to address the emerging challenges of deep natural language understanding. It is built upon the existing work on relation extraction using the Hidden Vector State (HVS) model. The HVS model belongs to the category of statistical learning methods. It can be trained directly from un-annotated data in a constrained way whilst at the same time being able to capture the underlying named entity relationships. However, it is difficult to incorporate background knowledge or non-local information into the HVS model. This paper proposes to represent the HVS model as a conditionally trained undirected graphical model in which non-local features derived from PPI ontology through inference would be easily incorporated. The seamless fusion of ontology inference with statistical learning produces a new paradigm to information extraction.
Resumo:
Measurement of hepatic oxygen extraction was performed on six healthy Greyhound dogs over a two hour period. The Greyhounds were anaesthetised and a right subcostal surgical incision performed. Ultrasonic flow transducers were used to measure flow rate in the hepatic artery and the portal vein. The blood oxygen tensions in arterial blood and in the portal and hepatic veins were also measured. Hepatic oxygen extraction remained stable throughout the study, despite a steady decline in arterial blood pressure. The methodology described in this study provides a direct measure of oxygen uptake by the liver in the dog and could readily be used to investigate hepatic uptake of drugs. (C) 2003 Elsevier Ltd. All rights reserved.
Resumo:
With the electricity market liberalization, distribution and retail companies are looking for better market strategies based on adequate information upon the consumption patterns of its electricity customers. In this environment all consumers are free to choose their electricity supplier. A fair insight on the customer´s behaviour will permit the definition of specific contract aspects based on the different consumption patterns. In this paper Data Mining (DM) techniques are applied to electricity consumption data from a utility client’s database. To form the different customer´s classes, and find a set of representative consumption patterns, we have used the Two-Step algorithm which is a hierarchical clustering algorithm. Each consumer class will be represented by its load profile resulting from the clustering operation. Next, to characterize each consumer class a classification model will be constructed with the C5.0 classification algorithm.
Resumo:
PURPOSE: Fatty liver disease (FLD) is an increasing prevalent disease that can be reversed if detected early. Ultrasound is the safest and ubiquitous method for identifying FLD. Since expert sonographers are required to accurately interpret the liver ultrasound images, lack of the same will result in interobserver variability. For more objective interpretation, high accuracy, and quick second opinions, computer aided diagnostic (CAD) techniques may be exploited. The purpose of this work is to develop one such CAD technique for accurate classification of normal livers and abnormal livers affected by FLD. METHODS: In this paper, the authors present a CAD technique (called Symtosis) that uses a novel combination of significant features based on the texture, wavelet transform, and higher order spectra of the liver ultrasound images in various supervised learning-based classifiers in order to determine parameters that classify normal and FLD-affected abnormal livers. RESULTS: On evaluating the proposed technique on a database of 58 abnormal and 42 normal liver ultrasound images, the authors were able to achieve a high classification accuracy of 93.3% using the decision tree classifier. CONCLUSIONS: This high accuracy added to the completely automated classification procedure makes the authors' proposed technique highly suitable for clinical deployment and usage.
Resumo:
Dissertation to obtain a Master Degree in Biotechnology
Resumo:
Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.
Resumo:
Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação
Resumo:
BACKGROUND Spain shows the highest bladder cancer incidence rates in men among European countries. The most important risk factors are tobacco smoking and occupational exposure to a range of different chemical substances, such as aromatic amines. METHODS This paper describes the municipal distribution of bladder cancer mortality and attempts to "adjust" this spatial pattern for the prevalence of smokers, using the autoregressive spatial model proposed by Besag, York and Molliè, with relative risk of lung cancer mortality as a surrogate. RESULTS It has been possible to compile and ascertain the posterior distribution of relative risk for bladder cancer adjusted for lung cancer mortality, on the basis of a single Bayesian spatial model covering all of Spain's 8077 towns. Maps were plotted depicting smoothed relative risk (RR) estimates, and the distribution of the posterior probability of RR>1 by sex. Towns that registered the highest relative risks for both sexes were mostly located in the Provinces of Cadiz, Seville, Huelva, Barcelona and Almería. The highest-risk area in Barcelona Province corresponded to very specific municipal areas in the Bages district, e.g., Suría, Sallent, Balsareny, Manresa and Cardona. CONCLUSION Mining/industrial pollution and the risk entailed in certain occupational exposures could in part be dictating the pattern of municipal bladder cancer mortality in Spain. Population exposure to arsenic is a matter that calls for attention. It would be of great interest if the relationship between the chemical quality of drinking water and the frequency of bladder cancer could be studied.
Resumo:
A quantitative model of water movement within the immediate vicinity of an individual root is developed and results of an experiment to validate the model are presented. The model is based on the assumption that the amount of water transpired by a plant in a certain period is replaced by an equal volume entering its root system during the same time. The model is based on the Darcy-Buckingham equation to calculate the soil water matric potential at any distance from a plant root as a function of parameters related to crop, soil and atmospheric conditions. The model output is compared against measurements of soil water depletion by rice roots monitored using γ-beam attenuation in a greenhouse of the Escola Superior de Agricultura "Luiz de Queiroz"/Universidade de São Paulo(ESALQ/USP) in Piracicaba, State of São Paulo, Brazil, in 1993. The experimental results are in agreement with the output from the model. Model simulations show that a single plant root is able to withdraw water from more than 0.1 m away within a few days. We therefore can assume that root distribution is a less important factor for soil water extraction efficiency.
Resumo:
Choice of industrial development options and the relevant allocation of the research funds become more and more difficult because of the increasing R&D costs and pressure for shorter development period. Forecast of the research progress is based on the analysis of the publications activity in the field of interest as well as on the dynamics of its change. Moreover, allocation of funds is hindered by exponential growth in the number of publications and patents. Thematic clusters become more and more difficult to identify, and their evolution hard to follow. The existing approaches of research field structuring and identification of its development are very limited. They do not identify the thematic clusters with adequate precision while the identified trends are often ambiguous. Therefore, there is a clear need to develop methods and tools, which are able to identify developing fields of research. The main objective of this Thesis is to develop tools and methods helping in the identification of the promising research topics in the field of separation processes. Two structuring methods as well as three approaches for identification of the development trends have been proposed. The proposed methods have been applied to the analysis of the research on distillation and filtration. The results show that the developed methods are universal and could be used to study of the various fields of research. The identified thematic clusters and the forecasted trends of their development have been confirmed in almost all tested cases. It proves the universality of the proposed methods. The results allow for identification of the fast-growing scientific fields as well as the topics characterized by stagnant or diminishing research activity.
Resumo:
The objective of this research is to observe the state of customer value management in Outotec Oyj, determine the key development areas and develop a phase model with which to guide the development of a customer value based sales tool. The study was conducted with a constructive research approach with the focus of identifying a problem and developing a solution for the problem. As a basis for the study, the current literature involving customer value assessment and solution and customer value selling was studied. The data was collected by conducting 16 interviews in two rounds within the company and it was analyzed by coding openly. First, seven important development areas were identified, out of which the most critical were “Customer value mindset inside the company” and “Coordination of customer value management activities”. Utilizing these seven areas three functionality requirements, “Preparation”, “Outotec’s value creation and communication” and “Documentation” and three development requirements for a customer value sales tool were identified. The study concluded with the formulation of a phase model for building a customer value based sales tool. The model included five steps that were defined as 1) Enable customer value utilization, 2) Connect with the customer, 3) Create customer value, 4) Define tool to facilitate value selling and 5) Develop sales tool. Further practical activities were also recommended as a guide for executing the phase model.