956 resultados para Models for count data
Resumo:
Dissertação de mestrado em Estatística
Resumo:
Football is considered nowadays one of the most popular sports. In the betting world, it has acquired an outstanding position, which moves millions of euros during the period of a single football match. The lack of profitability of football betting users has been stressed as a problem. This lack gave origin to this research proposal, which it is going to analyse the possibility of existing a way to support the users to increase their profits on their bets. Data mining models were induced with the purpose of supporting the gamblers to increase their profits in the medium/long term. Being conscience that the models can fail, the results achieved by four of the seven targets in the models are encouraging and suggest that the system can help to increase the profits. All defined targets have two possible classes to predict, for example, if there are more or less than 7.5 corners in a single game. The data mining models of the targets, more or less than 7.5 corners, 8.5 corners, 1.5 goals and 3.5 goals achieved the pre-defined thresholds. The models were implemented in a prototype, which it is a pervasive decision support system. This system was developed with the purpose to be an interface for any user, both for an expert user as to a user who has no knowledge in football games.
Resumo:
Healthcare organizations often benefit from information technologies as well as embedded decision support systems, which improve the quality of services and help preventing complications and adverse events. In Centro Materno Infantil do Norte (CMIN), the maternal and perinatal care unit of Centro Hospitalar of Oporto (CHP), an intelligent pre-triage system is implemented, aiming to prioritize patients in need of gynaecology and obstetrics care in two classes: urgent and consultation. The system is designed to evade emergency problems such as incorrect triage outcomes and extensive triage waiting times. The current study intends to improve the triage system, and therefore, optimize the patient workflow through the emergency room, by predicting the triage waiting time comprised between the patient triage and their medical admission. For this purpose, data mining (DM) techniques are induced in selected information provided by the information technologies implemented in CMIN. The DM models achieved accuracy values of approximately 94% with a five range target distribution, which not only allow obtaining confident prediction models, but also identify the variables that stand as direct inducers to the triage waiting times.
Resumo:
The needs of reducing human error has been growing in every field of study, and medicine is one of those. Through the implementation of technologies is possible to help in the decision making process of clinics, therefore to reduce the difficulties that are typically faced. This study focuses on easing some of those difficulties by presenting real-time data mining models capable of predicting if a monitored patient, typically admitted in intensive care, will need to take vasopressors. Data Mining models were induced using clinical variables such as vital signs, laboratory analysis, among others. The best model presented a sensitivity of 94.94%. With this model it is possible reducing the misuse of vasopressors acting as prevention. At same time it is offered a better care to patients by anticipating their treatment with vasopressors.
Resumo:
An unsuitable patient flow as well as prolonged waiting lists in the emergency room of a maternity unit, regarding gynecology and obstetrics care, can affect the mother and child’s health, leading to adverse events and consequences regarding their safety and satisfaction. Predicting the patients’ waiting time in the emergency room is a means to avoid this problem. This study aims to predict the pre-triage waiting time in the emergency care of gynecology and obstetrics of Centro Materno Infantil do Norte (CMIN), the maternal and perinatal care unit of Centro Hospitalar of Oporto, situated in the north of Portugal. Data mining techniques were induced using information collected from the information systems and technologies available in CMIN. The models developed presented good results reaching accuracy and specificity values of approximately 74% and 94%, respectively. Additionally, the number of patients and triage professionals working in the emergency room, as well as some temporal variables were identified as direct enhancers to the pre-triage waiting time. The imp lementation of the attained knowledge in the decision support system and business intelligence platform, deployed in CMIN, leads to the optimization of the patient flow through the emergency room and improving the quality of services.
Resumo:
Patient blood pressure is an important vital signal to the physicians take a decision and to better understand the patient condition. In Intensive Care Units is possible monitoring the blood pressure due the fact of the patient being in continuous monitoring through bedside monitors and the use of sensors. The intensivist only have access to vital signs values when they look to the monitor or consult the values hourly collected. Most important is the sequence of the values collected, i.e., a set of highest or lowest values can signify a critical event and bring future complications to a patient as is Hypotension or Hypertension. This complications can leverage a set of dangerous diseases and side-effects. The main goal of this work is to predict the probability of a patient has a blood pressure critical event in the next hours by combining a set of patient data collected in real-time and using Data Mining classification techniques. As output the models indicate the probability (%) of a patient has a Blood Pressure Critical Event in the next hour. The achieved results showed to be very promising, presenting sensitivity around of 95%.
Resumo:
Expectations are central to behaviour. Despite the existence of subjective expectations data, the standard approach is to ignore these, to hypothecate a model of behaviour and to infer expectations from realisations. In the context of income models, we reveal the informational gain obtained from using both a canonical model and subjective expectations data. We propose a test for this informational gain, and illustrate our approach with an application to the problem of measuring income risk.
Resumo:
Based on an behavioral equilibrium exchange rate model, this paper examines the determinants of the real effective exchange rate and evaluates the degree of misalignment of a group of currencies since 1980. Within a panel cointegration setting, we estimate the relationship between exchange rate and a set of economic fundamentals, such as traded-nontraded productivity differentials and the stock of foreign assets. Having ascertained the variables are integrated and cointegrated, the long-run equilibrium value of the fundamentals are estimated and used to derive equilibrium exchange rates and misalignments. Although there is statistical homogeneity, some structural differences were found to exist between advanced and emerging economies.
Resumo:
We present experimental and theoretical analyses of data requirements for haplotype inference algorithms. Our experiments include a broad range of problem sizes under two standard models of tree distribution and were designed to yield statistically robust results despite the size of the sample space. Our results validate Gusfield's conjecture that a population size of n log n is required to give (with high probability) sufficient information to deduce the n haplotypes and their complete evolutionary history. The experimental results inspired our experimental finding with theoretical bounds on the population size. We also analyze the population size required to deduce some fixed fraction of the evolutionary history of a set of n haplotypes and establish linear bounds on the required sample size. These linear bounds are also shown theoretically.
Resumo:
There is recent interest in the generalization of classical factor models in which the idiosyncratic factors are assumed to be orthogonal and there are identification restrictions on cross-sectional and time dimensions. In this study, we describe and implement a Bayesian approach to generalized factor models. A flexible framework is developed to determine the variations attributed to common and idiosyncratic factors. We also propose a unique methodology to select the (generalized) factor model that best fits a given set of data. Applying the proposed methodology to the simulated data and the foreign exchange rate data, we provide a comparative analysis between the classical and generalized factor models. We find that when there is a shift from classical to generalized, there are significant changes in the estimates of the structures of the covariance and correlation matrices while there are less dramatic changes in the estimates of the factor loadings and the variation attributed to common factors.
Resumo:
Among the largest resources for biological sequence data is the large amount of expressed sequence tags (ESTs) available in public and proprietary databases. ESTs provide information on transcripts but for technical reasons they often contain sequencing errors. Therefore, when analyzing EST sequences computationally, such errors must be taken into account. Earlier attempts to model error prone coding regions have shown good performance in detecting and predicting these while correcting sequencing errors using codon usage frequencies. In the research presented here, we improve the detection of translation start and stop sites by integrating a more complex mRNA model with codon usage bias based error correction into one hidden Markov model (HMM), thus generalizing this error correction approach to more complex HMMs. We show that our method maintains the performance in detecting coding sequences.
Resumo:
A large fraction of genome variation between individuals is comprised of submicroscopic copy number variation of genomic DNA segments. We assessed the relative contribution of structural changes and gene dosage alterations on phenotypic outcomes with mouse models of Smith-Magenis and Potocki-Lupski syndromes. We phenotyped mice with 1n (Deletion/+), 2n (+/+), 3n (Duplication/+), and balanced 2n compound heterozygous (Deletion/Duplication) copies of the same region. Parallel to the observations made in humans, such variation in gene copy number was sufficient to generate phenotypic consequences: in a number of cases diametrically opposing phenotypes were associated with gain versus loss of gene content. Surprisingly, some neurobehavioral traits were not rescued by restoration of the normal gene copy number. Transcriptome profiling showed that a highly significant propensity of transcriptional changes map to the engineered interval in the five assessed tissues. A statistically significant overrepresentation of the genes mapping to the entire length of the engineered chromosome was also found in the top-ranked differentially expressed genes in the mice containing rearranged chromosomes, regardless of the nature of the rearrangement, an observation robust across different cell lineages of the central nervous system. Our data indicate that a structural change at a given position of the human genome may affect not only locus and adjacent gene expression but also "genome regulation." Furthermore, structural change can cause the same perturbation in particular pathways regardless of gene dosage. Thus, the presence of a genomic structural change, as well as gene dosage imbalance, contributes to the ultimate phenotype.
Resumo:
This study aimed at identifying clinical factors for predicting hematologic toxicity after radioimmunotherapy with (90)Y-ibritumomab tiuxetan or (131)I-tositumomab in clinical practice. Hematologic data were available from 14 non-Hodgkin lymphoma patients treated with (90)Y-ibritumomab tiuxetan and 18 who received (131)I-tositumomab. The percentage baseline at nadir and 4 wk post nadir and the time to nadir were selected as the toxicity indicators for both platelets and neutrophils. Multiple linear regression analysis was performed to identify significant predictors (P < 0.05) of each indicator. For both platelets and neutrophils, pooled and separate analyses of (90)Y-ibritumomab tiuxetan and (131)I-tositumomab data yielded the time elapsed since the last chemotherapy as the only significant predictor of the percentage baseline at nadir. The extent of bone marrow involvement was not a significant factor in this study, possibly because of the short time elapsed since the last chemotherapy of the 7 patients with bone marrow involvement. Because both treatments were designed to deliver a comparable bone marrow dose, this factor also was not significant. None of the 14 factors considered was predictive of the time to nadir. The R(2) value for the model predicting percentage baseline at nadir was 0.60 for platelets and 0.40 for neutrophils. This model predicted the platelet and neutrophil toxicity grade to within ±1 for 28 and 30 of the 32 patients, respectively. For the 7 patients predicted with grade I thrombocytopenia, 6 of whom had actual grade I-II, dosing might be increased to improve treatment efficacy. The elapsed time since the last chemotherapy can be used to predict hematologic toxicity and customize the current dosing method in radioimmunotherapy.
Resumo:
This paper does two things. First, it presents alternative approaches to the standard methods of estimating productive efficiency using a production function. It favours a parametric approach (viz. the stochastic production frontier approach) over a nonparametric approach (e.g. data envelopment analysis); and, further, one that provides a statistical explanation of efficiency, as well as an estimate of its magnitude. Second, it illustrates the favoured approach (i.e. the ‘single stage procedure’) with estimates of two models of explained inefficiency, using data from the Thai manufacturing sector, after the crisis of 1997. Technical efficiency is modelled as being dependent on capital investment in three major areas (viz. land, machinery and office appliances) where land is intended to proxy the effects of unproductive, speculative capital investment; and both machinery and office appliances are intended to proxy the effects of productive, non-speculative capital investment. The estimates from these models cast new light on the five-year long, post-1997 crisis period in Thailand, suggesting a structural shift from relatively labour intensive to relatively capital intensive production in manufactures from 1998 to 2002.
Resumo:
Background Alzheimer's disease (AD) is the leading form of dementia worldwide. The Aß-peptide is believed to be the major pathogenic compound of the disease. Since several years it is hypothesized that Aß impacts the Wnt signaling cascade and therefore activation of this signaling pathway is proposed to rescue the neurotoxic effect of Aß. Findings Expression of the human Aß42 in the Drosophila nervous system leads to a drastically shortened life span. We found that the action of Aß42 specifically in the glutamatergic motoneurons is responsible for the reduced survival. However, we find that the morphology of the glutamatergic larval neuromuscular junctions, which are widely used as the model for mammalian central nervous system synapses, is not affected by Aß42 expression. We furthermore demonstrate that genetic activation of the Wnt signal transduction pathway in the nervous system is not able to rescue the shortened life span or a rough eye phenotype in Drosophila. Conclusions Our data confirm that the life span is a useful readout of Aß42 induced neurotoxicity in Drosophila; the neuromuscular junction seems however not to be an appropriate model to study AD in flies. Additionally, our results challenge the hypothesis that Wnt signaling might be implicated in Aß42 toxicity and might serve as a drug target against AD.