958 resultados para multivariate binary data
Resumo:
Thesis (Master's)--University of Washington, 2016-06
Resumo:
La stratégie actuelle de contrôle de la qualité de l’anode est inadéquate pour détecter les anodes défectueuses avant qu’elles ne soient installées dans les cuves d’électrolyse. Des travaux antérieurs ont porté sur la modélisation du procédé de fabrication des anodes afin de prédire leurs propriétés directement après la cuisson en utilisant des méthodes statistiques multivariées. La stratégie de carottage des anodes utilisée à l’usine partenaire fait en sorte que ce modèle ne peut être utilisé que pour prédire les propriétés des anodes cuites aux positions les plus chaudes et les plus froides du four à cuire. Le travail actuel propose une stratégie pour considérer l’histoire thermique des anodes cuites à n’importe quelle position et permettre de prédire leurs propriétés. Il est montré qu’en combinant des variables binaires pour définir l’alvéole et la position de cuisson avec les données routinières mesurées sur le four à cuire, les profils de température des anodes cuites à différentes positions peuvent être prédits. Également, ces données ont été incluses dans le modèle pour la prédiction des propriétés des anodes. Les résultats de prédiction ont été validés en effectuant du carottage supplémentaire et les performances du modèle sont concluantes pour la densité apparente et réelle, la force de compression, la réactivité à l’air et le Lc et ce peu importe la position de cuisson.
Resumo:
BACKGROUND: Over the past decade, physician-rating websites have been gaining attention in scientific literature and in the media. However, little knowledge is available about the awareness and the impact of using such sites on health care professionals. It also remains unclear what key predictors are associated with the knowledge and the use of physician-rating websites. OBJECTIVE: To estimate the current level of awareness and use of physician-rating websites in Germany and to determine their impact on physician choice making and the key predictors which are associated with the knowledge and the use of physician-rating websites. METHODS: This study was designed as a cross-sectional survey. An online panel was consulted in January 2013. A questionnaire was developed containing 28 questions; a pretest was carried out to assess the comprehension of the questionnaire. Several sociodemographic (eg, age, gender, health insurance status, Internet use) and 2 health-related independent variables (ie, health status and health care utilization) were included. Data were analyzed using descriptive statistics, chi-square tests, and t tests. Binary multivariate logistic regression models were performed for elaborating the characteristics of physician-rating website users. Results from the logistic regression are presented for both the observed and weighted sample. RESULTS: In total, 1505 respondents (mean age 43.73 years, SD 14.39; 857/1505, 57.25% female) completed our survey. Of all respondents, 32.09% (483/1505) heard of physician-rating websites and 25.32% (381/1505) already had used a website when searching for a physician. Furthermore, 11.03% (166/1505) had already posted a rating on a physician-rating website. Approximately 65.35% (249/381) consulted a particular physician based on the ratings shown on the websites; in contrast, 52.23% (199/381) had not consulted a particular physician because of the publicly reported ratings. Significantly higher likelihoods for being aware of the websites could be demonstrated for female participants (P<.001), those who were widowed (P=.01), covered by statutory health insurance (P=.02), and with higher health care utilization (P<.001). Health care utilization was significantly associated with all dependent variables in our multivariate logistic regression models (P<.001). Furthermore, significantly higher scores could be shown for health insurance status in the unweighted and Internet use in the weighted models. CONCLUSIONS: Neither health policy makers nor physicians should underestimate the influence of physician-rating websites. They already play an important role in providing information to help patients decide on an appropriate physician. Assuming there will be a rising level of public awareness, the influence of their use will increase well into the future. Future studies should assess the impact of physician-rating websites under experimental conditions and investigate whether physician-rating websites have the potential to reflect the quality of care offered by health care providers.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
This dissertation contains four essays that all share a common purpose: developing new methodologies to exploit the potential of high-frequency data for the measurement, modeling and forecasting of financial assets volatility and correlations. The first two chapters provide useful tools for univariate applications while the last two chapters develop multivariate methodologies. In chapter 1, we introduce a new class of univariate volatility models named FloGARCH models. FloGARCH models provide a parsimonious joint model for low frequency returns and realized measures, and are sufficiently flexible to capture long memory as well as asymmetries related to leverage effects. We analyze the performances of the models in a realistic numerical study and on the basis of a data set composed of 65 equities. Using more than 10 years of high-frequency transactions, we document significant statistical gains related to the FloGARCH models in terms of in-sample fit, out-of-sample fit and forecasting accuracy compared to classical and Realized GARCH models. In chapter 2, using 12 years of high-frequency transactions for 55 U.S. stocks, we argue that combining low-frequency exogenous economic indicators with high-frequency financial data improves the ability of conditionally heteroskedastic models to forecast the volatility of returns, their full multi-step ahead conditional distribution and the multi-period Value-at-Risk. Using a refined version of the Realized LGARCH model allowing for time-varying intercept and implemented with realized kernels, we document that nominal corporate profits and term spreads have strong long-run predictive ability and generate accurate risk measures forecasts over long-horizon. The results are based on several loss functions and tests, including the Model Confidence Set. Chapter 3 is a joint work with David Veredas. We study the class of disentangled realized estimators for the integrated covariance matrix of Brownian semimartingales with finite activity jumps. These estimators separate correlations and volatilities. We analyze different combinations of quantile- and median-based realized volatilities, and four estimators of realized correlations with three synchronization schemes. Their finite sample properties are studied under four data generating processes, in presence, or not, of microstructure noise, and under synchronous and asynchronous trading. The main finding is that the pre-averaged version of disentangled estimators based on Gaussian ranks (for the correlations) and median deviations (for the volatilities) provide a precise, computationally efficient, and easy alternative to measure integrated covariances on the basis of noisy and asynchronous prices. Along these lines, a minimum variance portfolio application shows the superiority of this disentangled realized estimator in terms of numerous performance metrics. Chapter 4 is co-authored with Niels S. Hansen, Asger Lunde and Kasper V. Olesen, all affiliated with CREATES at Aarhus University. We propose to use the Realized Beta GARCH model to exploit the potential of high-frequency data in commodity markets. The model produces high quality forecasts of pairwise correlations between commodities which can be used to construct a composite covariance matrix. We evaluate the quality of this matrix in a portfolio context and compare it to models used in the industry. We demonstrate significant economic gains in a realistic setting including short selling constraints and transaction costs.
Resumo:
Increasing the size of training data in many computer vision tasks has shown to be very effective. Using large scale image datasets (e.g. ImageNet) with simple learning techniques (e.g. linear classifiers) one can achieve state-of-the-art performance in object recognition compared to sophisticated learning techniques on smaller image sets. Semantic search on visual data has become very popular. There are billions of images on the internet and the number is increasing every day. Dealing with large scale image sets is intense per se. They take a significant amount of memory that makes it impossible to process the images with complex algorithms on single CPU machines. Finding an efficient image representation can be a key to attack this problem. A representation being efficient is not enough for image understanding. It should be comprehensive and rich in carrying semantic information. In this proposal we develop an approach to computing binary codes that provide a rich and efficient image representation. We demonstrate several tasks in which binary features can be very effective. We show how binary features can speed up large scale image classification. We present learning techniques to learn the binary features from supervised image set (With different types of semantic supervision; class labels, textual descriptions). We propose several problems that are very important in finding and using efficient image representation.
Resumo:
Background Many acute stroke trials have given neutral results. Sub-optimal statistical analyses may be failing to detect efficacy. Methods which take account of the ordinal nature of functional outcome data are more efficient. We compare sample size calculations for dichotomous and ordinal outcomes for use in stroke trials. Methods Data from stroke trials studying the effects of interventions known to positively or negatively alter functional outcome – Rankin Scale and Barthel Index – were assessed. Sample size was calculated using comparisons of proportions, means, medians (according to Payne), and ordinal data (according to Whitehead). The sample sizes gained from each method were compared using Friedman 2 way ANOVA. Results Fifty-five comparisons (54 173 patients) of active vs. control treatment were assessed. Estimated sample sizes differed significantly depending on the method of calculation (Po00001). The ordering of the methods showed that the ordinal method of Whitehead and comparison of means produced significantly lower sample sizes than the other methods. The ordinal data method on average reduced sample size by 28% (inter-quartile range 14–53%) compared with the comparison of proportions; however, a 22% increase in sample size was seen with the ordinal method for trials assessing thrombolysis. The comparison of medians method of Payne gave the largest sample sizes. Conclusions Choosing an ordinal rather than binary method of analysis allows most trials to be, on average, smaller by approximately 28% for a given statistical power. Smaller trial sample sizes may help by reducing time to completion, complexity, and financial expense. However, ordinal methods may not be optimal for interventions which both improve functional outcome
Resumo:
Background and Purpose—Vascular prevention trials mostly count “yes/no” (binary) outcome events, eg, stroke/no stroke. Analysis of ordered categorical vascular events (eg, fatal stroke/nonfatal stroke/no stroke) is clinically relevant and could be more powerful statistically. Although this is not a novel idea in the statistical community, ordinal outcomes have not been applied to stroke prevention trials in the past. Methods—Summary data on stroke, myocardial infarction, combined vascular events, and bleeding were obtained by treatment group from published vascular prevention trials. Data were analyzed using 10 statistical approaches which allow comparison of 2 ordinal or binary treatment groups. The results for each statistical test for each trial were then compared using Friedman 2-way analysis of variance with multiple comparison procedures. Results—Across 85 trials (335 305 subjects) the test results differed substantially so that approaches which used the ordinal nature of stroke events (fatal/nonfatal/no stroke) were more efficient than those which combined the data to form 2 groups (P0.0001). The most efficient tests were bootstrapping the difference in mean rank, Mann–Whitney U test, and ordinal logistic regression; 4- and 5-level data were more efficient still. Similar findings were obtained for myocardial infarction, combined vascular outcomes, and bleeding. The findings were consistent across different types, designs and sizes of trial, and for the different types of intervention. Conclusions—When analyzing vascular events from prevention trials, statistical tests which use ordered categorical data are more efficient and are more likely to yield reliable results than binary tests. This approach gives additional information on treatment effects by severity of event and will allow trials to be smaller. (Stroke. 2008;39:000-000.)
Resumo:
The general purpose of this work is to describe and analyse the financing phenomenon of crowdfunding and to investigate the relations among crowdfunders, project creators and crowdfunding websites. More specifically, it also intends to describe the profile differences between major crowdfunding platforms, such as Kickstarter and Indiegogo. The findings are supported by literature, gathered from different scientific research papers. In the empirical part, data about Kickstarter and Indiegogo was collected from their websites and also complemented with further data from other statistical websites. For finding out specific information, such as satisfaction of entrepreneurs from both platforms, a satisfaction survey was applied among 200 entrepreneurs from different countries. To identify the profile of users of the Kickstarter and of the Indiegogo platforms, a multivariate analysis was performed, using a Hierarchical Clusters Analysis for each platform under study. Descriptive analysis was used for exploring information about popularity of platforms, average cost and the most popular area of projects, profile of users and future opportunities of platforms. To assess differences between groups, association between variables, and answering to the research hypothesis, an inferential analysis it was applied. The results showed that the Kickstarter and Indiegogo are one of the most popular crowdfunding platforms. Both of them have thousands of users and they are generally satisfied. Each of them uses individual approach for crowdfunders. Despite this, they both could benefit from further improving their services. Furthermore, according the results it was possible to observe that there is a direct and positive relationship between the money needed for the projects and the money collected from the investors for the projects, per platform.
Resumo:
Case-control studies evaluating the factors associated with childhood obesity are scarce in Brazil. We aimed to analyze the factors associated with obesity in Brazilian schoolchildren enrolled in the School Health Program.A case-control study was conducted on 80 schoolchildren aged 7 to 9 years, 40 of them obese and 40 of normal weight according to the cut-off points established by the World Health Organization (2007). Weight, height and waist circumference were obtained. Socioeconomic, demographic, health, eating behavior and lifestyle data were collected by applying a questionnaire to the person responsible and by determining his/her nutritional status. A binary unconditional logistic regression model (univariate and multivariate) was used for data analysis. The prevalence of obesity was 7.21%. The final model showed that duration of breast-feeding ≥6 months of age (OR 5.3; 95% CI: 1.3-22.1), excess weight of the person responsible (OR 7.1; 95% CI: 1.2-40.2), a sedentary level of physical activity (OR 4.1; 95% CI: 1.115.5), and fast chewing (OR 7.4; 95% CI: 2.1-26.9) were significantly associated with childhood obesity. The factors associated with obesity in schoolchildren were duration of breast-feeding ≥6 months, persons responsible with excess weight, and sedentary children who chew fast. The present study contributes information to be used for the health actions planned by the School Health Program.
Resumo:
Animal welfare issues have received much attention not only to supply farmed animal requirements, but also to ethical and cultural public concerns. Daily collected information, as well as the systematic follow-up of production stages, produces important statistical data for production assessment and control, as well as for improvement possibilities. In this scenario, this research study analyzed behavioral, production, and environmental data using Main Component Multivariable Analysis, which correlated observed behaviors, recorded using video cameras and electronic identification, with performance parameters of female broiler breeders. The aim was to start building a system to support decision-making in broiler breeder housing, based on bird behavioral parameters. Birds were housed in an environmental chamber, with three pens with different controlled environments. Bird sensitivity to environmental conditions were indicated by their behaviors, stressing the importance of behavioral observations for modern poultry management. A strong association between performance parameters and the behavior at the nest, suggesting that this behavior may be used to predict productivity. The behaviors of ruffling feathers, opening wings, preening, and at the drinker were negatively correlated with environmental temperature, suggesting that the increase of in the frequency of these behaviors indicate improvement of thermal welfare.
Resumo:
Isobaric vapor-liquid equilibria of binary mixtures of isopropyl acetate plus an alkanol (1-propanol, 2-propanol, 1-butanol, or 2-butanol) were measured at 101.32 kPa, using a dynamic recirculating still. An azeotropic behavior was observed only in the mixtures of isopropyl acetate + 2-propanol and isopropyl acetate + 1-propanol. The application of four thermodynamic consistency tests (the Herington test, the Van Ness test, the infinite dilution test, and the pure component test) showed the high quality of the experimental data. Finally, both NRTL and UNIQUAC activity coefficient models were successfully applied in the correlation of the measured data, with the average absolute deviations in vapor phase composition and temperature of 0.01 and 0.16 K, respectively.
Resumo:
This paper applies two measures to assess spillovers across markets: the Diebold Yilmaz (2012) Spillover Index and the Hafner and Herwartz (2006) analysis of multivariate GARCH models using volatility impulse response analysis. We use two sets of data, daily realized volatility estimates taken from the Oxford Man RV library, running from the beginning of 2000 to October 2016, for the S&P500 and the FTSE, plus ten years of daily returns series for the New York Stock Exchange Index and the FTSE 100 index, from 3 January 2005 to 31 January 2015. Both data sets capture both the Global Financial Crisis (GFC) and the subsequent European Sovereign Debt Crisis (ESDC). The spillover index captures the transmission of volatility to and from markets, plus net spillovers. The key difference between the measures is that the spillover index captures an average of spillovers over a period, whilst volatility impulse responses (VIRF) have to be calibrated to conditional volatility estimated at a particular point in time. The VIRF provide information about the impact of independent shocks on volatility. In the latter analysis, we explore the impact of three different shocks, the onset of the GFC, which we date as 9 August 2007 (GFC1). It took a year for the financial crisis to come to a head, but it did so on 15 September 2008, (GFC2). The third shock is 9 May 2010. Our modelling includes leverage and asymmetric effects undertaken in the context of a multivariate GARCH model, which are then analysed using both BEKK and diagonal BEKK (DBEKK) models. A key result is that the impact of negative shocks is larger, in terms of the effects on variances and covariances, but shorter in duration, in this case a difference between three and six months.
Resumo:
O presente trabalho utilizou métodos multivariados e matemáticos para integrar dados químicos e ecotoxicológicos obtidos para o Sistema Estuarino de Santos e para a região próxima à zona de lançamento do emissário submarino de Santos, com a finalidade de estabelecer com maior exatidão os riscos ambientais, e assim identificar áreas prioritárias e orientar programas de controle e políticas públicas. Para ambos os conjuntos de dados, as violações de valores numéricos de qualidade de sedimento tenderam a estar associadas com a ocorrência de toxicidade. Para o estuário, essa tendência foi corroborada pelas correlações entre a toxicidade e as concentrações de HPAs e Cu, enquanto para a região do emissário, pela correlação entre toxicidade e conteúdo de mercúrio no sedimento. Valores normalizados em relação às medias foram calculados para cada amostra, permitindo classificá-las de acordo com a toxicidade e a contaminação. As análises de agrupamento confirmaram os resultados das classificações. Para os dados de sistema estuarino, houve a separação das amostras em três categorias: as estações SSV-2, SSV-3 e SSV-4 encontram-se sob maior risco, seguidas da estação SSV-6. As estações SSV-1 e SSV-5 demonstraram melhores condições. Já em relação ao emissário, as amostras 1 e 2 apresentaram melhores condições, enquanto a estação 5 pareceu apresentar um maior risco, seguida das estações 3 e 4 que tiveram apenas alguns indícios de alteração.
Resumo:
Multivariate normal distribution is commonly encountered in any field, a frequent issue is the missing values in practice. The purpose of this research was to estimate the parameters in three-dimensional covariance permutation-symmetric normal distribution with complete data and all possible patterns of incomplete data. In this study, MLE with missing data were derived, and the properties of the MLE as well as the sampling distributions were obtained. A Monte Carlo simulation study was used to evaluate the performance of the considered estimators for both cases when ρ was known and unknown. All results indicated that, compared to estimators in the case of omitting observations with missing data, the estimators derived in this article led to better performance. Furthermore, when ρ was unknown, using the estimate of ρ would lead to the same conclusion.