905 resultados para partial least-squares regression
Resumo:
The photometric determination of ascorbic acid with the "E. E. L. portable colorimeter" can be carried" out rapid and conveniently using either 3% HPO3 or 0,4% (COOH) 2 as protective agent. The standards would contain from 2 to 20 micrograms of ascorbic acid per ml of metaphosphoric or oxalic acid solutions. We mix 10 ml of these solutions with 3 ml of the adequate citrate buffer solutions, and we pipet 5 ml of the resulting mixture to a matched test tube containing 5 ml of sodium - 2,6 - dichlorobenzenoneindophenol (80 mg per liter); then we shake well and after 15 seconds the extintion is read using green filter. The readings are subtracted from the blank one. Designating the differences by x and the concentrations of ascorbic acid/ml in the standards by y, we get, with the acid of the method of least squares, the following regression equations: for the metaphosphoric acid Y = 0,543x + 0,629 for the oxalic acid Y = 0,516x + 0,422, which permit, by interpolating, the determination of the ascorbic acid content in plant materials.
Resumo:
Leaders must scan the internal and external environment, chart strategic and task objectives, and provide performance feedback. These instrumental leadership (IL) functions go beyond the motivational and quid-pro quo leader behaviors that comprise the full-range-transformational, transactional, and laissez faire-leadership model. In four studies we examined the construct validity of IL. We found evidence for a four-factor IL model that was highly prototypical of good leadership. IL predicted top-level leader emergence controlling for the full-range factors, initiating structure, and consideration. It also explained unique variance in outcomes beyond the full-range factors; the effects of transformational leadership were vastly overstated when IL was omitted from the model. We discuss the importance of a "fuller full-range" leadership theory for theory and practice. We also showcase our methodological contributions regarding corrections for common method variance (i.e., endogeneity) bias using two-stage least squares (2SLS) regression and Monte Carlo split-sample designs.
Resumo:
Drainage-basin and channel-geometry multiple-regression equations are presented for estimating design-flood discharges having recurrence intervals of 2, 5, 10, 25, 50, and 100 years at stream sites on rural, unregulated streams in Iowa. Design-flood discharge estimates determined by Pearson Type-III analyses using data collected through the 1990 water year are reported for the 188 streamflow-gaging stations used in either the drainage-basin or channel-geometry regression analyses. Ordinary least-squares multiple-regression techniques were used to identify selected drainage-basin and channel-geometry regions. Weighted least-squares multiple-regression techniques, which account for differences in the variance of flows at different gaging stations and for variable lengths in station records, were used to estimate the regression parameters. Statewide drainage-basin equations were developed from analyses of 164 streamflow-gaging stations. Drainage-basin characteristics were quantified using a geographic-information-system (GIS) procedure to process topographic maps and digital cartographic data. The significant characteristics identified for the drainage-basin equations included contributing drainage area, relative relief, drainage frequency, and 2-year, 24-hour precipitation intensity. The average standard errors of prediction for the drainage-basin equations ranged from 38.6% to 50.2%. The GIS procedure expanded the capability to quantitatively relate drainage-basin characteristics to the magnitude and frequency of floods for stream sites in Iowa and provides a flood-estimation method that is independent of hydrologic regionalization. Statewide and regional channel-geometry regression equations were developed from analyses of 157 streamflow-gaging stations. Channel-geometry characteristics were measured on site and on topographic maps. Statewide and regional channel-geometry regression equations that are dependent on whether a stream has been channelized were developed on the basis of bankfull and active-channel characteristics. The significant channel-geometry characteristics identified for the statewide and regional regression equations included bankfull width and bankfull depth for natural channels unaffected by channelization, and active-channel width for stabilized channels affected by channelization. The average standard errors of prediction ranged from 41.0% to 68.4% for the statewide channel-geometry equations and from 30.3% to 70.0% for the regional channel-geometry equations. Procedures provided for applying the drainage-basin and channel-geometry regression equations depend on whether the design-flood discharge estimate is for a site on an ungaged stream, an ungaged site on a gaged stream, or a gaged site. When both a drainage-basin and a channel-geometry regression-equation estimate are available for a stream site, a procedure is presented for determining a weighted average of the two flood estimates.
Resumo:
BACKGROUND AND PURPOSE: Knowledge of cerebral blood flow (CBF) alterations in cases of acute stroke could be valuable in the early management of these cases. Among imaging techniques affording evaluation of cerebral perfusion, perfusion CT studies involve sequential acquisition of cerebral CT sections obtained in an axial mode during the IV administration of iodinated contrast material. They are thus very easy to perform in emergency settings. Perfusion CT values of CBF have proved to be accurate in animals, and perfusion CT affords plausible values in humans. The purpose of this study was to validate perfusion CT studies of CBF by comparison with the results provided by stable xenon CT, which have been reported to be accurate, and to evaluate acquisition and processing modalities of CT data, notably the possible deconvolution methods and the selection of the reference artery. METHODS: Twelve stable xenon CT and perfusion CT cerebral examinations were performed within an interval of a few minutes in patients with various cerebrovascular diseases. CBF maps were obtained from perfusion CT data by deconvolution using singular value decomposition and least mean square methods. The CBF were compared with the stable xenon CT results in multiple regions of interest through linear regression analysis and bilateral t tests for matched variables. RESULTS: Linear regression analysis showed good correlation between perfusion CT and stable xenon CT CBF values (singular value decomposition method: R(2) = 0.79, slope = 0.87; least mean square method: R(2) = 0.67, slope = 0.83). Bilateral t tests for matched variables did not identify a significant difference between the two imaging methods (P >.1). Both deconvolution methods were equivalent (P >.1). The choice of the reference artery is a major concern and has a strong influence on the final perfusion CT CBF map. CONCLUSION: Perfusion CT studies of CBF achieved with adequate acquisition parameters and processing lead to accurate and reliable results.
Resumo:
Tämän tutkielman tavoitteena on selvittää Venäjän, Slovakian, Tsekin, Romanian, Bulgarian, Unkarin ja Puolan osakemarkkinoiden heikkojen ehtojen tehokkuutta. Tämä tutkielma on kvantitatiivinen tutkimus ja päiväkohtaiset indeksin sulkemisarvot kerättiin Datastreamin tietokannasta. Data kerättiin pörssien ensimmäisestä kaupankäyntipäivästä aina vuoden 2006 elokuun loppuun saakka. Analysoinnin tehostamiseksi dataa tutkittiin koko aineistolla, sekä kahdella aliperiodilla. Osakemarkkinoiden tehokkuutta on testattu neljällä tilastollisella metodilla, mukaan lukien autokorrelaatiotesti ja epäparametrinen runs-testi. Tavoitteena on myös selvittääesiintyykö kyseisillä markkinoilla viikonpäiväanomalia. Viikonpäiväanomalian esiintymistä tutkitaan käyttämällä pienimmän neliösumman menetelmää (OLS). Viikonpäiväanomalia on löydettävissä kaikilta edellä mainituilta osakemarkkinoilta paitsi Tsekin markkinoilta. Merkittävää, positiivista tai negatiivista autokorrelaatiota, on löydettävissä kaikilta osakemarkkinoilta, myös Ljung-Box testi osoittaa kaikkien markkinoiden tehottomuutta täydellä periodilla. Osakemarkkinoiden satunnaiskulku hylätään runs-testin perusteella kaikilta muilta paitsi Slovakian osakemarkkinoilla, ainakin tarkastellessa koko aineistoa tai ensimmäistä aliperiodia. Aineisto ei myöskään ole normaalijakautunut minkään indeksin tai aikajakson kohdalla. Nämä havainnot osoittavat, että kyseessä olevat markkinat eivät ole heikkojen ehtojen mukaan tehokkaita
Resumo:
Tämän tutkimuksen tarkoituksena on tarkastella esiintyykö Venäjän osakemarkkinoilla kalenterianomalioita. Tutkimus keskittyy Halloween-, kuukausi-, kuunvaihde-, viikonpäivä- ja juhlapäiväanomalioiden tarkasteluun. Tutkimusaineistona käytetään RTS (Russian Trading System) indeksiä. Tarkasteluaika alkaa 1. syyskuuta 1995 ja loppuu 31. joulukuuta 2005. Havaintojen kokonaismäärä on 2584. Tutkimusmenetelmänä käytetään pienimmän neliösumman menetelmää (OLS). Tutkimustulokset osoittavat, että Venäjän osakemarkkinoilla esiintyy Halloween-, kuunvaihde- ja viikonpäiväanomalioita. Sen sijaan kuukausi- ja juhlapäiväanomalioita ei tulosten mukaanesiinny Venäjän osakemarkkinoilla. Tulokset osoittavat lisäksi, että suurin osaanomalioista on merkittävämpiä nykyään kuin Venäjän osakemarkkinoiden ensimmäisinä vuosina. Näiden tulosten perusteella voidaan todeta, että Venäjän osakemarkkinat eivät ole vielä tehokkaat.
Resumo:
Tämän tutkielman tavoitteena on tarkastella Kiinan osakemarkkinoiden tehokkuutta ja random walk -hypoteesin voimassaoloa. Tavoitteena on myös selvittää esiintyykö viikonpäiväanomalia Kiinan osakemarkkinoilla. Tutkimusaineistona käytetään Shanghain osakepörssin A-sarjan,B-sarjan ja yhdistelmä-sarjan ja Shenzhenin yhdistelmä-sarjan indeksien päivittäisiä logaritmisoituja tuottoja ajalta 21.2.1992-30.12.2005 sekä Shenzhenin osakepörssin A-sarjan ja B-sarjan indeksien päivittäisiä logaritmisoituja tuottoja ajalta 5.10.1992-30.12.2005. Tutkimusmenetelminä käytetään neljä tilastollista menetelmää, mukaan lukien autokorrelaatiotestiä, epäparametrista runs-testiä, varianssisuhdetestiä sekä Augmented Dickey-Fullerin yksikköjuuritestiä. Viikonpäiväanomalian esiintymistä tutkitaan käyttämällä pienimmän neliösumman menetelmää (OLS). Testejä tehdään sekä koko aineistolla että kolmella erillisellä ajanjaksolla. Tämän tutkielman empiiriset tulokset tukevat aikaisempia tutkimuksia Kiinan osakemarkkinoiden tehottomuudesta. Lukuun ottamatta yksikköjuuritestien saatuja tuloksia, autokorrelaatio-, runs- ja varianssisuhdetestien perusteella random walk-hypoteesi hylättiin molempien Kiinan osakemarkkinoiden kohdalla. Tutkimustulokset osoittavat, että molemmilla osakepörssillä B-sarjan indeksien käyttäytyminenon ollut huomattavasti enemmän random walk -hypoteesin vastainen kuin A-sarjan indeksit. Paitsi B-sarjan markkinat, molempien Kiinan osakemarkkinoiden tehokkuus näytti myös paranevan vuoden 2001 markkinabuumin jälkeen. Tutkimustulokset osoittavat myös viikonpäiväanomalian esiintyvän Shanghain osakepörssillä, muttei kuitenkaan Shenzhenin osakepörssillä koko tarkasteluajanjaksolla.
Resumo:
Metastatic melanomas are frequently refractory to most adjuvant therapies such as chemotherapies and radiotherapies. Recently, immunotherapies have shown good results in the treatment of some metastatic melanomas. Immune cell infiltration in the tumor has been associated with successful immunotherapy. More generally, tumor infiltrating lymphocytes (TILs) in the primary tumor and in metastases of melanoma patients have been demonstrated to correlate positively with favorable clinical outcomes. Altogether, these findings suggest the importance of being able to identify, quantify and characterize immune infiltration at the tumor site for a better diagnostic and treatment choice. In this paper, we used Fourier Transform Infrared (FTIR) imaging to identify and quantify different subpopulations of T cells: the cytotoxic T cells (CD8+), the helper T cells (CD4+) and the regulatory T cells (T reg). As a proof of concept, we investigated pure populations isolated from human peripheral blood from 6 healthy donors. These subpopulations were isolated from blood samples by magnetic labeling and purities were assessed by Fluorescence Activated Cell Sorting (FACS). The results presented here show that Fourier Transform Infrared (FTIR) imaging followed by supervised Partial Least Square Discriminant Analysis (PLS-DA) allows an accurate identification of CD4+ T cells and CD8+ T cells (>86%). We then developed a PLS regression allowing the quantification of T reg in a different mix of immune cells (e.g. Peripheral Blood Mononuclear Cells (PBMCs)). Altogether, these results demonstrate the sensitivity of infrared imaging to detect the low biological variability observed in T cell subpopulations.
Resumo:
Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.
Resumo:
Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.
Resumo:
Least-squares support vector machines (LS-SVM) were used as an alternative multivariate calibration method for the simultaneous quantification of some common adulterants found in powdered milk samples, using near-infrared spectroscopy. Excellent models were built using LS-SVM for determining R², RMSECV and RMSEP values. LS-SVMs show superior performance for quantifying starch, whey and sucrose in powdered milk samples in relation to PLSR. This study shows that it is possible to determine precisely the amount of one and two common adulterants simultaneously in powdered milk samples using LS-SVM and NIR spectra.
Resumo:
The purpose of the thesis is to analyze whether the returns of general stock market indices of Estonia, Latvia and Lithuania follow the random walk hypothesis (RWH), and in addition, whether they are consistent with the weak-form efficiency criterion. Also the existence of the day-of-the-week anomaly is examined in the same regional markets. The data consists of daily closing quotes of the OMX Tallinn, Riga and Vilnius total return indices for the sample period from January 3, 2000 to August 28, 2009. Moreover, the full sample period is also divided into two sub-periods. The RWH is tested by applying three quantitative methods (i.e. the Augmented Dickey-Fuller unit root test, serial correlation test and non-parametric runs test). Ordinary Least Squares (OLS) regression with dummy variables is employed to detect the day-of-the-week anomalies. The random walk hypothesis (RWH) is rejected in the Estonian and Lithuanian stock markets. The Latvian stock market exhibits more efficient behaviour, although some evidence of inefficiency is also found, mostly during the first sub-period from 2000 to 2004. Day-of-the-week anomalies are detected on every stock market examined, though no longer during the later sub-period.
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
A model for predicting temperature evolution for automatic controling systems in manufacturing processes requiring the coiling of bars in the transfer table is presented. Although the method is of a general nature, the presentation in this work refers to the manufacturing of steel plates in hot rolling mills. The predicting strategy is based on a mathematical model of the evolution of temperature in a coiling and uncoiling bar and is presented in the form of a parabolic partial differential equation for a shape changing domain. The mathematical model is solved numerically by a space discretization via geometrically adaptive finite elements which accomodate the change in shape of the domain, using a computationally novel treatment of the resulting thermal contact problem due to coiling. Time is discretized according to a Crank-Nicolson scheme. Since the actual physical process takes less time than the time required by the process controlling computer to solve the full mathematical model, a special predictive device was developed, in the form of a set of least squares polynomials, based on the off-line numerical solution of the mathematical model.
Resumo:
The purpose of this thesis is to investigate whether different private equity fund characteristics have any influence on the fund performance. Fund characteristics include fund type (venture capital or buyouts), fund size (sizes of funds are divided into six ranges), fund investment industry, fund sequence (first fund or follow-on fund) and investment market (US or EMEA). Fund performance is measured by internal rate of return, and tested by cross-sectional regression analysis with the method of Ordinary Least Squares. The data employs performance and characteristics of 997 private equity funds between 1985 and 2008. Our findings are that fund type has effect on fund performance. The average IRR of venture capital funds is 2.7% less than average IRR of buyout funds. However, We did not find any relationship between fund size and performance, and between fund sequence and performance. Funds based on US market perform better than funds based on EMEA market. The fund performance differs across different industries. The average IRRs of industrial/energy industry, consumer related industry, communications and media industry and medical/health industry are higher than the average IRR of other industries.