986 resultados para Instrumental variable regression
Resumo:
Multivariate lifetime data arise in various forms including recurrent event data when individuals are followed to observe the sequence of occurrences of a certain type of event; correlated lifetime when an individual is followed for the occurrence of two or more types of events, or when distinct individuals have dependent event times. In most studies there are covariates such as treatments, group indicators, individual characteristics, or environmental conditions, whose relationship to lifetime is of interest. This leads to a consideration of regression models.The well known Cox proportional hazards model and its variations, using the marginal hazard functions employed for the analysis of multivariate survival data in literature are not sufficient to explain the complete dependence structure of pair of lifetimes on the covariate vector. Motivated by this, in Chapter 2, we introduced a bivariate proportional hazards model using vector hazard function of Johnson and Kotz (1975), in which the covariates under study have different effect on two components of the vector hazard function. The proposed model is useful in real life situations to study the dependence structure of pair of lifetimes on the covariate vector . The well known partial likelihood approach is used for the estimation of parameter vectors. We then introduced a bivariate proportional hazards model for gap times of recurrent events in Chapter 3. The model incorporates both marginal and joint dependence of the distribution of gap times on the covariate vector . In many fields of application, mean residual life function is considered superior concept than the hazard function. Motivated by this, in Chapter 4, we considered a new semi-parametric model, bivariate proportional mean residual life time model, to assess the relationship between mean residual life and covariates for gap time of recurrent events. The counting process approach is used for the inference procedures of the gap time of recurrent events. In many survival studies, the distribution of lifetime may depend on the distribution of censoring time. In Chapter 5, we introduced a proportional hazards model for duration times and developed inference procedures under dependent (informative) censoring. In Chapter 6, we introduced a bivariate proportional hazards model for competing risks data under right censoring. The asymptotic properties of the estimators of the parameters of different models developed in previous chapters, were studied. The proposed models were applied to various real life situations.
Resumo:
An improved color video super-resolution technique using kernel regression and fuzzy enhancement is presented in this paper. A high resolution frame is computed from a set of low resolution video frames by kernel regression using an adaptive Gaussian kernel. A fuzzy smoothing filter is proposed to enhance the regression output. The proposed technique is a low cost software solution to resolution enhancement of color video in multimedia applications. The performance of the proposed technique is evaluated using several color videos and it is found to be better than other techniques in producing high quality high resolution color videos
Resumo:
In our study we use a kernel based classification technique, Support Vector Machine Regression for predicting the Melting Point of Drug – like compounds in terms of Topological Descriptors, Topological Charge Indices, Connectivity Indices and 2D Auto Correlations. The Machine Learning model was designed, trained and tested using a dataset of 100 compounds and it was found that an SVMReg model with RBF Kernel could predict the Melting Point with a mean absolute error 15.5854 and Root Mean Squared Error 19.7576
Resumo:
The problem of using information available from one variable X to make inferenceabout another Y is classical in many physical and social sciences. In statistics this isoften done via regression analysis where mean response is used to model the data. Onestipulates the model Y = µ(X) +ɛ. Here µ(X) is the mean response at the predictor variable value X = x, and ɛ = Y - µ(X) is the error. In classical regression analysis, both (X; Y ) are observable and one then proceeds to make inference about the mean response function µ(X). In practice there are numerous examples where X is not available, but a variable Z is observed which provides an estimate of X. As an example, consider the herbicidestudy of Rudemo, et al. [3] in which a nominal measured amount Z of herbicide was applied to a plant but the actual amount absorbed by the plant X is unobservable. As another example, from Wang [5], an epidemiologist studies the severity of a lung disease, Y , among the residents in a city in relation to the amount of certain air pollutants. The amount of the air pollutants Z can be measured at certain observation stations in the city, but the actual exposure of the residents to the pollutants, X, is unobservable and may vary randomly from the Z-values. In both cases X = Z+error: This is the so called Berkson measurement error model.In more classical measurement error model one observes an unbiased estimator W of X and stipulates the relation W = X + error: An example of this model occurs when assessing effect of nutrition X on a disease. Measuring nutrition intake precisely within 24 hours is almost impossible. There are many similar examples in agricultural or medical studies, see e.g., Carroll, Ruppert and Stefanski [1] and Fuller [2], , among others. In this talk we shall address the question of fitting a parametric model to the re-gression function µ(X) in the Berkson measurement error model: Y = µ(X) + ɛ; X = Z + η; where η and ɛ are random errors with E(ɛ) = 0, X and η are d-dimensional, and Z is the observable d-dimensional r.v.
Resumo:
Study on variable stars is an important topic of modern astrophysics. After the invention of powerful telescopes and high resolving powered CCD’s, the variable star data is accumulating in the order of peta-bytes. The huge amount of data need lot of automated methods as well as human experts. This thesis is devoted to the data analysis on variable star’s astronomical time series data and hence belong to the inter-disciplinary topic, Astrostatistics. For an observer on earth, stars that have a change in apparent brightness over time are called variable stars. The variation in brightness may be regular (periodic), quasi periodic (semi-periodic) or irregular manner (aperiodic) and are caused by various reasons. In some cases, the variation is due to some internal thermo-nuclear processes, which are generally known as intrinsic vari- ables and in some other cases, it is due to some external processes, like eclipse or rotation, which are known as extrinsic variables. Intrinsic variables can be further grouped into pulsating variables, eruptive variables and flare stars. Extrinsic variables are grouped into eclipsing binary stars and chromospheri- cal stars. Pulsating variables can again classified into Cepheid, RR Lyrae, RV Tauri, Delta Scuti, Mira etc. The eruptive or cataclysmic variables are novae, supernovae, etc., which rarely occurs and are not periodic phenomena. Most of the other variations are periodic in nature. Variable stars can be observed through many ways such as photometry, spectrophotometry and spectroscopy. The sequence of photometric observa- xiv tions on variable stars produces time series data, which contains time, magni- tude and error. The plot between variable star’s apparent magnitude and time are known as light curve. If the time series data is folded on a period, the plot between apparent magnitude and phase is known as phased light curve. The unique shape of phased light curve is a characteristic of each type of variable star. One way to identify the type of variable star and to classify them is by visually looking at the phased light curve by an expert. For last several years, automated algorithms are used to classify a group of variable stars, with the help of computers. Research on variable stars can be divided into different stages like observa- tion, data reduction, data analysis, modeling and classification. The modeling on variable stars helps to determine the short-term and long-term behaviour and to construct theoretical models (for eg:- Wilson-Devinney model for eclips- ing binaries) and to derive stellar properties like mass, radius, luminosity, tem- perature, internal and external structure, chemical composition and evolution. The classification requires the determination of the basic parameters like pe- riod, amplitude and phase and also some other derived parameters. Out of these, period is the most important parameter since the wrong periods can lead to sparse light curves and misleading information. Time series analysis is a method of applying mathematical and statistical tests to data, to quantify the variation, understand the nature of time-varying phenomena, to gain physical understanding of the system and to predict future behavior of the system. Astronomical time series usually suffer from unevenly spaced time instants, varying error conditions and possibility of big gaps. This is due to daily varying daylight and the weather conditions for ground based observations and observations from space may suffer from the impact of cosmic ray particles. Many large scale astronomical surveys such as MACHO, OGLE, EROS, xv ROTSE, PLANET, Hipparcos, MISAO, NSVS, ASAS, Pan-STARRS, Ke- pler,ESA, Gaia, LSST, CRTS provide variable star’s time series data, even though their primary intention is not variable star observation. Center for Astrostatistics, Pennsylvania State University is established to help the astro- nomical community with the aid of statistical tools for harvesting and analysing archival data. Most of these surveys releases the data to the public for further analysis. There exist many period search algorithms through astronomical time se- ries analysis, which can be classified into parametric (assume some underlying distribution for data) and non-parametric (do not assume any statistical model like Gaussian etc.,) methods. Many of the parametric methods are based on variations of discrete Fourier transforms like Generalised Lomb-Scargle peri- odogram (GLSP) by Zechmeister(2009), Significant Spectrum (SigSpec) by Reegen(2007) etc. Non-parametric methods include Phase Dispersion Minimi- sation (PDM) by Stellingwerf(1978) and Cubic spline method by Akerlof(1994) etc. Even though most of the methods can be brought under automation, any of the method stated above could not fully recover the true periods. The wrong detection of period can be due to several reasons such as power leakage to other frequencies which is due to finite total interval, finite sampling interval and finite amount of data. Another problem is aliasing, which is due to the influence of regular sampling. Also spurious periods appear due to long gaps and power flow to harmonic frequencies is an inherent problem of Fourier methods. Hence obtaining the exact period of variable star from it’s time series data is still a difficult problem, in case of huge databases, when subjected to automation. As Matthew Templeton, AAVSO, states “Variable star data analysis is not always straightforward; large-scale, automated analysis design is non-trivial”. Derekas et al. 2007, Deb et.al. 2010 states “The processing of xvi huge amount of data in these databases is quite challenging, even when looking at seemingly small issues such as period determination and classification”. It will be beneficial for the variable star astronomical community, if basic parameters, such as period, amplitude and phase are obtained more accurately, when huge time series databases are subjected to automation. In the present thesis work, the theories of four popular period search methods are studied, the strength and weakness of these methods are evaluated by applying it on two survey databases and finally a modified form of cubic spline method is intro- duced to confirm the exact period of variable star. For the classification of new variable stars discovered and entering them in the “General Catalogue of Vari- able Stars” or other databases like “Variable Star Index“, the characteristics of the variability has to be quantified in term of variable star parameters.
Resumo:
Heilkräuter sind während des Trocknungsprozesses zahlreichen Einflüssen ausgesetzt, welche die Qualität des Endproduktes entscheidend beeinflussen. Diese Forschungsarbeit beschäftigt sich mit der Trocknung von Zitronenmelisse (Melissa officinalis .L) zu einem qualitativ hochwertigen Endprodukt. Es werden Strategien zur Trocknung vorgeschlagen, die experimentelle und mathematische Aspekte mit einbeziehen, um bei einer adäquaten Produktivität die erforderlichen Qualitätsmerkmale im Hinblick auf Farbeänderung und Gehalt an ätherischen Ölen zu erzielen. Getrocknete Zitronenmelisse kann zurzeit, auf Grund verschiedener Probleme beim Trocknungsvorgang, den hohen Qualitätsanforderungen des Marktes nicht immer genügen. Es gibt keine standardisierten Informationen zu den einzelnen und komplexen Trocknungsparametern. In der Praxis beruht die Trocknung auf Erfahrungswerten, bzw. werden Vorgehensweisen bei der Trocknung anderer Pflanzen kopiert, und oftmals ist die Trocknung nicht reproduzierbar, oder beruht auf subjektiven Annäherungen. Als Folge dieser nicht angepassten Wahl der Trocknungsparameter entstehen oftmals Probleme wie eine Übertrocknung, was zu erhöhten Bruchverlusten der Blattmasse führt, oder eine zu geringe Trocknung, was wiederum einen zu hohen Endfeuchtegehalt im Produkt zur Folge hat. Dies wiederum mündet zwangsläufig in einer nicht vertretbaren Farbänderung und einen übermäßigen Verlust an ätherischen Ölen. Auf Grund der unterschiedlichen thermischen und mechanischen Eigenschaften von Blättern und Stängel, ist eine ungleichmäßige Trocknung die Regel. Es wird außerdem eine unnötig lange Trocknungsdauer beobachtet, die zu einem erhöhten Energieverbrauch führt. Das Trocknen in solaren Tunneln Trocknern bringt folgendes Problem mit sich: wegen des ungeregelten Strahlungseinfalles ist es schwierig die Trocknungstemperatur zu regulieren. Ebenso beeinflusst die Strahlung die Farbe des Produktes auf Grund von photochemischen Reaktionen. Zusätzlich erzeugen die hohen Schwankungen der Strahlung, der Temperatur und der Luftfeuchtigkeit instabile Bedingungen für eine gleichmäßige und kontrollierbare Trocknung. In Anbetracht der erwähnten Probleme werden folgende Forschungsschwerpunkte in dieser Arbeit gesetzt: neue Strategien zur Verbesserung der Qualität werden entwickelt, mit dem Ziel die Trocknungszeit und den Energieverbrauch zu verringern. Um eine Methodik vorzuschlagen, die auf optimalen Trocknungsparameter beruht, wurden Temperatur und Luftfeuchtigkeit als Variable in Abhängigkeit der Trocknungszeit, des ätherischer Ölgehaltes, der Farbänderung und der erforderliche Energie betrachtet. Außerdem wurden die genannten Parametern und deren Auswirkungen auf die Qualitätsmerkmale in solaren Tunnel Trocknern analysiert. Um diese Ziele zu erreichen, wurden unterschiedliche Ansätze verfolgt. Die Sorption-Isothermen und die Trocknungskinetik von Zitronenmelisse und deren entsprechende Anpassung an verschiedene mathematische Modelle wurden erarbeitet. Ebenso wurde eine alternative gestaffelte Trocknung in gestufte Schritte vorgenommen, um die Qualität des Endproduktes zu erhöhen und gleichzeitig den Gesamtenergieverbrauch zu senken. Zusätzlich wurde ein statistischer Versuchsplan nach der CCD-Methode (Central Composite Design) und der RSM-Methode (Response Surface Methodology) vorgeschlagen, um die gewünschten Qualitätsmerkmalen und den notwendigen Energieeinsatz in Abhängigkeit von Lufttemperatur und Luftfeuchtigkeit zu erzielen. Anhand der gewonnenen Daten wurden Regressionsmodelle erzeugt, und das Verhalten des Trocknungsverfahrens wurde beschrieben. Schließlich wurde eine statistische DOE-Versuchsplanung (design of experiments) angewandt, um den Einfluss der Parameter auf die zu erzielende Produktqualität in einem solaren Tunnel Trockner zu bewerten. Die Wirkungen der Beschattung, der Lage im Tunnel, des Befüllungsgrades und der Luftgeschwindigkeit auf Trocknungszeit, Farbänderung und dem Gehalt an ätherischem Öl, wurde analysiert. Ebenso wurden entsprechende Regressionsmodelle bei der Anwendung in solaren Tunneltrocknern erarbeitet. Die wesentlichen Ergebnisse werden in Bezug auf optimale Trocknungsparameter in Bezug auf Qualität und Energieverbrauch analysiert.
Resumo:
Cubicle should provide good resting comfort as well as clean udders. Dairy cows in cubicle houses often face a restrictive environment with regard to resting behaviour, whereas cleanliness may still be impaired. This study aimed to determine reliable behavioural measures regarding resting comfort applicable in on-farm welfare assessments. Furthermore, relationships between cubicle design, cow sizes, management factors and udder cleanliness (namely teats and teat tips) were investigated. Altogether 15 resting measures were examined in terms of feasibility, inter-observer reliability (IOR) and consistency of results per farm over time. They were recorded during three farm visits on farms in Germany and Austria with cubicle, deep litter and tie stall systems. Seven measures occurred to infrequently to allow reliable recording within a limited observation time. IOR was generally acceptable to excellent except for 'collisions during lying down', which only showed good IOR after improvement of the definition. Only three measures were acceptably repeatable over time: 'duration of lying down', 'percentage of collisions during lying down' and 'percentage of cows lying partly or completely outside lying area'. These measures were evaluated as suitable animal based welfare measures regarding resting behaviour in the framework of an on-farm welfare assessment protocol. The second part of the thesis comprises a cross-sectional study on resting comfort and cow cleanliness including 23 Holstein Friesian dairy herds with very low within-farm variation in cubicle measures. Height at withers, shoulder width and diagonal body length were measured in 79-100 % of the cows (herd size 30 to115 cows). Based on the 25 % largest animals, compliance with recommendations for cubicle measures was calculated. Cleanliness of different body parts, the udder, teats and teat tips was assessed for each cow in the herd prior to morning milking. No significant correlation was found between udder soiling and teat or teat tip soiling on herd level. The final model of a stepwise regression regarding the percentage of dirty teats per farm explained 58.5 % the variance and contained four factors. Teat dipping after milking which might be associated with an overall clean and accurate management style, deep bedded cubicles, increasing cubicle maintenance times and decreasing compliance concerning total cubicle length predicted lower teat soiling. The final model concerning teat tip soiling explained 46.0 % of the variance and contained three factors. Increasing litter height in the rear part of the cubicle and increased alley soiling which is difficult to explain, predicted for less soiled teat tips, whereas increasing compliance concerning resting length was associated with higher percentages of dirty teat tips. The dependent variable ‘duration of lying down’ was analysed using again stepwise regression. The final model explained 54.8 % of the total variance. Lying down duration was significantly shorter in deep bedded cubicles. Further explanatory though not significant factors in the model were neck-rail height, deep bedding or comfort mattresses versus concrete floor or rubber mats and clearance height of side partitions. In the attempt to create a more comprehensive lying down measure, another analysis was carried out with percentage of ‘impaired lying down’ (i.e. events exceeding 6.3 seconds, with collisions or being interrupted) as dependent variable. The explanatory value of this final model was 41.3 %. An increase in partition length, in compliance concerning cubicle width and the presence of straw within bedding predicted a lower proportion of impaired lying down. The effect of partition length is difficult to interpret, but partition length and height were positively correlated on the study farms, possibly leading to a bigger zone of clear space for pelvis freedom. No associations could be found between impaired lying down and teat or teat tip soiling. Altogether, in agreement with earlier studies it was found that cubicle dimensions in practice are often inadequate with regard to the body dimensions of the cows, leading to high proportions of impaired lying down behaviour, whereas teat cleanliness is still unsatisfactory. Connections between cleanliness and cow comfort are far from simplistic. Especially the relationship between cubicle characteristics and lying down behaviour apparently is very complex, so that it is difficult to identify single influential factors that are valid for all farm situations. However, based on the results of the present study the use of deep bedded cubicles can be recommended as well as improved management with special regard to cubicle and litter maintenance in order to achieve both better resting comfort and teat cleanliness.
Resumo:
Se pretende paliar el absentismo y el abandono escolar prematuro y poner al alcance de todo el alumnado los elementos del curriculum de forma que, desde la integraci??n, todas y todos vivan su tiempo de escolarizaci??n como un tiempo ??til, sin desesperanza y con las perspectivas de obtener el Graduado de ESO.
Resumo:
This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EMand the Minimum Spanning Tree algorithm to find the ML and MAP mixtureof trees for a variety of priors, including the Dirichlet and the MDL priors.
Resumo:
This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EM and the Minimum Spanning Tree algorithm to find the ML and MAP mixture of trees for a variety of priors, including the Dirichlet and the MDL priors. We also show that the single tree classifier acts like an implicit feature selector, thus making the classification performance insensitive to irrelevant attributes. Experimental results demonstrate the excellent performance of the new model both in density estimation and in classification.
Resumo:
We study the relation between support vector machines (SVMs) for regression (SVMR) and SVM for classification (SVMC). We show that for a given SVMC solution there exists a SVMR solution which is equivalent for a certain choice of the parameters. In particular our result is that for $epsilon$ sufficiently close to one, the optimal hyperplane and threshold for the SVMC problem with regularization parameter C_c are equal to (1-epsilon)^{- 1} times the optimal hyperplane and threshold for SVMR with regularization parameter C_r = (1-epsilon)C_c. A direct consequence of this result is that SVMC can be seen as a special case of SVMR.
Resumo:
Support Vector Machines Regression (SVMR) is a regression technique which has been recently introduced by V. Vapnik and his collaborators (Vapnik, 1995; Vapnik, Golowich and Smola, 1996). In SVMR the goodness of fit is measured not by the usual quadratic loss function (the mean square error), but by a different loss function called Vapnik"s $epsilon$- insensitive loss function, which is similar to the "robust" loss functions introduced by Huber (Huber, 1981). The quadratic loss function is well justified under the assumption of Gaussian additive noise. However, the noise model underlying the choice of Vapnik's loss function is less clear. In this paper the use of Vapnik's loss function is shown to be equivalent to a model of additive and Gaussian noise, where the variance and mean of the Gaussian are random variables. The probability distributions for the variance and mean will be stated explicitly. While this work is presented in the framework of SVMR, it can be extended to justify non-quadratic loss functions in any Maximum Likelihood or Maximum A Posteriori approach. It applies not only to Vapnik's loss function, but to a much broader class of loss functions.
Resumo:
This paper presents a computation of the $V_gamma$ dimension for regression in bounded subspaces of Reproducing Kernel Hilbert Spaces (RKHS) for the Support Vector Machine (SVM) regression $epsilon$-insensitive loss function, and general $L_p$ loss functions. Finiteness of the RV_gamma$ dimension is shown, which also proves uniform convergence in probability for regression machines in RKHS subspaces that use the $L_epsilon$ or general $L_p$ loss functions. This paper presenta a novel proof of this result also for the case that a bias is added to the functions in the RKHS.
Resumo:
Time series regression models are especially suitable in epidemiology for evaluating short-term effects of time-varying exposures on health. The problem is that potential for confounding in time series regression is very high. Thus, it is important that trend and seasonality are properly accounted for. Our paper reviews the statistical models commonly used in time-series regression methods, specially allowing for serial correlation, make them potentially useful for selected epidemiological purposes. In particular, we discuss the use of time-series regression for counts using a wide range Generalised Linear Models as well as Generalised Additive Models. In addition, recently critical points in using statistical software for GAM were stressed, and reanalyses of time series data on air pollution and health were performed in order to update already published. Applications are offered through an example on the relationship between asthma emergency admissions and photochemical air pollutants
Resumo:
Comprobar la relación existente entre la enseñanza de la lengua asturiana y los aprendizajes instrumentales en las áreas matemáticas y lingüística, así como de su influencia en los aprendizajes culturales y metalingüísticos específicos de la Comunidad Autónoma.. 323 alumnos de sexto de Educación Primaria divididos en: grupo experimental formado por alumnos de 9 colegios 'con asturiano', elegidos aleatoriamente entre los que ofertan la asignatura de Lengua Asturiana y grupo control de 9 colegios 'sin asturiano', elegidos en función de la proximidad geográfica con los centros del otro grupo; estratificados ambos por su entorno: urbano, semiurbano y rural.. Se realizó un planteamiento global de la investigación, elaboración de los instrumentos de recogida de datos y selección de la muestra, trabajo de campo, análisis de datos y la redacción del informe. En el análisis estadístico se han considerado variables como: sexo, lugar de nacimiento, entorno en que se ubica el centro, profesión y lugar de nacimiento de sus padres y profesiones.. Los autores elaboraron una bateria de pruebas de rendimiento instrumental, algunas, adaptación de las Pruebas Psicopedagógicas de Aprendizajes Instrumentales de Canals (1991) y otras de elaboración propia: escalas de comprensión lectora en castellano, de ortografía en castellano, de rapidez de cálculo, de razonamiento y resolución de problemas, de conocimiento del medio social y cultural de la comunidad autónoma, de aprecio y valoración hacia la lengua y las culturas asturianas, y finalmente otra escala de competencia lingüística en asturiano.. Analizados los valores de tendencia central (media y desviación típica) se optó, en el tratamiento de las variables nominales, por proceder al establecimiento de categorías de respuesta (agrupando en torno a tres unidades de desviación típica por encima y por debajo de la media). Aplicando posteriormente el procedimiento de contraste estadístico mediante Chi-cuadrado. A fín de establecer un criterio de contrastación plantear una correlación biseral-puntual entre las variables, una de ellas continua y la otra dicotómica.. Hay una tendencia generalizada a que las puntuaciones medias del grupo de alumnos con asturiano supere a las correspondientes medias del grupo sin asturiano, tras realizar un contraste estadístico como corrección. Además los alumnos procedentes del entorno semiurbano o urbano; en un análisis descriptivo a partir de la variable centro, se puede concluir que los centros que alcanzan mayor competencia lingüística en asturiano corresponde a los centros donde se imparte asturiano y son de ámbito rural y semirrural; en los centros con mayor media en conocimiento del medio asturiano se imparte lengua asturiana.. Destaca la incidencia que tiene la enseñanza del asturiano en el rendimiento escolar en materias instrumentales, facilita un mejor conocimiento del medio social y natural de la comunidad. Los resultados de este estudio despejan las dudas sobre los posibles efectos negativos que la enseñanza del asturiano pueda tener en el aprendizaje lingüístico general del alumnado.. Bibliografía p. 107-109.