1000 resultados para stationary distribution


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In most epidemiological studies, historical monitoring data are scant and must be pooled to identify occupational groups with homogeneous exposures. Homogeneity of exposure is generally assessed in a group of workers who share a common job title or work in a common area. While published results suggest that the degree of homogeneity varies widely across job groups, less is known whether such variation differs across industrial sectors, classes of contaminants, or in the methods used to group workers. Relying upon a compilation of results presented in the literature, patterns of homogeneity among nearly 500 occupational groups of workers were evaluated on the basis of type of industry and agent. Additionally, effects of the characteristics of the sampling strategy on estimated indicators of homogeneity of exposure were assessed. ^ Exposure profiles for occupational groups of workers have typically been assessed under the assumption of stationarity, i.e., the mean exposure level and variance of the distribution that describes the underlying population of exposures are constant over time. Yet, the literature has shown that occupational exposures have declined in the last decades. This renders traditional methods for the description of exposure profiles inadequate. Thus, work was needed to develop appropriate methods to assess homogeneity for groups of workers whose exposures have changed over time. A study was carried out applying mixed effects models with a term for temporal trend to appropriately describe exposure profiles of groups of workers in the nickel-producing industry over a 20-year period. Using a sub-set of groups of nickel-exposed workers, another study was conducted to develop and apply a framework to evaluate the assumption of stationarity of the variances in the presence of systematic changes in exposure levels over time. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The study of the large-sample distribution of the canonical correlations and variates in cointegrated models is extended from the first-order autoregression model to autoregression of any (finite) order. The cointegrated process considered here is nonstationary in some dimensions and stationary in some other directions, but the first difference (the “error-correction form”) is stationary. The asymptotic distribution of the canonical correlations between the first differences and the predictor variables as well as the corresponding canonical variables is obtained under the assumption that the process is Gaussian. The method of analysis is similar to that used for the first-order process.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A distribution of tumor size at detection is derived within the framework of a mechanistic model of carcinogenesis with the object of estimating biologically meaningful parameters of tumor latency. Its limiting form appears to be a generalization of the distribution that arises in the length-biased sampling from stationary point processes. The model renders the associated estimation problems tractable. The usefulness of the proposed approach is illustrated with an application to clinical data on premenopausal breast cancer.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bibliography: p. 146.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 60G70, 60F12, 60G10.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: Primary 60J80, Secondary 62F12, 60G99.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The strategy to control the noxious water hyacinth requires reliable data base on the magnitude of the weed problem. This report which presents up to date (1997) information on distribution, coverage and movement of the weed in Uganda is intended to provide this basis as suppliment to other inputs for control planning. The report is the first in a series being prepared by FIRI from results the "Water hyacinth. research project" whose major goal is to supply information needed to define and focus the scope of the control process for water hyacinth in view of the fact that the weed is to remain a permanent feature of the aquatic landscape of Uganda. Water hyacinth is firmly established in lakes Victoria, Kyoga Albert and along the River Nile, distributed in two distinct forms namely as stationary fringes the shoreline and as mobile mats and large "fields" usually found in sheltered bays.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Statistical approaches to study extreme events require, by definition, long time series of data. In many scientific disciplines, these series are often subject to variations at different temporal scales that affect the frequency and intensity of their extremes. Therefore, the assumption of stationarity is violated and alternative methods to conventional stationary extreme value analysis (EVA) must be adopted. Using the example of environmental variables subject to climate change, in this study we introduce the transformed-stationary (TS) methodology for non-stationary EVA. This approach consists of (i) transforming a non-stationary time series into a stationary one, to which the stationary EVA theory can be applied, and (ii) reverse transforming the result into a non-stationary extreme value distribution. As a transformation, we propose and discuss a simple time-varying normalization of the signal and show that it enables a comprehensive formulation of non-stationary generalized extreme value (GEV) and generalized Pareto distribution (GPD) models with a constant shape parameter. A validation of the methodology is carried out on time series of significant wave height, residual water level, and river discharge, which show varying degrees of long-term and seasonal variability. The results from the proposed approach are comparable with the results from (a) a stationary EVA on quasi-stationary slices of non-stationary series and (b) the established method for non-stationary EVA. However, the proposed technique comes with advantages in both cases. For example, in contrast to (a), the proposed technique uses the whole time horizon of the series for the estimation of the extremes, allowing for a more accurate estimation of large return levels. Furthermore, with respect to (b), it decouples the detection of non-stationary patterns from the fitting of the extreme value distribution. As a result, the steps of the analysis are simplified and intermediate diagnostics are possible. In particular, the transformation can be carried out by means of simple statistical techniques such as low-pass filters based on the running mean and the standard deviation, and the fitting procedure is a stationary one with a few degrees of freedom and is easy to implement and control. An open-source MAT-LAB toolbox has been developed to cover this methodology, which is available at https://github.com/menta78/tsEva/(Mentaschi et al., 2016).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sequential panel selection methods (spsms — procedures that sequentially use conventional panel unit root tests to identify I(0)I(0) time series in panels) are increasingly used in the empirical literature. We check the reliability of spsms by using Monte Carlo simulations based on generating directly the individual asymptotic pp values to be combined into the panel unit root tests, in this way isolating the classification abilities of the procedures from the small sample properties of the underlying univariate unit root tests. The simulations consider both independent and cross-dependent individual test statistics. Results suggest that spsms may offer advantages over time series tests only under special conditions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Anomaly detection in a WSN is an important aspect of data analysis in order to identify data items that significantly differ from normal data. A characteristic of the data generated by a WSN is that the data distribution may alter over the lifetime of the network due to the changing nature of the phenomenon being observed. Anomaly detection techniques must be able to adapt to a non-stationary data distribution in order to perform optimally. In this survey, we provide a comprehensive overview of approaches to anomaly detection in a WSN and their operation in a non-stationary environment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We use a probing strategy to estimate the time dependent traffic intensity in an Mt/Gt/1 queue, where the arrival rate and the general service-time distribution change from one time interval to another, and derive statistical properties of the proposed estimator. We present a method to detect a switch from a stationary interval to another using a sequence of probes to improve the estimation. At the end, we compare our results with two estimators proposed in the literature for the M/G/1 queue.