83 resultados para Transformation-based semi-parametric estimators
em Reposit
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Concept drift is a problem of increasing importance in machine learning and data mining. Data sets under analysis are no longer only static databases, but also data streams in which concepts and data distributions may not be stable over time. However, most learning algorithms produced so far are based on the assumption that data comes from a fixed distribution, so they are not suitable to handle concept drifts. Moreover, some concept drifts applications requires fast response, which means an algorithm must always be (re) trained with the latest available data. But the process of labeling data is usually expensive and/or time consuming when compared to unlabeled data acquisition, thus only a small fraction of the incoming data may be effectively labeled. Semi-supervised learning methods may help in this scenario, as they use both labeled and unlabeled data in the training process. However, most of them are also based on the assumption that the data is static. Therefore, semi-supervised learning with concept drifts is still an open challenge in machine learning. Recently, a particle competition and cooperation approach was used to realize graph-based semi-supervised learning from static data. In this paper, we extend that approach to handle data streams and concept drift. The result is a passive algorithm using a single classifier, which naturally adapts to concept changes, without any explicit drift detection mechanism. Its built-in mechanisms provide a natural way of learning from new data, gradually forgetting older knowledge as older labeled data items became less influent on the classification of newer data items. Some computer simulation are presented, showing the effectiveness of the proposed method.
Resumo:
Semi-supervised learning is applied to classification problems where only a small portion of the data items is labeled. In these cases, the reliability of the labels is a crucial factor, because mislabeled items may propagate wrong labels to a large portion or even the entire data set. This paper aims to address this problem by presenting a graph-based (network-based) semi-supervised learning method, specifically designed to handle data sets with mislabeled samples. The method uses teams of walking particles, with competitive and cooperative behavior, for label propagation in the network constructed from the input data set. The proposed model is nature-inspired and it incorporates some features to make it robust to a considerable amount of mislabeled data items. Computer simulations show the performance of the method in the presence of different percentage of mislabeled data, in networks of different sizes and average node degree. Importantly, these simulations reveals the existence of the critical points of the mislabeled subset size, below which the network is free of wrong label contamination, but above which the mislabeled samples start to propagate their labels to the rest of the network. Moreover, numerical comparisons have been made among the proposed method and other representative graph-based semi-supervised learning methods using both artificial and real-world data sets. Interestingly, the proposed method has increasing better performance than the others as the percentage of mislabeled samples is getting larger. © 2012 IEEE.
Resumo:
Concept drift, which refers to non stationary learning problems over time, has increasing importance in machine learning and data mining. Many concept drift applications require fast response, which means an algorithm must always be (re)trained with the latest available data. But the process of data labeling is usually expensive and/or time consuming when compared to acquisition of unlabeled data, thus usually only a small fraction of the incoming data may be effectively labeled. Semi-supervised learning methods may help in this scenario, as they use both labeled and unlabeled data in the training process. However, most of them are based on assumptions that the data is static. Therefore, semi-supervised learning with concept drifts is still an open challenging task in machine learning. Recently, a particle competition and cooperation approach has been developed to realize graph-based semi-supervised learning from static data. We have extend that approach to handle data streams and concept drift. The result is a passive algorithm which uses a single classifier approach, naturally adapted to concept changes without any explicit drift detection mechanism. It has built-in mechanisms that provide a natural way of learning from new data, gradually "forgetting" older knowledge as older data items are no longer useful for the classification of newer data items. The proposed algorithm is applied to the KDD Cup 1999 Data of network intrusion, showing its effectiveness.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The GPS observables are subject to several errors. Among them, the systematic ones have great impact, because they degrade the accuracy of the accomplished positioning. These errors are those related, mainly, to GPS satellites orbits, multipath and atmospheric effects. Lately, a method has been suggested to mitigate these errors: the semiparametric model and the penalised least squares technique (PLS). In this method, the errors are modeled as functions varying smoothly in time. It is like to change the stochastic model, in which the errors functions are incorporated, the results obtained are similar to those in which the functional model is changed. As a result, the ambiguities and the station coordinates are estimated with better reliability and accuracy than the conventional least square method (CLS). In general, the solution requires a shorter data interval, minimizing costs. The method performance was analyzed in two experiments, using data from single frequency receivers. The first one was accomplished with a short baseline, where the main error was the multipath. In the second experiment, a baseline of 102 km was used. In this case, the predominant errors were due to the ionosphere and troposphere refraction. In the first experiment, using 5 minutes of data collection, the largest coordinates discrepancies in relation to the ground truth reached 1.6 cm and 3.3 cm in h coordinate for PLS and the CLS, respectively, in the second one, also using 5 minutes of data, the discrepancies were 27 cm in h for the PLS and 175 cm in h for the CLS. In these tests, it was also possible to verify a considerable improvement in the ambiguities resolution using the PLS in relation to the CLS, with a reduced data collection time interval. © Springer-Verlag Berlin Heidelberg 2007.
Resumo:
Identification and classification of overlapping nodes in networks are important topics in data mining. In this paper, a network-based (graph-based) semi-supervised learning method is proposed. It is based on competition and cooperation among walking particles in a network to uncover overlapping nodes by generating continuous-valued outputs (soft labels), corresponding to the levels of membership from the nodes to each of the communities. Moreover, the proposed method can be applied to detect overlapping data items in a data set of general form, such as a vector-based data set, once it is transformed to a network. Usually, label propagation involves risks of error amplification. In order to avoid this problem, the proposed method offers a mechanism to identify outliers among the labeled data items, and consequently prevents error propagation from such outliers. Computer simulations carried out for synthetic and real-world data sets provide a numeric quantification of the performance of the method. © 2012 Springer-Verlag.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Pós-graduação em Agronomia (Energia na Agricultura) - FCA
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
A new series of high temperature copper based shape memory alloys has recently been patented. These alloys contain 8-20 wt% Al, 1-20 wt% Ag, 0-2 wt% of a minor element (preferably Co), balance copper. The martensitic start transformation temperatures of these alloys are above 200 degrees C and, in some cases, they have good high temperature stability and may be useful in commercial applications where higher operating temperatures than those obtained from Cu-Zn-Al and Cu-Al-Ni shape memory alloys are required.
Resumo:
In this paper is proposed a methodology for semiautomatic CBERS image orientation using roads as ground control. It is based on an iterative strategy involving three steps. In the first step, an operator identifies on the image the ground control roads and supplies along them a few seed points, which could be sparsely and coarsely distributed. These seed points are used by the dynamic programming algorithm for extracting the ground control roads from the image. In the second step, it is established the correspondences between points describing the ground control roads and the corresponding ones extracted from the image. In the last step, the corresponding points are used to orient the CBERS image by using the DLT (Direct Linear Transformation). The two last steps are iterated until the convergence of the orientation process is verified. Experimental results showed that the proposed methodology was efficient with several test images. In all cases the orientation process converged. Moreover, the estimated orientation parameters allowed the registration of check roads with pixel accuracy or better.
Resumo:
Parametric VaR (Value-at-Risk) is widely used due to its simplicity and easy calculation. However, the normality assumption, often used in the estimation of the parametric VaR, does not provide satisfactory estimates for risk exposure. Therefore, this study suggests a method for computing the parametric VaR based on goodness-of-fit tests using the empirical distribution function (EDF) for extreme returns, and compares the feasibility of this method for the banking sector in an emerging market and in a developed one. The paper also discusses possible theoretical contributions in related fields like enterprise risk management (ERM). © 2013 Elsevier Ltd.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
The Box-Cox transformation is a technique mostly utilized to turn the probabilistic distribution of a time series data into approximately normal. And this helps statistical and neural models to perform more accurate forecastings. However, it introduces a bias when the reversion of the transformation is conducted with the predicted data. The statistical methods to perform a bias-free reversion require, necessarily, the assumption of Gaussianity of the transformed data distribution, which is a rare event in real-world time series. So, the aim of this study was to provide an effective method of removing the bias when the reversion of the Box-Cox transformation is executed. Thus, the developed method is based on a focused time lagged feedforward neural network, which does not require any assumption about the transformed data distribution. Therefore, to evaluate the performance of the proposed method, numerical simulations were conducted and the Mean Absolute Percentage Error, the Theil Inequality Index and the Signal-to-Noise ratio of 20-step-ahead forecasts of 40 time series were compared, and the results obtained indicate that the proposed reversion method is valid and justifies new studies. (C) 2014 Elsevier B.V. All rights reserved.