5 resultados para time complexity
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
The attributes describing a data set may often be arranged in meaningful subsets, each of which corresponds to a different aspect of the data. An unsupervised algorithm (SCAD) that simultaneously performs fuzzy clustering and aspects weighting was proposed in the literature. However, SCAD may fail and halt given certain conditions. To fix this problem, its steps are modified and then reordered to reduce the number of parameters required to be set by the user. In this paper we prove that each step of the resulting algorithm, named ASCAD, globally minimizes its cost-function with respect to the argument being optimized. The asymptotic analysis of ASCAD leads to a time complexity which is the same as that of fuzzy c-means. A hard version of the algorithm and a novel validity criterion that considers aspect weights in order to estimate the number of clusters are also described. The proposed method is assessed over several artificial and real data sets.
Resumo:
Complexity in time series is an intriguing feature of living dynamical systems, with potential use for identification of system state. Although various methods have been proposed for measuring physiologic complexity, uncorrelated time series are often assigned high values of complexity, errouneously classifying them as a complex physiological signals. Here, we propose and discuss a method for complex system analysis based on generalized statistical formalism and surrogate time series. Sample entropy (SampEn) was rewritten inspired in Tsallis generalized entropy, as function of q parameter (qSampEn). qSDiff curves were calculated, which consist of differences between original and surrogate series qSampEn. We evaluated qSDiff for 125 real heart rate variability (HRV) dynamics, divided into groups of 70 healthy, 44 congestive heart failure (CHF), and 11 atrial fibrillation (AF) subjects, and for simulated series of stochastic and chaotic process. The evaluations showed that, for nonperiodic signals, qSDiff curves have a maximum point (qSDiff(max)) for q not equal 1. Values of q where the maximum point occurs and where qSDiff is zero were also evaluated. Only qSDiff(max) values were capable of distinguish HRV groups (p-values 5.10 x 10(-3); 1.11 x 10(-7), and 5.50 x 10(-7) for healthy vs. CHF, healthy vs. AF, and CHF vs. AF, respectively), consistently with the concept of physiologic complexity, and suggests a potential use for chaotic system analysis. (C) 2012 American Institute of Physics. [http://dx.doi.org/10.1063/1.4758815]
Resumo:
The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While dozens of classification algorithms have been applied to time series, recent empirical evidence strongly suggests that simple nearest neighbor classification is exceptionally difficult to beat. The choice of distance measure used by the nearest neighbor algorithm is important, and depends on the invariances required by the domain. For example, motion capture data typically requires invariance to warping, and cardiology data requires invariance to the baseline (the mean value). Similarly, recent work suggests that for time series clustering, the choice of clustering algorithm is much less important than the choice of distance measure used.In this work we make a somewhat surprising claim. There is an invariance that the community seems to have missed, complexity invariance. Intuitively, the problem is that in many domains the different classes may have different complexities, and pairs of complex objects, even those which subjectively may seem very similar to the human eye, tend to be further apart under current distance measures than pairs of simple objects. This fact introduces errors in nearest neighbor classification, where some complex objects may be incorrectly assigned to a simpler class. Similarly, for clustering this effect can introduce errors by “suggesting” to the clustering algorithm that subjectively similar, but complex objects belong in a sparser and larger diameter cluster than is truly warranted.We introduce the first complexity-invariant distance measure for time series, and show that it generally produces significant improvements in classification and clustering accuracy. We further show that this improvement does not compromise efficiency, since we can lower bound the measure and use a modification of triangular inequality, thus making use of most existing indexing and data mining algorithms. We evaluate our ideas with the largest and most comprehensive set of time series mining experiments ever attempted in a single work, and show that complexity-invariant distance measures can produce improvements in classification and clustering in the vast majority of cases.
Resumo:
Stochastic methods based on time-series modeling combined with geostatistics can be useful tools to describe the variability of water-table levels in time and space and to account for uncertainty. Monitoring water-level networks can give information about the dynamic of the aquifer domain in both dimensions. Time-series modeling is an elegant way to treat monitoring data without the complexity of physical mechanistic models. Time-series model predictions can be interpolated spatially, with the spatial differences in water-table dynamics determined by the spatial variation in the system properties and the temporal variation driven by the dynamics of the inputs into the system. An integration of stochastic methods is presented, based on time-series modeling and geostatistics as a framework to predict water levels for decision making in groundwater management and land-use planning. The methodology is applied in a case study in a Guarani Aquifer System (GAS) outcrop area located in the southeastern part of Brazil. Communication of results in a clear and understandable form, via simulated scenarios, is discussed as an alternative, when translating scientific knowledge into applications of stochastic hydrogeology in large aquifers with limited monitoring network coverage like the GAS.
Resumo:
Semi-qualitative probabilistic networks (SQPNs) merge two important graphical model formalisms: Bayesian networks and qualitative probabilistic networks. They provade a very Complexity of inferences in polytree-shaped semi-qualitative probabilistic networks and qualitative probabilistic networks. They provide a very general modeling framework by allowing the combination of numeric and qualitative assessments over a discrete domain, and can be compactly encoded by exploiting the same factorization of joint probability distributions that are behind the bayesian networks. This paper explores the computational complexity of semi-qualitative probabilistic networks, and takes the polytree-shaped networks as its main target. We show that the inference problem is coNP-Complete for binary polytrees with multiple observed nodes. We also show that interferences can be performed in time linear in the number of nodes if there is a single observed node. Because our proof is construtive, we obtain an efficient linear time algorithm for SQPNs under such assumptions. To the best of our knowledge, this is the first exact polynominal-time algorithm for SQPn. Together these results provide a clear picture of the inferential complexity in polytree-shaped SQPNs.