924 resultados para Automatic Analysis of Multivariate Categorical Data Sets
Resumo:
This thesis proposes a framework for identifying the root-cause of a voltage disturbance, as well as, its source location (upstream/downstream) from the monitoring place. The framework works with three-phase voltage and current waveforms collected in radial distribution networks without distributed generation. Real-world and synthetic waveforms are used to test it. The framework involves features that are conceived based on electrical principles, and assuming some hypothesis on the analyzed phenomena. Features considered are based on waveforms and timestamp information. Multivariate analysis of variance and rule induction algorithms are applied to assess the amount of meaningful information explained by each feature, according to the root-cause of the disturbance and its source location. The obtained classification rates show that the proposed framework could be used for automatic diagnosis of voltage disturbances collected in radial distribution networks. Furthermore, the diagnostic results can be subsequently used for supporting power network operation, maintenance and planning.
Resumo:
We compare European Centre for Medium-Range Weather Forecasts 15-year reanalysis (ERA-15) moisture over the tropical oceans with satellite observations and the U.S. National Centers for Environmental Prediction (NCEP) National Center for Atmospheric Research 40-year reanalysis. When systematic differences in moisture between the observational and reanalysis data sets are removed, the NCEP data show excellent agreement with the observations while the ERA-15 variability exhibits remarkable differences. By forcing agreement between ERA-15 column water vapor and the observations, where available, by scaling the entire moisture column accordingly, the height-dependent moisture variability remains unchanged for all but the 550–850 hPa layer, where the moisture variability reduces significantly. Thus the excess variation of column moisture in ERA-15 appears to originate in this layer. The moisture variability provided by ERA-15 is not deemed of sufficient quality for use in the validation of climate models.
Resumo:
The complexity inherent in climate data makes it necessary to introduce more than one statistical tool to the researcher to gain insight into the climate system. Empirical orthogonal function (EOF) analysis is one of the most widely used methods to analyze weather/climate modes of variability and to reduce the dimensionality of the system. Simple structure rotation of EOFs can enhance interpretability of the obtained patterns but cannot provide anything more than temporal uncorrelatedness. In this paper, an alternative rotation method based on independent component analysis (ICA) is considered. The ICA is viewed here as a method of EOF rotation. Starting from an initial EOF solution rather than rotating the loadings toward simplicity, ICA seeks a rotation matrix that maximizes the independence between the components in the time domain. If the underlying climate signals have an independent forcing, one can expect to find loadings with interpretable patterns whose time coefficients have properties that go beyond simple noncorrelation observed in EOFs. The methodology is presented and an application to monthly means sea level pressure (SLP) field is discussed. Among the rotated (to independence) EOFs, the North Atlantic Oscillation (NAO) pattern, an Arctic Oscillation–like pattern, and a Scandinavian-like pattern have been identified. There is the suggestion that the NAO is an intrinsic mode of variability independent of the Pacific.
Resumo:
There are now considerable expectations that semi-distributed models are useful tools for supporting catchment water quality management. However, insufficient attention has been given to evaluating the uncertainties inherent to this type of model, especially those associated with the spatial disaggregation of the catchment. The Integrated Nitrogen in Catchments model (INCA) is subjected to an extensive regionalised sensitivity analysis in application to the River Kennet, part of the groundwater-dominated upper Thames catchment, UK The main results are: (1) model output was generally insensitive to land-phase parameters, very sensitive to groundwater parameters, including initial conditions, and significantly sensitive to in-river parameters; (2) INCA was able to produce good fits simultaneously to the available flow, nitrate and ammonium in-river data sets; (3) representing parameters as heterogeneous over the catchment (206 calibrated parameters) rather than homogeneous (24 calibrated parameters) produced a significant improvement in fit to nitrate but no significant improvement to flow and caused a deterioration in ammonium performance; (4) the analysis indicated that calibrating the flow-related parameters first, then calibrating the remaining parameters (as opposed to calibrating all parameters together) was not a sensible strategy in this case; (5) even the parameters to which the model output was most sensitive suffered from high uncertainty due to spatial inconsistencies in the estimated optimum values, parameter equifinality and the sampling error associated with the calibration method; (6) soil and groundwater nutrient and flow data are needed to reduce. uncertainty in initial conditions, residence times and nitrogen transformation parameters, and long-term historic data are needed so that key responses to changes in land-use management can be assimilated. The results indicate the general, difficulty of reconciling the questions which catchment nutrient models are expected to answer with typically limited data sets and limited knowledge about suitable model structures. The results demonstrate the importance of analysing semi-distributed model uncertainties prior to model application, and illustrate the value and limitations of using Monte Carlo-based methods for doing so. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
The Representative Soil Sampling Scheme of England and Wales has recorded information on the soil of agricultural land in England and Wales since 1969. It is a valuable source of information about the soil in the context of monitoring for sustainable agricultural development. Changes in soil nutrient status and pH were examined over the period 1971-2001. Several methods of statistical analysis were applied to data from the surveys during this period. The main focus here is on the data for 1971, 1981, 1991 and 2001. The results of examining change over time in general show that levels of potassium in the soil have increased, those of magnesium have remained fairly constant, those of phosphorus have declined and pH has changed little. Future sampling needs have been assessed in the context of monitoring, to determine the mean at a given level of confidence and tolerable error and to detect change in the mean over time at these same levels over periods of 5 and 10 years. The results of a non-hierarchical multivariate classification suggest that England and Wales could be stratified to optimize future sampling and analysis. To monitor soil quality and health more generally than for agriculture, more of the country should be sampled and a wider range of properties recorded.
Resumo:
The elucidation of spatial variation in the landscape can indicate potential wildlife habitats or breeding sites for vectors, such as ticks or mosquitoes, which cause a range of diseases. Information from remotely sensed data could aid the delineation of vegetation distribution on the ground in areas where local knowledge is limited. The data from digital images are often difficult to interpret because of pixel-to-pixel variation, that is, noise, and complex variation at more than one spatial scale. Landsat Thematic Mapper Plus (ETM+) and Satellite Pour l'Observation de La Terre (SPOT) image data were analyzed for an area close to Douna in Mali, West Africa. The variograms of the normalized difference vegetation index (NDVI) from both types of image data were nested. The parameters of the nested variogram function from the Landsat ETM+ data were used to design the sampling for a ground survey of soil and vegetation data. Variograms of the soil and vegetation data showed that their variation was anisotropic and their scales of variation were similar to those of NDVI from the SPOT data. The short- and long-range components of variation in the SPOT data were filtered out separately by factorial kriging. The map of the short-range component appears to represent the patterns of vegetation and associated shallow slopes and drainage channels of the tiger bush system. The map of the long-range component also appeared to relate to broader patterns in the tiger bush and to gentle undulations in the topography. The results suggest that the types of image data analyzed in this study could be used to identify areas with more moisture in semiarid regions that could support wildlife and also be potential vector breeding sites.
Resumo:
We use the third perihelion pass by the Ulysses spacecraft to illustrate and investigate the “flux excess” effect, whereby open solar flux estimates from spacecraft increase with increasing heliocentric distance. We analyze the potential effects of small-scale structure in the heliospheric field (giving fluctuations in the radial component on timescales smaller than 1 h) and kinematic time-of-flight effects of longitudinal structure in the solar wind flow. We show that the flux excess is explained by neither very small-scale structure (timescales < 1 h) nor by the kinematic “bunching effect” on spacecraft sampling. The observed flux excesses is, however, well explained by the kinematic effect of larger-scale (>1 day) solar wind speed variations on the frozen-in heliospheric field. We show that averaging over an interval T (that is long enough to eliminate structure originating in the heliosphere yet small enough to avoid cancelling opposite polarity radial field that originates from genuine sector structure in the coronal source field) is only an approximately valid way of allowing for these effects and does not adequately explain or account for differences between the streamer belt and the polar coronal holes.
Resumo:
Genetic association analyses of family-based studies with ordered categorical phenotypes are often conducted using methods either for quantitative or for binary traits, which can lead to suboptimal analyses. Here we present an alternative likelihood-based method of analysis for single nucleotide polymorphism (SNP) genotypes and ordered categorical phenotypes in nuclear families of any size. Our approach, which extends our previous work for binary phenotypes, permits straightforward inclusion of covariate, gene-gene and gene-covariate interaction terms in the likelihood, incorporates a simple model for ascertainment and allows for family-specific effects in the hypothesis test. Additionally, our method produces interpretable parameter estimates and valid confidence intervals. We assess the proposed method using simulated data, and apply it to a polymorphism in the c-reactive protein (CRP) gene typed in families collected to investigate human systemic lupus erythematosus. By including sex interactions in the analysis, we show that the polymorphism is associated with anti-nuclear autoantibody (ANA) production in females, while there appears to be no effect in males.