796 resultados para Empirical Algorithm Analysis
Resumo:
The complexity inherent in climate data makes it necessary to introduce more than one statistical tool to the researcher to gain insight into the climate system. Empirical orthogonal function (EOF) analysis is one of the most widely used methods to analyze weather/climate modes of variability and to reduce the dimensionality of the system. Simple structure rotation of EOFs can enhance interpretability of the obtained patterns but cannot provide anything more than temporal uncorrelatedness. In this paper, an alternative rotation method based on independent component analysis (ICA) is considered. The ICA is viewed here as a method of EOF rotation. Starting from an initial EOF solution rather than rotating the loadings toward simplicity, ICA seeks a rotation matrix that maximizes the independence between the components in the time domain. If the underlying climate signals have an independent forcing, one can expect to find loadings with interpretable patterns whose time coefficients have properties that go beyond simple noncorrelation observed in EOFs. The methodology is presented and an application to monthly means sea level pressure (SLP) field is discussed. Among the rotated (to independence) EOFs, the North Atlantic Oscillation (NAO) pattern, an Arctic Oscillation–like pattern, and a Scandinavian-like pattern have been identified. There is the suggestion that the NAO is an intrinsic mode of variability independent of the Pacific.
Phosphorus dynamics and export in streams draining micro-catchments: Development of empirical models
Resumo:
Annual total phosphorus (TP) export data from 108 European micro-catchments were analyzed against descriptive catchment data on climate (runoff), soil types, catchment size, and land use. The best possible empirical model developed included runoff, proportion of agricultural land and catchment size as explanatory variables but with a low explanation of the variance in the dataset (R-2 = 0.37). Improved country specific empirical models could be developed in some cases. The best example was from Norway where an analysis of TP-export data from 12 predominantly agricultural micro-catchments revealed a relationship explaining 96% of the variance in TP-export. The explanatory variables were in this case soil-P status (P-AL), proportion of organic soil, and the export of suspended sediment. Another example is from Denmark where an empirical model was established for the basic annual average TP-export from 24 catchments with percentage sandy soils, percentage organic soils, runoff, and application of phosphorus in fertilizer and animal manure as explanatory variables (R-2 = 0.97).
Resumo:
Across Europe, elevated phosphorus (P) concentrations in lowland rivers have made them particularly susceptible to eutrophication. This is compounded in southern and central UK by increasing pressures on water resources, which may be further enhanced by the potential effects of climate change. The EU Water Framework Directive requires an integrated approach to water resources management at the catchment scale and highlights the need for modelling tools that can distinguish relative contributions from multiple nutrient sources and are consistent with the information content of the available data. Two such models are introduced and evaluated within a stochastic framework using daily flow and total phosphorus concentrations recorded in a clay catchment typical of many areas of the lowland UK. Both models disaggregate empirical annual load estimates, derived from land use data, as a function of surface/near surface runoff, generated using a simple conceptual rainfall-runoff model. Estimates of the daily load from agricultural land, together with those from baseflow and point sources, feed into an in-stream routing algorithm. The first model assumes constant concentrations in runoff via surface/near surface pathways and incorporates an additional P store in the river-bed sediments, depleted above a critical discharge, to explicitly simulate resuspension. The second model, which is simpler, simulates P concentrations as a function of surface/near surface runoff, thus emphasising the influence of non-point source loads during flow peaks and mixing of baseflow and point sources during low flows. The temporal consistency of parameter estimates and thus the suitability of each approach is assessed dynamically following a new approach based on Monte-Carlo analysis. (c) 2004 Elsevier B.V. All rights reserved.
Resumo:
The aim of the study was to establish and verify a predictive vegetation model for plant community distribution in the alti-Mediterranean zone of the Lefka Ori massif, western Crete. Based on previous work three variables were identified as significant determinants of plant community distribution, namely altitude, slope angle and geomorphic landform. The response of four community types against these variables was tested using classification trees analysis in order to model community type occurrence. V-fold cross-validation plots were used to determine the length of the best fitting tree. The final 9node tree selected, classified correctly 92.5% of the samples. The results were used to provide decision rules for the construction of a spatial model for each community type. The model was implemented within a Geographical Information System (GIS) to predict the distribution of each community type in the study site. The evaluation of the model in the field using an error matrix gave an overall accuracy of 71%. The user's accuracy was higher for the Crepis-Cirsium (100%) and Telephium-Herniaria community type (66.7%) and relatively lower for the Peucedanum-Alyssum and Dianthus-Lomelosia community types (63.2% and 62.5%, respectively). Misclassification and field validation points to the need for improved geomorphological mapping and suggests the presence of transitional communities between existing community types.
Resumo:
Empirical orthogonal function (EOF) analysis is a powerful tool for data compression and dimensionality reduction used broadly in meteorology and oceanography. Often in the literature, EOF modes are interpreted individually, independent of other modes. In fact, it can be shown that no such attribution can generally be made. This review demonstrates that in general individual EOF modes (i) will not correspond to individual dynamical modes, (ii) will not correspond to individual kinematic degrees of freedom, (iii) will not be statistically independent of other EOF modes, and (iv) will be strongly influenced by the nonlocal requirement that modes maximize variance over the entire domain. The goal of this review is not to argue against the use of EOF analysis in meteorology and oceanography; rather, it is to demonstrate the care that must be taken in the interpretation of individual modes in order to distinguish the medium from the message.
Resumo:
Space weather effects on technological systems originate with energy carried from the Sun to the terrestrial environment by the solar wind. In this study, we present results of modeling of solar corona-heliosphere processes to predict solar wind conditions at the L1 Lagrangian point upstream of Earth. In particular we calculate performance metrics for (1) empirical, (2) hybrid empirical/physics-based, and (3) full physics-based coupled corona-heliosphere models over an 8-year period (1995–2002). L1 measurements of the radial solar wind speed are the primary basis for validation of the coronal and heliosphere models studied, though other solar wind parameters are also considered. The models are from the Center for Integrated Space-Weather Modeling (CISM) which has developed a coupled model of the whole Sun-to-Earth system, from the solar photosphere to the terrestrial thermosphere. Simple point-by-point analysis techniques, such as mean-square-error and correlation coefficients, indicate that the empirical coronal-heliosphere model currently gives the best forecast of solar wind speed at 1 AU. A more detailed analysis shows that errors in the physics-based models are predominately the result of small timing offsets to solar wind structures and that the large-scale features of the solar wind are actually well modeled. We suggest that additional “tuning” of the coupling between the coronal and heliosphere models could lead to a significant improvement of their accuracy. Furthermore, we note that the physics-based models accurately capture dynamic effects at solar wind stream interaction regions, such as magnetic field compression, flow deflection, and density buildup, which the empirical scheme cannot.
Resumo:
Nitrogen oxide biogenic emissions from soils are driven by soil and environmental parameters. The relationship between these parameters and NO fluxes is highly non linear. A new algorithm, based on a neural network calculation, is used to reproduce the NO biogenic emissions linked to precipitations in the Sahel on the 6 August 2006 during the AMMA campaign. This algorithm has been coupled in the surface scheme of a coupled chemistry dynamics model (MesoNH Chemistry) to estimate the impact of the NO emissions on NOx and O3 formation in the lower troposphere for this particular episode. Four different simulations on the same domain and at the same period are compared: one with anthropogenic emissions only, one with soil NO emissions from a static inventory, at low time and space resolution, one with NO emissions from neural network, and one with NO from neural network plus lightning NOx. The influence of NOx from lightning is limited to the upper troposphere. The NO emission from soils calculated with neural network responds to changes in soil moisture giving enhanced emissions over the wetted soil, as observed by aircraft measurements after the passing of a convective system. The subsequent enhancement of NOx and ozone is limited to the lowest layers of the atmosphere in modelling, whereas measurements show higher concentrations above 1000 m. The neural network algorithm, applied in the Sahel region for one particular day of the wet season, allows an immediate response of fluxes to environmental parameters, unlike static emission inventories. Stewart et al (2008) is a companion paper to this one which looks at NOx and ozone concentrations in the boundary layer as measured on a research aircraft, examines how they vary with respect to the soil moisture, as indicated by surface temperature anomalies, and deduces NOx fluxes. In this current paper the model-derived results are compared to the observations and calculated fluxes presented by Stewart et al (2008).
Resumo:
Locality to other nodes on a peer-to-peer overlay network can be established by means of a set of landmarks shared among the participating nodes. Each node independently collects a set of latency measures to landmark nodes, which are used as a multi-dimensional feature vector. Each peer node uses the feature vector to generate a unique scalar index which is correlated to its topological locality. A popular dimensionality reduction technique is the space filling Hilbert’s curve, as it possesses good locality preserving properties. However, there exists little comparison between Hilbert’s curve and other techniques for dimensionality reduction. This work carries out a quantitative analysis of their properties. Linear and non-linear techniques for scaling the landmark vectors to a single dimension are investigated. Hilbert’s curve, Sammon’s mapping and Principal Component Analysis have been used to generate a 1d space with locality preserving properties. This work provides empirical evidence to support the use of Hilbert’s curve in the context of locality preservation when generating peer identifiers by means of landmark vector analysis. A comparative analysis is carried out with an artificial 2d network model and with a realistic network topology model with a typical power-law distribution of node connectivity in the Internet. Nearest neighbour analysis confirms Hilbert’s curve to be very effective in both artificial and realistic network topologies. Nevertheless, the results in the realistic network model show that there is scope for improvements and better techniques to preserve locality information are required.
Resumo:
Capturing the pattern of structural change is a relevant task in applied demand analysis, as consumer preferences may vary significantly over time. Filtering and smoothing techniques have recently played an increasingly relevant role. A dynamic Almost Ideal Demand System with random walk parameters is estimated in order to detect modifications in consumer habits and preferences, as well as changes in the behavioural response to prices and income. Systemwise estimation, consistent with the underlying constraints from economic theory, is achieved through the EM algorithm. The proposed model is applied to UK aggregate consumption of alcohol and tobacco, using quarterly data from 1963 to 2003. Increased alcohol consumption is explained by a preference shift, addictive behaviour and a lower price elasticity. The dynamic and time-varying specification is consistent with the theoretical requirements imposed at each sample point. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
The aim of the study was to establish and verify a predictive vegetation model for plant community distribution in the alti-Mediterranean zone of the Lefka Ori massif, western Crete. Based on previous work three variables were identified as significant determinants of plant community distribution, namely altitude, slope angle and geomorphic landform. The response of four community types against these variables was tested using classification trees analysis in order to model community type occurrence. V-fold cross-validation plots were used to determine the length of the best fitting tree. The final 9node tree selected, classified correctly 92.5% of the samples. The results were used to provide decision rules for the construction of a spatial model for each community type. The model was implemented within a Geographical Information System (GIS) to predict the distribution of each community type in the study site. The evaluation of the model in the field using an error matrix gave an overall accuracy of 71%. The user's accuracy was higher for the Crepis-Cirsium (100%) and Telephium-Herniaria community type (66.7%) and relatively lower for the Peucedanum-Alyssum and Dianthus-Lomelosia community types (63.2% and 62.5%, respectively). Misclassification and field validation points to the need for improved geomorphological mapping and suggests the presence of transitional communities between existing community types.
Resumo:
Population size estimation with discrete or nonparametric mixture models is considered, and reliable ways of construction of the nonparametric mixture model estimator are reviewed and set into perspective. Construction of the maximum likelihood estimator of the mixing distribution is done for any number of components up to the global nonparametric maximum likelihood bound using the EM algorithm. In addition, the estimators of Chao and Zelterman are considered with some generalisations of Zelterman’s estimator. All computations are done with CAMCR, a special software developed for population size estimation with mixture models. Several examples and data sets are discussed and the estimators illustrated. Problems using the mixture model-based estimators are highlighted.
Resumo:
Nested clade phylogeographic analysis (NCPA) is a popular method for reconstructing the demographic history of spatially distributed populations from genetic data. Although some parts of the analysis are automated, there is no unique and widely followed algorithm for doing this in its entirety, beginning with the data, and ending with the inferences drawn from the data. This article describes a method that automates NCPA, thereby providing a framework for replicating analyses in an objective way. To do so, a number of decisions need to be made so that the automated implementation is representative of previous analyses. We review how the NCPA procedure has evolved since its inception and conclude that there is scope for some variability in the manual application of NCPA. We apply the automated software to three published datasets previously analyzed manually and replicate many details of the manual analyses, suggesting that the current algorithm is representative of how a typical user will perform NCPA. We simulate a large number of replicate datasets for geographically distributed, but entirely random-mating, populations. These are then analyzed using the automated NCPA algorithm. Results indicate that NCPA tends to give a high frequency of false positives. In our simulations we observe that 14% of the clades give a conclusive inference that a demographic event has occurred, and that 75% of the datasets have at least one clade that gives such an inference. This is mainly due to the generation of multiple statistics per clade, of which only one is required to be significant to apply the inference key. We survey the inferences that have been made in recent publications and show that the most commonly inferred processes (restricted gene flow with isolation by distance and contiguous range expansion) are those that are commonly inferred in our simulations. However, published datasets typically yield a richer set of inferences with NCPA than obtained in our random-mating simulations, and further testing of NCPA with models of structured populations is necessary to examine its accuracy.
Resumo:
Accurately and reliably identifying the actual number of clusters present with a dataset of gene expression profiles, when no additional information on cluster structure is available, is a problem addressed by few algorithms. GeneMCL transforms microarray analysis data into a graph consisting of nodes connected by edges, where the nodes represent genes, and the edges represent the similarity in expression of those genes, as given by a proximity measurement. This measurement is taken to be the Pearson correlation coefficient combined with a local non-linear rescaling step. The resulting graph is input to the Markov Cluster (MCL) algorithm, which is an elegant, deterministic, non-specific and scalable method, which models stochastic flow through the graph. The algorithm is inherently affected by any cluster structure present, and rapidly decomposes a graph into cohesive clusters. The potential of the GeneMCL algorithm is demonstrated with a 5730 gene subset (IGS) of the Van't Veer breast cancer database, for which the clusterings are shown to reflect underlying biological mechanisms. (c) 2005 Elsevier Ltd. All rights reserved.
Resumo:
Inferring the spatial expansion dynamics of invading species from molecular data is notoriously difficult due to the complexity of the processes involved. For these demographic scenarios, genetic data obtained from highly variable markers may be profitably combined with specific sampling schemes and information from other sources using a Bayesian approach. The geographic range of the introduced toad Bufo marinus is still expanding in eastern and northern Australia, in each case from isolates established around 1960. A large amount of demographic and historical information is available on both expansion areas. In each area, samples were collected along a transect representing populations of different ages and genotyped at 10 microsatellite loci. Five demographic models of expansion, differing in the dispersal pattern for migrants and founders and in the number of founders, were considered. Because the demographic history is complex, we used an approximate Bayesian method, based on a rejection-regression algorithm. to formally test the relative likelihoods of the five models of expansion and to infer demographic parameters. A stepwise migration-foundation model with founder events was statistically better supported than other four models in both expansion areas. Posterior distributions supported different dynamics of expansion in the studied areas. Populations in the eastern expansion area have a lower stable effective population size and have been founded by a smaller number of individuals than those in the northern expansion area. Once demographically stabilized, populations exchange a substantial number of effective migrants per generation in both expansion areas, and such exchanges are larger in northern than in eastern Australia. The effective number of migrants appears to be considerably lower than that of founders in both expansion areas. We found our inferences to be relatively robust to various assumptions on marker. demographic, and historical features. The method presented here is the only robust, model-based method available so far, which allows inferring complex population dynamics over a short time scale. It also provides the basis for investigating the interplay between population dynamics, drift, and selection in invasive species.
Resumo:
Uncertainty contributes a major part in the accuracy of a decision-making process while its inconsistency is always difficult to be solved by existing decision-making tools. Entropy has been proved to be useful to evaluate the inconsistency of uncertainty among different respondents. The study demonstrates an entropy-based financial decision support system called e-FDSS. This integrated system provides decision support to evaluate attributes (funding options and multiple risks) available in projects. Fuzzy logic theory is included in the system to deal with the qualitative aspect of these options and risks. An adaptive genetic algorithm (AGA) is also employed to solve the decision algorithm in the system in order to provide optimal and consistent rates to these attributes. Seven simplified and parallel projects from a Hong Kong construction small and medium enterprise (SME) were assessed to evaluate the system. The result shows that the system calculates risk adjusted discount rates (RADR) of projects in an objective way. These rates discount project cash flow impartially. Inconsistency of uncertainty is also successfully evaluated by the use of the entropy method. Finally, the system identifies the favourable funding options that are managed by a scheme called SME Loan Guarantee Scheme (SGS). Based on these results, resource allocation could then be optimized and the best time to start a new project could also be identified throughout the overall project life cycle.