Biblioteca Digital

945 resultados para Mean squared error

Biclustering Gene Expression Data using MSR Difference Threshold

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Biclustering is simultaneous clustering of both rows and columns of a data matrix. A measure called Mean Squared Residue (MSR) is used to simultaneously evaluate the coherence of rows and columns within a submatrix. In this paper a novel algorithm is developed for biclustering gene expression data using the newly introduced concept of MSR difference threshold. In the first step high quality bicluster seeds are generated using K-Means clustering algorithm. Then more genes and conditions (node) are added to the bicluster. Before adding a node the MSR X of the bicluster is calculated. After adding the node again the MSR Y is calculated. The added node is deleted if Y minus X is greater than MSR difference threshold or if Y is greater than MSR threshold which depends on the dataset. The MSR difference threshold is different for gene list and condition list and it depends on the dataset also. Proper values should be identified through experimentation in order to obtain biclusters of high quality. The results obtained on bench mark dataset clearly indicate that this algorithm is better than many of the existing biclustering algorithms

Quadratic Predictor based Differential Encoding and Decoding of Speech Signals

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Modeling nonlinear systems using Volterra series is a century old method but practical realizations were hampered by inadequate hardware to handle the increased computational complexity stemming from its use. But interest is renewed recently, in designing and implementing filters which can model much of the polynomial nonlinearities inherent in practical systems. The key advantage in resorting to Volterra power series for this purpose is that nonlinear filters so designed can be made to work in parallel with the existing LTI systems, yielding improved performance. This paper describes the inclusion of a quadratic predictor (with nonlinearity order 2) with a linear predictor in an analog source coding system. Analog coding schemes generally ignore the source generation mechanisms but focuses on high fidelity reconstruction at the receiver. The widely used method of differential pnlse code modulation (DPCM) for speech transmission uses a linear predictor to estimate the next possible value of the input speech signal. But this linear system do not account for the inherent nonlinearities in speech signals arising out of multiple reflections in the vocal tract. So a quadratic predictor is designed and implemented in parallel with the linear predictor to yield improved mean square error performance. The augmented speech coder is tested on speech signals transmitted over an additive white gaussian noise (AWGN) channel.

On the Noise Model of Support Vector Machine Regression

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Support Vector Machines Regression (SVMR) is a regression technique which has been recently introduced by V. Vapnik and his collaborators (Vapnik, 1995; Vapnik, Golowich and Smola, 1996). In SVMR the goodness of fit is measured not by the usual quadratic loss function (the mean square error), but by a different loss function called Vapnik"s $epsilon$- insensitive loss function, which is similar to the "robust" loss functions introduced by Huber (Huber, 1981). The quadratic loss function is well justified under the assumption of Gaussian additive noise. However, the noise model underlying the choice of Vapnik's loss function is less clear. In this paper the use of Vapnik's loss function is shown to be equivalent to a model of additive and Gaussian noise, where the variance and mean of the Gaussian are random variables. The probability distributions for the variance and mean will be stated explicitly. While this work is presented in the framework of SVMR, it can be extended to justify non-quadratic loss functions in any Maximum Likelihood or Maximum A Posteriori approach. It applies not only to Vapnik's loss function, but to a much broader class of loss functions.

CODAMAT: a modern analogue techinque for compositional data

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The quantitative estimation of Sea Surface Temperatures from fossils assemblages is a fundamental issue in palaeoclimatic and paleooceanographic investigations. The Modern Analogue Technique, a widely adopted method based on direct comparison of fossil assemblages with modern coretop samples, was revised with the aim of conforming it to compositional data analysis. The new CODAMAT method was developed by adopting the Aitchison metric as distance measure. Modern coretop datasets are characterised by a large amount of zeros. The zero replacement was carried out by adopting a Bayesian approach to the zero replacement, based on a posterior estimation of the parameter of the multinomial distribution. The number of modern analogues from which reconstructing the SST was determined by means of a multiple approach by considering the Proxies correlation matrix, Standardized Residual Sum of Squares and Mean Squared Distance. This new CODAMAT method was applied to the planktonic foraminiferal assemblages of a core recovered in the Tyrrhenian Sea. Kew words: Modern analogues, Aitchison distance, Proxies correlation matrix, Standardized Residual Sum of Squares

Sampling uncertainties in surface radiation budget calculations in RADAGAST

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the Radiative Atmospheric Divergence Using ARM Mobile Facility GERB and AMMA Stations (RADAGAST) project we calculate the divergence of radiative flux across the atmosphere by comparing fluxes measured at each end of an atmospheric column above Niamey, in the African Sahel region. The combination of broadband flux measurements from geostationary orbit and the deployment for over 12 months of a comprehensive suite of active and passive instrumentation at the surface eliminates a number of sampling issues that could otherwise affect divergence calculations of this sort. However, one sampling issue that challenges the project is the fact that the surface flux data are essentially measurements made at a point, while the top-of-atmosphere values are taken over a solid angle that corresponds to an area at the surface of some 2500 km2. Variability of cloud cover and aerosol loading in the atmosphere mean that the downwelling fluxes, even when averaged over a day, will not be an exact match to the area-averaged value over that larger area, although we might expect that it is an unbiased estimate thereof. The heterogeneity of the surface, for example, fixed variations in albedo, further means that there is a likely systematic difference in the corresponding upwelling fluxes. In this paper we characterize and quantify this spatial sampling problem. We bound the root-mean-square error in the downwelling fluxes by exploiting a second set of surface flux measurements from a site that was run in parallel with the main deployment. The differences in the two sets of fluxes lead us to an upper bound to the sampling uncertainty, and their correlation leads to another which is probably optimistic as it requires certain other conditions to be met. For the upwelling fluxes we use data products from a number of satellite instruments to characterize the relevant heterogeneities and so estimate the systematic effects that arise from the flux measurements having to be taken at a single point. The sampling uncertainties vary with the season, being higher during the monsoon period. We find that the sampling errors for the daily average flux are small for the shortwave irradiance, generally less than 5 W m−2, under relatively clear skies, but these increase to about 10 W m−2 during the monsoon. For the upwelling fluxes, again taking daily averages, systematic errors are of order 10 W m−2 as a result of albedo variability. The uncertainty on the longwave component of the surface radiation budget is smaller than that on the shortwave component, in all conditions, but a bias of 4 W m−2 is calculated to exist in the surface leaving longwave flux.

Perturbation growth at the convective scale for CSIP IOP18

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Met Office Unified Model is run for a case observed during Intensive Observation Period 18 (IOP18) of the Convective Storms Initiation Project (CSIP). The aims are to identify the physical processes that lead to perturbation growth at the convective scale in response to model-state perturbations and to determine their sensitivity to the character of the perturbations. The case is strongly upper-level forced but with detailed mesoscale/convective-scale evolution that is dependent on smaller-scale processes. Potential temperature is perturbed within the boundary layer. The effects on perturbation growth of both the amplitude and typical scalelength of the perturbations are investigated and perturbations are applied either sequentially (every 30 min throughout the simulation) or at specific times. The direct effects (within one timestep) of the perturbations are to generate propagating Lamb and acoustic waves and produce generally small changes in cloud parameters and convective instability. In exceptional cases a perturbation at a specific gridpoint leads to switching of the diagnosed boundary-layer type or discontinuous changes in convective instability, through the generation or removal of a lid. The indirect effects (during the entire simulation) are changes in the intensity and location of precipitation and in the cloud size distribution. Qualitatively different behaviour is found for strong (1K amplitude) and weak (0.01K amplitude) perturbations, with faster growth after sunrise found only for the weaker perturbations. However, the overall perturbation growth (as measured by the root-mean-square error of accumulated precipitation) reaches similar values at saturation, regardless of the perturbation characterisation.

Co-kriging when the soil and ancillary data are not co-located

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Data such as digitized aerial photographs, electrical conductivity and yield are intensive and relatively inexpensive to obtain compared with collecting soil data by sampling. If such ancillary data are co-regionalized with the soil data they should be suitable for co-kriging. The latter requires that information for both variables is co-located at several locations; this is rarely so for soil and ancillary data. To solve this problem, we have derived values for the ancillary variable at the soil sampling locations by averaging the values within a radius of 15 m, taking the nearest-neighbour value, kriging over 5 m blocks, and punctual kriging. The cross-variograms from these data with clay content and also the pseudo cross-variogram were used to co-krige to validation points and the root mean squared errors (RMSEs) were calculated. In general, the data averaged within 15m and the punctually kriged values resulted in more accurate predictions.

Scalability of efficient parallel K-Means

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Clustering is defined as the grouping of similar items in a set, and is an important process within the field of data mining. As the amount of data for various applications continues to increase, in terms of its size and dimensionality, it is necessary to have efficient clustering methods. A popular clustering algorithm is K-Means, which adopts a greedy approach to produce a set of K-clusters with associated centres of mass, and uses a squared error distortion measure to determine convergence. Methods for improving the efficiency of K-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting a more efficient data structure, notably a multi-dimensional binary search tree (KD-Tree) to store either centroids or data points. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient K-Means techniques in parallel computational environments. In this work, we provide a parallel formulation for the KD-Tree based K-Means algorithm and address its load balancing issues.

Metrics for solar wind prediction models: Comparison of empirical, hybrid, and physics-based schemes with 8 years of L1 observations

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Space weather effects on technological systems originate with energy carried from the Sun to the terrestrial environment by the solar wind. In this study, we present results of modeling of solar corona-heliosphere processes to predict solar wind conditions at the L1 Lagrangian point upstream of Earth. In particular we calculate performance metrics for (1) empirical, (2) hybrid empirical/physics-based, and (3) full physics-based coupled corona-heliosphere models over an 8-year period (1995–2002). L1 measurements of the radial solar wind speed are the primary basis for validation of the coronal and heliosphere models studied, though other solar wind parameters are also considered. The models are from the Center for Integrated Space-Weather Modeling (CISM) which has developed a coupled model of the whole Sun-to-Earth system, from the solar photosphere to the terrestrial thermosphere. Simple point-by-point analysis techniques, such as mean-square-error and correlation coefficients, indicate that the empirical coronal-heliosphere model currently gives the best forecast of solar wind speed at 1 AU. A more detailed analysis shows that errors in the physics-based models are predominately the result of small timing offsets to solar wind structures and that the large-scale features of the solar wind are actually well modeled. We suggest that additional “tuning” of the coupling between the coronal and heliosphere models could lead to a significant improvement of their accuracy. Furthermore, we note that the physics-based models accurately capture dynamic effects at solar wind stream interaction regions, such as magnetic field compression, flow deflection, and density buildup, which the empirical scheme cannot.

A kinematically distorted flux rope model for magnetic clouds

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Constant-α force-free magnetic flux rope models have proven to be a valuable first step toward understanding the global context of in situ observations of magnetic clouds. However, cylindrical symmetry is necessarily assumed when using such models, and it is apparent from both observations and modeling that magnetic clouds have highly noncircular cross sections. A number of approaches have been adopted to relax the circular cross section approximation: frequently, the cross-sectional shape is allowed to take an arbitrarily chosen shape (usually elliptical), increasing the number of free parameters that are fit between data and model. While a better “fit” may be achieved in terms of reducing the mean square error between the model and observed magnetic field time series, it is not always clear that this translates to a more accurate reconstruction of the global structure of the magnetic cloud. We develop a new, noncircular cross section flux rope model that is constrained by observations of CMEs/ICMEs and knowledge of the physical processes acting on the magnetic cloud: The magnetic cloud is assumed to initially take the form of a force-free flux rope in the low corona but to be subsequently deformed by a combination of axis-centered self-expansion and heliocentric radial expansion. The resulting analytical solution is validated by fitting to artificial time series produced by numerical MHD simulations of magnetic clouds and shown to accurately reproduce the global structure.

An event-based approach to validating solar wind speed predictions: high-speed enhancements in the Wang-Sheeley-Arge model

Relevância:

80.00% 80.00%

Publicador:

Resumo:

One of the primary goals of the Center for Integrated Space Weather Modeling (CISM) effort is to assess and improve prediction of the solar wind conditions in near‐Earth space, arising from both quasi‐steady and transient structures. We compare 8 years of L1 in situ observations to predictions of the solar wind speed made by the Wang‐Sheeley‐Arge (WSA) empirical model. The mean‐square error (MSE) between the observed and model predictions is used to reach a number of useful conclusions: there is no systematic lag in the WSA predictions, the MSE is found to be highest at solar minimum and lowest during the rise to solar maximum, and the optimal lead time for 1 AU solar wind speed predictions is found to be 3 days. However, MSE is shown to frequently be an inadequate “figure of merit” for assessing solar wind speed predictions. A complementary, event‐based analysis technique is developed in which high‐speed enhancements (HSEs) are systematically selected and associated from observed and model time series. WSA model is validated using comparisons of the number of hit, missed, and false HSEs, along with the timing and speed magnitude errors between the forecasted and observed events. Morphological differences between the different HSE populations are investigated to aid interpretation of the results and improvements to the model. Finally, by defining discrete events in the time series, model predictions from above and below the ecliptic plane can be used to estimate an uncertainty in the predicted HSE arrival times.

Evaluation of the S(T) assimilation method with the Argo dataset

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recent observations from the Argo dataset of temperature and salinity profiles are used to evaluate a series of 3-year data assimilation experiments in a global ice–ocean general circulation model. The experiments are designed to evaluate a new data assimilation system whereby salinity is assimilated along isotherms, S(T ). In addition, the role of a balancing salinity increment to maintain water mass properties is investigated. This balancing increment is found to effectively prevent spurious mixing in tropical regions induced by univariate temperature assimilation, allowing the correction of isotherm geometries without adversely influencing temperature–salinity relationships. In addition, the balancing increment is able to correct a fresh bias associated with a weak subtropical gyre in the North Atlantic using only temperature observations. The S(T ) assimilation method is found to provide an important improvement over conventional depth level assimilation, with lower root-mean-squared forecast errors over the upper 500 m in the tropical Atlantic and Pacific Oceans. An additional set of experiments is performed whereby Argo data are withheld and used for independent evaluation. The most significant improvements from Argo assimilation are found in less well-observed regions (Indian, South Atlantic and South Pacific Oceans). When Argo salinity data are assimilated in addition to temperature, improvements to modelled temperature fields are obtained due to corrections to model density gradients and the resulting circulation. It is found that observations from the Argo array provide an invaluable tool for both correcting modelled water mass properties through data assimilation and for evaluating the assimilation methods themselves.

Flux comparison of Eulerian and Lagrangian estimates of Agulhas leakage: A case study using a numerical model

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Estimating the magnitude of Agulhas leakage, the volume flux of water from the Indian to the Atlantic Ocean, is difficult because of the presence of other circulation systems in the Agulhas region. Indian Ocean water in the Atlantic Ocean is vigorously mixed and diluted in the Cape Basin. Eulerian integration methods, where the velocity field perpendicular to a section is integrated to yield a flux, have to be calibrated so that only the flux by Agulhas leakage is sampled. Two Eulerian methods for estimating the magnitude of Agulhas leakage are tested within a high-resolution two-way nested model with the goal to devise a mooring-based measurement strategy. At the GoodHope line, a section halfway through the Cape Basin, the integrated velocity perpendicular to that line is compared to the magnitude of Agulhas leakage as determined from the transport carried by numerical Lagrangian floats. In the first method, integration is limited to the flux of water warmer and more saline than specific threshold values. These threshold values are determined by maximizing the correlation with the float-determined time series. By using the threshold values, approximately half of the leakage can directly be measured. The total amount of Agulhas leakage can be estimated using a linear regression, within a 90% confidence band of 12 Sv. In the second method, a subregion of the GoodHope line is sought so that integration over that subregion yields an Eulerian flux as close to the float-determined leakage as possible. It appears that when integration is limited within the model to the upper 300 m of the water column within 900 km of the African coast the time series have the smallest root-mean-square difference. This method yields a root-mean-square error of only 5.2 Sv but the 90% confidence band of the estimate is 20 Sv. It is concluded that the optimum thermohaline threshold method leads to more accurate estimates even though the directly measured transport is a factor of two lower than the actual magnitude of Agulhas leakage in this model.

Design and optimisation of a large-area process-based model for annual crops

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The formulation of a new process-based crop model, the general large-area model (GLAM) for annual crops is presented. The model has been designed to operate on spatial scales commensurate with those of global and regional climate models. It aims to simulate the impact of climate on crop yield. Procedures for model parameter determination and optimisation are described, and demonstrated for the prediction of groundnut (i.e. peanut; Arachis hypogaea L.) yields across India for the period 1966-1989. Optimal parameters (e.g. extinction coefficient, transpiration efficiency, rate of change of harvest index) were stable over space and time, provided the estimate of the yield technology trend was based on the full 24-year period. The model has two location-specific parameters, the planting date, and the yield gap parameter. The latter varies spatially and is determined by calibration. The optimal value varies slightly when different input data are used. The model was tested using a historical data set on a 2.5degrees x 2.5degrees grid to simulate yields. Three sites are examined in detail-grid cells from Gujarat in the west, Andhra Pradesh towards the south, and Uttar Pradesh in the north. Agreement between observed and modelled yield was variable, with correlation coefficients of 0.74, 0.42 and 0, respectively. Skill was highest where the climate signal was greatest, and correlations were comparable to or greater than correlations with seasonal mean rainfall. Yields from all 35 cells were aggregated to simulate all-India yield. The correlation coefficient between observed and simulated yields was 0.76, and the root mean square error was 8.4% of the mean yield. The model can be easily extended to any annual crop for the investigation of the impacts of climate variability (or change) on crop yield over large areas. (C) 2004 Elsevier B.V. All rights reserved.

Simulation of crop yields using ERA-40: Limits to skill and nonstationarity in weather-yield relationships

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Reanalysis data provide an excellent test bed for impacts prediction systems. because they represent an upper limit on the skill of climate models. Indian groundnut (Arachis hypogaea L.) yields have been simulated using the General Large-Area Model (GLAM) for annual crops and the European Centre for Medium-Range Weather Forecasts (ECMWF) 40-yr reanalysis (ERA-40). The ability of ERA-40 to represent the Indian summer monsoon has been examined. The ability of GLAM. when driven with daily ERA-40 data, to model both observed yields and observed relationships between subseasonal weather and yield has been assessed. Mean yields "were simulated well across much of India. Correlations between observed and modeled yields, where these are significant. are comparable to correlations between observed yields and ERA-40 rainfall. Uncertainties due to the input planting window, crop duration, and weather data have been examined. A reduction in the root-mean-square error of simulated yields was achieved by applying bias correction techniques to the precipitation. The stability of the relationship between weather and yield over time has been examined. Weather-yield correlations vary on decadal time scales. and this has direct implications for the accuracy of yield simulations. Analysis of the skewness of both detrended yields and precipitation suggest that nonclimatic factors are partly responsible for this nonstationarity. Evidence from other studies, including data on cereal and pulse yields, indicates that this result is not particular to groundnut yield. The detection and modeling of nonstationary weather-yield relationships emerges from this study as an important part of the process of understanding and predicting the impacts of climate variability and change on crop yields.

«
1
2
...
23
24
25
26
27
28
29
...
62
63
»