876 resultados para variance analysis
Resumo:
Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.
Resumo:
Atmospheric surface boundary layer parameters vary anomalously in response to the occurrence of annular solar eclipse on 15th January 2010 over Cochin. It was the longest annular solar eclipse occurred over South India with high intensity. As it occurred during the noon hours, it is considered to be much more significant because of its effects in all the regions of atmosphere including ionosphere. Since the insolation is the main driving factor responsible for the anomalous changes occurred in the surface layer due to annular solar eclipse, occurred on 15th January 2010, that played very important role in understanding dynamics of the atmosphere during the eclipse period because of its coincidence with the noon time. The Sonic anemometer is able to give data of zonal, meridional and vertical wind as well as the air temperature at a temporal resolution of 1 s. Different surface boundary layer parameters and turbulent fluxes were computed by the application of eddy correlation technique using the high resolution station data. The surface boundary layer parameters that are computed using the sonic anemometer data during the period are momentum flux, sensible heat flux, turbulent kinetic energy, frictional velocity (u*), variance of temperature, variances of u, v and w wind. In order to compare the results, a control run has been done using the data of previous day as well as next day. It is noted that over the specified time period of annular solar eclipse, all the above stated surface boundary layer parameters vary anomalously when compared with the control run. From the observations we could note that momentum flux was 0.1 Nm 2 instead of the mean value 0.2 Nm-2 when there was eclipse. Sensible heat flux anomalously decreases to 50 Nm 2 instead of the mean value 200 Nm 2 at the time of solar eclipse. The turbulent kinetic energy decreases to 0.2 m2s 2 from the mean value 1 m2s 2. The frictional velocity value decreases to 0.05 ms 1 instead of the mean value 0.2 ms 1. The present study aimed at understanding the dynamics of surface layer in response to the annular solar eclipse over a tropical coastal station, occurred during the noon hours. Key words: annular solar eclipse, surface boundary layer, sonic anemometer
Resumo:
Metal matrix composites (MMC) having aluminium (Al) in the matrix phase and silicon carbide particles (SiCp) in reinforcement phase, ie Al‐SiCp type MMC, have gained popularity in the re‐cent past. In this competitive age, manufacturing industries strive to produce superior quality products at reasonable price. This is possible by achieving higher productivity while performing machining at optimum combinations of process variables. The low weight and high strength MMC are found suitable for variety of components
Resumo:
The classical methods of analysing time series by Box-Jenkins approach assume that the observed series uctuates around changing levels with constant variance. That is, the time series is assumed to be of homoscedastic nature. However, the nancial time series exhibits the presence of heteroscedasticity in the sense that, it possesses non-constant conditional variance given the past observations. So, the analysis of nancial time series, requires the modelling of such variances, which may depend on some time dependent factors or its own past values. This lead to introduction of several classes of models to study the behaviour of nancial time series. See Taylor (1986), Tsay (2005), Rachev et al. (2007). The class of models, used to describe the evolution of conditional variances is referred to as stochastic volatility modelsThe stochastic models available to analyse the conditional variances, are based on either normal or log-normal distributions. One of the objectives of the present study is to explore the possibility of employing some non-Gaussian distributions to model the volatility sequences and then study the behaviour of the resulting return series. This lead us to work on the related problem of statistical inference, which is the main contribution of the thesis
Resumo:
Hydrogeological research usually includes some statistical studies devised to elucidate mean background state, characterise relationships among different hydrochemical parameters, and show the influence of human activities. These goals are achieved either by means of a statistical approach or by mixing models between end-members. Compositional data analysis has proved to be effective with the first approach, but there is no commonly accepted solution to the end-member problem in a compositional framework. We present here a possible solution based on factor analysis of compositions illustrated with a case study. We find two factors on the compositional bi-plot fitting two non-centered orthogonal axes to the most representative variables. Each one of these axes defines a subcomposition, grouping those variables that lay nearest to it. With each subcomposition a log-contrast is computed and rewritten as an equilibrium equation. These two factors can be interpreted as the isometric log-ratio coordinates (ilr) of three hidden components, that can be plotted in a ternary diagram. These hidden components might be interpreted as end-members. We have analysed 14 molarities in 31 sampling stations all along the Llobregat River and its tributaries, with a monthly measure during two years. We have obtained a bi-plot with a 57% of explained total variance, from which we have extracted two factors: factor G, reflecting geological background enhanced by potash mining; and factor A, essentially controlled by urban and/or farming wastewater. Graphical representation of these two factors allows us to identify three extreme samples, corresponding to pristine waters, potash mining influence and urban sewage influence. To confirm this, we have available analysis of diffused and widespread point sources identified in the area: springs, potash mining lixiviates, sewage, and fertilisers. Each one of these sources shows a clear link with one of the extreme samples, except fertilisers due to the heterogeneity of their composition. This approach is a useful tool to distinguish end-members, and characterise them, an issue generally difficult to solve. It is worth note that the end-member composition cannot be fully estimated but only characterised through log-ratio relationships among components. Moreover, the influence of each endmember in a given sample must be evaluated in relative terms of the other samples. These limitations are intrinsic to the relative nature of compositional data
Resumo:
Models of the dynamics of nitrogen in soil (soil-N) can be used to aid the fertilizer management of a crop. The predictions of soil-N models can be validated by comparison with observed data. Validation generally involves calculating non-spatial statistics of the observations and predictions, such as their means, their mean squared-difference, and their correlation. However, when the model predictions are spatially distributed across a landscape the model requires validation with spatial statistics. There are three reasons for this: (i) the model may be more or less successful at reproducing the variance of the observations at different spatial scales; (ii) the correlation of the predictions with the observations may be different at different spatial scales; (iii) the spatial pattern of model error may be informative. In this study we used a model, parameterized with spatially variable input information about the soil, to predict the mineral-N content of soil in an arable field, and compared the results with observed data. We validated the performance of the N model spatially with a linear mixed model of the observations and model predictions, estimated by residual maximum likelihood. This novel approach allowed us to describe the joint variation of the observations and predictions as: (i) independent random variation that occurred at a fine spatial scale; (ii) correlated random variation that occurred at a coarse spatial scale; (iii) systematic variation associated with a spatial trend. The linear mixed model revealed that, in general, the performance of the N model changed depending on the spatial scale of interest. At the scales associated with random variation, the N model underestimated the variance of the observations, and the predictions were correlated poorly with the observations. At the scale of the trend, the predictions and observations shared a common surface. The spatial pattern of the error of the N model suggested that the observations were affected by the local soil condition, but this was not accounted for by the N model. In summary, the N model would be well-suited to field-scale management of soil nitrogen, but suited poorly to management at finer spatial scales. This information was not apparent with a non-spatial validation. (c),2007 Elsevier B.V. All rights reserved.
Resumo:
Genetic parameters and breeding values for dairy cow fertility were estimated from 62 443 lactation records. Two-trait analysis of fertility and milk yield was investigated as a method to estimate fertility breeding values when culling or selection based on milk yield in early lactation determines presence or absence of fertility observations in later lactations. Fertility traits were calving interval, intervals from calving to first service, calving to conception and first to last service, conception success to first service and number of services per conception. Milk production traits were 305-day milk, fat and protein yield. For fertility traits, range of estimates of heritability (h(2)) was 0.012 to 0.028 and of permanent environmental variance (c(2)) was 0.016 to 0.032. Genetic correlations (r(g)) among fertility traits were generally high ( > 0.70). Genetic correlations of fertility with milk production traits were unfavourable (range -0.11 to 0.46). Single and two-trait analyses of fertility were compared using the same data set. The estimates of h(2) and c(2) were similar for two types of analyses. However, there were differences between estimated breeding values and rankings for the same trait from single versus multi-trait analyses. The range for rank correlation was 0.69-0.83 for all animals in the pedigree and 0.89-0.96 for sires with more than 25 daughters. As single-trait method is biased due to selection on milk yield, a multi-trait evaluation of fertility with milk yield is recommended. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
Objectives: To assess the potential source of variation that surgeon may add to patient outcome in a clinical trial of surgical procedures. Methods: Two large (n = 1380) parallel multicentre randomized surgical trials were undertaken to compare laparoscopically assisted hysterectomy with conventional methods of abdominal and vaginal hysterectomy; involving 43 surgeons. The primary end point of the trial was the occurrence of at least one major complication. Patients were nested within surgeons giving the data set a hierarchical structure. A total of 10% of patients had at least one major complication, that is, a sparse binary outcome variable. A linear mixed logistic regression model (with logit link function) was used to model the probability of a major complication, with surgeon fitted as a random effect. Models were fitted using the method of maximum likelihood in SAS((R)). Results: There were many convergence problems. These were resolved using a variety of approaches including; treating all effects as fixed for the initial model building; modelling the variance of a parameter on a logarithmic scale and centring of continuous covariates. The initial model building process indicated no significant 'type of operation' across surgeon interaction effect in either trial, the 'type of operation' term was highly significant in the abdominal trial, and the 'surgeon' term was not significant in either trial. Conclusions: The analysis did not find a surgeon effect but it is difficult to conclude that there was not a difference between surgeons. The statistical test may have lacked sufficient power, the variance estimates were small with large standard errors, indicating that the precision of the variance estimates may be questionable.
Resumo:
The analysis-error variance of a 3D-FGAT assimilation is examined analytically using a simple scalar equation. It is shown that the analysis-error variance may be greater than the error variances of the inputs. The results are illustrated numerically with a scalar example and a shallow-water model.
Resumo:
Aims: We conducted a systematic review of studies examining relationships between measures of beverage alcohol tax or price levels and alcohol sales or self-reported drinking. A total of 112 studies of alcohol tax or price effects were found, containing 1003 estimates of the tax/price–consumption relationship. Design: Studies included analyses of alternative outcome measures, varying subgroups of the population, several statistical models, and using different units of analysis. Multiple estimates were coded from each study, along with numerous study characteristics. Using reported estimates, standard errors, t-ratios, sample sizes and other statistics, we calculated the partial correlation for the relationship between alcohol price or tax and sales or drinking measures for each major model or subgroup reported within each study. Random-effects models were used to combine studies for inverse variance weighted overall estimates of the magnitude and significance of the relationship between alcohol tax/price and drinking. Findings: Simple means of reported elasticities are -0.46 for beer, -0.69 for wine and -0.80 for spirits. Meta-analytical results document the highly significant relationships (P < 0.001) between alcohol tax or price measures and indices of sales or consumption of alcohol (aggregate-level r = -0.17 for beer, -0.30 for wine, -0.29 for spirits and -0.44 for total alcohol). Price/tax also affects heavy drinking significantly (mean reported elasticity = -0.28, individual-level r = -0.01, P < 0.01), but the magnitude of effect is smaller than effects on overall drinking. Conclusions: A large literature establishes that beverage alcohol prices and taxes are related inversely to drinking. Effects are large compared to other prevention policies and programs. Public policies that raise prices of alcohol are an effective means to reduce drinking.
Resumo:
Decision theory is the study of models of judgement involved in, and leading to, deliberate and (usually) rational choice. In real estate investment there are normative models for the allocation of assets. These asset allocation models suggest an optimum allocation between the respective asset classes based on the investors’ judgements of performance and risk. Real estate is selected, as other assets, on the basis of some criteria, e.g. commonly its marginal contribution to the production of a mean variance efficient multi asset portfolio, subject to the investor’s objectives and capital rationing constraints. However, decisions are made relative to current expectations and current business constraints. Whilst a decision maker may believe in the required optimum exposure levels as dictated by an asset allocation model, the final decision may/will be influenced by factors outside the parameters of the mathematical model. This paper discusses investors' perceptions and attitudes toward real estate and highlights the important difference between theoretical exposure levels and pragmatic business considerations. It develops a model to identify “soft” parameters in decision making which will influence the optimal allocation for that asset class. This “soft” information may relate to behavioural issues such as the tendency to mirror competitors; a desire to meet weight of money objectives; a desire to retain the status quo and many other non-financial considerations. The paper aims to establish the place of property in multi asset portfolios in the UK and examine the asset allocation process in practice, with a view to understanding the decision making process and to look at investors’ perceptions based on an historic analysis of market expectation; a comparison with historic data and an analysis of actual performance.
Resumo:
To achieve CO2 emissions reductions the UK Building Regulations require developers of new residential buildings to calculate expected CO2 emissions arising from their energy consumption using a methodology such as Standard Assessment Procedure (SAP 2005) or, more recently SAP 2009. SAP encompasses all domestic heat consumption and a limited proportion of the electricity consumption. However, these calculations are rarely verified with real energy consumption and related CO2 emissions. This paper presents the results of an analysis based on weekly head demand data for more than 200 individual flats. The data is collected from recently built residential development connected to a district heating network. A methodology for separating out the domestic hot water use (DHW) and space heating demand (SH) has been developed and compares measured values to the demand calculated using SAP 2005 and 2009 methodologies. The analysis shows also the variance in DHW and SH consumption between both size of the flats and tenure (privately owned or housing association). Evaluation of the space heating consumption includes also an estimation of the heating degree day (HDD) base temperature for each block of flats and its comparison to the average base temperature calculated using the SAP 2005 methodology.
Resumo:
Modern Portfolio Theory (MPT) has been advocated as a more rational approach to the construction of real estate portfolios. The application of MPT can now be achieved with relative ease using the powerful facilities of modern spreadsheet, and does not necessarily need specialist software. This capability is to be found in the use of an add-in Tool now found in several spreadsheets, called an Optimiser or Solver. The value in using this kind of more sophisticated analysis feature of spreadsheets is increasingly difficult to ignore. This paper examines the use of the spreadsheet Optimiser in handling asset allocation problems. Using the Markowitz Mean-Variance approach, the paper introduces the necessary calculations, and shows, by means of an elementary example implemented in Microsoft's Excel, how the Optimiser may be used. Emphasis is placed on understanding the inputs and outputs from the portfolio optimisation process, and the danger of treating the Optimiser as a Black Box is discussed.
Resumo:
In this paper, we present a polynomial-based noise variance estimator for multiple-input multiple-output single-carrier block transmission (MIMO-SCBT) systems. It is shown that the optimal pilots for noise variance estimation satisfy the same condition as that for channel estimation. Theoretical analysis indicates that the proposed estimator is statistically more efficient than the conventional sum of squared residuals (SSR) based estimator. Furthermore, we obtain an efficient implementation of the estimator by exploiting its special structure. Numerical results confirm our theoretical analysis.
Resumo:
The paper analyses the impact of a priori determinants of biosecurity behaviour of farmers in Great Britain. We use a dataset collected through a stratified telephone survey of 900 cattle and sheep farmers in Great Britain (400 in England and a further 250 in Wales and Scotland respectively) which took place between 25 March 2010 and 18 June 2010. The survey was stratified by farm type, farm size and region. To test the influence of a priori determinants on biosecurity behaviour we used a behavioural economics method, structural equation modelling (SEM) with observed and latent variables. SEM is a statistical technique for testing and estimating causal relationships amongst variables, some of which may be latent using a combination of statistical data and qualitative causal assumptions. Thirteen latent variables were identified and extracted, expressing the behaviour and the underlying determining factors. The variables were: experience, economic factors, organic certification of farm, membership in a cattle/sheep health scheme, perceived usefulness of biosecurity information sources, knowledge about biosecurity measures, perceived importance of specific biosecurity strategies, perceived effect (on farm business in the past five years) of welfare/health regulation, perceived effect of severe outbreaks of animal diseases, attitudes towards livestock biosecurity, attitudes towards animal welfare, influence on decision to apply biosecurity measures and biosecurity behaviour. The SEM model applied on the Great Britain sample has an adequate fit according to the measures of absolute, incremental and parsimonious fit. The results suggest that farmers’ perceived importance of specific biosecurity strategies, organic certification of farm, knowledge about biosecurity measures, attitudes towards animal welfare, perceived usefulness of biosecurity information sources, perceived effect on business during the past five years of severe outbreaks of animal diseases, membership in a cattle/sheep health scheme, attitudes towards livestock biosecurity, influence on decision to apply biosecurity measures, experience and economic factors are significantly influencing behaviour (overall explaining 64% of the variance in behaviour).