912 resultados para mean squared residue
Resumo:
Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.
Resumo:
A large number of ridge regression estimators have been proposed and used with little knowledge of their true distributions. Because of this lack of knowledge, these estimators cannot be used to test hypotheses or to form confidence intervals.^ This paper presents a basic technique for deriving the exact distribution functions for a class of generalized ridge estimators. The technique is applied to five prominent generalized ridge estimators. Graphs of the resulting distribution functions are presented. The actual behavior of these estimators is found to be considerably different than the behavior which is generally assumed for ridge estimators.^ This paper also uses the derived distributions to examine the mean squared error properties of the estimators. A technique for developing confidence intervals based on the generalized ridge estimators is also presented. ^
Resumo:
Biclustering is simultaneous clustering of both rows and columns of a data matrix. A measure called Mean Squared Residue (MSR) is used to simultaneously evaluate the coherence of rows and columns within a submatrix. In this paper a novel algorithm is developed for biclustering gene expression data using the newly introduced concept of MSR difference threshold. In the first step high quality bicluster seeds are generated using K-Means clustering algorithm. Then more genes and conditions (node) are added to the bicluster. Before adding a node the MSR X of the bicluster is calculated. After adding the node again the MSR Y is calculated. The added node is deleted if Y minus X is greater than MSR difference threshold or if Y is greater than MSR threshold which depends on the dataset. The MSR difference threshold is different for gene list and condition list and it depends on the dataset also. Proper values should be identified through experimentation in order to obtain biclusters of high quality. The results obtained on bench mark dataset clearly indicate that this algorithm is better than many of the existing biclustering algorithms
Resumo:
Despite the large applicability of the field capacity (FC) concept in hydrology and engineering, it presents various ambiguities and inconsistencies due to a lack of methodological procedure standardization. Experimental field and laboratory protocols taken from the literature were used in this study to determine the value of FC for different depths in 29 soil profiles, totaling 209 soil samples. The volumetric water content (θ) values were also determined at three suction values (6 kPa, 10 kPa, 33 kPa), along with bulk density (BD), texture (T) and organic matter content (OM). The protocols were devised based on the water processes involved in the FC concept aiming at minimizing hydraulic inconsistencies and procedural difficulty while maintaining the practical meaning of the concept. A high correlation between FC and θ(6 kPa) allowed the development of a pedotransfer function (Equation 3) quadratic for θ(6 kPa), resulting in an accurate and nearly bias-free calculation of FC for the four database geographic areas, with a global root mean squared residue (RMSR) of 0.026 m3·m-3. At the individual soil profile scale, the maximum RMSR was only 0.040 m3·m-3. The BD, T and OM data were generally of a low predicting quality regarding FC when not accompanied by the moisture variables. As all the FC values were obtained by the same experimental protocol and as the predicting quality of Equation 3 was clearly better than that of the classical method, which considers FC equal to θ(6), θ(10) or θ(33), we recommend using Equation 3 rather than the classical method, as well as the protocol presented here, to determine in-situ FC.
Resumo:
We report an experimental study of a new type of turbulent flow that is driven purely by buoyancy. The flow is due to an unstable density difference, created using brine and water, across the ends of a long (length/diameter = 9) vertical pipe. The Schmidt number Sc is 670, and the Rayleigh number (Ra) based on the density gradient and diameter is about 10(8). Under these conditions the convection is turbulent, and the time-averaged velocity at any point is `zero'. The Reynolds number based on the Taylor microscale, Re-lambda, is about 65. The pipe is long enough for there to be an axially homogeneous region, with a linear density gradient, about 6-7 diameters long in the midlength of the pipe. In the absence of a mean flow and, therefore, mean shear, turbulence is sustained just by buoyancy. The flow can be thus considered to be an axially homogeneous turbulent natural convection driven by a constant (unstable) density gradient. We characterize the flow using flow visualization and particle image velocimetry (PIV). Measurements show that the mean velocities and the Reynolds shear stresses are zero across the cross-section; the root mean squared (r.m.s.) of the vertical velocity is larger than those of the lateral velocities (by about one and half times at the pipe axis). We identify some features of the turbulent flow using velocity correlation maps and the probability density functions of velocities and velocity differences. The flow away from the wall, affected mainly by buoyancy, consists of vertically moving fluid masses continually colliding and interacting, while the flow near the wall appears similar to that in wall-bound shear-free turbulence. The turbulence is anisotropic, with the anisotropy increasing to large values as the wall is approached. A mixing length model with the diameter of the pipe as the length scale predicts well the scalings for velocity fluctuations and the flux. This model implies that the Nusselt number would scale as (RaSc1/2)-Sc-1/2, and the Reynolds number would scale as (RaSc-1/2)-Sc-1/2. The velocity and the flux measurements appear to be consistent with the Ra-1/2 scaling, although it must be pointed out that the Rayleigh number range was less than 10. The Schmidt number was not varied to check the Sc scaling. The fluxes and the Reynolds numbers obtained in the present configuration are Much higher compared to what would be obtained in Rayleigh-Benard (R-B) convection for similar density differences.
Resumo:
Time evolution of mean-squared displacement based on molecular dynamics for a variety of adsorbate-zeolite systems is reported. Transition from ballistic to diffusive behavior is observed for all the systems. The transition times are found to be system dependent and show different types of dependence on temperature. Model calculations on a one-dimensional system are carried out which show that the characteristic length and transition times are dependent on the distance between the barriers, their heights, and temperature. In light of these findings, it is shown that it is possible to obtain valuable information about the average potential energy surface sampled under specific external conditions.
Resumo:
We report an experimental study of a new type of turbulent flow that is driven purely by buoyancy. The flow is due to an unstable density difference, created using brine and water, across the ends of a long (length/diameter=9) vertical pipe. The Schmidt number Sc is 670, and the Rayleigh number (Ra) based on the density gradient and diameter is about 108. Under these conditions the convection is turbulent, and the time-averaged velocity at any point is ‘zero’. The Reynolds number based on the Taylor microscale, Reλ, is about 65. The pipe is long enough for there to be an axially homogeneous region, with a linear density gradient, about 6–7 diameters long in the midlength of the pipe. In the absence of a mean flow and, therefore, mean shear, turbulence is sustained just by buoyancy. The flow can be thus considered to be an axially homogeneous turbulent natural convection driven by a constant (unstable) density gradient. We characterize the flow using flow visualization and particle image velocimetry (PIV). Measurements show that the mean velocities and the Reynolds shear stresses are zero across the cross-section; the root mean squared (r.m.s.) of the vertical velocity is larger than those of the lateral velocities (by about one and half times at the pipe axis). We identify some features of the turbulent flow using velocity correlation maps and the probability density functions of velocities and velocity differences. The flow away from the wall, affected mainly by buoyancy, consists of vertically moving fluid masses continually colliding and interacting, while the flow near the wall appears similar to that in wall-bound shear-free turbulence. The turbulence is anisotropic, with the anisotropy increasing to large values as the wall is approached. A mixing length model with the diameter of the pipe as the length scale predicts well the scalings for velocity fluctuations and the flux. This model implies that the Nusselt number would scale as Ra1/2Sc1/2, and the Reynolds number would scale as Ra1/2Sc−1/2. The velocity and the flux measurements appear to be consistent with the Ra1/2 scaling, although it must be pointed out that the Rayleigh number range was less than 10. The Schmidt number was not varied to check the Sc scaling. The fluxes and the Reynolds numbers obtained in the present configuration are much higher compared to what would be obtained in Rayleigh–Bénard (R–B) convection for similar density differences.
Resumo:
Adaptive filter is a primary method to filter Electrocardiogram (ECG), because it does not need the signal statistical characteristics. In this paper, an adaptive filtering technique for denoising the ECG based on Genetic Algorithm (GA) tuned Sign-Data Least Mean Square (SD-LMS) algorithm is proposed. This technique minimizes the mean-squared error between the primary input, which is a noisy ECG, and a reference input which can be either noise that is correlated in some way with the noise in the primary input or a signal that is correlated only with ECG in the primary input. Noise is used as the reference signal in this work. The algorithm was applied to the records from the MIT -BIH Arrhythmia database for removing the baseline wander and 60Hz power line interference. The proposed algorithm gave an average signal to noise ratio improvement of 10.75 dB for baseline wander and 24.26 dB for power line interference which is better than the previous reported works
Resumo:
The Gram-Schmidt (GS) orthogonalisation procedure has been used to improve the convergence speed of least mean square (LMS) adaptive code-division multiple-access (CDMA) detectors. However, this algorithm updates two sets of parameters, namely the GS transform coefficients and the tap weights, simultaneously. Because of the additional adaptation noise introduced by the former, it is impossible to achieve the same performance as the ideal orthogonalised LMS filter, unlike the result implied in an earlier paper. The authors provide a lower bound on the minimum achievable mean squared error (MSE) as a function of the forgetting factor λ used in finding the GS transform coefficients, and propose a variable-λ algorithm to balance the conflicting requirements of good tracking and low misadjustment.
Resumo:
The detection and diagnosis of faults, ie., find out how , where and why failures occur is an important area of study since man came to be replaced by machines. However, no technique studied to date can solve definitively the problem. Differences in dynamic systems, whether linear, nonlinear, variant or invariant in time, with physical or analytical redundancy, hamper research in order to obtain a unique solution . In this paper, a technique for fault detection and diagnosis (FDD) will be presented in dynamic systems using state observers in conjunction with other tools in order to create a hybrid FDD. A modified state observer is used to create a residue that allows also the detection and diagnosis of faults. A bank of faults signatures will be created using statistical tools and finally an approach using mean squared error ( MSE ) will assist in the study of the behavior of fault diagnosis even in the presence of noise . This methodology is then applied to an educational plant with coupled tanks and other with industrial instrumentation to validate the system.
Resumo:
Retinal image properties such as contrast and spatial frequency play important roles in the development of normal vision. For example, visual environments comprised solely of low contrast and/or low spatial frequencies induce myopia. The visual image is processed by the retina and it then locally controls eye growth. In terms of the retinal neurotransmitters that link visual stimuli to eye growth, there is strong evidence to suggest involvement of the retinal dopamine (DA) system. For example, effectively increasing retinal DA levels by using DA agonists can suppress the development of form-deprivation myopia (FDM). However, whether visual feedback controls eye growth by modulating retinal DA release, and/or some other factors, is still being elucidated. This thesis is chiefly concerned with the relationship between the dopaminergic system and retinal image properties in eye growth control. More specifically, whether the amount of retinal DA release reduces as the complexity of the image degrades was determined. For example, we investigated whether the level of retinal DA release decreased as image contrast decreased. In addition, the effects of spatial frequency, spatial energy distribution slope, and spatial phase on retinal DA release and eye growth were examined. When chicks were 8-days-old, a cone-lens imaging system was applied monocularly (+30 D, 3.3 cm cone). A short-term treatment period (6 hr) and a longer-term treatment period (4.5 days) were used. The short-term treatment tests for the acute reduction in DA release by the visual stimulus, as is seen with diffusers and lenses, whereas the 4.5 day point tests for reduction in DA release after more prolonged exposure to the visual stimulus. In the contrast study, 1.35 cyc/deg square wave grating targets of 95%, 67%, 45%, 12% or 4.2% contrast were used. Blank (0% contrast) targets were included for comparison. In the spatial frequency study, both sine and square wave grating targets with either 0.017 cyc/deg and 0.13 cyc/deg fundamental spatial frequencies and 95% contrast were used. In the spectral slope study, 30% root-mean-squared (RMS) contrast fractal noise targets with spectral fall-off of 1/f0.5, 1/f and 1/f2 were used. In the spatial alignment study, a structured Maltese cross (MX) target, a structured circular patterned (C) target and the scrambled versions of these two targets (SMX and SC) were used. Each treatment group comprised 6 chicks for ocular biometry (refraction and ocular dimension measurement) and 4 for analysis of retinal DA release. Vitreal dihydroxyphenylacetic acid (DOPAC) was analysed through ion-paired reversed phase high performance liquid chromatography with electrochemical detection (HPLC-ED), as a measure of retinal DA release. For the comparison between retinal DA release and eye growth, large reductions in retinal DA release possibly due to the decreased light level inside the cone-lens imaging system were observed across all treated eyes while only those exposed to low contrast, low spatial frequency sine wave grating, 1/f2, C and SC targets had myopic shifts in refraction. Amongst these treatment groups, no acute effect was observed and longer-term effects were only found in the low contrast and 1/f2 groups. These findings suggest that retinal DA release does not causally link visual stimuli properties to eye growth, and these target induced changes in refractive development are not dependent on the level of retinal DA release. Retinal dopaminergic cells might be affected indirectly via other retinal cells that immediately respond to changes in the image contrast of the retinal image.
Resumo:
Purpose: To investigate the effect of orthokeratology on peripheral aberrations in two myopic volunteers. Methods: The subjects wore reverse geometry orthokeratology lenses overnight and were monitored for 2 weeks of wear. They underwent corneal topography, peripheral refraction (out to ±34° along the horizontal visual field) and peripheral aberration measurements across the 42° × 32° central visual field using a modified Hartmann-Shack aberrometer. Results: Spherical equivalent refraction was corrected for the central 25° of the visual fields beyond which it gradually returned to its preorthokeratology values. There were increases in axial coma, spherical aberration, higher order root mean square aberrations, and total root-mean-squared aberrations (excluding defocus). The rates of change of vertical and horizontal coma across the field changed in sign. Total root mean square aberrations showed a quadratic rate of change across the visual field which was greater subsequent to orthokeratology. Conclusion: Although orthokeratology can correct peripheral relative hypermetropia it induces dramatic increases in higher-order aberrations across the field
Resumo:
Biased estimation has the advantage of reducing the mean squared error (MSE) of an estimator. The question of interest is how biased estimation affects model selection. In this paper, we introduce biased estimation to a range of model selection criteria. Specifically, we analyze the performance of the minimum description length (MDL) criterion based on biased and unbiased estimation and compare it against modern model selection criteria such as Kay's conditional model order estimator (CME), the bootstrap and the more recently proposed hook-and-loop resampling based model selection. The advantages and limitations of the considered techniques are discussed. The results indicate that, in some cases, biased estimators can slightly improve the selection of the correct model. We also give an example for which the CME with an unbiased estimator fails, but could regain its power when a biased estimator is used.
Resumo:
The main objective of this PhD was to further develop Bayesian spatio-temporal models (specifically the Conditional Autoregressive (CAR) class of models), for the analysis of sparse disease outcomes such as birth defects. The motivation for the thesis arose from problems encountered when analyzing a large birth defect registry in New South Wales. The specific components and related research objectives of the thesis were developed from gaps in the literature on current formulations of the CAR model, and health service planning requirements. Data from a large probabilistically-linked database from 1990 to 2004, consisting of fields from two separate registries: the Birth Defect Registry (BDR) and Midwives Data Collection (MDC) were used in the analyses in this thesis. The main objective was split into smaller goals. The first goal was to determine how the specification of the neighbourhood weight matrix will affect the smoothing properties of the CAR model, and this is the focus of chapter 6. Secondly, I hoped to evaluate the usefulness of incorporating a zero-inflated Poisson (ZIP) component as well as a shared-component model in terms of modeling a sparse outcome, and this is carried out in chapter 7. The third goal was to identify optimal sampling and sample size schemes designed to select individual level data for a hybrid ecological spatial model, and this is done in chapter 8. Finally, I wanted to put together the earlier improvements to the CAR model, and along with demographic projections, provide forecasts for birth defects at the SLA level. Chapter 9 describes how this is done. For the first objective, I examined a series of neighbourhood weight matrices, and showed how smoothing the relative risk estimates according to similarity by an important covariate (i.e. maternal age) helped improve the model’s ability to recover the underlying risk, as compared to the traditional adjacency (specifically the Queen) method of applying weights. Next, to address the sparseness and excess zeros commonly encountered in the analysis of rare outcomes such as birth defects, I compared a few models, including an extension of the usual Poisson model to encompass excess zeros in the data. This was achieved via a mixture model, which also encompassed the shared component model to improve on the estimation of sparse counts through borrowing strength across a shared component (e.g. latent risk factor/s) with the referent outcome (caesarean section was used in this example). Using the Deviance Information Criteria (DIC), I showed how the proposed model performed better than the usual models, but only when both outcomes shared a strong spatial correlation. The next objective involved identifying the optimal sampling and sample size strategy for incorporating individual-level data with areal covariates in a hybrid study design. I performed extensive simulation studies, evaluating thirteen different sampling schemes along with variations in sample size. This was done in the context of an ecological regression model that incorporated spatial correlation in the outcomes, as well as accommodating both individual and areal measures of covariates. Using the Average Mean Squared Error (AMSE), I showed how a simple random sample of 20% of the SLAs, followed by selecting all cases in the SLAs chosen, along with an equal number of controls, provided the lowest AMSE. The final objective involved combining the improved spatio-temporal CAR model with population (i.e. women) forecasts, to provide 30-year annual estimates of birth defects at the Statistical Local Area (SLA) level in New South Wales, Australia. The projections were illustrated using sixteen different SLAs, representing the various areal measures of socio-economic status and remoteness. A sensitivity analysis of the assumptions used in the projection was also undertaken. By the end of the thesis, I will show how challenges in the spatial analysis of rare diseases such as birth defects can be addressed, by specifically formulating the neighbourhood weight matrix to smooth according to a key covariate (i.e. maternal age), incorporating a ZIP component to model excess zeros in outcomes and borrowing strength from a referent outcome (i.e. caesarean counts). An efficient strategy to sample individual-level data and sample size considerations for rare disease will also be presented. Finally, projections in birth defect categories at the SLA level will be made.