845 resultados para Sign Data LMS algorithm.
Resumo:
This work present the application of a computer package for generating of projection data for neutron computerized tomography, and in second part, discusses an application of neutron tomography, using the projection data obtained by Monte Carlo technique, for the detection and localization of light materials such as those containing hydrogen, concealed by heavy materials such as iron and lead. For tomographic reconstructions of the samples simulated use was made of only six equal projection angles distributed between 0º and 180º, with reconstruction making use of an algorithm (ARIEM), based on the principle of maximum entropy. With the neutron tomography it was possible to detect and locate polyethylene and water hidden by lead and iron (with 1cm-thick). Thus, it is demonstrated that thermal neutrons tomography is a viable test method which can provide important interior information about test components, so, extremely useful in routine industrial applications.
Resumo:
In the present study, we modeled a reaching task as a two-link mechanism. The upper arm and forearm motion trajectories during vertical arm movements were estimated from the measured angular accelerations with dual-axis accelerometers. A data set of reaching synergies from able-bodied individuals was used to train a radial basis function artificial neural network with upper arm/forearm tangential angular accelerations. The trained radial basis function artificial neural network for the specific movements predicted forearm motion from new upper arm trajectories with high correlation (mean, 0.9149-0.941). For all other movements, prediction was low (range, 0.0316-0.8302). Results suggest that the proposed algorithm is successful in generalization over similar motions and subjects. Such networks may be used as a high-level controller that could predict forearm kinematics from voluntary movements of the upper arm. This methodology is suitable for restoring the upper limb functions of individuals with motor disabilities of the forearm, but not of the upper arm. The developed control paradigm is applicable to upper-limb orthotic systems employing functional electrical stimulation. The proposed approach is of great significance particularly for humans with spinal cord injuries in a free-living environment. The implication of a measurement system with dual-axis accelerometers, developed for this study, is further seen in the evaluation of movement during the course of rehabilitation. For this purpose, training-related changes in synergies apparent from movement kinematics during rehabilitation would characterize the extent and the course of recovery. As such, a simple system using this methodology is of particular importance for stroke patients. The results underlie the important issue of upper-limb coordination.
Resumo:
The objective of this study was to predict by means of Artificial Neural Network (ANN), multilayer perceptrons, the texture attributes of light cheesecurds perceived by trained judges based on instrumental texture measurements. Inputs to the network were the instrumental texture measurements of light cheesecurd (imitative and fundamental parameters). Output variables were the sensory attributes consistency and spreadability. Nine light cheesecurd formulations composed of different combinations of fat and water were evaluated. The measurements obtained by the instrumental and sensory analyses of these formulations constituted the data set used for training and validation of the network. Network training was performed using a back-propagation algorithm. The network architecture selected was composed of 8-3-9-2 neurons in its layers, which quickly and accurately predicted the sensory texture attributes studied, showing a high correlation between the predicted and experimental values for the validation data set and excellent generalization ability, with a validation RMSE of 0.0506.
Resumo:
Most of the applications of airborne laser scanner data to forestry require that the point cloud be normalized, i.e., each point represents height from the ground instead of elevation. To normalize the point cloud, a digital terrain model (DTM), which is derived from the ground returns in the point cloud, is employed. Unfortunately, extracting accurate DTMs from airborne laser scanner data is a challenging task, especially in tropical forests where the canopy is normally very thick (partially closed), leading to a situation in which only a limited number of laser pulses reach the ground. Therefore, robust algorithms for extracting accurate DTMs in low-ground-point-densitysituations are needed in order to realize the full potential of airborne laser scanner data to forestry. The objective of this thesis is to develop algorithms for processing airborne laser scanner data in order to: (1) extract DTMs in demanding forest conditions (complex terrain and low number of ground points) for applications in forestry; (2) estimate canopy base height (CBH) for forest fire behavior modeling; and (3) assess the robustness of LiDAR-based high-resolution biomass estimation models against different field plot designs. Here, the aim is to find out if field plot data gathered by professional foresters can be combined with field plot data gathered by professionally trained community foresters and used in LiDAR-based high-resolution biomass estimation modeling without affecting prediction performance. The question of interest in this case is whether or not the local forest communities can achieve the level technical proficiency required for accurate forest monitoring. The algorithms for extracting DTMs from LiDAR point clouds presented in this thesis address the challenges of extracting DTMs in low-ground-point situations and in complex terrain while the algorithm for CBH estimation addresses the challenge of variations in the distribution of points in the LiDAR point cloud caused by things like variations in tree species and season of data acquisition. These algorithms are adaptive (with respect to point cloud characteristics) and exhibit a high degree of tolerance to variations in the density and distribution of points in the LiDAR point cloud. Results of comparison with existing DTM extraction algorithms showed that DTM extraction algorithms proposed in this thesis performed better with respect to accuracy of estimating tree heights from airborne laser scanner data. On the other hand, the proposed DTM extraction algorithms, being mostly based on trend surface interpolation, can not retain small artifacts in the terrain (e.g., bumps, small hills and depressions). Therefore, the DTMs generated by these algorithms are only suitable for forestry applications where the primary objective is to estimate tree heights from normalized airborne laser scanner data. On the other hand, the algorithm for estimating CBH proposed in this thesis is based on the idea of moving voxel in which gaps (openings in the canopy) which act as fuel breaks are located and their height is estimated. Test results showed a slight improvement in CBH estimation accuracy over existing CBH estimation methods which are based on height percentiles in the airborne laser scanner data. However, being based on the idea of moving voxel, this algorithm has one main advantage over existing CBH estimation methods in the context of forest fire modeling: it has great potential in providing information about vertical fuel continuity. This information can be used to create vertical fuel continuity maps which can provide more realistic information on the risk of crown fires compared to CBH.
Resumo:
Solid state nuclear magnetic resonance (NMR) spectroscopy is a powerful technique for studying structural and dynamical properties of disordered and partially ordered materials, such as glasses, polymers, liquid crystals, and biological materials. In particular, twodimensional( 2D) NMR methods such as ^^C-^^C correlation spectroscopy under the magicangle- spinning (MAS) conditions have been used to measure structural constraints on the secondary structure of proteins and polypeptides. Amyloid fibrils implicated in a broad class of diseases such as Alzheimer's are known to contain a particular repeating structural motif, called a /5-sheet. However, the details of such structures are poorly understood, primarily because the structural constraints extracted from the 2D NMR data in the form of the so-called Ramachandran (backbone torsion) angle distributions, g{^,'4)), are strongly model-dependent. Inverse theory methods are used to extract Ramachandran angle distributions from a set of 2D MAS and constant-time double-quantum-filtered dipolar recoupling (CTDQFD) data. This is a vastly underdetermined problem, and the stability of the inverse mapping is problematic. Tikhonov regularization is a well-known method of improving the stability of the inverse; in this work it is extended to use a new regularization functional based on the Laplacian rather than on the norm of the function itself. In this way, one makes use of the inherently two-dimensional nature of the underlying Ramachandran maps. In addition, a modification of the existing numerical procedure is performed, as appropriate for an underdetermined inverse problem. Stability of the algorithm with respect to the signal-to-noise (S/N) ratio is examined using a simulated data set. The results show excellent convergence to the true angle distribution function g{(j),ii) for the S/N ratio above 100.
Resumo:
Spatial data representation and compression has become a focus issue in computer graphics and image processing applications. Quadtrees, as one of hierarchical data structures, basing on the principle of recursive decomposition of space, always offer a compact and efficient representation of an image. For a given image, the choice of quadtree root node plays an important role in its quadtree representation and final data compression. The goal of this thesis is to present a heuristic algorithm for finding a root node of a region quadtree, which is able to reduce the number of leaf nodes when compared with the standard quadtree decomposition. The empirical results indicate that, this proposed algorithm has quadtree representation and data compression improvement when in comparison with the traditional method.
Resumo:
DNA assembly is among the most fundamental and difficult problems in bioinformatics. Near optimal assembly solutions are available for bacterial and small genomes, however assembling large and complex genomes especially the human genome using Next-Generation-Sequencing (NGS) technologies is shown to be very difficult because of the highly repetitive and complex nature of the human genome, short read lengths, uneven data coverage and tools that are not specifically built for human genomes. Moreover, many algorithms are not even scalable to human genome datasets containing hundreds of millions of short reads. The DNA assembly problem is usually divided into several subproblems including DNA data error detection and correction, contig creation, scaffolding and contigs orientation; each can be seen as a distinct research area. This thesis specifically focuses on creating contigs from the short reads and combining them with outputs from other tools in order to obtain better results. Three different assemblers including SOAPdenovo [Li09], Velvet [ZB08] and Meraculous [CHS+11] are selected for comparative purposes in this thesis. Obtained results show that this thesis’ work produces comparable results to other assemblers and combining our contigs to outputs from other tools, produces the best results outperforming all other investigated assemblers.
Resumo:
Understanding the relationship between genetic diseases and the genes associated with them is an important problem regarding human health. The vast amount of data created from a large number of high-throughput experiments performed in the last few years has resulted in an unprecedented growth in computational methods to tackle the disease gene association problem. Nowadays, it is clear that a genetic disease is not a consequence of a defect in a single gene. Instead, the disease phenotype is a reflection of various genetic components interacting in a complex network. In fact, genetic diseases, like any other phenotype, occur as a result of various genes working in sync with each other in a single or several biological module(s). Using a genetic algorithm, our method tries to evolve communities containing the set of potential disease genes likely to be involved in a given genetic disease. Having a set of known disease genes, we first obtain a protein-protein interaction (PPI) network containing all the known disease genes. All the other genes inside the procured PPI network are then considered as candidate disease genes as they lie in the vicinity of the known disease genes in the network. Our method attempts to find communities of potential disease genes strongly working with one another and with the set of known disease genes. As a proof of concept, we tested our approach on 16 breast cancer genes and 15 Parkinson's Disease genes. We obtained comparable or better results than CIPHER, ENDEAVOUR and GPEC, three of the most reliable and frequently used disease-gene ranking frameworks.
Resumo:
La microscopie par fluorescence de cellules vivantes produit de grandes quantités de données. Ces données sont composées d’une grande diversité au niveau de la forme des objets d’intérêts et possèdent un ratio signaux/bruit très bas. Pour concevoir un pipeline d’algorithmes efficaces en traitement d’image de microscopie par fluorescence, il est important d’avoir une segmentation robuste et fiable étant donné que celle-ci constitue l’étape initiale du traitement d’image. Dans ce mémoire, je présente MinSeg, un algorithme de segmentation d’image de microscopie par fluorescence qui fait peu d’assomptions sur l’image et utilise des propriétés statistiques pour distinguer le signal par rapport au bruit. MinSeg ne fait pas d’assomption sur la taille ou la forme des objets contenus dans l’image. Par ce fait, il est donc applicable sur une grande variété d’images. Je présente aussi une suite d’algorithmes pour la quantification de petits complexes dans des expériences de microscopie par fluorescence de molécules simples utilisant l’algorithme de segmentation MinSeg. Cette suite d’algorithmes a été utilisée pour la quantification d’une protéine nommée CENP-A qui est une variante de l’histone H3. Par cette technique, nous avons trouvé que CENP-A est principalement présente sous forme de dimère.
Resumo:
There are many ways to generate geometrical models for numerical simulation, and most of them start with a segmentation step to extract the boundaries of the regions of interest. This paper presents an algorithm to generate a patient-specific three-dimensional geometric model, based on a tetrahedral mesh, without an initial extraction of contours from the volumetric data. Using the information directly available in the data, such as gray levels, we built a metric to drive a mesh adaptation process. The metric is used to specify the size and orientation of the tetrahedral elements everywhere in the mesh. Our method, which produces anisotropic meshes, gives good results with synthetic and real MRI data. The resulting model quality has been evaluated qualitatively and quantitatively by comparing it with an analytical solution and with a segmentation made by an expert. Results show that our method gives, in 90% of the cases, as good or better meshes as a similar isotropic method, based on the accuracy of the volume reconstruction for a given mesh size. Moreover, a comparison of the Hausdorff distances between adapted meshes of both methods and ground-truth volumes shows that our method decreases reconstruction errors faster. Copyright © 2015 John Wiley & Sons, Ltd.
Resumo:
Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.
Resumo:
This thesis Entitled “modelling and analysis of recurrent event data with multiple causes.Survival data is a term used for describing data that measures the time to occurrence of an event.In survival studies, the time to occurrence of an event is generally referred to as lifetime.Recurrent event data are commonly encountered in longitudinal studies when individuals are followed to observe the repeated occurrences of certain events. In many practical situations, individuals under study are exposed to the failure due to more than one causes and the eventual failure can be attributed to exactly one of these causes.The proposed model was useful in real life situations to study the effect of covariates on recurrences of certain events due to different causes.In Chapter 3, an additive hazards model for gap time distributions of recurrent event data with multiple causes was introduced. The parameter estimation and asymptotic properties were discussed .In Chapter 4, a shared frailty model for the analysis of bivariate competing risks data was presented and the estimation procedures for shared gamma frailty model, without covariates and with covariates, using EM algorithm were discussed. In Chapter 6, two nonparametric estimators for bivariate survivor function of paired recurrent event data were developed. The asymptotic properties of the estimators were studied. The proposed estimators were applied to a real life data set. Simulation studies were carried out to find the efficiency of the proposed estimators.
Resumo:
Microarray data analysis is one of data mining tool which is used to extract meaningful information hidden in biological data. One of the major focuses on microarray data analysis is the reconstruction of gene regulatory network that may be used to provide a broader understanding on the functioning of complex cellular systems. Since cancer is a genetic disease arising from the abnormal gene function, the identification of cancerous genes and the regulatory pathways they control will provide a better platform for understanding the tumor formation and development. The major focus of this thesis is to understand the regulation of genes responsible for the development of cancer, particularly colorectal cancer by analyzing the microarray expression data. In this thesis, four computational algorithms namely fuzzy logic algorithm, modified genetic algorithm, dynamic neural fuzzy network and Takagi Sugeno Kang-type recurrent neural fuzzy network are used to extract cancer specific gene regulatory network from plasma RNA dataset of colorectal cancer patients. Plasma RNA is highly attractive for cancer analysis since it requires a collection of small amount of blood and it can be obtained at any time in repetitive fashion allowing the analysis of disease progression and treatment response.
Resumo:
In the current study, epidemiology study is done by means of literature survey in groups identified to be at higher potential for DDIs as well as in other cases to explore patterns of DDIs and the factors affecting them. The structure of the FDA Adverse Event Reporting System (FAERS) database is studied and analyzed in detail to identify issues and challenges in data mining the drug-drug interactions. The necessary pre-processing algorithms are developed based on the analysis and the Apriori algorithm is modified to suit the process. Finally, the modules are integrated into a tool to identify DDIs. The results are compared using standard drug interaction database for validation. 31% of the associations obtained were identified to be new and the match with existing interactions was 69%. This match clearly indicates the validity of the methodology and its applicability to similar databases. Formulation of the results using the generic names expanded the relevance of the results to a global scale. The global applicability helps the health care professionals worldwide to observe caution during various stages of drug administration thus considerably enhancing pharmacovigilance
Resumo:
Biclustering is simultaneous clustering of both rows and columns of a data matrix. A measure called Mean Squared Residue (MSR) is used to simultaneously evaluate the coherence of rows and columns within a submatrix. In this paper a novel algorithm is developed for biclustering gene expression data using the newly introduced concept of MSR difference threshold. In the first step high quality bicluster seeds are generated using K-Means clustering algorithm. Then more genes and conditions (node) are added to the bicluster. Before adding a node the MSR X of the bicluster is calculated. After adding the node again the MSR Y is calculated. The added node is deleted if Y minus X is greater than MSR difference threshold or if Y is greater than MSR threshold which depends on the dataset. The MSR difference threshold is different for gene list and condition list and it depends on the dataset also. Proper values should be identified through experimentation in order to obtain biclusters of high quality. The results obtained on bench mark dataset clearly indicate that this algorithm is better than many of the existing biclustering algorithms