5 resultados para sequences analysis technology
em Cochin University of Science
Resumo:
The classical methods of analysing time series by Box-Jenkins approach assume that the observed series uctuates around changing levels with constant variance. That is, the time series is assumed to be of homoscedastic nature. However, the nancial time series exhibits the presence of heteroscedasticity in the sense that, it possesses non-constant conditional variance given the past observations. So, the analysis of nancial time series, requires the modelling of such variances, which may depend on some time dependent factors or its own past values. This lead to introduction of several classes of models to study the behaviour of nancial time series. See Taylor (1986), Tsay (2005), Rachev et al. (2007). The class of models, used to describe the evolution of conditional variances is referred to as stochastic volatility modelsThe stochastic models available to analyse the conditional variances, are based on either normal or log-normal distributions. One of the objectives of the present study is to explore the possibility of employing some non-Gaussian distributions to model the volatility sequences and then study the behaviour of the resulting return series. This lead us to work on the related problem of statistical inference, which is the main contribution of the thesis
Resumo:
Three dimensional (3D) composites are strong contenders for the structural applications in situations like aerospace,aircraft and automotive industries where multidirectional thermal and mechanical stresses exist. The presence of reinforcement along the thickness direction in 3D composites,increases the through the thickness stiffness and strength properties.The 3D preforms can be manufactured with numerous complex architecture variations to meet the needs of specific applications.For hot structure applications Carbon-Carbon(C-C) composites are generally used,whose property variation with respect to temperature is essential for carrying out the design of hot structures.The thermomechanical behavior of 3D composites is not fully understood and reported.The methodology to find the thermomechanical properties using analytical modelling of 3D woven,3D 4-axes braided and 3D 5-axes braided composites from Representative Unit Cells(RUC's) based on constitutive equations for 3D composites has been dealt in the present study.High Temperature Unidirectional (UD) Carbon-Carbon material properties have been evaluated using analytical methods,viz.,Composite cylinder assemblage Model and Method of Cells based on experiments carried out on Carbon-Carbon fabric composite for a temparature range of 300 degreeK to 2800degreeK.These properties have been used for evaluating the 3D composite properties.From among the existing methods of solution sequences for 3D composites,"3D composite Strength Model" has been identified as the most suitable method.For thegeneration of material properies of RUC's od 3D composites,software has been developed using MATLAB.Correlaton of the analytically determined properties with test results available in literature has been established.Parametric studies on the variation of all the thermomechanical constants for different 3D performs of Carbon-Carbon material have been studied and selection criteria have been formulated for their applications for the hot structures.Procedure for the structural design of hot structures made of 3D Carbon-Carbon composites has been established through the numerical investigations on a Nosecap.Nonlinear transient thermal and nonlinear transient thermo-structural analysis on the Nosecap have been carried out using finite element software NASTRAN.Failure indices have been established for the identified performs,identification of suitable 3D composite based on parametric studies on strength properties and recommendation of this material for Nosecap of RLV based on structural performance have been carried out in this Study.Based on the 3D failure theory the best perform for the Nosecap has been identified as 4-axis 15degree braided composite.
Resumo:
Modern computer systems are plagued with stability and security problems: applications lose data, web servers are hacked, and systems crash under heavy load. Many of these problems or anomalies arise from rare program behavior caused by attacks or errors. A substantial percentage of the web-based attacks are due to buffer overflows. Many methods have been devised to detect and prevent anomalous situations that arise from buffer overflows. The current state-of-art of anomaly detection systems is relatively primitive and mainly depend on static code checking to take care of buffer overflow attacks. For protection, Stack Guards and I-leap Guards are also used in wide varieties.This dissertation proposes an anomaly detection system, based on frequencies of system calls in the system call trace. System call traces represented as frequency sequences are profiled using sequence sets. A sequence set is identified by the starting sequence and frequencies of specific system calls. The deviations of the current input sequence from the corresponding normal profile in the frequency pattern of system calls is computed and expressed as an anomaly score. A simple Bayesian model is used for an accurate detection.Experimental results are reported which show that frequency of system calls represented using sequence sets, captures the normal behavior of programs under normal conditions of usage. This captured behavior allows the system to detect anomalies with a low rate of false positives. Data are presented which show that Bayesian Network on frequency variations responds effectively to induced buffer overflows. It can also help administrators to detect deviations in program flow introduced due to errors.
Resumo:
Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.
Resumo:
Knowledge discovery in databases is the non-trivial process of identifying valid, novel potentially useful and ultimately understandable patterns from data. The term Data mining refers to the process which does the exploratory analysis on the data and builds some model on the data. To infer patterns from data, data mining involves different approaches like association rule mining, classification techniques or clustering techniques. Among the many data mining techniques, clustering plays a major role, since it helps to group the related data for assessing properties and drawing conclusions. Most of the clustering algorithms act on a dataset with uniform format, since the similarity or dissimilarity between the data points is a significant factor in finding out the clusters. If a dataset consists of mixed attributes, i.e. a combination of numerical and categorical variables, a preferred approach is to convert different formats into a uniform format. The research study explores the various techniques to convert the mixed data sets to a numerical equivalent, so as to make it equipped for applying the statistical and similar algorithms. The results of clustering mixed category data after conversion to numeric data type have been demonstrated using a crime data set. The thesis also proposes an extension to the well known algorithm for handling mixed data types, to deal with data sets having only categorical data. The proposed conversion has been validated on a data set corresponding to breast cancer. Moreover, another issue with the clustering process is the visualization of output. Different geometric techniques like scatter plot, or projection plots are available, but none of the techniques display the result projecting the whole database but rather demonstrate attribute-pair wise analysis