11 resultados para process data
em Cochin University of Science
Resumo:
Department of Marine Geology and Geophysics,Cochin University of Science and Technology
Resumo:
An attempt is made to study the possible relationship between the process of upwelling and zooplankton biomass in the shelf weters along the south west coast of India between Cape comorin and Ratnagiri based on oceanographic and Zooplankton data collected by the erstwhile FAO/UNDP Pelagic Fishery Project,Cochin between 1973 and 1978. Different factors such as the depth from which the bottom waters are induced upwards during the process of upwelling,the depth to which the bottom waters are drawn, vertical velocity of upwelling and the resultant zooplankton productivity were considered while arriving at the deductions. Except for nutrients and phytoplankton productivity, for which simultaneous data is lacking, all the major factors were taken into consideration before cocluding- xon positive/negative correlation.
Resumo:
Machine tool chatter is an unfavorable phenomenon during metal cutting, which results in heavy vibration of cutting tool. With increase in depth of cut, the cutting regime changes from chatter-free cutting to one with chatter. In this paper, we propose the use of permutation entropy (PE), a conceptually simple and computationally fast measurement to detect the onset of chatter from the time series using sound signal recorded with a unidirectional microphone. PE can efficiently distinguish the regular and complex nature of any signal and extract information about the dynamics of the process by indicating sudden change in its value. Under situations where the data sets are huge and there is no time for preprocessing and fine-tuning, PE can effectively detect dynamical changes of the system. This makes PE an ideal choice for online detection of chatter, which is not possible with other conventional nonlinear methods. In the present study, the variation of PE under two cutting conditions is analyzed. Abrupt variation in the value of PE with increase in depth of cut indicates the onset of chatter vibrations. The results are verified using frequency spectra of the signals and the nonlinear measure, normalized coarse-grained information rate (NCIR).
Resumo:
Multivariate lifetime data arise in various forms including recurrent event data when individuals are followed to observe the sequence of occurrences of a certain type of event; correlated lifetime when an individual is followed for the occurrence of two or more types of events, or when distinct individuals have dependent event times. In most studies there are covariates such as treatments, group indicators, individual characteristics, or environmental conditions, whose relationship to lifetime is of interest. This leads to a consideration of regression models.The well known Cox proportional hazards model and its variations, using the marginal hazard functions employed for the analysis of multivariate survival data in literature are not sufficient to explain the complete dependence structure of pair of lifetimes on the covariate vector. Motivated by this, in Chapter 2, we introduced a bivariate proportional hazards model using vector hazard function of Johnson and Kotz (1975), in which the covariates under study have different effect on two components of the vector hazard function. The proposed model is useful in real life situations to study the dependence structure of pair of lifetimes on the covariate vector . The well known partial likelihood approach is used for the estimation of parameter vectors. We then introduced a bivariate proportional hazards model for gap times of recurrent events in Chapter 3. The model incorporates both marginal and joint dependence of the distribution of gap times on the covariate vector . In many fields of application, mean residual life function is considered superior concept than the hazard function. Motivated by this, in Chapter 4, we considered a new semi-parametric model, bivariate proportional mean residual life time model, to assess the relationship between mean residual life and covariates for gap time of recurrent events. The counting process approach is used for the inference procedures of the gap time of recurrent events. In many survival studies, the distribution of lifetime may depend on the distribution of censoring time. In Chapter 5, we introduced a proportional hazards model for duration times and developed inference procedures under dependent (informative) censoring. In Chapter 6, we introduced a bivariate proportional hazards model for competing risks data under right censoring. The asymptotic properties of the estimators of the parameters of different models developed in previous chapters, were studied. The proposed models were applied to various real life situations.
Resumo:
In this paper we try to fit a threshold autoregressive (TAR) model to time series data of monthly coconut oil prices at Cochin market. The procedure proposed by Tsay [7] for fitting the TAR model is briefly presented. The fitted model is compared with a simple autoregressive (AR) model. The results are in favour of TAR process. Thus the monthly coconut oil prices exhibit a type of non-linearity which can be accounted for by a threshold model.
Resumo:
Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.
Resumo:
Satellite remote sensing is being effectively used in monitoring the ocean surface and its overlying atmosphere. Technical growth in the field of satellite sensors has made satellite measurement an inevitable part of oceanographic and atmospheric research. Among the ocean observing sensors, ocean colour sensors make use of visible band of electromagnetic spectrum (shorter wavelength). The use of shorter wavelength ensures fine spatial resolution of these parameters to depict oceanographic and atmospheric characteristics of any region having significant spaio-temporal variability. Off the southwest coast of India is such an area showing very significant spatio-temporal oceanographic and atmospheric variability due to the seasonally reversing surface winds and currents. Consequently, the region is enriched with features like upwelling, sinking, eddies, fronts, etc. Among them, upwelling brings nutrient-rich waters from subsurface layers to surface layers. During this process primary production enhances, which is measured in ocean colour sensors as high values of Chl a. Vertical attenuation depth of incident solar radiation (Kd) and Aerosol Optical Depth (AOD) are another two parameters provided by ocean colour sensors. Kd is also susceptible to undergo significant seasonal variability due to the changes in the content of Chl a in the water column. Moreover, Kd is affected by sediment transport in the upper layers as the region experiences land drainage resulting from copious rainfall. The wide range of variability of wind speed and direction may also influence the aerosol source / transport and consequently AOD. The present doctoral thesis concentrates on the utility of Chl a, Kd and AODprovided by satellite ocean colour sensors to understand oceanographic and atmospheric variability off the southwest coast of India. The thesis is divided into six Chapters with further subdivisions
Resumo:
In this paper we discuss our research in developing general and systematic method for anomaly detection. The key ideas are to represent normal program behaviour using system call frequencies and to incorporate probabilistic techniques for classification to detect anomalies and intrusions. Using experiments on the sendmail system call data, we demonstrate that we can construct concise and accurate classifiers to detect anomalies. We provide an overview of the approach that we have implemented
Resumo:
In the current study, epidemiology study is done by means of literature survey in groups identified to be at higher potential for DDIs as well as in other cases to explore patterns of DDIs and the factors affecting them. The structure of the FDA Adverse Event Reporting System (FAERS) database is studied and analyzed in detail to identify issues and challenges in data mining the drug-drug interactions. The necessary pre-processing algorithms are developed based on the analysis and the Apriori algorithm is modified to suit the process. Finally, the modules are integrated into a tool to identify DDIs. The results are compared using standard drug interaction database for validation. 31% of the associations obtained were identified to be new and the match with existing interactions was 69%. This match clearly indicates the validity of the methodology and its applicability to similar databases. Formulation of the results using the generic names expanded the relevance of the results to a global scale. The global applicability helps the health care professionals worldwide to observe caution during various stages of drug administration thus considerably enhancing pharmacovigilance
Resumo:
Knowledge discovery in databases is the non-trivial process of identifying valid, novel potentially useful and ultimately understandable patterns from data. The term Data mining refers to the process which does the exploratory analysis on the data and builds some model on the data. To infer patterns from data, data mining involves different approaches like association rule mining, classification techniques or clustering techniques. Among the many data mining techniques, clustering plays a major role, since it helps to group the related data for assessing properties and drawing conclusions. Most of the clustering algorithms act on a dataset with uniform format, since the similarity or dissimilarity between the data points is a significant factor in finding out the clusters. If a dataset consists of mixed attributes, i.e. a combination of numerical and categorical variables, a preferred approach is to convert different formats into a uniform format. The research study explores the various techniques to convert the mixed data sets to a numerical equivalent, so as to make it equipped for applying the statistical and similar algorithms. The results of clustering mixed category data after conversion to numeric data type have been demonstrated using a crime data set. The thesis also proposes an extension to the well known algorithm for handling mixed data types, to deal with data sets having only categorical data. The proposed conversion has been validated on a data set corresponding to breast cancer. Moreover, another issue with the clustering process is the visualization of output. Different geometric techniques like scatter plot, or projection plots are available, but none of the techniques display the result projecting the whole database but rather demonstrate attribute-pair wise analysis
Resumo:
In a business environment that is characterized by intense competition, building customer loyalty has become a key area of focus for most financial institutions. The explosion of the services sector, changing customer demographics and deregulation and emergence of new technology in the financial services industry have had a critical impact on consumers’ financial services buying behaviour. The changes have forced banks to modify their service offerings to customers so as to ensure high levels of customer satisfaction and also high levels of customer retention. Banks have historically had difficulty distinguishing their products from one another because of their relative homogeneity; with increasing competition,the problem has only intensified with no coherent distinguishing theme. Rising wealth, product proliferation, regulatory changes and newer technologies are together making bank switching easier for customers. In order to remain competitive, it is important for banks to retain their customer base. The financial services sector is the foundation for any economy and plays the role of mobilization of resources and their allocation. The retail banking sector in India has emerged as one of the major drivers of the overall banking industry and has witnessed enormous growth. Switching behaviour has a negative impact on the banks’ market share and profitability as the costs of acquiring customers are much higher than the costs of retaining. When customers switch, the business loses the potential for additional profits from the customer the initial costs invested in the customer by the business get . The Objective of the thesis was to examine the relationship among triggers that customers experience, their perceptions of service quality, consumers’ commitment and behavioral intentions in the contemporary India retail banking context through the eyes of the customer. To understand customers’ perception of these aspects, data were collected from retail banking customers alone for the purpose of analysis, though the banks’ views were considered during the qualitative work carried out prior to the main study. No respondent who is an employee of a banking organization was considered for the final study to avoid the possibility of any bias that could affect the results adversely. The data for the study were collected from customers who have switched banks and from those who were non switchers. The study attempted to develop and validate a multidimensional construct of service quality for retail banking from the consumer’s perspective. A major conclusion from the empirical research was the confirmation of the multidimensional construct for perceived service quality in the banking context. Switching can be viewed as an optimization problem for customers; customers review the potential gains of switching to another service provider against the costs of leaving the service provider. As banks do not provide tangible products, their service quality is usually assessed through service provider’s relationship with customers. Thus, banks should pay attention towards their employees’ skills and knowledge; assessing customers’ needs and offering fast and efficient services.