10 resultados para rule-based algorithms
em Cochin University of Science
Resumo:
Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.
Resumo:
This is a Named Entity Based Question Answering System for Malayalam Language. Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. This system is based on Named Entity Tagging and Question Classification. Document tagging extracts useful information from the documents which will be used in finding the answer to the question. Question Classification extracts useful information from the question to determine the type of the question and the way in which the question is to be answered. Various Machine Learning methods are used to tag the documents. Rule-Based Approach is used for Question Classification. Malayalam belongs to the Dravidian family of languages and is one of the four major languages of this family. It is one of the 22 Scheduled Languages of India with official language status in the state of Kerala. It is spoken by 40 million people. Malayalam is a morphologically rich agglutinative language and relatively of free word order. Also Malayalam has a productive morphology that allows the creation of complex words which are often highly ambiguous. Document tagging tools such as Parts-of-Speech Tagger, Phrase Chunker, Named Entity Tagger, and Compound Word Splitter are developed as a part of this research work. No such tools were available for Malayalam language. Finite State Transducer, High Order Conditional Random Field, Artificial Immunity System Principles, and Support Vector Machines are the techniques used for the design of these document preprocessing tools. This research work describes how the Named Entity is used to represent the documents. Single sentence questions are used to test the system. Overall Precision and Recall obtained are 88.5% and 85.9% respectively. This work can be extended in several directions. The coverage of non-factoid questions can be increased and also it can be extended to include open domain applications. Reference Resolution and Word Sense Disambiguation techniques are suggested as the future enhancements
Resumo:
This thesis summarizes the results on the studies on a syntax based approach for translation between Malayalam, one of Dravidian languages and English and also on the development of the major modules in building a prototype machine translation system from Malayalam to English. The development of the system is a pioneering effort in Malayalam language unattempted by previous researchers. The computational models chosen for the system is first of its kind for Malayalam language. An in depth study has been carried out in the design of the computational models and data structures needed for different modules: morphological analyzer , a parser, a syntactic structure transfer module and target language sentence generator required for the prototype system. The generation of list of part of speech tags, chunk tags and the hierarchical dependencies among the chunks required for the translation process also has been done. In the development process, the major goals are: (a) accuracy of translation (b) speed and (c) space. Accuracy-wise, smart tools for handling transfer grammar and translation standards including equivalent words, expressions, phrases and styles in the target language are to be developed. The grammar should be optimized with a view to obtaining a single correct parse and hence a single translated output. Speed-wise, innovative use of corpus analysis, efficient parsing algorithm, design of efficient Data Structure and run-time frequency-based rearrangement of the grammar which substantially reduces the parsing and generation time are required. The space requirement also has to be minimised
Resumo:
Knowledge discovery in databases is the non-trivial process of identifying valid, novel potentially useful and ultimately understandable patterns from data. The term Data mining refers to the process which does the exploratory analysis on the data and builds some model on the data. To infer patterns from data, data mining involves different approaches like association rule mining, classification techniques or clustering techniques. Among the many data mining techniques, clustering plays a major role, since it helps to group the related data for assessing properties and drawing conclusions. Most of the clustering algorithms act on a dataset with uniform format, since the similarity or dissimilarity between the data points is a significant factor in finding out the clusters. If a dataset consists of mixed attributes, i.e. a combination of numerical and categorical variables, a preferred approach is to convert different formats into a uniform format. The research study explores the various techniques to convert the mixed data sets to a numerical equivalent, so as to make it equipped for applying the statistical and similar algorithms. The results of clustering mixed category data after conversion to numeric data type have been demonstrated using a crime data set. The thesis also proposes an extension to the well known algorithm for handling mixed data types, to deal with data sets having only categorical data. The proposed conversion has been validated on a data set corresponding to breast cancer. Moreover, another issue with the clustering process is the visualization of output. Different geometric techniques like scatter plot, or projection plots are available, but none of the techniques display the result projecting the whole database but rather demonstrate attribute-pair wise analysis
Resumo:
The country has witnessed tremendous increase in the vehicle population and increased axle loading pattern during the last decade, leaving its road network overstressed and leading to premature failure. The type of deterioration present in the pavement should be considered for determining whether it has a functional or structural deficiency, so that appropriate overlay type and design can be developed. Structural failure arises from the conditions that adversely affect the load carrying capability of the pavement structure. Inadequate thickness, cracking, distortion and disintegration cause structural deficiency. Functional deficiency arises when the pavement does not provide a smooth riding surface and comfort to the user. This can be due to poor surface friction and texture, hydro planning and splash from wheel path, rutting and excess surface distortion such as potholes, corrugation, faulting, blow up, settlement, heaves etc. Functional condition determines the level of service provided by the facility to its users at a particular time and also the Vehicle Operating Costs (VOC), thus influencing the national economy. Prediction of the pavement deterioration is helpful to assess the remaining effective service life (RSL) of the pavement structure on the basis of reduction in performance levels, and apply various alternative designs and rehabilitation strategies with a long range funding requirement for pavement preservation. In addition, they can predict the impact of treatment on the condition of the sections. The infrastructure prediction models can thus be classified into four groups, namely primary response models, structural performance models, functional performance models and damage models. The factors affecting the deterioration of the roads are very complex in nature and vary from place to place. Hence there is need to have a thorough study of the deterioration mechanism under varied climatic zones and soil conditions before arriving at a definite strategy of road improvement. Realizing the need for a detailed study involving all types of roads in the state with varying traffic and soil conditions, the present study has been attempted. This study attempts to identify the parameters that affect the performance of roads and to develop performance models suitable to Kerala conditions. A critical review of the various factors that contribute to the pavement performance has been presented based on the data collected from selected road stretches and also from five corporations of Kerala. These roads represent the urban conditions as well as National Highways, State Highways and Major District Roads in the sub urban and rural conditions. This research work is a pursuit towards a study of the road condition of Kerala with respect to varying soil, traffic and climatic conditions, periodic performance evaluation of selected roads of representative types and development of distress prediction models for roads of Kerala. In order to achieve this aim, the study is focused into 2 parts. The first part deals with the study of the pavement condition and subgrade soil properties of urban roads distributed in 5 Corporations of Kerala; namely Thiruvananthapuram, Kollam, Kochi, Thrissur and Kozhikode. From selected 44 roads, 68 homogeneous sections were studied. The data collected on the functional and structural condition of the surface include pavement distress in terms of cracks, potholes, rutting, raveling and pothole patching. The structural strength of the pavement was measured as rebound deflection using Benkelman Beam deflection studies. In order to collect the details of the pavement layers and find out the subgrade soil properties, trial pits were dug and the in-situ field density was found using the Sand Replacement Method. Laboratory investigations were carried out to find out the subgrade soil properties, soil classification, Atterberg limits, Optimum Moisture Content, Field Moisture Content and 4 days soaked CBR. The relative compaction in the field was also determined. The traffic details were also collected by conducting traffic volume count survey and axle load survey. From the data thus collected, the strength of the pavement was calculated which is a function of the layer coefficient and thickness and is represented as Structural Number (SN). This was further related to the CBR value of the soil and the Modified Structural Number (MSN) was found out. The condition of the pavement was represented in terms of the Pavement Condition Index (PCI) which is a function of the distress of the surface at the time of the investigation and calculated in the present study using deduct value method developed by U S Army Corps of Engineers. The influence of subgrade soil type and pavement condition on the relationship between MSN and rebound deflection was studied using appropriate plots for predominant types of soil and for classified value of Pavement Condition Index. The relationship will be helpful for practicing engineers to design the overlay thickness required for the pavement, without conducting the BBD test. Regression analysis using SPSS was done with various trials to find out the best fit relationship between the rebound deflection and CBR, and other soil properties for Gravel, Sand, Silt & Clay fractions. The second part of the study deals with periodic performance evaluation of selected road stretches representing National Highway (NH), State Highway (SH) and Major District Road (MDR), located in different geographical conditions and with varying traffic. 8 road sections divided into 15 homogeneous sections were selected for the study and 6 sets of continuous periodic data were collected. The periodic data collected include the functional and structural condition in terms of distress (pothole, pothole patch, cracks, rutting and raveling), skid resistance using a portable skid resistance pendulum, surface unevenness using Bump Integrator, texture depth using sand patch method and rebound deflection using Benkelman Beam. Baseline data of the study stretches were collected as one time data. Pavement history was obtained as secondary data. Pavement drainage characteristics were collected in terms of camber or cross slope using camber board (slope meter) for the carriage way and shoulders, availability of longitudinal side drain, presence of valley, terrain condition, soil moisture content, water table data, High Flood Level, rainfall data, land use and cross slope of the adjoining land. These data were used for finding out the drainage condition of the study stretches. Traffic studies were conducted, including classified volume count and axle load studies. From the field data thus collected, the progression of each parameter was plotted for all the study roads; and validated for their accuracy. Structural Number (SN) and Modified Structural Number (MSN) were calculated for the study stretches. Progression of the deflection, distress, unevenness, skid resistance and macro texture of the study roads were evaluated. Since the deterioration of the pavement is a complex phenomena contributed by all the above factors, pavement deterioration models were developed as non linear regression models, using SPSS with the periodic data collected for all the above road stretches. General models were developed for cracking progression, raveling progression, pothole progression and roughness progression using SPSS. A model for construction quality was also developed. Calibration of HDM–4 pavement deterioration models for local conditions was done using the data for Cracking, Raveling, Pothole and Roughness. Validation was done using the data collected in 2013. The application of HDM-4 to compare different maintenance and rehabilitation options were studied considering the deterioration parameters like cracking, pothole and raveling. The alternatives considered for analysis were base alternative with crack sealing and patching, overlay with 40 mm BC using ordinary bitumen, overlay with 40 mm BC using Natural Rubber Modified Bitumen and an overlay of Ultra Thin White Topping. Economic analysis of these options was done considering the Life Cycle Cost (LCC). The average speed that can be obtained by applying these options were also compared. The results were in favour of Ultra Thin White Topping over flexible pavements. Hence, Design Charts were also plotted for estimation of maximum wheel load stresses for different slab thickness under different soil conditions. The design charts showed the maximum stress for a particular slab thickness and different soil conditions incorporating different k values. These charts can be handy for a design engineer. Fuzzy rule based models developed for site specific conditions were compared with regression models developed using SPSS. The Riding Comfort Index (RCI) was calculated and correlated with unevenness to develop a relationship. Relationships were developed between Skid Number and Macro Texture of the pavement. The effort made through this research work will be helpful to highway engineers in understanding the behaviour of flexible pavements in Kerala conditions and for arriving at suitable maintenance and rehabilitation strategies. Key Words: Flexible Pavements – Performance Evaluation – Urban Roads – NH – SH and other roads – Performance Models – Deflection – Riding Comfort Index – Skid Resistance – Texture Depth – Unevenness – Ultra Thin White Topping
Resumo:
Analog-to digital Converters (ADC) have an important impact on the overall performance of signal processing system. This research is to explore efficient techniques for the design of sigma-delta ADC,specially for multi-standard wireless tranceivers. In particular, the aim is to develop novel models and algorithms to address this problem and to implement software tools which are avle to assist the designer's decisions in the system-level exploration phase. To this end, this thesis presents a framework of techniques to design sigma-delta analog to digital converters.A2-2-2 reconfigurable sigma-delta modulator is proposed which can meet the design specifications of the three wireless communication standards namely GSM,WCDMA and WLAN. A sigma-delta modulator design tool is developed using the Graphical User Interface Development Environment (GUIDE) In MATLAB.Genetic Algorithm(GA) based search method is introduced to find the optimum value of the scaling coefficients and to maximize the dynamic range in a sigma-delta modulator.
Resumo:
To ensure quality of machined products at minimum machining costs and maximum machining effectiveness, it is very important to select optimum parameters when metal cutting machine tools are employed. Traditionally, the experience of the operator plays a major role in the selection of optimum metal cutting conditions. However, attaining optimum values each time by even a skilled operator is difficult. The non-linear nature of the machining process has compelled engineers to search for more effective methods to attain optimization. The design objective preceding most engineering design activities is simply to minimize the cost of production or to maximize the production efficiency. The main aim of research work reported here is to build robust optimization algorithms by exploiting ideas that nature has to offer from its backyard and using it to solve real world optimization problems in manufacturing processes.In this thesis, after conducting an exhaustive literature review, several optimization techniques used in various manufacturing processes have been identified. The selection of optimal cutting parameters, like depth of cut, feed and speed is a very important issue for every machining process. Experiments have been designed using Taguchi technique and dry turning of SS420 has been performed on Kirlosker turn master 35 lathe. Analysis using S/N and ANOVA were performed to find the optimum level and percentage of contribution of each parameter. By using S/N analysis the optimum machining parameters from the experimentation is obtained.Optimization algorithms begin with one or more design solutions supplied by the user and then iteratively check new design solutions, relative search spaces in order to achieve the true optimum solution. A mathematical model has been developed using response surface analysis for surface roughness and the model was validated using published results from literature.Methodologies in optimization such as Simulated annealing (SA), Particle Swarm Optimization (PSO), Conventional Genetic Algorithm (CGA) and Improved Genetic Algorithm (IGA) are applied to optimize machining parameters while dry turning of SS420 material. All the above algorithms were tested for their efficiency, robustness and accuracy and observe how they often outperform conventional optimization method applied to difficult real world problems. The SA, PSO, CGA and IGA codes were developed using MATLAB. For each evolutionary algorithmic method, optimum cutting conditions are provided to achieve better surface finish.The computational results using SA clearly demonstrated that the proposed solution procedure is quite capable in solving such complicated problems effectively and efficiently. Particle Swarm Optimization (PSO) is a relatively recent heuristic search method whose mechanics are inspired by the swarming or collaborative behavior of biological populations. From the results it has been observed that PSO provides better results and also more computationally efficient.Based on the results obtained using CGA and IGA for the optimization of machining process, the proposed IGA provides better results than the conventional GA. The improved genetic algorithm incorporating a stochastic crossover technique and an artificial initial population scheme is developed to provide a faster search mechanism. Finally, a comparison among these algorithms were made for the specific example of dry turning of SS 420 material and arriving at optimum machining parameters of feed, cutting speed, depth of cut and tool nose radius for minimum surface roughness as the criterion. To summarize, the research work fills in conspicuous gaps between research prototypes and industry requirements, by simulating evolutionary procedures seen in nature that optimize its own systems.
Resumo:
Clustering schemes improve energy efficiency of wireless sensor networks. The inclusion of mobility as a new criterion for the cluster creation and maintenance adds new challenges for these clustering schemes. Cluster formation and cluster head selection is done on a stochastic basis for most of the algorithms. In this paper we introduce a cluster formation and routing algorithm based on a mobility factor. The proposed algorithm is compared with LEACH-M protocol based on metrics viz. number of cluster head transitions, average residual energy, number of alive nodes and number of messages lost
Resumo:
Cancer treatment is most effective when it is detected early and the progress in treatment will be closely related to the ability to reduce the proportion of misses in the cancer detection task. The effectiveness of algorithms for detecting cancers can be greatly increased if these algorithms work synergistically with those for characterizing normal mammograms. This research work combines computerized image analysis techniques and neural networks to separate out some fraction of the normal mammograms with extremely high reliability, based on normal tissue identification and removal. The presence of clustered microcalcifications is one of the most important and sometimes the only sign of cancer on a mammogram. 60% to 70% of non-palpable breast carcinoma demonstrates microcalcifications on mammograms [44], [45], [46].WT based techniques are applied on the remaining mammograms, those are obviously abnormal, to detect possible microcalcifications. The goal of this work is to improve the detection performance and throughput of screening-mammography, thus providing a ‘second opinion ‘ to the radiologists. The state-of- the- art DWT computation algorithms are not suitable for practical applications with memory and delay constraints, as it is not a block transfonn. Hence in this work, the development of a Block DWT (BDWT) computational structure having low processing memory requirement has also been taken up.
Resumo:
Fingerprint based authentication systems are one of the cost-effective biometric authentication techniques employed for personal identification. As the data base population increases, fast identification/recognition algorithms are required with high accuracy. Accuracy can be increased using multimodal evidences collected by multiple biometric traits. In this work, consecutive fingerprint images are taken, global singularities are located using directional field strength and their local orientation vector is formulated with respect to the base line of the finger. Feature level fusion is carried out and a 32 element feature template is obtained. A matching score is formulated for the identification and 100% accuracy was obtained for a database of 300 persons. The polygonal feature vector helps to reduce the size of the feature database from the present 70-100 minutiae features to just 32 features and also a lower matching threshold can be fixed compared to single finger based identification