13 resultados para 080109 Pattern Recognition and Data Mining
em Cochin University of Science
Resumo:
Data mining is one of the hottest research areas nowadays as it has got wide variety of applications in common man’s life to make the world a better place to live. It is all about finding interesting hidden patterns in a huge history data base. As an example, from a sales data base, one can find an interesting pattern like “people who buy magazines tend to buy news papers also” using data mining. Now in the sales point of view the advantage is that one can place these things together in the shop to increase sales. In this research work, data mining is effectively applied to a domain called placement chance prediction, since taking wise career decision is so crucial for anybody for sure. In India technical manpower analysis is carried out by an organization named National Technical Manpower Information System (NTMIS), established in 1983-84 by India's Ministry of Education & Culture. The NTMIS comprises of a lead centre in the IAMR, New Delhi, and 21 nodal centres located at different parts of the country. The Kerala State Nodal Centre is located at Cochin University of Science and Technology. In Nodal Centre, they collect placement information by sending postal questionnaire to passed out students on a regular basis. From this raw data available in the nodal centre, a history data base was prepared. Each record in this data base includes entrance rank ranges, reservation, Sector, Sex, and a particular engineering. From each such combination of attributes from the history data base of student records, corresponding placement chances is computed and stored in the history data base. From this data, various popular data mining models are built and tested. These models can be used to predict the most suitable branch for a particular new student with one of the above combination of criteria. Also a detailed performance comparison of the various data mining models is done.This research work proposes to use a combination of data mining models namely a hybrid stacking ensemble for better predictions. A strategy to predict the overall absorption rate for various branches as well as the time it takes for all the students of a particular branch to get placed etc are also proposed. Finally, this research work puts forward a new data mining algorithm namely C 4.5 * stat for numeric data sets which has been proved to have competent accuracy over standard benchmarking data sets called UCI data sets. It also proposes an optimization strategy called parameter tuning to improve the standard C 4.5 algorithm. As a summary this research work passes through all four dimensions for a typical data mining research work, namely application to a domain, development of classifier models, optimization and ensemble methods.
Resumo:
For years, choosing the right career by monitoring the trends and scope for different career paths have been a requirement for all youngsters all over the world. In this paper we provide a scientific, data mining based method for job absorption rate prediction and predicting the waiting time needed for 100% placement, for different engineering courses in India. This will help the students in India in a great deal in deciding the right discipline for them for a bright future. Information about passed out students are obtained from the NTMIS ( National technical manpower information system ) NODAL center in Kochi, India residing in Cochin University of science and technology
Resumo:
In the current study, epidemiology study is done by means of literature survey in groups identified to be at higher potential for DDIs as well as in other cases to explore patterns of DDIs and the factors affecting them. The structure of the FDA Adverse Event Reporting System (FAERS) database is studied and analyzed in detail to identify issues and challenges in data mining the drug-drug interactions. The necessary pre-processing algorithms are developed based on the analysis and the Apriori algorithm is modified to suit the process. Finally, the modules are integrated into a tool to identify DDIs. The results are compared using standard drug interaction database for validation. 31% of the associations obtained were identified to be new and the match with existing interactions was 69%. This match clearly indicates the validity of the methodology and its applicability to similar databases. Formulation of the results using the generic names expanded the relevance of the results to a global scale. The global applicability helps the health care professionals worldwide to observe caution during various stages of drug administration thus considerably enhancing pharmacovigilance
Resumo:
Handwritten character recognition is always a frontier area of research in the field of pattern recognition and image processing and there is a large demand for OCR on hand written documents. Even though, sufficient studies have performed in foreign scripts like Chinese, Japanese and Arabic characters, only a very few work can be traced for handwritten character recognition of Indian scripts especially for the South Indian scripts. This paper provides an overview of offline handwritten character recognition in South Indian Scripts, namely Malayalam, Tamil, Kannada and Telungu
Resumo:
Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining
Resumo:
This paper reports a novel region-based shape descriptor based on orthogonal Legendre moments. The preprocessing steps for invariance improvement of the proposed Improved Legendre Moment Descriptor (ILMD) are discussed. The performance of the ILMD is compared to the MPEG-7 approved region shape descriptor, angular radial transformation descriptor (ARTD), and the widely used Zernike moment descriptor (ZMD). Set B of the MPEG-7 CE-1 contour database and all the datasets of the MPEG-7 CE-2 region database were used for experimental validation. The average normalized modified retrieval rate (ANMRR) and precision- recall pair were employed for benchmarking the performance of the candidate descriptors. The ILMD has lower ANMRR values than ARTD for most of the datasets, and ARTD has a lower value compared to ZMD. This indicates that overall performance of the ILMD is better than that of ARTD and ZMD. This result is confirmed by the precision-recall test where ILMD was found to have better precision rates for most of the datasets tested. Besides retrieval accuracy, ILMD is more compact than ARTD and ZMD. The descriptor proposed is useful as a generic shape descriptor for content-based image retrieval (CBIR) applications
Resumo:
The basic objective of the present study has been to observe the process and pattern of employment diversification among the rural women workers in Ernakulam district. The evidences are that the women workers in the rural areas of the state are being increasingly diversified into the tertiary sector. The clear cut evidence for the fact that in Kerala non-agricultural employment of rural women is increasing with more and more of them getting diversified into the tertiary sector. The women get more self esteem and recognition in terms of the work being done by them. In the urban areas of the state as a poverty eradicating measure the Kerala government has already introduced a new scheme under the banner of Kudumbasree. Another fact noticed in the study that the sectoral shift of women workers has posed a grave problem to the agricultural sector. The reluctance of workers to do manual jobs on land and the prevalence of high wages among the agricultural labours has left many a cultivable area fallow or has induced farmers to shift to less labour –intensive crops. The situation is expected to worsen in future as even the high wages fail to attract the young generation to this sector. To conclude the study has fulfilled all its objectives, viz; highlighting the rural employment structure in Kerala, examining the process, pattern, determinants and consequences of diversification among rural women workers in the sample villages. Being the first of its kind at the micro level in the state it contributes to the available literature in the area enriching the database that is crucially lacking for devising projects at the village and block-level. There exists ample scope for future research of similar nature in an urban background where the secondary data-sources are hinding towards a reversal of trends from non-agriculture to agriculture.
Resumo:
A new procedure for the classification of lower case English language characters is presented in this work . The character image is binarised and the binary image is further grouped into sixteen smaller areas ,called Cells . Each cell is assigned a name depending upon the contour present in the cell and occupancy of the image contour in the cell. A data reduction procedure called Filtering is adopted to eliminate undesirable redundant information for reducing complexity during further processing steps . The filtered data is fed into a primitive extractor where extraction of primitives is done . Syntactic methods are employed for the classification of the character . A decision tree is used for the interaction of the various components in the scheme . 1ike the primitive extraction and character recognition. A character is recognized by the primitive by primitive construction of its description . Openended inventories are used for including variants of the characters and also adding new members to the general class . Computer implementation of the proposal is discussed at the end using handwritten character samples . Results are analyzed and suggestions for future studies are made. The advantages of the proposal are discussed in detail .
Resumo:
This thesis Entitled Environmental impact of Sand Mining :A case Study in the river catchments of vembanad lake southwest india.The entire study is addressed in nine chapters. Chapter l deals with the general introduction about rivers, problems of river sand mining, objectives, location of the study area and scope of the study. A detailed review on river classification, classic concepts in riverine studies, geological work of rivers and channel processes, importance of river ecosystems and its need for management are dealt in Chapter 2. Chapter 3 gives a comprehensive account of the study area - its location, administrative divisions, physiography, soil, geology, land use and living and non-living resources. The various methods adopted in the study are dealt in Chapter 4. Chapter 5 contains river characteristics like drainage, environmental and geologic setting, channel characteristics, river discharge and water quality of the study area. Chapter 6 gives an account of river sand mining (instream and floodplain mining) from the study area. The various environmental problems of river sand mining on the land adjoining the river banks, river channel, water, biotic and social / human environments of the area and data interpretation are presented in Chapter 7. Chapter 8 deals with the Environmental Impact Assessment (EIA) and Environmental Management Plan (EMP) of sand mining from the river catchments of Vembanad lake.
Resumo:
Image processing has been a challenging and multidisciplinary research area since decades with continuing improvements in its various branches especially Medical Imaging. The healthcare industry was very much benefited with the advances in Image Processing techniques for the efficient management of large volumes of clinical data. The popularity and growth of Image Processing field attracts researchers from many disciplines including Computer Science and Medical Science due to its applicability to the real world. In the meantime, Computer Science is becoming an important driving force for the further development of Medical Sciences. The objective of this study is to make use of the basic concepts in Medical Image Processing and develop methods and tools for clinicians’ assistance. This work is motivated from clinical applications of digital mammograms and placental sonograms, and uses real medical images for proposing a method intended to assist radiologists in the diagnostic process. The study consists of two domains of Pattern recognition, Classification and Content Based Retrieval. Mammogram images of breast cancer patients and placental images are used for this study. Cancer is a disaster to human race. The accuracy in characterizing images using simplified user friendly Computer Aided Diagnosis techniques helps radiologists in detecting cancers at an early stage. Breast cancer which accounts for the major cause of cancer death in women can be fully cured if detected at an early stage. Studies relating to placental characteristics and abnormalities are important in foetal monitoring. The diagnostic variability in sonographic examination of placenta can be overlooked by detailed placental texture analysis by focusing on placental grading. The work aims on early breast cancer detection and placental maturity analysis. This dissertation is a stepping stone in combing various application domains of healthcare and technology.
Resumo:
Speech is the most natural means of communication among human beings and speech processing and recognition are intensive areas of research for the last five decades. Since speech recognition is a pattern recognition problem, classification is an important part of any speech recognition system. In this work, a speech recognition system is developed for recognizing speaker independent spoken digits in Malayalam. Voice signals are sampled directly from the microphone. The proposed method is implemented for 1000 speakers uttering 10 digits each. Since the speech signals are affected by background noise, the signals are tuned by removing the noise from it using wavelet denoising method based on Soft Thresholding. Here, the features from the signals are extracted using Discrete Wavelet Transforms (DWT) because they are well suitable for processing non-stationary signals like speech. This is due to their multi- resolutional, multi-scale analysis characteristics. Speech recognition is a multiclass classification problem. So, the feature vector set obtained are classified using three classifiers namely, Artificial Neural Networks (ANN), Support Vector Machines (SVM) and Naive Bayes classifiers which are capable of handling multiclasses. During classification stage, the input feature vector data is trained using information relating to known patterns and then they are tested using the test data set. The performances of all these classifiers are evaluated based on recognition accuracy. All the three methods produced good recognition accuracy. DWT and ANN produced a recognition accuracy of 89%, SVM and DWT combination produced an accuracy of 86.6% and Naive Bayes and DWT combination produced an accuracy of 83.5%. ANN is found to be better among the three methods.