32 resultados para principle component analysis
em Indian Institute of Science - Bangalore - Índia
Resumo:
This paper presents a new application of two dimensional Principal Component Analysis (2DPCA) to the problem of online character recognition in Tamil Script. A novel set of features employing polynomial fits and quartiles in combination with conventional features are derived for each sample point of the Tamil character obtained after smoothing and resampling. These are stacked to form a matrix, using which a covariance matrix is constructed. A subset of the eigenvectors of the covariance matrix is employed to get the features in the reduced sub space. Each character is modeled as a separate subspace and a modified form of the Mahalanobis distance is derived to classify a given test character. Results indicate that the recognition accuracy using the 2DPCA scheme shows an approximate 3% improvement over the conventional PCA technique.
Resumo:
The transient changes in resistances of Cr0.8Fe0.2NbO4 thick film sensors towards specified concentrations of H-2, NH3, acetonitrile, acetone, alcohol, cyclohexane and petroleum gas at different operating temperatures were recorded. The analyte-specific characteristics such as slopes of the response and retrace curves, area under the curve and sensitivity deduced from the transient curve of the respective analyte gas have been used to construct a data matrix. Principal component analysis (PCA) was applied to this data and the score plot was obtained. Distinguishing one reducing gas from the other is demonstrated based on this approach, which otherwise is not possible by measuring relative changes in conductivity. This methodology is extended for three Cr0.8Fe0.2NbO4 thick film sensor array operated at different temperatures. (C) 2015 Elsevier B.V. All rights reserved.
Resumo:
Principal component analysis is applied to derive patterns of temporal variation of the rainfall at fifty-three stations in peninsular India. The location of the stations in the coordinate space determined by the amplitudes of the two leading eigenvectors is used to delineate them into eight clusters. The clusters obtained seem to be stable with respect to variations in the grid of stations used. Stations within any cluster occur in geographically contiguous areas.
Resumo:
Skew correction of complex document images is a difficult task. We propose an edge-based connected component approach for robust skew correction of documents with complex layout and content. The algorithm essentially consists of two steps - an 'initialization' step to determine the image orientation from the centroids of the connected components and a 'search' step to find the actual skew of the image. During initialization, we choose two different sets of points regularly spaced across the the image, one from the left to right and the other from top to bottom. The image orientation is determined from the slope between the two succesive nearest neighbors of each of the points in the chosen set. The search step finds succesive nearest neighbors that satisfy the parameters obtained in the initialization step. The final skew is determined from the slopes obtained in the 'search' step. Unlike other connected component based methods, the proposed method does not require any binarization step that generally precedes connected component analysis. The method works well for scanned documents with complex layout of any skew with a precision of 0.5 degrees.
Resumo:
Image fusion is a formal framework which is expressed as means and tools for the alliance of multisensor, multitemporal, and multiresolution data. Multisource data vary in spectral, spatial and temporal resolutions necessitating advanced analytical or numerical techniques for enhanced interpretation capabilities. This paper reviews seven pixel based image fusion techniques - intensity-hue-saturation, brovey, high pass filter (HPF), high pass modulation (HPM), principal component analysis, fourier transform and correspondence analysis.Validation of these techniques on IKONOS data (Panchromatic band at I m spatial resolution and Multispectral 4 bands at 4 in spatial resolution) reveal that HPF and HPM methods synthesises the images closest to those the corresponding multisensors would observe at the high resolution level.
Resumo:
Angiogenin is a protein belonging to the superfamily of RNase A. The RNase activity of this protein is essential for its angiogenic activity. Although members of the RNase A family carry out RNase activity, they differ markedly in their strength and specificity. In this paper, we address the problem of higher specificity of angiogenin towards cytosine against uracil in the first base binding position. We have carried out extensive nano-second level molecular dynamics(MD) computer simulations on the native bovine angiogenin and on the CMP and UMP complexes of this protein in aqueous medium with explicit molecular solvent. The structures thus generated were subjected to a rigorous free energy component analysis to arrive at a plausible molecular thermodynamic explanation for the substrate specificity of angiogenin.
Resumo:
Urbanisation is a dynamic complex phenomenon involving large scale changes in the land uses at local levels. Analyses of changes in land uses in urban environments provide a historical perspective of land use and give an opportunity to assess the spatial patterns, correlation, trends, rate and impacts of the change, which would help in better regional planning and good governance of the region. Main objective of this research is to quantify the urban dynamics using temporal remote sensing data with the help of well-established landscape metrics. Bangalore being one of the rapidly urbanising landscapes in India has been chosen for this investigation. Complex process of urban sprawl was modelled using spatio temporal analysis. Land use analyses show 584% growth in built-up area during the last four decades with the decline of vegetation by 66% and water bodies by 74%. Analyses of the temporal data reveals an increase in urban built up area of 342.83% (during 1973-1992), 129.56% (during 1992-1999), 106.7% (1999-2002), 114.51% (2002-2006) and 126.19% from 2006 to 2010. The Study area was divided into four zones and each zone is further divided into 17 concentric circles of 1 km incrementing radius to understand the patterns and extent of the urbanisation at local levels. The urban density gradient illustrates radial pattern of urbanisation for the period 1973-2010. Bangalore grew radially from 1973 to 2010 indicating that the urbanisation is intensifying from the central core and has reached the periphery of the Greater Bangalore. Shannon's entropy, alpha and beta population densities were computed to understand the level of urbanisation at local levels. Shannon's entropy values of recent time confirms dispersed haphazard urban growth in the city, particularly in the outskirts of the city. This also illustrates the extent of influence of drivers of urbanisation in various directions. Landscape metrics provided in depth knowledge about the sprawl. Principal component analysis helped in prioritizing the metrics for detailed analyses. The results clearly indicates that whole landscape is aggregating to a large patch in 2010 as compared to earlier years which was dominated by several small patches. The large scale conversion of small patches to large single patch can be seen from 2006 to 2010. In the year 2010 patches are maximally aggregated indicating that the city is becoming more compact and more urbanised in recent years. Bangalore was the most sought after destination for its climatic condition and the availability of various facilities (land availability, economy, political factors) compared to other cities. The growth into a single urban patch can be attributed to rapid urbanisation coupled with the industrialisation. Monitoring of growth through landscape metrics helps to maintain and manage the natural resources. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Latent variable methods, such as PLCA (Probabilistic Latent Component Analysis) have been successfully used for analysis of non-negative signal representations. In this paper, we formulate PLCS (Probabilistic Latent Component Segmentation), which models each time frame of a spectrogram as a spectral distribution. Given the signal spectrogram, the segmentation boundaries are estimated using a maximum-likelihood approach. For an efficient solution, the algorithm imposes a hard constraint that each segment is modelled by a single latent component. The hard constraint facilitates the solution of ML boundary estimation using dynamic programming. The PLCS framework does not impose a parametric assumption unlike earlier ML segmentation techniques. PLCS can be naturally extended to model coarticulation between successive phones. Experiments on the TIMIT corpus show that the proposed technique is promising compared to most state of the art speech segmentation algorithms.
Resumo:
The practice of Ayurveda, the traditional medicine of India, is based on the concept of three major constitutional types (Vata, Pitta and Kapha) defined as ``Prakriti''. To the best of our knowledge, no study has convincingly correlated genomic variations with the classification of Prakriti. In the present study, we performed genome-wide SNP (single nucleotide polymorphism) analysis (Affymetrix, 6.0) of 262 well-classified male individuals (after screening 3416 subjects) belonging to three Prakritis. We found 52 SNPs (p <= 1 x 10(-5)) were significantly different between Prakritis, without any confounding effect of stratification, after 10(6) permutations. Principal component analysis (PCA) of these SNPs classified 262 individuals into their respective groups (Vata, Pitta and Kapha) irrespective of their ancestry, which represent its power in categorization. We further validated our finding with 297 Indian population samples with known ancestry. Subsequently, we found that PGM1 correlates with phenotype of Pitta as described in the ancient text of Caraka Samhita, suggesting that the phenotypic classification of India's traditional medicine has a genetic basis; and its Prakriti-based practice in vogue for many centuries resonates with personalized medicine.
Resumo:
The development of techniques for scaling up classifiers so that they can be applied to problems with large datasets of training examples is one of the objectives of data mining. Recently, AdaBoost has become popular among machine learning community thanks to its promising results across a variety of applications. However, training AdaBoost on large datasets is a major problem, especially when the dimensionality of the data is very high. This paper discusses the effect of high dimensionality on the training process of AdaBoost. Two preprocessing options to reduce dimensionality, namely the principal component analysis and random projection are briefly examined. Random projection subject to a probabilistic length preserving transformation is explored further as a computationally light preprocessing step. The experimental results obtained demonstrate the effectiveness of the proposed training process for handling high dimensional large datasets.
Resumo:
Ninety-two strong-motion earthquake records from the California region, U.S.A., have been statistically studied using principal component analysis in terms of twelve important standardized strong-motion characteristics. The first two principal components account for about 57 per cent of the total variance. Based on these two components the earthquake records are classified into nine groups in a two-dimensional principal component plane. Also a unidimensional engineering rating scale is proposed. The procedure can be used as an objective approach for classifying and rating future earthquakes.
Resumo:
Nanotechnology is a new technology which is generating a lot of interest among academicians, practitioners and scientists. Critical research is being carried out in this area all over the world.Governments are creating policy initiatives to promote developments it the nanoscale science and technology developments. Private investment is also seeing a rising trend. Large number of academic institutions and national laboratories has set up research centers that are workingon the multiple applications of nanotechnology. Wide ranges of applications are claimed for nanotechnology. This consists of materials, chemicals, textiles, semiconductors, to wonder drug delivery systems and diagnostics. Nanotechnology is considered to be a next big wave of technology after information technology and biotechnology. In fact, nanotechnology holds the promise of advances that exceed those achieved in recent decades in computers and biotechnology. Much interest in nanotechnology also could be because of the fact that enormous monetary benefits are expected from nanotechnology based products. According to NSF, revenues from nanotechnology could touch $ 1 trillion by 2015. However much of the benefits are projected ones. Realizing claimed benefits require successful development of nanoscience andv nanotechnology research efforts. That is the journey of invention to innovation has to be completed. For this to happen the technology has to flow from laboratory to market. Nanoscience and nanotechnology research efforts have to come out in the form of new products, new processes, and new platforms.India has also started its Nanoscience and Nanotechnology development program in under its 10(th) Five Year Plan and funds worth Rs. One billion have been allocated for Nanoscience and Nanotechnology Research and Development. The aim of the paper is to assess Nanoscience and Nanotechnology initiatives in India. We propose a conceptual model derived from theresource based view of the innovation. We have developed a structured questionnaire to measure the constructs in the conceptual model. Responses have been collected from 115 scientists and engineers working in the field of Nanoscience and Nanotechnology. The responses have been analyzed further by using Principal Component Analysis, Cluster Analysis and Regression Analysis.
Resumo:
While plants of a single species emit a diversity of volatile organic compounds (VOCs) to attract or repel interacting organisms, these specific messages may be lost in the midst of the hundreds of VOCs produced by sympatric plants of different species, many of which may have no signal content. Receivers must be able to reduce the babel or noise in these VOCs in order to correctly identify the message. For chemical ecologists faced with vast amounts of data on volatile signatures of plants in different ecological contexts, it is imperative to employ accurate methods of classifying messages, so that suitable bioassays may then be designed to understand message content. We demonstrate the utility of `Random Forests' (RF), a machine-learning algorithm, for the task of classifying volatile signatures and choosing the minimum set of volatiles for accurate discrimination, using datam from sympatric Ficus species as a case study. We demonstrate the advantages of RF over conventional classification methods such as principal component analysis (PCA), as well as data-mining algorithms such as support vector machines (SVM), diagonal linear discriminant analysis (DLDA) and k-nearest neighbour (KNN) analysis. We show why a tree-building method such as RF, which is increasingly being used by the bioinformatics, food technology and medical community, is particularly advantageous for the study of plant communication using volatiles, dealing, as it must, with abundant noise.
Resumo:
Community diversity and the population abundance of a particular group of species are controlled by immediate environment, inter-and intra-species interactions, landscape conditions, historical events and evolutionary processes. Nestedness is a measure of order in an ecological system, referring to the order in which the number of species is related to area or other factors. In this study we have studied the nestedness pattern in stream diatom assemblages in 24 stream sites of central Western Ghats, and report 98 taxa from the streams of central Western Ghats region. The communities show highly significant nested pattern. The Mantel test of matrix revealed a strong relationship between species assemblages and environmental conditions at the sites. A significant relationship between species assemblage and environmental condition was observed. Principal component analysis (PCA) indicated that environmental conditions differed markedly across the sampling sites, with the first three components explaining 78% of variance. Species composition of diatoms is significantly correlated with environmental distance across geographical extent. The current pattern suggests that micro-environment at regional levels influences the species composition of epilithic diatoms in streams. The nestedness shown by the diatom community was highly significant, even though it had a high proportion of idiosyncratic species, characterized with high numbers of cosmopolitan species, whereas the nested species were dominated by endemic species. PCA identifies ionic parameters and nutrients as the major features which determine the characteristics of the sampling sites. Hence the local water quality parameters are the major factors in deciding the diatom species assemblages.
Resumo:
The basic characteristic of a chaotic system is its sensitivity to the infinitesimal changes in its initial conditions. A limit to predictability in chaotic system arises mainly due to this sensitivity and also due to the ineffectiveness of the model to reveal the underlying dynamics of the system. In the present study, an attempt is made to quantify these uncertainties involved and thereby improve the predictability by adopting a multivariate nonlinear ensemble prediction. Daily rainfall data of Malaprabha basin, India for the period 1955-2000 is used for the study. It is found to exhibit a low dimensional chaotic nature with the dimension varying from 5 to 7. A multivariate phase space is generated, considering a climate data set of 16 variables. The chaotic nature of each of these variables is confirmed using false nearest neighbor method. The redundancy, if any, of this atmospheric data set is further removed by employing principal component analysis (PCA) method and thereby reducing it to eight principal components (PCs). This multivariate series (rainfall along with eight PCs) is found to exhibit a low dimensional chaotic nature with dimension 10. Nonlinear prediction employing local approximation method is done using univariate series (rainfall alone) and multivariate series for different combinations of embedding dimensions and delay times. The uncertainty in initial conditions is thus addressed by reconstructing the phase space using different combinations of parameters. The ensembles generated from multivariate predictions are found to be better than those from univariate predictions. The uncertainty in predictions is decreased or in other words predictability is increased by adopting multivariate nonlinear ensemble prediction. The restriction on predictability of a chaotic series can thus be altered by quantifying the uncertainty in the initial conditions and also by including other possible variables, which may influence the system. (C) 2011 Elsevier B.V. All rights reserved.