32 resultados para COMPONENT ANALYSIS
Resumo:
Feature extraction in bilingual OCR is handicapped by the increase in the number of classes or characters to be handled. This is evident in the case of Indian languages whose alphabet set is large. It is expected that the complexity of the feature extraction process increases with the number of classes. Though the determination of the best set of features that could be used cannot be ascertained through any quantitative measures, the characteristics of the scripts can help decide on the feature extraction procedure. This paper describes a hierarchical feature extraction scheme for recognition of printed bilingual (Tamil and Roman) text. The scheme divides the combined alphabet set of both the scripts into subsets by the extraction of certain spatial and structural features. Three features viz geometric moments, DCT based features and Wavelet transform based features are extracted from the grouped symbols and a linear transformation is performed on them for the purpose of efficient representation in the feature space. The transformation is obtained by the maximization of certain criterion functions. Three techniques : Principal component analysis, maximization of Fisher's ratio and maximization of divergence measure have been employed to estimate the transformation matrix. It has been observed that the proposed hierarchical scheme allows for easier handling of the alphabets and there is an appreciable rise in the recognition accuracy as a result of the transformations.
Resumo:
Sandalwood is an economically important aromatic tree belonging to the family Santalaceae. The trees are used mainly for their fragrant heartwood and oil that have immense potential for foreign exchange. Very little information is available on the genetic diversity in this species. Hence studies were initiated and genetic diversity estimated using RAPD markers in 51 genotypes of Santalum album procured from different geographcial regions of India and three exotic lines of S. spicatum from Australia. Eleven selected Operon primers (10mer) generated a total of 156 consistent and unambiguous amplification products ranging from 200bp to 4kb. Rare and genotype specific bands were identified which could be effectively used to distinguish the genotypes. Genetic relationships within the genotypes were evaluated by generating a dissimilarity matrix based on Ward's method (Squared Euclidean distance). The phenetic dendrogram and the Principal Component Analysis generated, separated the 51 Indian genotypes from the three Australian lines. The cluster analysis indicated that sandalwood germplasm within India constitutes a broad genetic base with values of genetic dissimilarity ranging from 15 to 91 %. A core collection of 21 selected individuals revealed the same diversity of the entire population. The results show that RAPD analysis is an efficient marker technology for estimating genetic diversity and relatedness, thereby enabling the formulation of appropriate strategies for conservation, germplasm management, and selection of diverse parents for sandalwood improvement programmes.
Resumo:
Land-use changes influence local biodiversity directly, and also cumulatively, contribute to regional and global changes in natural systems and quality of life. Consequent to these, direct impacts on the natural resources that support the health and integrity of living beings are evident in recent times. The Western Ghats being one of the global biodiversity hotspots, is reeling under a tremendous pressure from human induced changes in terms of developmental projects like hydel or thermal power plants, big dams, mining activities, unplanned agricultural practices,monoculture plantations, illegal timber logging, etc. This has led to the once contiguous forest habitats to be fragmented in patches, which in turn has led to the shrinkage of original habitat for the wildlife, change in the hydrological regime of the catchment, decreased inflow in streams,human-animal conflicts, etc. Under such circumstances, a proper management practice is called for requiring suitable biological indicators to show the impact of these changes, set priority regions and in developing models for conservation planning. Amphibians are regarded as one of the best biological indicators due to their sensitivity to even the slightest changes in the environment and hence they could be used as surrogates in conservation and management practices. They are the predominating vertebrates with a high degree of endemism (78%) in Western Ghats. The present study is an attempt to bring in the impacts of various land-uses on anuran distribution in three river basins. Sampling was carried out for amphibians during all seasons of 2003-2006 in basins of Sharavathi, Aghanashini and Bedthi. There are as many as 46 species in the region, one of which is new to science and nearly 59% of them are endemic to the Western Ghats. They belong to nine families, Dicroglossidae being represented by 14 species,followed by Rhacophoridae (9 species) and Ranidae (5 species). Species richness is high in Sharavathi river basin, with 36 species, followed by Bedthi 33 and Aghanashini 27. The impact of land-use changes, was investigated in the upper catchment of Sharavathi river basin. Species diversity indices, relative abundance values, percentage endemics gave clear indication of differences in each sub-catchment. Karl Pearson’s correlation coefficient (r) was calculated between species richness, endemics, environmental descriptors, land-use classes and fragmentation metrics. Principal component analysis was performed to depict the influence of these variables. Results show that sub-catchments with lesser percentage of forest, low canopy cover, higher amount of agricultural area, low rainfall have low species richness, less endemic species and abundant non-endemic species, whereas endemism, species richness and abundance of endemic species are more in the sub-catchments with high tree density, endemic trees, canopy cover, rainfall and lower amount of agriculture fields. This analysis aided in prioritising regions in the Sharavathi river basin for further conservation measures.
Resumo:
Climate change vulnerability profiles are developed at the district level for agriculture, water and forest sectors for the North East region of India for the current and projected future climates. An index-based approach was used where a set of indicators that represent key sectors of vulnerability (agriculture, forest, water) is selected using the statistical technique principal component analysis. The impacts of climate change on key sectors as represented by the changes in the indicators were derived from impact assessment models. These impacted indicators were utilized for the calculation of the future vulnerability to climate change. Results indicate that majority of the districts in North East India are subject to climate induced vulnerability currently and in the near future. This is a first of its kind study that exhibits ranking of districts of North East India on the basis of the vulnerability index values. The objective of such ranking is to assist in: (i) identifying and prioritizing the most vulnerable sectors and districts; (ii) identifying adaptation interventions, and (iii) mainstreaming adaptation in development programmes.
Resumo:
The problem of on-line recognition and retrieval of relatively weak industrial signals such as partial discharges (PD), buried in excessive noise, has been addressed in this paper. The major bottleneck being the recognition and suppression of stochastic pulsive interference (PI) due to the overlapping broad band frequency spectrum of PI and PD pulses. Therefore, on-line, onsite, PD measurement is hardly possible in conventional frequency based DSP techniques. The observed PD signal is modeled as a linear combination of systematic and random components employing probabilistic principal component analysis (PPCA) and the pdf of the underlying stochastic process is obtained. The PD/PI pulses are assumed as the mean of the process and modeled instituting non-parametric methods, based on smooth FIR filters, and a maximum aposteriori probability (MAP) procedure employed therein, to estimate the filter coefficients. The classification of the pulses is undertaken using a simple PCA classifier. The methods proposed by the authors were found to be effective in automatic retrieval of PD pulses completely rejecting PI.
Resumo:
Land cover (LC) and land use (LU) dynamics induced by human and natural processes play a major role in global as well as regional patterns of landscapes influencing biodiversity, hydrology, ecology and climate. Changes in LC features resulting in forest fragmentations have posed direct threats to biodiversity, endangering the sustainability of ecological goods and services. Habitat fragmentation is of added concern as the residual spatial patterns mitigate or exacerbate edge effects. LU dynamics are obtained by classifying temporal remotely sensed satellite imagery of different spatial and spectral resolutions. This paper reviews five different image classification algorithms using spatio-temporal data of a temperate watershed in Himachal Pradesh, India. Gaussian Maximum Likelihood classifier was found to be apt for analysing spatial pattern at regional scale based on accuracy assessment through error matrix and ROC (receiver operating characteristic) curves. The LU information thus derived was then used to assess spatial changes from temporal data using principal component analysis and correspondence analysis based image differencing. The forest area dynamics was further studied by analysing the different types of fragmentation through forest fragmentation models. The computed forest fragmentation and landscape metrics show a decline of interior intact forests with a substantial increase in patch forest during 1972-2007.
Resumo:
We address the problem of recognition and retrieval of relatively weak industrial signal such as Partial Discharges (PD) buried in excessive noise. The major bottleneck being the recognition and suppression of stochastic pulsive interference (PI) which has similar time-frequency characteristics as PD pulse. Therefore conventional frequency based DSP techniques are not useful in retrieving PD pulses. We employ statistical signal modeling based on combination of long-memory process and probabilistic principal component analysis (PPCA). An parametric analysis of the signal is exercised for extracting the features of desired pules. We incorporate a wavelet based bootstrap method for obtaining the noise training vectors from observed data. The procedure adopted in this work is completely different from the research work reported in the literature, which is generally based on deserved signal frequency and noise frequency.
Resumo:
Detecting and quantifying the presence of human-induced climate change in regional hydrology is important for studying the impacts of such changes on the water resources systems as well as for reliable future projections and policy making for adaptation. In this article a formal fingerprint-based detection and attribution analysis has been attempted to study the changes in the observed monsoon precipitation and streamflow in the rain-fed Mahanadi River Basin in India, considering the variability across different climate models. This is achieved through the use of observations, several climate model runs, a principal component analysis and regression based statistical downscaling technique, and a Genetic Programming based rainfall-runoff model. It is found that the decreases in observed hydrological variables across the second half of the 20th century lie outside the range that is expected from natural internal variability of climate alone at 95% statistical confidence level, for most of the climate models considered. For several climate models, such changes are consistent with those expected from anthropogenic emissions of greenhouse gases. However, unequivocal attribution to human-induced climate change cannot be claimed across all the climate models and uncertainties in our detection procedure, arising out of various sources including the use of models, cannot be ruled out. Changes in solar irradiance and volcanic activities are considered as other plausible natural external causes of climate change. Time evolution of the anthropogenic climate change ``signal'' in the hydrological observations, above the natural internal climate variability ``noise'' shows that the detection of the signal is achieved earlier in streamflow as compared to precipitation for most of the climate models, suggesting larger impacts of human-induced climate change on streamflow than precipitation at the river basin scale.
Resumo:
This paper presents a new hierarchical clustering algorithm for crop stage classification using hyperspectral satellite image. Amongst the multiple benefits and uses of remote sensing, one of the important application is to solve the problem of crop stage classification. Modern commercial imaging satellites, owing to their large volume of satellite imagery, offer greater opportunities for automated image analysis. Hence, we propose a unsupervised algorithm namely Hierarchical Artificial Immune System (HAIS) of two steps: splitting the cluster centers and merging them. The high dimensionality of the data has been reduced with the help of Principal Component Analysis (PCA). The classification results have been compared with K-means and Artificial Immune System algorithms. From the results obtained, we conclude that the proposed hierarchical clustering algorithm is accurate.
Resumo:
Background: Recent research on glioblastoma (GBM) has focused on deducing gene signatures predicting prognosis. The present study evaluated the mRNA expression of selected genes and correlated with outcome to arrive at a prognostic gene signature. Methods: Patients with GBM (n = 123) were prospectively recruited, treated with a uniform protocol and followed up. Expression of 175 genes in GBM tissue was determined using qRT-PCR. A supervised principal component analysis followed by derivation of gene signature was performed. Independent validation of the signature was done using TCGA data. Gene Ontology and KEGG pathway analysis was carried out among patients from TCGA cohort. Results: A 14 gene signature was identified that predicted outcome in GBM. A weighted gene (WG) score was found to be an independent predictor of survival in multivariate analysis in the present cohort (HR = 2.507; B = 0.919; p < 0.001) and in TCGA cohort. Risk stratification by standardized WG score classified patients into low and high risk predicting survival both in our cohort (p = <0.001) and TCGA cohort (p = 0.001). Pathway analysis using the most differentially regulated genes (n = 76) between the low and high risk groups revealed association of activated inflammatory/immune response pathways and mesenchymal subtype in the high risk group. Conclusion: We have identified a 14 gene expression signature that can predict survival in GBM patients. A network analysis revealed activation of inflammatory response pathway specifically in high risk group. These findings may have implications in understanding of gliomagenesis, development of targeted therapies and selection of high risk cancer patients for alternate adjuvant therapies.
Resumo:
The presence of a large number of spectral bands in the hyperspectral images increases the capability to distinguish between various physical structures. However, they suffer from the high dimensionality of the data. Hence, the processing of hyperspectral images is applied in two stages: dimensionality reduction and unsupervised classification techniques. The high dimensionality of the data has been reduced with the help of Principal Component Analysis (PCA). The selected dimensions are classified using Niche Hierarchical Artificial Immune System (NHAIS). The NHAIS combines the splitting method to search for the optimal cluster centers using niching procedure and the merging method is used to group the data points based on majority voting. Results are presented for two hyperspectral images namely EO-1 Hyperion image and Indian pines image. A performance comparison of this proposed hierarchical clustering algorithm with the earlier three unsupervised algorithms is presented. From the results obtained, we deduce that the NHAIS is efficient.
Resumo:
This study aimed to assess soil nutrient status and heavy metal content and their impact on the predominant soil bacterial communities of mangroves of the Mahanadi Delta. Mangrove soil of the Mahanadi Delta is slightly acidic and the levels of soil nutrients such as carbon, nitrogen, phosphorous and potash vary with season and site. The seasonal average concentrations (g/g) of various heavy metals were in the range: 14810-63370 (Fe), 2.8-32.6 (Cu), 13.4-55.7 (Ni), 1.8-7.9 (Cd), 16.6-54.7 (Pb), 24.4-132.5 (Zn) and 13.3-48.2 (Co). Among the different heavy metals analysed, Co, Cu and Cd were above their permissible limits, as prescribed by Indian Standards (Co=17g/g, Cu=30 g/g, Cd=3-6 g/g), indicating pollution in the mangrove soil. A viable plate count revealed the presence of different groups of bacteria in the mangrove soil, i.e. heterotrophs, free-living N-2 fixers, nitrifyers, denitrifyers, phosphate solubilisers, cellulose degraders and sulfur oxidisers. Principal component analysis performed using multivariate statistical methods showed a positive relationship between soil nutrients and microbial load. Whereas metal content such as Cu, Co and Ni showed a negative impact on some of the studied soil bacteria.
Resumo:
The objective in this work is to develop downscaling methodologies to obtain a long time record of inundation extent at high spatial resolution based on the existing low spatial resolution results of the Global Inundation Extent from Multi-Satellites (GIEMS) dataset. In semiarid regions, high-spatial-resolution a priori information can be provided by visible and infrared observations from the Moderate Resolution Imaging Spectroradiometer (MODIS). The study concentrates on the Inner Niger Delta where MODIS-derived inundation extent has been estimated at a 500-m resolution. The space-time variability is first analyzed using a principal component analysis (PCA). This is particularly effective to understand the inundation variability, interpolate in time, or fill in missing values. Two innovative methods are developed (linear regression and matrix inversion) both based on the PCA representation. These GIEMS downscaling techniques have been calibrated using the 500-m MODIS data. The downscaled fields show the expected space-time behaviors from MODIS. A 20-yr dataset of the inundation extent at 500 m is derived from this analysis for the Inner Niger Delta. The methods are very general and may be applied to many basins and to other variables than inundation, provided enough a priori high-spatial-resolution information is available. The derived high-spatial-resolution dataset will be used in the framework of the Surface Water Ocean Topography (SWOT) mission to develop and test the instrument simulator as well as to select the calibration validation sites (with high space-time inundation variability). In addition, once SWOT observations are available, the downscaled methodology will be calibrated on them in order to downscale the GIEMS datasets and to extend the SWOT benefits back in time to 1993.
Resumo:
Rice landraces are lineages developed by farmers through artificial selection during the long-term domestication process. Despite huge potential for crop improvement, they are largely understudied in India. Here, we analyse a suite of phenotypic characters from large numbers of Indian landraces comprised of both aromatic and non-aromatic varieties. Our primary aim was to investigate the major determinants of diversity, the strength of segregation among aromatic and non-aromatic landraces as well as that within aromatic landraces. Using principal component analysis, we found that grain length, width and weight, panicle weight and leaf length have the most substantial contribution. Discriminant analysis can effectively distinguish the majority of aromatic from non-aromatic landraces. More interestingly, within aromatic landraces long-grain traditional Basmati and short-grain non-Basmati aromatics remain morphologically well differentiated. The present research emphasizes the general patterns of phenotypic diversity and finds out the most important characters. It also confirms the existence of very unique short-grain aromatic landraces, perhaps carrying signatures of independent origin of an additional aroma quantitative trait locus in the indica group, unlike introgression of specific alleles of the BADH2 gene from the japonica group as in Basmati. We presume that this parallel origin and evolution of aroma in short-grain indica landraces are linked to the long history of rice domestication that involved inheritance of several traits from Oryza nivara, in addition to O. rufipogon. We conclude with a note that the insights from the phenotypic analysis essentially comprise the first part, which will likely be validated with subsequent molecular analysis.
Resumo:
We consider two variants of the classical gossip algorithm. The first variant is a version of asynchronous stochastic approximation. We highlight a fundamental difficulty associated with the classical asynchronous gossip scheme, viz., that it may not converge to a desired average, and suggest an alternative scheme based on reinforcement learning that has guaranteed convergence to the desired average. We then discuss a potential application to a wireless network setting with simultaneous link activation constraints. The second variant is a gossip algorithm for distributed computation of the Perron-Frobenius eigenvector of a nonnegative matrix. While the first variant draws upon a reinforcement learning algorithm for an average cost controlled Markov decision problem, the second variant draws upon a reinforcement learning algorithm for risk-sensitive control. We then discuss potential applications of the second variant to ranking schemes, reputation networks, and principal component analysis.