989 resultados para Mining land-registry
Resumo:
This paper presents hierarchical clustering algorithms for land cover mapping problem using multi-spectral satellite images. In unsupervised techniques, the automatic generation of number of clusters and its centers for a huge database is not exploited to their full potential. Hence, a hierarchical clustering algorithm that uses splitting and merging techniques is proposed. Initially, the splitting method is used to search for the best possible number of clusters and its centers using Mean Shift Clustering (MSC), Niche Particle Swarm Optimization (NPSO) and Glowworm Swarm Optimization (GSO). Using these clusters and its centers, the merging method is used to group the data points based on a parametric method (k-means algorithm). A performance comparison of the proposed hierarchical clustering algorithms (MSC, NPSO and GSO) is presented using two typical multi-spectral satellite images - Landsat 7 thematic mapper and QuickBird. From the results obtained, we conclude that the proposed GSO based hierarchical clustering algorithm is more accurate and robust.
Resumo:
This article compares the land use in solar energy technologies with conventional energy sources. This has been done by introducing two parameters called land transformation and land occupation. It has been shown that the land area transformed by solar energy power generation is small compared to hydroelectric power generation, and is comparable with coal and nuclear energy power generation when life-cycle transformations are considered. We estimate that 0.97% of total land area or 3.1% of the total uncultivable land area of India would be required to generate 3400 TWh/yr from solar energy power systems in conjunction with other renewable energy sources.
Resumo:
A recent modelling study has shown that precipitation and runoff over land would increase when the reflectivity of marine clouds is increased to counter global warming. This implies that large scale albedo enhancement over land could lead to a decrease in runoff over land. In this study, we perform simulations using NCAR CAM3.1 that have implications for Solar Radiation Management geoengineering schemes that increase the albedo over land. We find that an increase in reflectivity over land that mitigates the global mean warming from a doubling of CO2 leads to a large residual warming in the southern hemisphere and cooling in the northern hemisphere since most of the land is located in northern hemisphere. Precipitation and runoff over land decrease by 13.4 and 22.3%, respectively, because of a large residual sinking motion over land triggered by albedo enhancement over land. Soil water content also declines when albedo over land is enhanced. The simulated magnitude of hydrological changes over land are much larger when compared to changes over oceans in the recent marine cloud albedo enhancement study since the radiative forcing over land needed (-8.2 W m(-2)) to counter global mean radiative forcing from a doubling of CO2 (3.3 W m(-2)) is approximately twice the forcing needed over the oceans (-4.2 W m(-2)). Our results imply that albedo enhancement over oceans produce climates closer to the unperturbed climate state than do albedo changes on land when the consequences on land hydrology are considered. Our study also has important implications for any intentional or unintentional large scale changes in land surface albedo such as deforestation/afforestation/reforestation, air pollution, and desert and urban albedo modification.
Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences
Resumo:
Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.
Resumo:
Song-selection and mood are interdependent. If we capture a song’s sentiment, we can determine the mood of the listener, which can serve as a basis for recommendation systems. Songs are generally classified according to genres, which don’t entirely reflect sentiments. Thus, we require an unsupervised scheme to mine them. Sentiments are classified into either two (positive/negative) or multiple (happy/angry/sad/...) classes, depending on the application. We are interested in analyzing the feelings invoked by a song, involving multi-class sentiments. To mine the hidden sentimental structure behind a song, in terms of “topics”, we consider its lyrics and use Latent Dirichlet Allocation (LDA). Each song is a mixture of moods. Topics mined by LDA can represent moods. Thus we get a scheme of collecting similar-mood songs. For validation, we use a dataset of songs containing 6 moods annotated by users of a particular website.
Resumo:
This paper presents an improved hierarchical clustering algorithm for land cover mapping problem using quasi-random distribution. Initially, Niche Particle Swarm Optimization (NPSO) with pseudo/quasi-random distribution is used for splitting the data into number of cluster centers by satisfying Bayesian Information Criteria (BIC). Themain objective is to search and locate the best possible number of cluster and its centers. NPSO which highly depends on the initial distribution of particles in search space is not been exploited to its full potential. In this study, we have compared more uniformly distributed quasi-random with pseudo-random distribution with NPSO for splitting data set. Here to generate quasi-random distribution, Faure method has been used. Performance of previously proposed methods namely K-means, Mean Shift Clustering (MSC) and NPSO with pseudo-random is compared with the proposed approach - NPSO with quasi distribution(Faure). These algorithms are used on synthetic data set and multi-spectral satellite image (Landsat 7 thematic mapper). From the result obtained we conclude that use of quasi-random sequence with NPSO for hierarchical clustering algorithm results in a more accurate data classification.
Resumo:
We address the problem of mining targeted association rules over multidimensional market-basket data. Here, each transaction has, in addition to the set of purchased items, ancillary dimension attributes associated with it. Based on these dimensions, transactions can be visualized as distributed over cells of an n-dimensional cube. In this framework, a targeted association rule is of the form {X -> Y} R, where R is a convex region in the cube and X. Y is a traditional association rule within region R. We first describe the TOARM algorithm, based on classical techniques, for identifying targeted association rules. Then, we discuss the concepts of bottom-up aggregation and cubing, leading to the CellUnion technique. This approach is further extended, using notions of cube-count interleaving and credit-based pruning, to derive the IceCube algorithm. Our experiments demonstrate that IceCube consistently provides the best execution time performance, especially for large and complex data cubes.
Resumo:
The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.
Resumo:
This paper primarily intends to develop a GIS (geographical information system)-based data mining approach for optimally selecting the locations and determining installed capacities for setting up distributed biomass power generation systems in the context of decentralized energy planning for rural regions. The optimal locations within a cluster of villages are obtained by matching the installed capacity needed with the demand for power, minimizing the cost of transportation of biomass from dispersed sources to power generation system, and cost of distribution of electricity from the power generation system to demand centers or villages. The methodology was validated by using it for developing an optimal plan for implementing distributed biomass-based power systems for meeting the rural electricity needs of Tumkur district in India consisting of 2700 villages. The approach uses a k-medoid clustering algorithm to divide the total region into clusters of villages and locate biomass power generation systems at the medoids. The optimal value of k is determined iteratively by running the algorithm for the entire search space for different values of k along with demand-supply matching constraints. The optimal value of the k is chosen such that it minimizes the total cost of system installation, costs of transportation of biomass, and transmission and distribution. A smaller region, consisting of 293 villages was selected to study the sensitivity of the results to varying demand and supply parameters. The results of clustering are represented on a GIS map for the region.
Resumo:
Mycobacterium tuberculosis owes its high pathogenic potential to its ability to evade host immune responses and thrive inside the macrophage. The outcome of infection is largely determined by the cellular response comprising a multitude of molecular events. The complexity and inter-relatedness in the processes makes it essential to adopt systems approaches to study them. In this work, we construct a comprehensive network of infection-related processes in a human macrophage comprising 1888 proteins and 14,016 interactions. We then compute response networks based on available gene expression profiles corresponding to states of health, disease and drug treatment. We use a novel formulation for mining response networks that has led to identifying highest activities in the cell. Highest activity paths provide mechanistic insights into pathogenesis and response to treatment. The approach used here serves as a generic framework for mining dynamic changes in genome-scale protein interaction networks.
Resumo:
Seasonal rainfall patterns in Bangalore, India, have been reconstructed using stable isotopic ratios in the growth bands of Giant African Land Snail shells. The present study was conducted at Bangalore, India which receives rain during the summer months. The oxygen isotopic record in the rainwater samples collected during different months covering the period of the summer monsoon of the year 2008 is compared with the isotopic ratio in the gastropod growth bands deposited simultaneously. The chronology of the shell growth band is independently established assuming the growth rate observed in a chamber experiment maintaining similar relative humidity and temperature conditions. A consistent pattern observed in the isotopic ratio in the gastropod growth bands and rainwater is demonstrated and provides a novel approach for precipitation reconstruction at seasonal and weekly time scales. This approach of using isotopic ratios in the gastropod growth bands for rainfall can serve as a substitute for filling gaps in rainfall data and for cases where no rain records are available. In addition, they can be used to determine the frequencies and magnitudes of dry spells from the past records. (C) 2013 Elsevier B.V. All rights reserved.
Resumo:
Variable Endmember Constrained Least Square (VECLS) technique is proposed to account endmember variability in the linear mixture model by incorporating the variance for each class, the signals of which varies from pixel to pixel due to change in urban land cover (LC) structures. VECLS is first tested with a computer simulated three class endmember considering four bands having small, medium and large variability with three different spatial resolutions. The technique is next validated with real datasets of IKONOS, Landsat ETM+ and MODIS. The results show that correlation between actual and estimated proportion is higher by an average of 0.25 for the artificial datasets compared to a situation where variability is not considered. With IKONOS, Landsat ETM+ and MODIS data, the average correlation increased by 0.15 for 2 and 3 classes and by 0.19 for 4 classes, when compared to single endmember per class. (C) 2013 COSPAR. Published by Elsevier Ltd. All rights reserved.
Resumo:
Feeding 9-10billion people by 2050 and preventing dangerous climate change are two of the greatest challenges facing humanity. Both challenges must be met while reducing the impact of land management on ecosystem services that deliver vital goods and services, and support human health and well-being. Few studies to date have considered the interactions between these challenges. In this study we briefly outline the challenges, review the supply- and demand-side climate mitigation potential available in the Agriculture, Forestry and Other Land Use AFOLU sector and options for delivering food security. We briefly outline some of the synergies and trade-offs afforded by mitigation practices, before presenting an assessment of the mitigation potential possible in the AFOLU sector under possible future scenarios in which demand-side measures codeliver to aid food security. We conclude that while supply-side mitigation measures, such as changes in land management, might either enhance or negatively impact food security, demand-side mitigation measures, such as reduced waste or demand for livestock products, should benefit both food security and greenhouse gas (GHG) mitigation. Demand-side measures offer a greater potential (1.5-15.6Gt CO2-eq. yr(-1)) in meeting both challenges than do supply-side measures (1.5-4.3Gt CO2-eq. yr(-1) at carbon prices between 20 and 100US$ tCO(2)-eq. yr(-1)), but given the enormity of challenges, all options need to be considered. Supply-side measures should be implemented immediately, focussing on those that allow the production of more agricultural product per unit of input. For demand-side measures, given the difficulties in their implementation and lag in their effectiveness, policy should be introduced quickly, and should aim to codeliver to other policy agenda, such as improving environmental quality or improving dietary health. These problems facing humanity in the 21st Century are extremely challenging, and policy that addresses multiple objectives is required now more than ever.
Resumo:
[1] Evaporative fraction (EF) is a measure of the amount of available energy at the earth surface that is partitioned into latent heat flux. The currently operational thermal sensors like the Moderate Resolution Imaging Spectroradiometer (MODIS) on satellite platforms provide data only at 1000 m, which constraints the spatial resolution of EF estimates. A simple model (disaggregation of evaporative fraction (DEFrac)) based on the observed relationship between EF and the normalized difference vegetation index is proposed to spatially disaggregate EF. The DEFrac model was tested with EF estimated from the triangle method using 113 clear sky data sets from the MODIS sensor aboard Terra and Aqua satellites. Validation was done using the data at four micrometeorological tower sites across varied agro-climatic zones possessing different land cover conditions in India using Bowen ratio energy balance method. The root-mean-square error (RMSE) of EF estimated at 1000 m resolution using the triangle method was 0.09 for all the four sites put together. The RMSE of DEFrac disaggregated EF was 0.09 for 250 m resolution. Two models of input disaggregation were also tried with thermal data sharpened using two thermal sharpening models DisTrad and TsHARP. The RMSE of disaggregated EF was 0.14 for both the input disaggregation models for 250 m resolution. Moreover, spatial analysis of disaggregation was performed using Landsat-7 (Enhanced Thematic Mapper) ETM+ data over four grids in India for contrasted seasons. It was observed that the DEFrac model performed better than the input disaggregation models under cropped conditions while they were marginally similar under non-cropped conditions.