848 resultados para Spatial data mining
Resumo:
This research studied distributed computing of all-to-all comparison problems with big data sets. The thesis formalised the problem, and developed a high-performance and scalable computing framework with a programming model, data distribution strategies and task scheduling policies to solve the problem. The study considered storage usage, data locality and load balancing for performance improvement in solving the problem. The research outcomes can be applied in bioinformatics, biometrics and data mining and other domains in which all-to-all comparisons are a typical computing pattern.
Resumo:
Multi-document summarization addressing the problem of information overload has been widely utilized in the various real-world applications. Most of existing approaches adopt term-based representation for documents which limit the performance of multi-document summarization systems. In this paper, we proposed a novel pattern-based topic model (PBTMSum) for the task of the multi-document summarization. PBTMSum combining pattern mining techniques with LDA topic modelling could generate discriminative and semantic rich representations for topics and documents so that the most representative and non-redundant sentences can be selected to form a succinct and informative summary. Extensive experiments are conducted on the data of document understanding conference (DUC) 2007. The results prove the effectiveness and efficiency of our proposed approach.
Resumo:
The idea of extracting knowledge in process mining is a descendant of data mining. Both mining disciplines emphasise data flow and relations among elements in the data. Unfortunately, challenges have been encountered when working with the data flow and relations. One of the challenges is that the representation of the data flow between a pair of elements or tasks is insufficiently simplified and formulated, as it considers only a one-to-one data flow relation. In this paper, we discuss how the effectiveness of knowledge representation can be extended in both disciplines. To this end, we introduce a new representation of the data flow and dependency formulation using a flow graph. The flow graph solves the issue of the insufficiency of presenting other relation types, such as many-to-one and one-to-many relations. As an experiment, a new evaluation framework is applied to the Teleclaim process in order to show how this method can provide us with more precise results when compared with other representations.
Resumo:
Urban population is growing at around 2.3 percent per annum in India. This is leading to urbanisation and often fuelling the dispersed development in the outskirts of urban and village centres with impacts such as loss of agricultural land, open space, and ecologically sensitive habitats. This type of upsurge is very much prevalent and persistent in most places, often inferred as sprawl. The direct implication of such urban sprawl is the change in land use and land cover of the region and lack of basic amenities, since planners are unable to visualise this type of growth patterns. This growth is normally left out in all government surveys (even in national population census), as this cannot be grouped under either urban or rural centre. The investigation of patterns of growth is very crucial from regional planning point of view to provide basic amenities in the region. The growth patterns of urban sprawl can be analysed and understood with the availability of temporal multi-sensor, multi-resolution spatial data. In order to optimise these spectral and spatial resolutions, image fusion techniques are required. This aids in integrating a lower spatial resolution multispectral (MSS) image (for example, IKONOS MSS bands of 4m spatial resolution) with a higher spatial resolution panchromatic (PAN) image (IKONOS PAN band of 1m spatial resolution) based on a simple spectral preservation fusion technique - the Smoothing Filter-based Intensity Modulation (SFIM). Spatial details are modulated to a co-registered lower resolution MSS image without altering its spectral properties and contrast by using a ratio between a higher resolution image and its low pass filtered (smoothing filter) image. The visual evaluation and statistical analysis confirms that SFIM is a superior fusion technique for improving spatial detail of MSS images with the preservation of spectral properties.
Resumo:
Over the past decade, many powerful data mining techniques have been developed to analyze temporal and sequential data. The time is now fertile for addressing problems of larger scope under the purview of temporal data mining. The fourth SIGKDD workshop on temporal data mining focused on the question: What can we infer about the structure of a complex dynamical system from observed temporal data? The goals of the workshop were to critically evaluate the need in this area by bringing together leading researchers from industry and academia, and to identify promising technologies and methodologies for doing the same. We provide a brief summary of the workshop proceedings and ideas arising out of the discussions.
Resumo:
Fast content addressable data access mechanisms have compelling applications in today's systems. Many of these exploit the powerful wildcard matching capabilities provided by ternary content addressable memories. For example, TCAM based implementations of important algorithms in data mining been developed in recent years; these achieve an an order of magnitude speedup over prevalent techniques. However, large hardware TCAMs are still prohibitively expensive in terms of power consumption and cost per bit. This has been a barrier to extending their exploitation beyond niche and special purpose systems. We propose an approach to overcome this barrier by extending the traditional virtual memory hierarchy to scale up the user visible capacity of TCAMs while mitigating the power consumption overhead. By exploiting the notion of content locality (as opposed to spatial locality), we devise a novel combination of software and hardware techniques to provide an abstraction of a large virtual ternary content addressable space. In the long run, such abstractions enable applications to disassociate considerations of spatial locality and contiguity from the way data is referenced. If successful, ideas for making content addressability a first class abstraction in computing systems can open up a radical shift in the way applications are optimized for memory locality, just as storage class memories are soon expected to shift away from the way in which applications are typically optimized for disk access locality.
Resumo:
A number of ecosystems can exhibit abrupt shifts between alternative stable states. Because of their important ecological and economic consequences, recent research has focused on devising early warning signals for anticipating such abrupt ecological transitions. In particular, theoretical studies show that changes in spatial characteristics of the system could provide early warnings of approaching transitions. However, the empirical validation of these indicators lag behind their theoretical developments. Here, we summarize a range of currently available spatial early warning signals, suggest potential null models to interpret their trends, and apply them to three simulated spatial data sets of systems undergoing an abrupt transition. In addition to providing a step-by-step methodology for applying these signals to spatial data sets, we propose a statistical toolbox that may be used to help detect approaching transitions in a wide range of spatial data. We hope that our methodology together with the computer codes will stimulate the application and testing of spatial early warning signals on real spatial data.
Resumo:
The problem of classification of time series data is an interesting problem in the field of data mining. Even though several algorithms have been proposed for the problem of time series classification we have developed an innovative algorithm which is computationally fast and accurate in several cases when compared with 1NN classifier. In our method we are calculating the fuzzy membership of each test pattern to be classified to each class. We have experimented with 6 benchmark datasets and compared our method with 1NN classifier.
Resumo:
188 p.
Resumo:
Washington depends on a healthy coastal and marine ecosystem to maintain a thriving economy and vibrant communities. These ecosystems support critical habitats for wildlife and a growing number of often competing ocean activities, such as fishing, transportation, aquaculture, recreation, and energy production. Planners, policy makers and resource managers are being challenged to sustainably balance ocean uses, and environmental conservation in a finite space and with limited information. This balancing act can be supported by spatial planning. Marine spatial planning (MSP) is a planning process that enables integrated, forward looking, and consistent decision making on the human uses of the oceans and coasts. It can improve marine resource management by planning for human uses in locations that reduce conflict, increase certainty, and support a balance among social, economic, and ecological benefits we receive from ocean resources. In March 2010, the Washington state legislature enacted a marine spatial planning law (RCW §43.372) to address resource use conflicts in Washington waters. In 2011, a report to the legislature and a workshop on human use data provided guidance for the marine spatial planning process. The report outlines a set of recommendations for the State to effectively undertake marine spatial planning and this work plan will support some of these recommendations, such as: federal integration, regional coordination, developing mechanisms to integrate scientific and technical expertise, developing data standards, and accessing and sharing spatial data. In 2012 the Governor amended the existing law to focus funding on mapping and ecosystem assessments for Washington’s Pacific coast and the legislature provided $2.1 million in funds to begin marine spatial planning off Washington’s coast. The funds are appropriated through the Washington Department of Natural Resources Marine Resources Stewardship Account with coordination among the State Ocean Caucus, the four Coastal Treaty Tribes, four coastal Marine Resource Committees and the newly formed stakeholder body, the Washington Coastal Marine Advisory Council.
Resumo:
Compared with structured data sources that are usually stored and analyzed in spreadsheets, relational databases, and single data tables, unstructured construction data sources such as text documents, site images, web pages, and project schedules have been less intensively studied due to additional challenges in data preparation, representation, and analysis. In this paper, our vision for data management and mining addressing such challenges are presented, together with related research results from previous work, as well as our recent developments of data mining on text-based, web-based, image-based, and network-based construction databases.
Resumo:
IEEE
Resumo:
On the issue of geological hazard evaluation(GHE), taking remote sensing and GIS systems as experimental environment, assisting with some programming development, this thesis combines multi-knowledges of geo-hazard mechanism, statistic learning, remote sensing (RS), high-spectral recognition, spatial analysis, digital photogrammetry as well as mineralogy, and selects geo-hazard samples from Hong Kong and Three Parallel River region as experimental data, to study two kinds of core questions of GHE, geo-hazard information acquiring and evaluation model. In the aspect of landslide information acquiring by RS, three detailed topics are presented, image enhance for visual interpretation, automatic recognition of landslide as well as quantitative mineral mapping. As to the evaluation model, the latest and powerful data mining method, support vector machine (SVM), is introduced to GHE field, and a serious of comparing experiments are carried out to verify its feasibility and efficiency. Furthermore, this paper proposes a method to forecast the distribution of landslides if rainfall in future is known baseing on historical rainfall and corresponding landslide susceptibility map. The details are as following: (a) Remote sensing image enhancing methods for geo-hazard visual interpretation. The effect of visual interpretation is determined by RS data and image enhancing method, for which the most effective and regular technique is image merge between high-spatial image and multi-spectral image, but there are few researches concerning the merging methods of geo-hazard recognition. By the comparing experimental of six mainstream merging methods and combination of different remote sensing data source, this thesis presents merits of each method ,and qualitatively analyzes the effect of spatial resolution, spectral resolution and time phase on merging image. (b) Automatic recognition of shallow landslide by RS image. The inventory of landslide is the base of landslide forecast and landslide study. If persistent collecting of landslide events, updating the geo-hazard inventory in time, and promoting prediction model incessantly, the accuracy of forecast would be boosted step by step. RS technique is a feasible method to obtain landslide information, which is determined by the feature of geo-hazard distribution. An automatic hierarchical approach is proposed to identify shallow landslides in vegetable region by the combination of multi-spectral RS imagery and DEM derivatives, and the experiment is also drilled to inspect its efficiency. (c) Hazard-causing factors obtaining. Accurate environmental factors are the key to analyze and predict the risk of regional geological hazard. As to predict huge debris flow, the main challenge is still to determine the startup material and its volume in debris flow source region. Exerting the merits of various RS technique, this thesis presents the methods to obtain two important hazard-causing factors, DEM and alteration mineral, and through spatial analysis, finds the relationship between hydrothermal clay alteration minerals and geo-hazards in the arid-hot valleys of Three Parallel Rivers region. (d) Applying support vector machine (SVM) to landslide susceptibility mapping. Introduce the latest and powerful statistical learning theory, SVM, to RGHE. SVM that proved an efficient statistic learning method can deal with two-class and one-class samples, with feature avoiding produce ‘pseudo’ samples. 55 years historical samples in a natural terrain of Hong Kong are used to assess this method, whose susceptibility maps obtained by one-class SVM and two-class SVM are compared to that obtained by logistic regression method. It can conclude that two-class SVM possesses better prediction efficiency than logistic regression and one-class SVM. However, one-class SVM, only requires failed cases, has an advantage over the other two methods as only "failed" case information is usually available in landslide susceptibility mapping. (e) Predicting the distribution of rainfall-induced landslides by time-series analysis. Rainfall is the most dominating factor to bring in landslides. More than 90% losing and casualty by landslides is introduced by rainfall, so predicting landslide sites under certain rainfall is an important geological evaluating issue. With full considering the contribution of stable factors (landslide susceptibility map) and dynamic factors (rainfall), the time-series linear regression analysis between rainfall and landslide risk mapis presented, and experiments based on true samples prove that this method is perfect in natural region of Hong Kong. The following 4 practicable or original findings are obtained: 1) The RS ways to enhance geo-hazards image, automatic recognize shallow landslides, obtain DEM and mineral are studied, and the detailed operating steps are given through examples. The conclusion is practical strongly. 2) The explorative researching about relationship between geo-hazards and alteration mineral in arid-hot valley of Jinshajiang river is presented. Based on standard USGS mineral spectrum, the distribution of hydrothermal alteration mineral is mapped by SAM method. Through statistic analysis between debris flows and hazard-causing factors, the strong correlation between debris flows and clay minerals is found and validated. 3) Applying SVM theory (especially one-class SVM theory) to the landslide susceptibility mapping and system evaluation for its performance is also carried out, which proves that advantages of SVM in this field. 4) Establishing time-serial prediction method for rainfall induced landslide distribution. In a natural study area, the distribution of landslides induced by a storm is predicted successfully under a real maximum 24h rainfall based on the regression between 4 historical storms and corresponding landslides.
Resumo:
The Xinli mine area of Sanshandao mine is adjacent to the Bohai Sea and its main exploitable ore deposit occurs in the undersea rock mass. The mine is the biggest undersea gold mine of China after production. The mine area faces a latent danger of water bursting, even sudden seawater inrush. There is no mature experience in undersea mining in China so far. The vein ore deposit is located in the lower wall of a fault; its possible groundwater sources mainly include bittern, Quaternary pore water and modern seawater. To ensure the safety of undersea mining, to survey the flooding conditions of the ore deposit using proper measures and study the potential seawater inrush pattern are the key technical problems. With the Xinli mine area as a case study, the engineering geological conditions of the Xinli mine area are surveyed in situ, the regional structural pattern and rock mass framework characteristics are found out, the distribution of the structural planes are modeled by a Monte Carlo method and the connectivity coefficients of rock mass structural planes are calculated. The regional hydro-geological conditions are analyzed and the in-situ hydro-geological investigation and sampling are performed in detail, the hydrochemistry and isotopes testing and groundwater dynamic monitoring are conducted, the recharge, runoff, discharge conditions are specified and the sources of flooding are distinguished. Some indices are selected from the testing results to calculate the proportion of each source in some water discharge points and in the whole water discharge of the Xinli mine area. The temporal and spatial variations of each water source of the whole ore deposit flooding are analyzed. According to the special project conditions in the Xinli mine area, the permeability coefficient tensors of the rock mass in Xinli mine area are calculated based on a fracture geometry measurement method, in terms of the connectivity and a few hydraulic testing results, a modified synthetic permeability coefficient are calculated. The hydro-geological conceptual and mathematical model are established,the water yield of mine is predicted using Visual Modflow code. The spreading law of surrounding rock mass deformation and secondary stress are studied by numerical analysis; the intrinsic mechanism of the faults slip caused by the excavation of ore deposit is analyzed. The results show that the development of surrounding rock mass deformation and secondary stress of vein ore deposit in the lower wall of a fault, is different from that in a thick-big ore deposit. The secondary stress caused by the excavation of vein ore deposit in the lower wall of a fault, is mainly distributed in the upper wall of the fault, one surface subsidence center will occur. The influences of fault on the rock mass movement, secondary stress and hydro-geological structures are analyzed; the secondary stress is blocked by the fault and the tensile stress concentration occurs in the rock mass near the fault, the original water blocking structure is destructed and the permeable structure is reconstructed, the primary structural planes begin to expand and newborn fissures occur, so the permeability of the original permeable structure is greatly enhanced, so the water bursting will probably occur. Based on this knowledge, the possible water inrush pattern and position of the Xinli mine area are predicted. Some computer programs are developed using object-oriented design method under the development platform Visual Studio.Net. These programs include a Monte Carlo simulation procedure, a joint diagrammatizing procedure, a structural planes connectivity coefficient calculating procedure, a permeability tensor calculating procedure, a water chemical formula edit and water source fixture conditions calculating procedure. A new computer mapping algorithm of joint iso-density diagram is raised. Based on the powerful spatial data management and icon functions of Geographic Information System, the pit water discharge dynamic monitoring data management information systems are established with ArcView.
Resumo:
Population research is a front area concerned by domestic and overseas, especially its researches on its spatial visualization and its geo-visualization system design, which provides a sound base for understanding and analysis of the regional difference in population distribution and its spatial rules. With the development of GIS, the theory of geo-visualization more and more plays an important role in many research fields, especially in population information visualization, and has been made the big achievements recently. Nevertheless, the current research is less attention paid to the system design for statistical-geo visualization for population information. This paper tries to explore the design theories and methodologies for statistical-geo-visualization system for population information. The researches are mainly focused on the framework, the methodologies and techniques for the system design and construction. The purpose of the research is developed a platform for population atlas by the integration of the former owned copy software of the research group in statistical mapping system. As a modern tool, the system will provide a spatial visual environment for user to analyze the characteristics of population distribution and differentiate the interrelations of the population components. Firstly, the paper discusses the essentiality of geo-visualization for population information and brings forward the key issue in statistical-geo visualization system design based on the analysis of inland and international trends. Secondly, the geo-visualization system for population design, including its structure, functionality, module, user interface design, is studied based on the concepts of theory and technology of geo-visualization. The system design is proposed and further divided into three parts: support layer, technical layer, user layer. The support layer is a basic operation module and main part of the system. The technical layer is a core part of the system, supported by database and function modules. The database module mainly include the integrated population database (comprises spatial data, attribute data and geographical features information), the cartographic symbol library, the color library, the statistical analysis model. The function module of the system consists of thematic map maker component, statistical graph maker component, database management component and statistical analysis component. The user layer is an integrated platform, which provides the functions to design and implement a visual interface for user to query, analysis and management the statistic data and the electronic map. Based on the above, China's E-atlas for population was designed and developed by the integration of the national fifth census data with 1:400 million scaled spatial data. The atlas illustrates the actual development level of the population nowadays in China by about 200 thematic maps relating with 10 map categories(environment, population distribution, sex and age, immigration, nation, family and marriage, birth, education, employment, house). As a scientific reference tool, China's E-atlas for population has already received the high evaluation after published in early 2005. Finally, the paper makes the deep analysis of the sex ratio in China, to show how to use the functions of the system to analyze the specific population problem and how to make the data mining. The analysis results showed that: 1. The sex ratio has been increased in many regions after fourth census in 1990 except the cities in the east region, and the high sex ratio is highly located in hilly and low mountain areas where with the high illiteracy rate and the high poor rate; 2. The statistical-geo visualization system is a powerful tool to handle population information, which can be used to reflect the regional differences and the regional variations of population in China and indicate the interrelations of the population with other environment factors. Although the author tries to bring up a integrate design frame of the statistical-geo visualization system, there are still many problems needed to be resolved with the development of geo-visualization studies.