796 resultados para Data-Mining Techniques
Resumo:
This paper presents the image reconstruction using the fan-beam filtered backprojection (FBP) algorithm with no backprojection weight from windowed linear prediction (WLP) completed truncated projection data. The image reconstruction from truncated projections aims to reconstruct the object accurately from the available limited projection data. Due to the incomplete projection data, the reconstructed image contains truncation artifacts which extends into the region of interest (ROI) making the reconstructed image unsuitable for further use. Data completion techniques have been shown to be effective in such situations. We use windowed linear prediction technique for projection completion and then use the fan-beam FBP algorithm with no backprojection weight for the 2-D image reconstruction. We evaluate the quality of the reconstructed image using fan-beam FBP algorithm with no backprojection weight after WLP completion.
Resumo:
Introduction of processor based instruments in power systems is resulting in the rapid growth of the measured data volume. The present practice in most of the utilities is to store only some of the important data in a retrievable fashion for a limited period. Subsequently even this data is either deleted or stored in some back up devices. The investigations presented here explore the application of lossless data compression techniques for the purpose of archiving all the operational data - so that they can be put to more effective use. Four arithmetic coding methods suitably modified for handling power system steady state operational data are proposed here. The performance of the proposed methods are evaluated using actual data pertaining to the Southern Regional Grid of India. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
Fast content addressable data access mechanisms have compelling applications in today's systems. Many of these exploit the powerful wildcard matching capabilities provided by ternary content addressable memories. For example, TCAM based implementations of important algorithms in data mining been developed in recent years; these achieve an an order of magnitude speedup over prevalent techniques. However, large hardware TCAMs are still prohibitively expensive in terms of power consumption and cost per bit. This has been a barrier to extending their exploitation beyond niche and special purpose systems. We propose an approach to overcome this barrier by extending the traditional virtual memory hierarchy to scale up the user visible capacity of TCAMs while mitigating the power consumption overhead. By exploiting the notion of content locality (as opposed to spatial locality), we devise a novel combination of software and hardware techniques to provide an abstraction of a large virtual ternary content addressable space. In the long run, such abstractions enable applications to disassociate considerations of spatial locality and contiguity from the way data is referenced. If successful, ideas for making content addressability a first class abstraction in computing systems can open up a radical shift in the way applications are optimized for memory locality, just as storage class memories are soon expected to shift away from the way in which applications are typically optimized for disk access locality.
Resumo:
Learning from Positive and Unlabelled examples (LPU) has emerged as an important problem in data mining and information retrieval applications. Existing techniques are not ideally suited for real world scenarios where the datasets are linearly inseparable, as they either build linear classifiers or the non-linear classifiers fail to achieve the desired performance. In this work, we propose to extend maximum margin clustering ideas and present an iterative procedure to design a non-linear classifier for LPU. In particular, we build a least squares support vector classifier, suitable for handling this problem due to symmetry of its loss function. Further, we present techniques for appropriately initializing the labels of unlabelled examples and for enforcing the ratio of positive to negative examples while obtaining these labels. Experiments on real-world datasets demonstrate that the non-linear classifier designed using the proposed approach gives significantly better generalization performance than the existing relevant approaches for LPU.
Resumo:
The problem of classification of time series data is an interesting problem in the field of data mining. Even though several algorithms have been proposed for the problem of time series classification we have developed an innovative algorithm which is computationally fast and accurate in several cases when compared with 1NN classifier. In our method we are calculating the fuzzy membership of each test pattern to be classified to each class. We have experimented with 6 benchmark datasets and compared our method with 1NN classifier.
Resumo:
As academic libraries are increasingly supported by a matrix of databases functions, the use of data mining and visualization techniques offer significant potential for future collection development and service initiatives based on quantifiable data. While data collection techniques are still not standardized and results may be skewed because of granularity problems, faulty algorithms, and a host of other factors, useful baseline data is extractable and broad trends can be identified. The purpose of the current study is to provide an initial assessment of data associated with science monograph collection at the Marston Science Library (MSL), University of Florida. These sciences fall within the major Library of Congress Classification schedules of Q, S, and T, excluding R, TN, TR, and TT. Overall strategy of this project is to look at the potential science audiences within the university community and analyze data related to purchasing and circulation patterns, e-book usage, and interlibrary loan statistics. While a longitudinal study from 2004 to the present would be ideal, this paper presents the results from the academic year July 1, 2008 to June 30, 2009 which was chosen as the pilot period because all data reservoirs identified above were available.
Resumo:
Techniques are developed for estimating activity profiles in fixed bed reactors and catalyst deactivation parameters from operating reactor data. These techniques are applicable, in general, to most industrial catalytic processes. The catalytic reforming of naphthas is taken as a broad example to illustrate the estimation schemes and to signify the physical meaning of the kinetic parameters of the estimation equations. The work is described in two parts. Part I deals with the modeling of kinetic rate expressions and the derivation of the working equations for estimation. Part II concentrates on developing various estimation techniques.
Part I: The reactions used to describe naphtha reforming are dehydrogenation and dehydroisomerization of cycloparaffins; isomerization, dehydrocyclization and hydrocracking of paraffins; and the catalyst deactivation reactions, namely coking on alumina sites and sintering of platinum crystallites. The rate expressions for the above reactions are formulated, and the effects of transport limitations on the overall reaction rates are discussed in the appendices. Moreover, various types of interaction between the metallic and acidic active centers of reforming catalysts are discussed as characterizing the different types of reforming reactions.
Part II: In catalytic reactor operation, the activity distribution along the reactor determines the kinetics of the main reaction and is needed for predicting the effect of changes in the feed state and the operating conditions on the reactor output. In the case of a monofunctional catalyst and of bifunctional catalysts in limiting conditions, the cumulative activity is sufficient for predicting steady reactor output. The estimation of this cumulative activity can be carried out easily from measurements at the reactor exit. For a general bifunctional catalytic system, the detailed activity distribution is needed for describing the reactor operation, and some approximation must be made to obtain practicable estimation schemes. This is accomplished by parametrization techniques using measurements at a few points along the reactor. Such parametrization techniques are illustrated numerically with a simplified model of naphtha reforming.
To determine long term catalyst utilization and regeneration policies, it is necessary to estimate catalyst deactivation parameters from the the current operating data. For a first order deactivation model with a monofunctional catalyst or with a bifunctional catalyst in special limiting circumstances, analytical techniques are presented to transform the partial differential equations to ordinary differential equations which admit more feasible estimation schemes. Numerical examples include the catalytic oxidation of butene to butadiene and a simplified model of naphtha reforming. For a general bifunctional system or in the case of a monofunctional catalyst subject to general power law deactivation, the estimation can only be accomplished approximately. The basic feature of an appropriate estimation scheme involves approximating the activity profile by certain polynomials and then estimating the deactivation parameters from the integrated form of the deactivation equation by regression techniques. Different bifunctional systems must be treated by different estimation algorithms, which are illustrated by several cases of naphtha reforming with different feed or catalyst composition.
Resumo:
Nos dias atuais, a maioria das operações feitas por empresas e organizações é armazenada em bancos de dados que podem ser explorados por pesquisadores com o objetivo de se obter informações úteis para auxílio da tomada de decisão. Devido ao grande volume envolvido, a extração e análise dos dados não é uma tarefa simples. O processo geral de conversão de dados brutos em informações úteis chama-se Descoberta de Conhecimento em Bancos de Dados (KDD - Knowledge Discovery in Databases). Uma das etapas deste processo é a Mineração de Dados (Data Mining), que consiste na aplicação de algoritmos e técnicas estatísticas para explorar informações contidas implicitamente em grandes bancos de dados. Muitas áreas utilizam o processo KDD para facilitar o reconhecimento de padrões ou modelos em suas bases de informações. Este trabalho apresenta uma aplicação prática do processo KDD utilizando a base de dados de alunos do 9 ano do ensino básico do Estado do Rio de Janeiro, disponibilizada no site do INEP, com o objetivo de descobrir padrões interessantes entre o perfil socioeconômico do aluno e seu desempenho obtido em Matemática na Prova Brasil 2011. Neste trabalho, utilizando-se da ferramenta chamada Weka (Waikato Environment for Knowledge Analysis), foi aplicada a tarefa de mineração de dados conhecida como associação, onde se extraiu regras por intermédio do algoritmo Apriori. Neste estudo foi possível descobrir, por exemplo, que alunos que já foram reprovados uma vez tendem a tirar uma nota inferior na prova de matemática, assim como alunos que nunca foram reprovados tiveram um melhor desempenho. Outros fatores, como a sua pretensão futura, a escolaridade dos pais, a preferência de matemática, o grupo étnico o qual o aluno pertence, se o aluno lê sites frequentemente, também influenciam positivamente ou negativamente no aprendizado do discente. Também foi feita uma análise de acordo com a infraestrutura da escola onde o aluno estuda e com isso, pôde-se afirmar que os padrões descobertos ocorrem independentemente se estes alunos estudam em escolas que possuem infraestrutura boa ou ruim. Os resultados obtidos podem ser utilizados para traçar perfis de estudantes que tem um melhor ou um pior desempenho em matemática e para a elaboração de políticas públicas na área de educação, voltadas ao ensino fundamental.
Resumo:
Compared with structured data sources that are usually stored and analyzed in spreadsheets, relational databases, and single data tables, unstructured construction data sources such as text documents, site images, web pages, and project schedules have been less intensively studied due to additional challenges in data preparation, representation, and analysis. In this paper, our vision for data management and mining addressing such challenges are presented, together with related research results from previous work, as well as our recent developments of data mining on text-based, web-based, image-based, and network-based construction databases.
Resumo:
IEEE
Resumo:
National Key Basic Research and Development Program of China [2006CB701305]; State Key Laboratory of Resource and Environment Information System [088RA400SA]; Chinese Academy of Sciences
Resumo:
Population research is a front area concerned by domestic and overseas, especially its researches on its spatial visualization and its geo-visualization system design, which provides a sound base for understanding and analysis of the regional difference in population distribution and its spatial rules. With the development of GIS, the theory of geo-visualization more and more plays an important role in many research fields, especially in population information visualization, and has been made the big achievements recently. Nevertheless, the current research is less attention paid to the system design for statistical-geo visualization for population information. This paper tries to explore the design theories and methodologies for statistical-geo-visualization system for population information. The researches are mainly focused on the framework, the methodologies and techniques for the system design and construction. The purpose of the research is developed a platform for population atlas by the integration of the former owned copy software of the research group in statistical mapping system. As a modern tool, the system will provide a spatial visual environment for user to analyze the characteristics of population distribution and differentiate the interrelations of the population components. Firstly, the paper discusses the essentiality of geo-visualization for population information and brings forward the key issue in statistical-geo visualization system design based on the analysis of inland and international trends. Secondly, the geo-visualization system for population design, including its structure, functionality, module, user interface design, is studied based on the concepts of theory and technology of geo-visualization. The system design is proposed and further divided into three parts: support layer, technical layer, user layer. The support layer is a basic operation module and main part of the system. The technical layer is a core part of the system, supported by database and function modules. The database module mainly include the integrated population database (comprises spatial data, attribute data and geographical features information), the cartographic symbol library, the color library, the statistical analysis model. The function module of the system consists of thematic map maker component, statistical graph maker component, database management component and statistical analysis component. The user layer is an integrated platform, which provides the functions to design and implement a visual interface for user to query, analysis and management the statistic data and the electronic map. Based on the above, China's E-atlas for population was designed and developed by the integration of the national fifth census data with 1:400 million scaled spatial data. The atlas illustrates the actual development level of the population nowadays in China by about 200 thematic maps relating with 10 map categories(environment, population distribution, sex and age, immigration, nation, family and marriage, birth, education, employment, house). As a scientific reference tool, China's E-atlas for population has already received the high evaluation after published in early 2005. Finally, the paper makes the deep analysis of the sex ratio in China, to show how to use the functions of the system to analyze the specific population problem and how to make the data mining. The analysis results showed that: 1. The sex ratio has been increased in many regions after fourth census in 1990 except the cities in the east region, and the high sex ratio is highly located in hilly and low mountain areas where with the high illiteracy rate and the high poor rate; 2. The statistical-geo visualization system is a powerful tool to handle population information, which can be used to reflect the regional differences and the regional variations of population in China and indicate the interrelations of the population with other environment factors. Although the author tries to bring up a integrate design frame of the statistical-geo visualization system, there are still many problems needed to be resolved with the development of geo-visualization studies.
Resumo:
Nonlinear multivariate statistical techniques on fast computers offer the potential to capture more of the dynamics of the high dimensional, noisy systems underlying financial markets than traditional models, while making fewer restrictive assumptions. This thesis presents a collection of practical techniques to address important estimation and confidence issues for Radial Basis Function networks arising from such a data driven approach, including efficient methods for parameter estimation and pruning, a pointwise prediction error estimator, and a methodology for controlling the "data mining'' problem. Novel applications in the finance area are described, including customized, adaptive option pricing and stock price prediction.
Resumo:
Clare, A., Williams, H. E. and Lester, N. M. (2004) Scalable Multi-Relational Association Mining. In proceedings of the 4th International Conference on Data Mining ICDM '04.
Resumo:
Ferr?, S. and King, R. D. (2004) A dichotomic search algorithm for mining and learning in domain-specific logics. Fundamenta Informaticae. IOS Press. To appear