Biblioteca Digital

774 resultados para outlier detection, data mining, gpgpu, gpu computing, supercomputing

Tensor regression based on linked multiway parameter analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Classical regression methods take vectors as covariates and estimate the corresponding vectors of regression parameters. When addressing regression problems on covariates of more complex form such as multi-dimensional arrays (i.e. tensors), traditional computational models can be severely compromised by ultrahigh dimensionality as well as complex structure. By exploiting the special structure of tensor covariates, the tensor regression model provides a promising solution to reduce the model’s dimensionality to a manageable level, thus leading to efficient estimation. Most of the existing tensor-based methods independently estimate each individual regression problem based on tensor decomposition which allows the simultaneous projections of an input tensor to more than one direction along each mode. As a matter of fact, multi-dimensional data are collected under the same or very similar conditions, so that data share some common latent components but can also have their own independent parameters for each regression task. Therefore, it is beneficial to analyse regression parameters among all the regressions in a linked way. In this paper, we propose a tensor regression model based on Tucker Decomposition, which identifies not only the common components of parameters across all the regression tasks, but also independent factors contributing to each particular regression task simultaneously. Under this paradigm, the number of independent parameters along each mode is constrained by a sparsity-preserving regulariser. Linked multiway parameter analysis and sparsity modeling further reduce the total number of parameters, with lower memory cost than their tensor-based counterparts. The effectiveness of the new method is demonstrated on real data sets.

A scalable expressive ensemble learning using Random Prism: a MapReduce approach

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The induction of classification rules from previously unseen examples is one of the most important data mining tasks in science as well as commercial applications. In order to reduce the influence of noise in the data, ensemble learners are often applied. However, most ensemble learners are based on decision tree classifiers which are affected by noise. The Random Prism classifier has recently been proposed as an alternative to the popular Random Forests classifier, which is based on decision trees. Random Prism is based on the Prism family of algorithms, which is more robust to noise. However, like most ensemble classification approaches, Random Prism also does not scale well on large training data. This paper presents a thorough discussion of Random Prism and a recently proposed parallel version of it called Parallel Random Prism. Parallel Random Prism is based on the MapReduce programming paradigm. The paper provides, for the first time, novel theoretical analysis of the proposed technique and in-depth experimental study that show that Parallel Random Prism scales well on a large number of training examples, a large number of data features and a large number of processors. Expressiveness of decision rules that our technique produces makes it a natural choice for Big Data applications where informed decision making increases the user’s trust in the system.

Low rank representation on Riemannian manifold of symmetric positive definite matrices

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sparse coding aims to find a more compact representation based on a set of dictionary atoms. A well-known technique looking at 2D sparsity is the low rank representation (LRR). However, in many computer vision applications, data often originate from a manifold, which is equipped with some Riemannian geometry. In this case, the existing LRR becomes inappropriate for modeling and incorporating the intrinsic geometry of the manifold that is potentially important and critical to applications. In this paper, we generalize the LRR over the Euclidean space to the LRR model over a specific Rimannian manifold—the manifold of symmetric positive matrices (SPD). Experiments on several computer vision datasets showcase its noise robustness and superior performance on classification and segmentation compared with state-of-the-art approaches.

A Survey on Graphical Methods for Classification Predictive Performance Evaluation

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Predictive performance evaluation is a fundamental issue in design, development, and deployment of classification systems. As predictive performance evaluation is a multidimensional problem, single scalar summaries such as error rate, although quite convenient due to its simplicity, can seldom evaluate all the aspects that a complete and reliable evaluation must consider. Due to this, various graphical performance evaluation methods are increasingly drawing the attention of machine learning, data mining, and pattern recognition communities. The main advantage of these types of methods resides in their ability to depict the trade-offs between evaluation aspects in a multidimensional space rather than reducing these aspects to an arbitrarily chosen (and often biased) single scalar measure. Furthermore, to appropriately select a suitable graphical method for a given task, it is crucial to identify its strengths and weaknesses. This paper surveys various graphical methods often used for predictive performance evaluation. By presenting these methods in the same framework, we hope this paper may shed some light on deciding which methods are more suitable to use in different situations.

ANFIS BASED MODELS FOR ACCESSING QUALITY OF WIKIPEDIA ARTICLES

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Due to the free nature of Wikipedia and allowing open access to everyone to edit articles the quality of articles may be affected. As all people don’t have equal level of knowledge and also different people have different opinions about a topic so there may be difference between the contributions made by different authors. To overcome this situation it is very important to classify the articles so that the articles of good quality can be separated from the poor quality articles and should be removed from the database. The aim of this study is to classify the articles of Wikipedia into two classes class 0 (poor quality) and class 1(good quality) using the Adaptive Neuro Fuzzy Inference System (ANFIS) and data mining techniques. Two ANFIS are built using the Fuzzy Logic Toolbox [1] available in Matlab. The first ANFIS is based on the rules obtained from J48 classifier in WEKA while the other one was built by using the expert’s knowledge. The data used for this research work contains 226 article’s records taken from the German version of Wikipedia. The dataset consists of 19 inputs and one output. The data was preprocessed to remove any similar attributes. The input variables are related to the editors, contributors, length of articles and the lifecycle of articles. In the end analysis of different methods implemented in this research is made to analyze the performance of each classification method used.

On Occurrence Of Plagiarism In Published Computer Science Thesis Reports At Swedish Universities

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, it has been observed that software clones and plagiarism are becoming an increased threat for one?s creativity. Clones are the results of copying and using other?s work. According to the Merriam – Webster dictionary, “A clone is one that appears to be a copy of an original form”. It is synonym to duplicate. Clones lead to redundancy of codes, but not all redundant code is a clone.On basis of this background knowledge ,in order to safeguard one?s idea and to avoid intentional code duplication for pretending other?s work as if their owns, software clone detection should be emphasized more. The objective of this paper is to review the methods for clone detection and to apply those methods for finding the extent of plagiarism occurrence among the Swedish Universities in Master level computer science department and to analyze the results.The rest part of the paper, discuss about software plagiarism detection which employs data analysis technique and then statistical analysis of the results.Plagiarism is an act of stealing and passing off the idea?s and words of another person?s as one?s own. Using data analysis technique, samples(Master level computer Science thesis report) were taken from various Swedish universities and processed in Ephorus anti plagiarism software detection. Ephorus gives the percentage of plagiarism for each thesis document, from this results statistical analysis were carried out using Minitab Software.The results gives a very low percentage of Plagiarism extent among the Swedish universities, which concludes that Plagiarism is not a threat to Sweden?s standard of education in computer science.This paper is based on data analysis, intelligence techniques, EPHORUS software plagiarism detection tool and MINITAB statistical software analysis.

Unsupervised learning to cluster the disease stages in parkinson's disease

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Parkinson's disease (PD) is the second most common neurodegenerative disorder (after Alzheimer's disease) and directly affects upto 5 million people worldwide. The stages (Hoehn and Yaar) of disease has been predicted by many methods which will be helpful for the doctors to give the dosage according to it. So these methods were brought up based on the data set which includes about seventy patients at nine clinics in Sweden. The purpose of the work is to analyze unsupervised technique with supervised neural network techniques in order to make sure the collected data sets are reliable to make decisions. The data which is available was preprocessed before calculating the features of it. One of the complex and efficient feature called wavelets has been calculated to present the data set to the network. The dimension of the final feature set has been reduced using principle component analysis. For unsupervised learning k-means gives the closer result around 76% while comparing with supervised techniques. Back propagation and J4 has been used as supervised model to classify the stages of Parkinson's disease where back propagation gives the variance percentage of 76-82%. The results of both these models have been analyzed. This proves that the data which are collected are reliable to predict the disease stages in Parkinson's disease.

Predictive models for chronic renal disease using decision trees, naïve bayes and case-based methods

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining can be used in healthcare industry to “mine” clinical data to discover hidden information for intelligent and affective decision making. Discovery of hidden patterns and relationships often goes intact, yet advanced data mining techniques can be helpful as remedy to this scenario. This thesis mainly deals with Intelligent Prediction of Chronic Renal Disease (IPCRD). Data covers blood, urine test, and external symptoms applied to predict chronic renal disease. Data from the database is initially transformed to Weka (3.6) and Chi-Square method is used for features section. After normalizing data, three classifiers were applied and efficiency of output is evaluated. Mainly, three classifiers are analyzed: Decision Tree, Naïve Bayes, K-Nearest Neighbour algorithm. Results show that each technique has its unique strength in realizing the objectives of the defined mining goals. Efficiency of Decision Tree and KNN was almost same but Naïve Bayes proved a comparative edge over others. Further sensitivity and specificity tests are used as statistical measures to examine the performance of a binary classification. Sensitivity (also called recall rate in some fields) measures the proportion of actual positives which are correctly identified while Specificity measures the proportion of negatives which are correctly identified. CRISP-DM methodology is applied to build the mining models. It consists of six major phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.

The Potential Of Kohonen Method For Classifying Rivers Focused On Regional Environmental Flow Determination With ELOHA

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article highlights the potential benefits that the Kohonen method has for the classification of rivers with similar characteristics by determining regional ecological flows using the ELOHA (Ecological Limits of Hydrologic Alteration) methodology. Currently, there are many methodologies for the classification of rivers, however none of them include the characteristics found in Kohonen method such as (i) providing the number of groups that actually underlie the information presented, (ii) used to make variable importance analysis, (iii) which in any case can display two-dimensional classification process, and (iv) that regardless of the parameters used in the model the clustering structure remains. In order to evaluate the potential benefits of the Kohonen method, 174 flow stations distributed along the great river basin “Magdalena-Cauca” (Colombia) were analyzed. 73 variables were obtained for the classification process in each case. Six trials were done using different combinations of variables and the results were validated against reference classification obtained by Ingfocol in 2010, whose results were also framed using ELOHA guidelines. In the process of validation it was found that two of the tested models reproduced a level higher than 80% of the reference classification with the first trial, meaning that more than 80% of the flow stations analyzed in both models formed invariant groups of streams.

Newly Automated System For Integrated Assessment Of The Conditions Of Underwater Gas Pipelines

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An underwater gas pipeline is the portion of the pipeline that crosses a river beneath its bottom. Underwater gas pipelines are subject to increasing dangers as time goes by. An accident at an underwater gas pipeline can lead to technological and environmental disaster on the scale of an entire region. Therefore, timely troubleshooting of all underwater gas pipelines in order to prevent any potential accidents will remain a pressing task for the industry. The most important aspect of resolving this challenge is the quality of the automated system in question. Now the industry doesn't have any automated system that fully meets the needs of the experts working in the field maintaining underwater gas pipelines. Principle Aim of this Research: This work aims to develop a new system of automated monitoring which would simplify the process of evaluating the technical condition and decision making on planning and preventive maintenance and repair work on the underwater gas pipeline. Objectives: Creation a shared model for a new, automated system via IDEF3; Development of a new database system which would store all information about underwater gas pipelines; Development a new application that works with database servers, and provides an explanation of the results obtained from the server; Calculation of the values MTBF for specified pipelines based on quantitative data obtained from tests of this system. Conclusion: The new, automated system PodvodGazExpert has been developed for timely and qualitative determination of the physical conditions of underwater gas pipeline; The basis of the mathematical analysis of this new, automated system uses principal component analysis method; The process of determining the physical condition of an underwater gas pipeline with this new, automated system increases the MTBF by a factor of 8.18 above the existing system used today in the industry.

Business Intelligence: comparação de ferramentas

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cada vez mais o tempo acaba sendo o diferencial de uma empresa para outra. As empresas, para serem bem sucedidas, precisam da informação certa, no momento certo e para as pessoas certas. Os dados outrora considerados importantes para a sobrevivência das empresas hoje precisam estar em formato de informações para serem utilizados. Essa é a função das ferramentas de “Business Intelligence”, cuja finalidade é modelar os dados para obter informações, de forma que diferencie as ações das empresas e essas consigam ser mais promissoras que as demais. “Business Intelligence” é um processo de coleta, análise e distribuição de dados para melhorar a decisão de negócios, que leva a informação a um número bem maior de usuários dentro da corporação. Existem vários tipos de ferramentas que se propõe a essa finalidade. Esse trabalho tem como objetivo comparar ferramentas através do estudo das técnicas de modelagem dimensional, fundamentais nos projetos de estruturas informacionais, suporte a “Data Warehouses”, “Data Marts”, “Data Mining” e outros, bem como o mercado, suas vantagens e desvantagens e a arquitetura tecnológica utilizada por estes produtos. Assim sendo, foram selecionados os conjuntos de ferramentas de “Business Intelligence” das empresas Microsoft Corporation e Oracle Corporation, visto as suas magnitudes no mundo da informática.

Como tomadores de decisão experts percebem cenários complexos?

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis, the basic research of Chase and Simon (1973) is questioned, and we seek new results by analyzing the errors of experts and beginners chess players in experiments to reproduce chess positions. Chess players with different levels of expertise participated in the study. The results were analyzed by a Brazilian grandmaster, and quantitative analysis was performed with the use of statistical methods data mining. The results challenge significantly, the current theories of expertise, memory and decision making in this area, because the present theory predicts piece on square encoding, in which players can recognize the strategic situation reproducing it faithfully, but commit several errors that the theory can¿t explain. The current theory can¿t fully explain the encoding used by players to register a board. The errors of intermediary players preserved fragments of the strategic situation, although they have committed a series of errors in the reconstruction of the positions. The encoding of chunks therefore includes more information than that predicted by current theories. Currently, research on perception, trial and decision is heavily concentrated on the idea of pattern recognition". Based on the results of this research, we explore a change of perspective. The idea of "pattern recognition" presupposes that the processing of relevant information is on "patterns" (or data) that exist independently of any interpretation. We propose that the theory suggests the vision of decision-making via the recognition of experience.

Autodeterminação local e desenvolvimento: uma análise da dinâmica social no município de São Roque de Minas

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis, the basic research of Chase and Simon (1973) is questioned, and we seek new results by analyzing the errors of experts and beginners chess players in experiments to reproduce chess positions. Chess players with different levels of expertise participated in the study. The results were analyzed by a Brazilian grandmaster, and quantitative analysis was performed with the use of statistical methods data mining. The results challenge significantly, the current theories of expertise, memory and decision making in this area, because the present theory predicts piece on square encoding, in which players can recognize the strategic situation reproducing it faithfully, but commit several errors that the theory can¿t explain. The current theory can¿t fully explain the encoding used by players to register a board. The errors of intermediary players preserved fragments of the strategic situation, although they have committed a series of errors in the reconstruction of the positions. The encoding of chunks therefore includes more information than that predicted by current theories. Currently, research on perception, trial and decision is heavily concentrated on the idea of 'pattern recognition'. Based on the results of this research, we explore a change of perspective. The idea of 'pattern recognition' presupposes that the processing of relevant information is on 'patterns' (or data) that exist independently of any interpretation. We propose that the theory suggests the vision of decision-making via the recognition of experience.

Aplicação de modelos gráficos probabilísticos computacionais em economia

Relevância:

100.00% 100.00%

Publicador:

Resumo:

O objetivo deste trabalho é testar a aplicação de um modelo gráfico probabilístico, denominado genericamente de Redes Bayesianas, para desenvolver modelos computacionais que possam ser utilizados para auxiliar a compreensão de problemas e/ou na previsão de variáveis de natureza econômica. Com este propósito, escolheu-se um problema amplamente abordado na literatura e comparou-se os resultados teóricos e experimentais já consolidados com os obtidos utilizando a técnica proposta. Para tanto,foi construído um modelo para a classificação da tendência do "risco país" para o Brasil a partir de uma base de dados composta por variáveis macroeconômicas e financeiras. Como medida do risco adotou-se o EMBI+ (Emerging Markets Bond Index Plus), por ser um indicador amplamente utilizado pelo mercado.

GVwise: uma aplicação de learning analytics para a redução da evasão na educação à distância

Relevância:

100.00% 100.00%

Publicador:

«
1
2
...
44
45
46
47
48
49
50
51
52
»