786 resultados para Data mining models


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Presentar métodos y técnicas aplicables al estudio de la Historia con el fin de estimular los aspectos cuantitativos de la misma. Se hace un repaso de varias tecnologías existentes en la actualidad incidiendo en alguna de ellas como text-data mining, redes neuronales, autómatas celulares, agentes autónomos, JAVA. Se intenta la extrapolación de teorías y desarrollos sobre vida artificial, teoría de la complejidad y teoría del caos para su incorporación al análisis de la Historia. La presión de visualizar el espacio Historia lleva a describir varias técnicas y software para la representación de acontecimientos sociales como los GIS, fractales, software AVIDA etc. Se contempla Internet como herramienta de colaboración y fuente de recursos incidiendo en la necesidad de saber buscar en la red. Se presentan una gran cantidad de recursos tanto software como bibliográficos y direcciones electrónicas para el fin que se persigue. Se insiste en la necesidad de la interdisciplinariedad e hibridación de conocimientos como proceso para el desarrollo de la Historia como ciencia y en la necesidad de incorporar a los currícula la enseñanza de estas técnicas.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Resumen tomado de la publicación

Relevância:

80.00% 80.00%

Publicador:

Resumo:

El treball desenvolupat en aquesta tesi presenta un profund estudi i proveïx solucions innovadores en el camp dels sistemes recomanadors. Els mètodes que usen aquests sistemes per a realitzar les recomanacions, mètodes com el Filtrat Basat en Continguts (FBC), el Filtrat Col·laboratiu (FC) i el Filtrat Basat en Coneixement (FBC), requereixen informació dels usuaris per a predir les preferències per certs productes. Aquesta informació pot ser demogràfica (Gènere, edat, adreça, etc), o avaluacions donades sobre algun producte que van comprar en el passat o informació sobre els seus interessos. Existeixen dues formes d'obtenir aquesta informació: els usuaris ofereixen explícitament aquesta informació o el sistema pot adquirir la informació implícita disponible en les transaccions o historial de recerca dels usuaris. Per exemple, el sistema recomanador de pel·lícules MovieLens (http://movielens.umn.edu/login) demana als usuaris que avaluïn almenys 15 pel·lícules dintre d'una escala de * a * * * * * (horrible, ...., ha de ser vista). El sistema genera recomanacions sobre la base d'aquestes avaluacions. Quan els usuaris no estan registrat en el sistema i aquest no té informació d'ells, alguns sistemes realitzen les recomanacions tenint en compte l'historial de navegació. Amazon.com (http://www.amazon.com) realitza les recomanacions tenint en compte les recerques que un usuari a fet o recomana el producte més venut. No obstant això, aquests sistemes pateixen de certa falta d'informació. Aquest problema és generalment resolt amb l'adquisició d'informació addicional, se li pregunta als usuaris sobre els seus interessos o es cerca aquesta informació en fonts addicionals. La solució proposada en aquesta tesi és buscar aquesta informació en diverses fonts, específicament aquelles que contenen informació implícita sobre les preferències dels usuaris. Aquestes fonts poden ser estructurades com les bases de dades amb informació de compres o poden ser no estructurades com les pàgines web on els usuaris deixen la seva opinió sobre algun producte que van comprar o posseïxen. Nosaltres trobem tres problemes fonamentals per a aconseguir aquest objectiu: 1 . La identificació de fonts amb informació idònia per als sistemes recomanadors. 2 . La definició de criteris que permetin la comparança i selecció de les fonts més idònies. 3 . La recuperació d'informació de fonts no estructurades. En aquest sentit, en la tesi proposada s'ha desenvolupat: 1 . Una metodologia que permet la identificació i selecció de les fonts més idònies. Criteris basats en les característiques de les fonts i una mesura de confiança han estat utilitzats per a resoldre el problema de la identificació i selecció de les fonts. 2 . Un mecanisme per a recuperar la informació no estructurada dels usuaris disponible en la web. Tècniques de Text Mining i ontologies s'han utilitzat per a extreure informació i estructurar-la apropiadament perquè la utilitzin els recomanadors. Les contribucions del treball desenvolupat en aquesta tesi doctoral són: 1. Definició d'un conjunt de característiques per a classificar fonts rellevants per als sistemes recomanadors 2. Desenvolupament d'una mesura de rellevància de les fonts calculada sobre la base de les característiques definides 3. Aplicació d'una mesura de confiança per a obtenir les fonts més fiables. La confiança es definida des de la perspectiva de millora de la recomanació, una font fiable és aquella que permet millorar les recomanacions. 4. Desenvolupament d'un algorisme per a seleccionar, des d'un conjunt de fonts possibles, les més rellevants i fiable utilitzant les mitjanes esmentades en els punts previs. 5. Definició d'una ontologia per a estructurar la informació sobre les preferències dels usuaris que estan disponibles en Internet. 6. Creació d'un procés de mapatge que extreu automàticament informació de les preferències dels usuaris disponibles en la web i posa aquesta informació dintre de l'ontologia. Aquestes contribucions permeten aconseguir dos objectius importants: 1 . Millorament de les recomanacions usant fonts d'informació alternatives que sigui rellevants i fiables. 2 . Obtenir informació implícita dels usuaris disponible en Internet.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Embora o objectivo de redução de acidentes laborais seja frequentemente invocado para justificar uma aplicação preventiva de testes de álcool e drogas no trabalho, há poucas evidências estatisticamente relevantes do pressuposto nexo de causalidade e correlação negativa entre a sujeição aos testes e os posteriores acidentes. Os dados dos testes e dos acidentes ocorridos com os colaboradores de uma empresa transportadora portuguesa, durante anos recentes, são explorados, em busca de relações entre estas e outras variáveis biográficas. Os resultados preliminares obtidos sugerem que a sujeição a testes aleatórios no local de trabalho está associada a menos acidentes posteriores que os ocorridos na ausência desses testes, e que existe uma frequência óptima de testes acima da qual não se verifica redução de acidentes que justifique o investimento em aumento de testes. - Although the aim of reducing occupational accidents is frequently cited to justify preventive drug and alcohol testing at work, there is little statistically significant evidence of the assumed causality relationship and negative correlation between exposure to testing and subsequent accidents. Data mining of tests and accidents involving employees of a Portuguese transportation company, during recent years, searches for relations between these and other biographical variables. Preliminary results indicate that being subjected to random testing in the workplace is associated with fewer subsequent accidents that occur in the absence of such tests, and also that there is an optimum frequency of tests, above which there is no reduction of accidents to justify an increase of investment in testing.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Systems Engineering often involves computer modelling the behaviour of proposed systems and their components. Where a component is human, fallibility must be modelled by a stochastic agent. The identification of a model of decision-making over quantifiable options is investigated using the game-domain of Chess. Bayesian methods are used to infer the distribution of players’ skill levels from the moves they play rather than from their competitive results. The approach is used on large sets of games by players across a broad FIDE Elo range, and is in principle applicable to any scenario where high-value decisions are being made under pressure.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Facilitating the visual exploration of scientific data has received increasing attention in the past decade or so. Especially in life science related application areas the amount of available data has grown at a breath taking pace. In this paper we describe an approach that allows for visual inspection of large collections of molecular compounds. In contrast to classical visualizations of such spaces we incorporate a specific focus of analysis, for example the outcome of a biological experiment such as high throughout screening results. The presented method uses this experimental data to select molecular fragments of the underlying molecules that have interesting properties and uses the resulting space to generate a two dimensional map based on a singular value decomposition algorithm and a self organizing map. Experiments on real datasets show that the resulting visual landscape groups molecules of similar chemical properties in densely connected regions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Clustering is defined as the grouping of similar items in a set, and is an important process within the field of data mining. As the amount of data for various applications continues to increase, in terms of its size and dimensionality, it is necessary to have efficient clustering methods. A popular clustering algorithm is K-Means, which adopts a greedy approach to produce a set of K-clusters with associated centres of mass, and uses a squared error distortion measure to determine convergence. Methods for improving the efficiency of K-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting a more efficient data structure, notably a multi-dimensional binary search tree (KD-Tree) to store either centroids or data points. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient K-Means techniques in parallel computational environments. In this work, we provide a parallel formulation for the KD-Tree based K-Means algorithm and address its load balancing issues.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The existence of endgame databases challenges us to extract higher-grade information and knowledge from their basic data content. Chess players, for example, would like simple and usable endgame theories if such holy grail exists: endgame experts would like to provide such insights and be inspired by computers to do so. Here, we investigate the use of artificial neural networks (NNs) to mine these databases and we report on a first use of NNs on KPK. The results encourage us to suggest further work on chess applications of neural networks and other data-mining techniques.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The van der Heijden Studies Database has been reviewed to identify 'Draw Studies' with sub-7-man positions in the main line which are not draws. The data-mining method is described. Some 1,500 studies were faulted, 700 for the first time: 14 of the more interesting faults are highlighted and discussed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This is a review of progress in the Chess Endgame field. It includes news of the promulgation of Endgame Tables, their use, non-use and potential runtime creation. It includes news of data-mining achievements related to 7-man chess and to the field of Chess Studies. It includes news of an algorithm to create Endgame Tables for variants of the normal game of chess.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper analyses the cut flower market as an example of an invasion pathway along which species of non-indigenous plant pests can travel to reach new areas. The paper examines the probability of pest detection by assessing information on pest detection and detection effort associated with the import of cut flowers. We test the link between the probability of plant pest arrivals as a precursor to potential invasion, and volume of traded flowers using count data regression models. The analysis is applied to the UK import of specific genera of cut flowers form Kenya between 1996 and 2004. There is a link between pest detection and the Genus of cut flower imported. Hence, pest detection efforts should focus on identifying and targeting those imported plants with a high risk of carrying pest species. For most of the plants studied efforts allocated to inspection have a significant influence on the probabilty of pest detction. However, by better targetting inspection efforts, it is shown that plant inspection effort could be reduced without increasing the risk of pest entry. Similarly, for most of the plants analysed, an increase in volume traded will not necessarily lead to an increase in the number of pests entering the UK. For some species, such as conclude that analysis at the rank of plant Genus is important both to understand the effectiveness of plant pest detection efforts and consequently to manage the risk of introduction of non-indigenous species.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Current feed evaluation systems for dairy cattle aim to match nutrient requirements with nutrient intake at pre-defined production levels. These systems were not developed to address, and are not suitable to predict, the responses to dietary changes in terms of production level and product composition, excretion of nutrients to the environment, and nutrition related disorders. The change from a requirement to a response system to meet the needs of various stakeholders requires prediction of the profile of absorbed nutrients and its subsequent utilisation for various purposes. This contribution examines the challenges to predicting the profile of nutrients available for absorption in dairy cattle and provides guidelines for further improved prediction with regard to animal production responses and environmental pollution. The profile of nutrients available for absorption comprises volatile fatty acids, long-chain fatty acids, amino acids and glucose. Thus the importance of processes in the reticulo-rumen is obvious. Much research into rumen fermentation is aimed at determination of substrate degradation rates. Quantitative knowledge on rates of passage of nutrients out of the rumen is rather limited compared with that on degradation rates, and thus should be an important theme in future research. Current systems largely ignore microbial metabolic variation, and extant mechanistic models of rumen fermentation give only limited attention to explicit representation of microbial metabolic activity. Recent molecular techniques indicate that knowledge on the presence and activity of various microbial species is far from complete. Such techniques may give a wealth of information, but to include such findings in systems predicting the nutrient profile requires close collaboration between molecular scientists and mathematical modellers on interpreting and evaluating quantitative data. Protozoal metabolism is of particular interest here given the paucity of quantitative data. Empirical models lack the biological basis necessary to evaluate mitigation strategies to reduce excretion of waste, including nitrogen, phosphorus and methane. Such models may have little predictive value when comparing various feeding strategies. Examples include the Intergovernmental Panel on Climate Change (IPCC) Tier II models to quantify methane emissions and current protein evaluation systems to evaluate low protein diets to reduce nitrogen losses to the environment. Nutrient based mechanistic models can address such issues. Since environmental issues generally attract more funding from governmental offices, further development of nutrient based models may well take place within an environmental framework.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In any data mining applications, automated text and text and image retrieval of information is needed. This becomes essential with the growth of the Internet and digital libraries. Our approach is based on the latent semantic indexing (LSI) and the corresponding term-by-document matrix suggested by Berry and his co-authors. Instead of using deterministic methods to find the required number of first "k" singular triplets, we propose a stochastic approach. First, we use Monte Carlo method to sample and to build much smaller size term-by-document matrix (e.g. we build k x k matrix) from where we then find the first "k" triplets using standard deterministic methods. Second, we investigate how we can reduce the problem to finding the "k"-largest eigenvalues using parallel Monte Carlo methods. We apply these methods to the initial matrix and also to the reduced one. The algorithms are running on a cluster of workstations under MPI and results of the experiments arising in textual retrieval of Web documents as well as comparison of the stochastic methods proposed are presented. (C) 2003 IMACS. Published by Elsevier Science B.V. All rights reserved.