852 resultados para 080109 Pattern Recognition and Data Mining


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Systems Engineering often involves computer modelling the behaviour of proposed systems and their components. Where a component is human, fallibility must be modelled by a stochastic agent. The identification of a model of decision-making over quantifiable options is investigated using the game-domain of Chess. Bayesian methods are used to infer the distribution of players’ skill levels from the moves they play rather than from their competitive results. The approach is used on large sets of games by players across a broad FIDE Elo range, and is in principle applicable to any scenario where high-value decisions are being made under pressure.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many weeds occur in patches but farmers frequently spray whole fields to control the weeds in these patches. Given a geo-referenced weed map, technology exists to confine spraying to these patches. Adoption of patch spraying by arable farmers has, however, been negligible partly due to the difficulty of constructing weed maps. Building on previous DEFRA and HGCA projects, this proposal aims to develop and evaluate a machine vision system to automate the weed mapping process. The project thereby addresses the principal technical stumbling block to widespread adoption of site specific weed management (SSWM). The accuracy of weed identification by machine vision based on a single field survey may be inadequate to create herbicide application maps. We therefore propose to test the hypothesis that sufficiently accurate weed maps can be constructed by integrating information from geo-referenced images captured automatically at different times of the year during normal field activities. Accuracy of identification will also be increased by utilising a priori knowledge of weeds present in fields. To prove this concept, images will be captured from arable fields on two farms and processed offline to identify and map the weeds, focussing especially on black-grass, wild oats, barren brome, couch grass and cleavers. As advocated by Lutman et al. (2002), the approach uncouples the weed mapping and treatment processes and builds on the observation that patches of these weeds are quite stable in arable fields. There are three main aspects to the project. 1) Machine vision hardware. Hardware component parts of the system are one or more cameras connected to a single board computer (Concurrent Solutions LLC) and interfaced with an accurate Global Positioning System (GPS) supplied by Patchwork Technology. The camera(s) will take separate measurements for each of the three primary colours of visible light (red, green and blue) in each pixel. The basic proof of concept can be achieved in principle using a single camera system, but in practice systems with more than one camera may need to be installed so that larger fractions of each field can be photographed. Hardware will be reviewed regularly during the project in response to feedback from other work packages and updated as required. 2) Image capture and weed identification software. The machine vision system will be attached to toolbars of farm machinery so that images can be collected during different field operations. Images will be captured at different ground speeds, in different directions and at different crop growth stages as well as in different crop backgrounds. Having captured geo-referenced images in the field, image analysis software will be developed to identify weed species by Murray State and Reading Universities with advice from The Arable Group. A wide range of pattern recognition and in particular Bayesian Networks will be used to advance the state of the art in machine vision-based weed identification and mapping. Weed identification algorithms used by others are inadequate for this project as we intend to collect and correlate images collected at different growth stages. Plants grown for this purpose by Herbiseed will be used in the first instance. In addition, our image capture and analysis system will include plant characteristics such as leaf shape, size, vein structure, colour and textural pattern, some of which are not detectable by other machine vision systems or are omitted by their algorithms. Using such a list of features observable using our machine vision system, we will determine those that can be used to distinguish weed species of interest. 3) Weed mapping. Geo-referenced maps of weeds in arable fields (Reading University and Syngenta) will be produced with advice from The Arable Group and Patchwork Technology. Natural infestations will be mapped in the fields but we will also introduce specimen plants in pots to facilitate more rigorous system evaluation and testing. Manual weed maps of the same fields will be generated by Reading University, Syngenta and Peter Lutman so that the accuracy of automated mapping can be assessed. The principal hypothesis and concept to be tested is that by combining maps from several surveys, a weed map with acceptable accuracy for endusers can be produced. If the concept is proved and can be commercialised, systems could be retrofitted at low cost onto existing farm machinery. The outputs of the weed mapping software would then link with the precision farming options already built into many commercial sprayers, allowing their use for targeted, site-specific herbicide applications. Immediate economic benefits would, therefore, arise directly from reducing herbicide costs. SSWM will also reduce the overall pesticide load on the crop and so may reduce pesticide residues in food and drinking water, and reduce adverse impacts of pesticides on non-target species and beneficials. Farmers may even choose to leave unsprayed some non-injurious, environmentally-beneficial, low density weed infestations. These benefits fit very well with the anticipated legislation emerging in the new EU Thematic Strategy for Pesticides which will encourage more targeted use of pesticides and greater uptake of Integrated Crop (Pest) Management approaches, and also with the requirements of the Water Framework Directive to reduce levels of pesticides in water bodies. The greater precision of weed management offered by SSWM is therefore a key element in preparing arable farming systems for the future, where policy makers and consumers want to minimise pesticide use and the carbon footprint of farming while maintaining food production and security. The mapping technology could also be used on organic farms to identify areas of fields needing mechanical weed control thereby reducing both carbon footprints and also damage to crops by, for example, spring tines. Objective i. To develop a prototype machine vision system for automated image capture during agricultural field operations; ii. To prove the concept that images captured by the machine vision system over a series of field operations can be processed to identify and geo-reference specific weeds in the field; iii. To generate weed maps from the geo-referenced, weed plants/patches identified in objective (ii).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of n-tuple or weightless neural networks as pattern recognition devices is well known (Aleksander and Stonham, 1979). They have some significant advantages over the more common and biologically plausible networks, such as multi-layer perceptrons; for example, n-tuple networks have been used for a variety of tasks, the most popular being real-time pattern recognition, and they can be implemented easily in hardware as they use standard random access memories. In operation, a series of images of an object are shown to the network, each being processed suitably and effectively stored in a memory called a discriminator. Then, when another image is shown to the system, it is processed in a similar manner and the system reports whether it recognises the image; is the image sufficiently similar to one already taught? If the system is to be able to recognise and discriminate between m-objects, then it must contain m-discriminators. This can require a great deal of memory. This paper describes various ways in which memory requirements can be reduced, including a novel method for multiple discriminator n-tuple networks used for pattern recognition. By using this method, the memory normally required to handle m-objects can be used to recognise and discriminate between 2^m — 2 objects.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Stochastic Diffusion Search algorithm -an integral part of Stochastic Search Networks is investigated. Stochastic Diffusion Search is an alternative solution for invariant pattern recognition and focus of attention. It has been shown that the algorithm can be modelled as an ergodic, finite state Markov Chain under some non-restrictive assumptions. Sub-linear time complexity for some settings of parameters has been formulated and proved. Some properties of the algorithm are then characterised and numerical examples illustrating some features of the algorithm are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Since their inception, Twitter and related microblogging systems have provided a rich source of information for researchers and have attracted interest in their affordances and use. Since 2009 PubMed has included 123 journal articles on medicine and Twitter, but no overview exists as to how the field uses Twitter in research. // Objective: This paper aims to identify published work relating to Twitter indexed by PubMed, and then to classify it. This classification will provide a framework in which future researchers will be able to position their work, and to provide an understanding of the current reach of research using Twitter in medical disciplines. Limiting the study to papers indexed by PubMed ensures the work provides a reproducible benchmark. // Methods: Papers, indexed by PubMed, on Twitter and related topics were identified and reviewed. The papers were then qualitatively classified based on the paper’s title and abstract to determine their focus. The work that was Twitter focused was studied in detail to determine what data, if any, it was based on, and from this a categorization of the data set size used in the studies was developed. Using open coded content analysis additional important categories were also identified, relating to the primary methodology, domain and aspect. // Results: As of 2012, PubMed comprises more than 21 million citations from biomedical literature, and from these a corpus of 134 potentially Twitter related papers were identified, eleven of which were subsequently found not to be relevant. There were no papers prior to 2009 relating to microblogging, a term first used in 2006. Of the remaining 123 papers which mentioned Twitter, thirty were focussed on Twitter (the others referring to it tangentially). The early Twitter focussed papers introduced the topic and highlighted the potential, not carrying out any form of data analysis. The majority of published papers used analytic techniques to sort through thousands, if not millions, of individual tweets, often depending on automated tools to do so. Our analysis demonstrates that researchers are starting to use knowledge discovery methods and data mining techniques to understand vast quantities of tweets: the study of Twitter is becoming quantitative research. // Conclusions: This work is to the best of our knowledge the first overview study of medical related research based on Twitter and related microblogging. We have used five dimensions to categorise published medical related research on Twitter. This classification provides a framework within which researchers studying development and use of Twitter within medical related research, and those undertaking comparative studies of research relating to Twitter in the area of medicine and beyond, can position and ground their work.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Texture is one of the most important visual attributes used in image analysis. It is used in many content-based image retrieval systems, where it allows the identification of a larger number of images from distinct origins. This paper presents a novel approach for image analysis and retrieval based on complexity analysis. The approach consists of a texture segmentation step, performed by complexity analysis through BoxCounting fractal dimension, followed by the estimation of complexity of each computed region by multiscale fractal dimension. Experiments have been performed with MRI database in both pattern recognition and image retrieval contexts. Results show the accuracy of the method and also indicate how the performance changes as the texture segmentation process is altered.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Texture is an important visual attribute used to describe the pixel organization in an image. As well as it being easily identified by humans, its analysis process demands a high level of sophistication and computer complexity. This paper presents a novel approach for texture analysis, based on analyzing the complexity of the surface generated from a texture, in order to describe and characterize it. The proposed method produces a texture signature which is able to efficiently characterize different texture classes. The paper also illustrates a novel method performance on an experiment using texture images of leaves. Leaf identification is a difficult and complex task due to the nature of plants, which presents a huge pattern variation. The high classification rate yielded shows the potential of the method, improving on traditional texture techniques, such as Gabor filters and Fourier analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Due to the free nature of Wikipedia and allowing open access to everyone to edit articles the quality of articles may be affected. As all people don’t have equal level of knowledge and also different people have different opinions about a topic so there may be difference between the contributions made by different authors. To overcome this situation it is very important to classify the articles so that the articles of good quality can be separated from the poor quality articles and should be removed from the database. The aim of this study is to classify the articles of Wikipedia into two classes class 0 (poor quality) and class 1(good quality) using the Adaptive Neuro Fuzzy Inference System (ANFIS) and data mining techniques. Two ANFIS are built using the Fuzzy Logic Toolbox [1] available in Matlab. The first ANFIS is based on the rules obtained from J48 classifier in WEKA while the other one was built by using the expert’s knowledge. The data used for this research work contains 226 article’s records taken from the German version of Wikipedia. The dataset consists of 19 inputs and one output. The data was preprocessed to remove any similar attributes. The input variables are related to the editors, contributors, length of articles and the lifecycle of articles. In the end analysis of different methods implemented in this research is made to analyze the performance of each classification method used.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The domain of Knowledge Discovery (KD) and Data Mining (DM) is of growing importance in a time where more and more data is produced and knowledge is one of the most precious assets. Having explored both the existing underlying theory, the results of the ongoing research in academia and the industry practices in the domain of KD and DM, we have found that this is a domain that still lacks some systematization. We also found that this systematization exists to a greater degree in the Software Engineering and Requirements Engineering domains, probably due to being more mature areas. We believe that it is possible to improve and facilitate the participation of enterprise stakeholders in the requirements engineering for KD projects by systematizing requirements engineering process for such projects. This will, in turn, result in more projects that end successfully, that is, with satisfied stakeholders, including in terms of time and budget constraints. With this in mind and based on all information found in the state-of-the art, we propose SysPRE - Systematized Process for Requirements Engineering in KD projects. We begin by proposing an encompassing generic description of the KD process, where the main focus is on the Requirements Engineering activities. This description is then used as a base for the application of the Design and Engineering Methodology for Organizations (DEMO) so that we can specify a formal ontology for this process. The resulting SysPRE ontology can serve as a base that can be used not only to make enterprises become aware of their own KD process and requirements engineering process in the KD projects, but also to improve such processes in reality, namely in terms of success rate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a methodology for edge detection in digital images using the Canny detector, but associated with a priori edge structure focusing by a nonlinear anisotropic diffusion via the partial differential equation (PDE). This strategy aims at minimizing the effect of the well-known duality of the Canny detector, under which is not possible to simultaneously enhance the insensitivity to image noise and the localization precision of detected edges. The process of anisotropic diffusion via thePDE is used to a priori focus the edge structure due to its notable characteristic in selectively smoothing the image, leaving the homogeneous regions strongly smoothed and mainly preserving the physical edges, i.e., those that are actually related to objects presented in the image. The solution for the mentioned duality consists in applying the Canny detector to a fine gaussian scale but only along the edge regions focused by the process of anisotropic diffusion via the PDE. The results have shown that the method is appropriate for applications involving automatic feature extraction, since it allowed the high-precision localization of thinned edges, which are usually related to objects present in the image. © Nauka/Interperiodica 2006.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pós-graduação em Desenvolvimento Humano e Tecnologias - IBRC

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While dozens of classification algorithms have been applied to time series, recent empirical evidence strongly suggests that simple nearest neighbor classification is exceptionally difficult to beat. The choice of distance measure used by the nearest neighbor algorithm is important, and depends on the invariances required by the domain. For example, motion capture data typically requires invariance to warping, and cardiology data requires invariance to the baseline (the mean value). Similarly, recent work suggests that for time series clustering, the choice of clustering algorithm is much less important than the choice of distance measure used.In this work we make a somewhat surprising claim. There is an invariance that the community seems to have missed, complexity invariance. Intuitively, the problem is that in many domains the different classes may have different complexities, and pairs of complex objects, even those which subjectively may seem very similar to the human eye, tend to be further apart under current distance measures than pairs of simple objects. This fact introduces errors in nearest neighbor classification, where some complex objects may be incorrectly assigned to a simpler class. Similarly, for clustering this effect can introduce errors by “suggesting” to the clustering algorithm that subjectively similar, but complex objects belong in a sparser and larger diameter cluster than is truly warranted.We introduce the first complexity-invariant distance measure for time series, and show that it generally produces significant improvements in classification and clustering accuracy. We further show that this improvement does not compromise efficiency, since we can lower bound the measure and use a modification of triangular inequality, thus making use of most existing indexing and data mining algorithms. We evaluate our ideas with the largest and most comprehensive set of time series mining experiments ever attempted in a single work, and show that complexity-invariant distance measures can produce improvements in classification and clustering in the vast majority of cases.