913 resultados para 004 - Informatik (Data processing Computer science)
Resumo:
Geospatial clustering must be designed in such a way that it takes into account the special features of geoinformation and the peculiar nature of geographical environments in order to successfully derive geospatially interesting global concentrations and localized excesses. This paper examines families of geospaital clustering recently proposed in the data mining community and identifies several features and issues especially important to geospatial clustering in data-rich environments.
Resumo:
Unauthorized accesses to digital contents are serious threats to international security and informatics. We propose an offline oblivious data distribution framework that preserves the sender's security and the receiver's privacy using tamper-proof smart cards. This framework provides persistent content protections from digital piracy and promises private content consumption.
Resumo:
In this work, we take advantage of association rule mining to support two types of medical systems: the Content-based Image Retrieval (CBIR) systems and the Computer-Aided Diagnosis (CAD) systems. For content-based retrieval, association rules are employed to reduce the dimensionality of the feature vectors that represent the images and to improve the precision of the similarity queries. We refer to the association rule-based method to improve CBIR systems proposed here as Feature selection through Association Rules (FAR). To improve CAD systems, we propose the Image Diagnosis Enhancement through Association rules (IDEA) method. Association rules are employed to suggest a second opinion to the radiologist or a preliminary diagnosis of a new image. A second opinion automatically obtained can either accelerate the process of diagnosing or to strengthen a hypothesis, increasing the probability of a prescribed treatment be successful. Two new algorithms are proposed to support the IDEA method: to pre-process low-level features and to propose a preliminary diagnosis based on association rules. We performed several experiments to validate the proposed methods. The results indicate that association rules can be successfully applied to improve CBIR and CAD systems, empowering the arsenal of techniques to support medical image analysis in medical systems. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
In this paper, we propose a method based on association rule-mining to enhance the diagnosis of medical images (mammograms). It combines low-level features automatically extracted from images and high-level knowledge from specialists to search for patterns. Our method analyzes medical images and automatically generates suggestions of diagnoses employing mining of association rules. The suggestions of diagnosis are used to accelerate the image analysis performed by specialists as well as to provide them an alternative to work on. The proposed method uses two new algorithms, PreSAGe and HiCARe. The PreSAGe algorithm combines, in a single step, feature selection and discretization, and reduces the mining complexity. Experiments performed on PreSAGe show that this algorithm is highly suitable to perform feature selection and discretization in medical images. HiCARe is a new associative classifier. The HiCARe algorithm has an important property that makes it unique: it assigns multiple keywords per image to suggest a diagnosis with high values of accuracy. Our method was applied to real datasets, and the results show high sensitivity (up to 95%) and accuracy (up to 92%), allowing us to claim that the use of association rules is a powerful means to assist in the diagnosing task.
Resumo:
With the proliferation of relational database programs for PC's and other platforms, many business end-users are creating, maintaining, and querying their own databases. More importantly, business end-users use the output of these queries as the basis for operational, tactical, and strategic decisions. Inaccurate data reduce the expected quality of these decisions. Implementing various input validation controls, including higher levels of normalisation, can reduce the number of data anomalies entering the databases. Even in well-maintained databases, however, data anomalies will still accumulate. To improve the quality of data, databases can be queried periodically to locate and correct anomalies. This paper reports the results of two experiments that investigated the effects of different data structures on business end-users' abilities to detect data anomalies in a relational database. The results demonstrate that both unnormalised and higher levels of normalisation lower the effectiveness and efficiency of queries relative to the first normal form. First normal form databases appear to provide the most effective and efficient data structure for business end-users formulating queries to detect data anomalies.
Resumo:
Map algebra is a data model and simple functional notation to study the distribution and patterns of spatial phenomena. It uses a uniform representation of space as discrete grids, which are organized into layers. This paper discusses extensions to map algebra to handle neighborhood operations with a new data type called a template. Templates provide general windowing operations on grids to enable spatial models for cellular automata, mathematical morphology, and local spatial statistics. A programming language for map algebra that incorporates templates and special processing constructs is described. The programming language is called MapScript. Example program scripts are presented to perform diverse and interesting neighborhood analysis for descriptive, model-based and processed-based analysis.
Resumo:
Most Internet search engines are keyword-based. They are not efficient for the queries where geographical location is important, such as finding hotels within an area or close to a place of interest. A natural interface for spatial searching is a map, which can be used not only to display locations of search results but also to assist forming search conditions. A map-based search engine requires a well-designed visual interface that is intuitive to use yet flexible and expressive enough to support various types of spatial queries as well as aspatial queries. Similar to hyperlinks for text and images in an HTML page, spatial objects in a map should support hyperlinks. Such an interface needs to be scalable with the size of the geographical regions and the number of websites it covers. In spite of handling typically a very large amount of spatial data, a map-based search interface should meet the expectation of fast response time for interactive applications. In this paper we discuss general requirements and the design for a new map-based web search interface, focusing on integration with the WWW and visual spatial query interface. A number of current and future research issues are discussed, and a prototype for the University of Queensland is presented. (C) 2001 Published by Elsevier Science Ltd.
Resumo:
Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571-578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.
Resumo:
Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.
Resumo:
Observations of an insect's movement lead to theory on the insect's flight behaviour and the role of movement in the species' population dynamics. This theory leads to predictions of the way the population changes in time under different conditions. If a hypothesis on movement predicts a specific change in the population, then the hypothesis can be tested against observations of population change. Routine pest monitoring of agricultural crops provides a convenient source of data for studying movement into a region and among fields within a region. Examples of the use of statistical and computational methods for testing hypotheses with such data are presented. The types of questions that can be addressed with these methods and the limitations of pest monitoring data when used for this purpose are discussed. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
We focus on mixtures of factor analyzers from the perspective of a method for model-based density estimation from high-dimensional data, and hence for the clustering of such data. This approach enables a normal mixture model to be fitted to a sample of n data points of dimension p, where p is large relative to n. The number of free parameters is controlled through the dimension of the latent factor space. By working in this reduced space, it allows a model for each component-covariance matrix with complexity lying between that of the isotropic and full covariance structure models. We shall illustrate the use of mixtures of factor analyzers in a practical example that considers the clustering of cell lines on the basis of gene expressions from microarray experiments. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
In microarray studies, the application of clustering techniques is often used to derive meaningful insights into the data. In the past, hierarchical methods have been the primary clustering tool employed to perform this task. The hierarchical algorithms have been mainly applied heuristically to these cluster analysis problems. Further, a major limitation of these methods is their inability to determine the number of clusters. Thus there is a need for a model-based approach to these. clustering problems. To this end, McLachlan et al. [7] developed a mixture model-based algorithm (EMMIX-GENE) for the clustering of tissue samples. To further investigate the EMMIX-GENE procedure as a model-based -approach, we present a case study involving the application of EMMIX-GENE to the breast cancer data as studied recently in van 't Veer et al. [10]. Our analysis considers the problem of clustering the tissue samples on the basis of the genes which is a non-standard problem because the number of genes greatly exceed the number of tissue samples. We demonstrate how EMMIX-GENE can be useful in reducing the initial set of genes down to a more computationally manageable size. The results from this analysis also emphasise the difficulty associated with the task of separating two tissue groups on the basis of a particular subset of genes. These results also shed light on why supervised methods have such a high misallocation error rate for the breast cancer data.
Resumo:
Chapter in Book Proceedings with Peer Review First Iberian Conference, IbPRIA 2003, Puerto de Andratx, Mallorca, Spain, JUne 4-6, 2003. Proceedings
Resumo:
Chapter in Book Proceedings with Peer Review First Iberian Conference, IbPRIA 2003, Puerto de Andratx, Mallorca, Spain, JUne 4-6, 2003. Proceedings
Resumo:
The study of biosignals has had a transforming role in multiple aspects of our society, which go well beyond the health sciences domains to which they were traditionally associated with. While biomedical engineering is a classical discipline where the topic is amply covered, today biosignals are a matter of interest for students, researchers and hobbyists in areas including computer science, informatics, electrical engineering, among others. Regardless of the context, the use of biosignals in experimental activities and practical projects is heavily bounded by the cost, and limited access to adequate support materials. In this paper we present an accessible, albeit versatile toolkit, composed of low-cost hardware and software, which was created to reinforce the engagement of different people in the field of biosignals. The hardware consists of a modular wireless biosignal acquisition system that can be used to support classroom activities, interface with other devices, or perform rapid prototyping of end-user applications. The software comprehends a set of programming APIs, a biosignal processing toolbox, and a framework for real time data acquisition and postprocessing. (C) 2014 Elsevier Ireland Ltd. All rights reserved.