796 resultados para Data-Mining Techniques
Resumo:
Big Data Analytics is an emerging field since massive storage and computing capabilities have been made available by advanced e-infrastructures. Earth and Environmental sciences are likely to benefit from Big Data Analytics techniques supporting the processing of the large number of Earth Observation datasets currently acquired and generated through observations and simulations. However, Earth Science data and applications present specificities in terms of relevance of the geospatial information, wide heterogeneity of data models and formats, and complexity of processing. Therefore, Big Earth Data Analytics requires specifically tailored techniques and tools. The EarthServer Big Earth Data Analytics engine offers a solution for coverage-type datasets, built around a high performance array database technology, and the adoption and enhancement of standards for service interaction (OGC WCS and WCPS). The EarthServer solution, led by the collection of requirements from scientific communities and international initiatives, provides a holistic approach that ranges from query languages and scalability up to mobile access and visualization. The result is demonstrated and validated through the development of lighthouse applications in the Marine, Geology, Atmospheric, Planetary and Cryospheric science domains.
Resumo:
Clustering algorithms, pattern mining techniques and associated quality metrics emerged as reliable methods for modeling learners’ performance, comprehension and interaction in given educational scenarios. The specificity of available data such as missing values, extreme values or outliers, creates a challenge to extract significant user models from an educational perspective. In this paper we introduce a pattern detection mechanism with-in our data analytics tool based on k-means clustering and on SSE, silhouette, Dunn index and Xi-Beni index quality metrics. Experiments performed on a dataset obtained from our online e-learning platform show that the extracted interaction patterns were representative in classifying learners. Furthermore, the performed monitoring activities created a strong basis for generating automatic feedback to learners in terms of their course participation, while relying on their previous performance. In addition, our analysis introduces automatic triggers that highlight learners who will potentially fail the course, enabling tutors to take timely actions.
Resumo:
In this talk, I will describe various computational modelling and data mining solutions that form the basis of how the office of Deputy Head of Department (Resources) works to serve you. These include lessons I learn about, and from, optimisation issues in resource allocation, uncertainty analysis on league tables, modelling the process of winning external grants, and lessons we learn from student satisfaction surveys, some of which I have attempted to inject into our planning processes.
Resumo:
Abstract Ordnance Survey, our national mapping organisation, collects vast amounts of high-resolution aerial imagery covering the entirety of the country. Currently, photogrammetrists and surveyors use this to manually capture real-world objects and characteristics for a relatively small number of features. Arguably, the vast archive of imagery that we have obtained portraying the whole of Great Britain is highly underutilised and could be ‘mined’ for much more information. Over the last year the ImageLearn project has investigated the potential of "representation learning" to automatically extract relevant features from aerial imagery. Representation learning is a form of data-mining in which the feature-extractors are learned using machine-learning techniques, rather than being manually defined. At the beginning of the project we conjectured that representations learned could help with processes such as object detection and identification, change detection and social landscape regionalisation of Britain. This seminar will give an overview of the project and highlight some of our research results.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
As a way to gain greater insights into the operation of online communities, this dissertation applies automated text mining techniques to text-based communication to identify, describe and evaluate underlying social networks among online community members. The main thrust of the study is to automate the discovery of social ties that form between community members, using only the digital footprints left behind in their online forum postings. Currently, one of the most common but time consuming methods for discovering social ties between people is to ask questions about their perceived social ties. However, such a survey is difficult to collect due to the high investment in time associated with data collection and the sensitive nature of the types of questions that may be asked. To overcome these limitations, the dissertation presents a new, content-based method for automated discovery of social networks from threaded discussions, referred to as ‘name network’. As a case study, the proposed automated method is evaluated in the context of online learning communities. The results suggest that the proposed ‘name network’ method for collecting social network data is a viable alternative to costly and time-consuming collection of users’ data using surveys. The study also demonstrates how social networks produced by the ‘name network’ method can be used to study online classes and to look for evidence of collaborative learning in online learning communities. For example, educators can use name networks as a real time diagnostic tool to identify students who might need additional help or students who may provide such help to others. Future research will evaluate the usefulness of the ‘name network’ method in other types of online communities.
Resumo:
Discovery Driven Analysis (DDA) is a common feature of OLAP technology to analyze structured data. In essence, DDA helps analysts to discover anomalous data by highlighting 'unexpected' values in the OLAP cube. By giving indications to the analyst on what dimensions to explore, DDA speeds up the process of discovering anomalies and their causes. However, Discovery Driven Analysis (and OLAP in general) is only applicable on structured data, such as records in databases. We propose a system to extend DDA technology to semi-structured text documents, that is, text documents with a few structured data. Our system pipeline consists of two stages: first, the text part of each document is structured around user specified dimensions, using semi-PLSA algorithm; then, we adapt DDA to these fully structured documents, thus enabling DDA on text documents. We present some applications of this system in OLAP analysis and show how scalability issues are solved. Results show that our system can handle reasonable datasets of documents, in real time, without any need for pre-computation.
Resumo:
The large number of opinions generated by online users made the former “word of mouth” find its way to virtual world. In addition to be numerous, many of the useful reviews are mixed with a large number of fraudulent, incomplete or duplicate reviews. However, how to find the features that influence on the number of votes received by an opinion and find useful reviews? The literature on opinion mining has several studies and techniques that are able to analyze of properties found in the text of reviews. This paper presents the application of a methodology for evaluation of usefulness of opinions with the aim of identifying which characteristics have more influence on the amount of votes: basic utility (e.g. ratings about the product and/or service, date of publication), textual (e.g.size of words, paragraphs) and semantics (e.g., the meaning of the words of the text). The evaluation was performed in a database extracted from TripAdvisor with opinionsabout hotels written in Portuguese. Results show that users give more attention to recent opinions with higher scores for value and location of the hotel and with lowest scores for sleep quality and service and cleanliness. Texts with positive opinions, small words, few adjectives and adverbs increase the chances of receiving more votes.
Resumo:
Background: Understanding transcriptional regulation by genome-wide microarray studies can contribute to unravel complex relationships between genes. Attempts to standardize the annotation of microarray data include the Minimum Information About a Microarray Experiment (MIAME) recommendations, the MAGE-ML format for data interchange, and the use of controlled vocabularies or ontologies. The existing software systems for microarray data analysis implement the mentioned standards only partially and are often hard to use and extend. Integration of genomic annotation data and other sources of external knowledge using open standards is therefore a key requirement for future integrated analysis systems. Results: The EMMA 2 software has been designed to resolve shortcomings with respect to full MAGE-ML and ontology support and makes use of modern data integration techniques. We present a software system that features comprehensive data analysis functions for spotted arrays, and for the most common synthesized oligo arrays such as Agilent, Affymetrix and NimbleGen. The system is based on the full MAGE object model. Analysis functionality is based on R and Bioconductor packages and can make use of a compute cluster for distributed services. Conclusion: Our model-driven approach for automatically implementing a full MAGE object model provides high flexibility and compatibility. Data integration via SOAP-based web-services is advantageous in a distributed client-server environment as the collaborative analysis of microarray data is gaining more and more relevance in international research consortia. The adequacy of the EMMA 2 software design and implementation has been proven by its application in many distributed functional genomics projects. Its scalability makes the current architecture suited for extensions towards future transcriptomics methods based on high-throughput sequencing approaches which have much higher computational requirements than microarrays.
Resumo:
C3S2E '16 Proceedings of the Ninth International C* Conference on Computer Science & Software Engineering
Resumo:
La intención del proyecto es mostrar las diferentes características que ofrece Oracle en el campo de la minería de datos, con la finalidad de saber si puede ser una plataforma apta para la investigación y la educación en la universidad. En la primera parte del proyecto se estudia la aplicación “Oracle Data Miner” y como, mediante un flujo de trabajo visual e intuitivo, pueden aplicarse las distintas técnicas de minería (clasificación, regresión, clustering y asociación). Para mostrar la ejecución de estas técnicas se han usado dataset procedentes de la universidad de Irvine. Con ello se ha conseguido observar el comportamiento de los distintos algoritmos en situaciones reales. Para cada técnica se expone como evaluar su fiabilidad y como interpretar los resultados que se obtienen a partir de su aplicación. También se muestra la aplicación de las técnicas mediante el uso del lenguaje PL/SQL. Gracias a ello podemos integrar la minería de datos en nuestras aplicaciones de manera sencilla. En la segunda parte del proyecto, se ha elaborado un prototipo de una aplicación que utiliza la minería de datos, en concreto la clasificación para obtener el diagnóstico y la probabilidad de que un tumor de mama sea maligno o benigno, a partir de los resultados de una citología.
Resumo:
Dissertação (mestrado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2016.
Resumo:
Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2015.
Resumo:
© 2014 Cises This work is distributed with License Creative Commons Attribution-Non commercial-No derivatives 4.0 International (CC BY-BC-ND 4.0)
Resumo:
To analyze the characteristics and predict the dynamic behaviors of complex systems over time, comprehensive research to enable the development of systems that can intelligently adapt to the evolving conditions and infer new knowledge with algorithms that are not predesigned is crucially needed. This dissertation research studies the integration of the techniques and methodologies resulted from the fields of pattern recognition, intelligent agents, artificial immune systems, and distributed computing platforms, to create technologies that can more accurately describe and control the dynamics of real-world complex systems. The need for such technologies is emerging in manufacturing, transportation, hazard mitigation, weather and climate prediction, homeland security, and emergency response. Motivated by the ability of mobile agents to dynamically incorporate additional computational and control algorithms into executing applications, mobile agent technology is employed in this research for the adaptive sensing and monitoring in a wireless sensor network. Mobile agents are software components that can travel from one computing platform to another in a network and carry programs and data states that are needed for performing the assigned tasks. To support the generation, migration, communication, and management of mobile monitoring agents, an embeddable mobile agent system (Mobile-C) is integrated with sensor nodes. Mobile monitoring agents visit distributed sensor nodes, read real-time sensor data, and perform anomaly detection using the equipped pattern recognition algorithms. The optimal control of agents is achieved by mimicking the adaptive immune response and the application of multi-objective optimization algorithms. The mobile agent approach provides potential to reduce the communication load and energy consumption in monitoring networks. The major research work of this dissertation project includes: (1) studying effective feature extraction methods for time series measurement data; (2) investigating the impact of the feature extraction methods and dissimilarity measures on the performance of pattern recognition; (3) researching the effects of environmental factors on the performance of pattern recognition; (4) integrating an embeddable mobile agent system with wireless sensor nodes; (5) optimizing agent generation and distribution using artificial immune system concept and multi-objective algorithms; (6) applying mobile agent technology and pattern recognition algorithms for adaptive structural health monitoring and driving cycle pattern recognition; (7) developing a web-based monitoring network to enable the visualization and analysis of real-time sensor data remotely. Techniques and algorithms developed in this dissertation project will contribute to research advances in networked distributed systems operating under changing environments.