995 resultados para Unstructured data


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work describes the parallelization of High Resolution flow solver on unstructured meshes, HIFUN-3D, an unstructured data based finite volume solver for 3-D Euler equations. For mesh partitioning, we use METIS, a software based on multilevel graph partitioning. The unstructured graph used for partitioning is associated with weights both on its vertices and edges. The data residing on every processor is split into four layers. Such a novel procedure of handling data helps in maintaining the effectiveness of the serial code. The communication of data across the processors is achieved by explicit message passing using the standard blocking mode feature of Message Passing Interface (MPI). The parallel code is tested on PACE++128 available in CFD Center

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A grid adaptation strategy for unstructured data based codes, employing a combination of hexahedral and prismatic elements, generalizable to tetrahedral and pyramidal elements has been developed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Master data management (MDM) integrates data from multiple
structured data sources and builds a consolidated 360-
degree view of business entities such as customers and products.
Today’s MDM systems are not prepared to integrate
information from unstructured data sources, such as news
reports, emails, call-center transcripts, and chat logs. However,
those unstructured data sources may contain valuable
information about the same entities known to MDM from
the structured data sources. Integrating information from
unstructured data into MDM is challenging as textual references
to existing MDM entities are often incomplete and
imprecise and the additional entity information extracted
from text should not impact the trustworthiness of MDM
data.
In this paper, we present an architecture for making MDM
text-aware and showcase its implementation as IBM InfoSphere
MDM Extension for Unstructured Text Correlation,
an add-on to IBM InfoSphere Master Data Management
Standard Edition. We highlight how MDM benefits from
additional evidence found in documents when doing entity
resolution and relationship discovery. We experimentally
demonstrate the feasibility of integrating information from
unstructured data sources into MDM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The recent liberalization of the German energy market has forced the energy industry to develop and install new information systems to support agents on the energy trading floors in their analytical tasks. Besides classical approaches of building a data warehouse giving insight into the time series to understand market and pricing mechanisms, it is crucial to provide a variety of external data from the web. Weather information as well as political news or market rumors are relevant to give the appropriate interpretation to the variables of a volatile energy market. Starting from a multidimensional data model and a collection of buy and sell transactions a data warehouse is built that gives analytical support to the agents. Following the idea of web farming we harvest the web, match the external information sources after a filtering and evaluation process to the data warehouse objects, and present this qualified information on a user interface where market values are correlated with those external sources over the time axis.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the last several years there has been an increase in the amount of qualitative research using in-depth interviews and comprehensive content analyses in sport psychology. However, no explicit method has been provided to deal with the large amount of unstructured data. This article provides common guidelines for organizing and interpreting unstructured data. Two main operations are suggested and discussed: first, coding meaningful text segments, or creating tags, and second, regrouping similar text segments,or creating categories. Furthermore, software programs for the microcomputer are presented as away to facilitate the organization and interpretation of qualitative data

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the last several years there has been an increase in the amount of qualitative research using in-depth interviews and comprehensive content analyses in sport psychology. However, no explicit method has been provided to deal with the large amount of unstructured data. This article provides common guidelines for organizing and interpreting unstructured data. Two main operations are suggested and discussed: first, coding meaningful text segments, or creating tags, and second, regrouping similar text segments,or creating categories. Furthermore, software programs for the microcomputer are presented as away to facilitate the organization and interpretation of qualitative data

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Data mining is concerned with analysing large volumes of (often unstructured) data to automatically discover interesting regularities or relationships which in turn lead to better understanding of the underlying processes. The field of temporal data mining is concerned with such analysis in the case of ordered data streams with temporal interdependencies. Over the last decade many interesting techniques of temporal data mining were proposed and shown to be useful in many applications. Since temporal data mining brings together techniques from different fields such as statistics, machine learning and databases, the literature is scattered among many different sources. In this article, we present an overview of techniques of temporal data mining.We mainly concentrate on algorithms for pattern discovery in sequential data streams.We also describe some recent results regarding statistical analysis of pattern discovery methods.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The modern CFD process consists of mesh generation, flow solving and post-processing integrated into an automated workflow. During the last several years we have developed and published research aimed at producing a meshing and geometry editing system, implemented in an end-to-end parallel, scalable manner and capable of automatic handling of large scale, real world applications. The particular focus of this paper is the associated unstructured mesh RANS flow solver and the porting of it to GPU architectures. After briefly describing the solver itself, the special issues associated with porting codes using unstructured data structures are discussed - followed by some application examples. Copyright © 2011 by W.N. Dawes.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Technology has been the catalyst that has facilitated an explosion of organisational data in terms of its velocity, variety, and volume, resulting in a greater depth and breadth of potentially valuable information, previously unutilised. The variety of data accessible to organisations extends beyond traditional structured data to now encompass previously unobtainable and difficult to analyse unstructured data. In addition to exploiting data, organisations are now facing an even greater challenge of assessing data quality and identifying the impacts of lack of quality. The aim of this research is to contribute to data quality literature, focusing on improving a current understanding of business-related Data Quality (DQ) issues facing organisations. This review builds on existing Information Systems literature, and proposes further research in this area. Our findings confirm that the current literature lags in recognising new types of data and imminent DQ impacts facing organisations in today’s dynamic environment of the so-called “Big Data”. Insights clearly identify the need for further research on DQ, in particular in relation to unstructured data. It also raises questions regarding new DQ impacts and implications for organisations, in their quest to leverage the variety of available data types to provide richer insights.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

With the increasing production of information from e-government initiatives, there is also the need to transform a large volume of unstructured data into useful information for society. All this information should be easily accessible and made available in a meaningful and effective way in order to achieve semantic interoperability in electronic government services, which is a challenge to be pursued by governments round the world. Our aim is to discuss the context of e-Government Big Data and to present a framework to promote semantic interoperability through automatic generation of ontologies from unstructured information found in the Internet. We propose the use of fuzzy mechanisms to deal with natural language terms and present some related works found in this area. The results achieved in this study are based on the architectural definition and major components and requirements in order to compose the proposed framework. With this, it is possible to take advantage of the large volume of information generated from e-Government initiatives and use it to benefit society.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Currently there are an overwhelming number of scientific publications in Life Sciences, especially in Genetics and Biotechnology. This huge amount of information is structured in corporate Data Warehouses (DW) or in Biological Databases (e.g. UniProt, RCSB Protein Data Bank, CEREALAB or GenBank), whose main drawback is its cost of updating that makes it obsolete easily. However, these Databases are the main tool for enterprises when they want to update their internal information, for example when a plant breeder enterprise needs to enrich its genetic information (internal structured Database) with recently discovered genes related to specific phenotypic traits (external unstructured data) in order to choose the desired parentals for breeding programs. In this paper, we propose to complement the internal information with external data from the Web using Question Answering (QA) techniques. We go a step further by providing a complete framework for integrating unstructured and structured information by combining traditional Databases and DW architectures with QA systems. The great advantage of our framework is that decision makers can compare instantaneously internal data with external data from competitors, thereby allowing taking quick strategic decisions based on richer data.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In current organizations, valuable enterprise knowledge is often buried under rapidly expanding huge amount of unstructured information in the form of web pages, blogs, and other forms of human text communications. We present a novel unsupervised machine learning method called CORDER (COmmunity Relation Discovery by named Entity Recognition) to turn these unstructured data into structured information for knowledge management in these organizations. CORDER exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We discuss the problems associated with evaluating unsupervised learners and report our initial evaluation experiments in an expert evaluation, a quantitative benchmarking, and an application of CORDER in a social networking tool called BuddyFinder.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

66 p.