889 resultados para heterogeneous data sources


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Joint analysis of multiple data sources is becoming increasingly popular in transfer learning, multi-task learning and cross-domain data mining. One promising approach to model the data jointly is through learning the shared and individual factor subspaces. However, performance of this approach depends on the subspace dimensionalities and the level of sharing needs to be specified a priori. To this end, we propose a nonparametric joint factor analysis framework for modeling multiple related data sources. Our model utilizes the hierarchical beta process as a nonparametric prior to automatically infer the number of shared and individual factors. For posterior inference, we provide a Gibbs sampling scheme using auxiliary variables. The effectiveness of the proposed framework is validated through its application on two real world problems - transfer learning in text and image retrieval.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nonnegative matrix factorization based methods provide one of the simplest and most effective approaches to text mining. However, their applicability is mainly limited to analyzing a single data source. In this chapter, we propose a novel joint matrix factorization framework which can jointly analyze multiple data sources by exploiting their shared and individual structures. The proposed framework is flexible to handle any arbitrary sharing configurations encountered in real world data. We derive an efficient algorithm for learning the factorization and show that its convergence is theoretically guaranteed. We demonstrate the utility and effectiveness of the proposed framework in two real-world applications—improving social media retrieval using auxiliary sources and cross-social media retrieval. Representing each social media source using their textual tags, for both applications, we show that retrieval performance exceeds the existing state-of-the-art techniques. The proposed solution provides a generic framework and can be applicable to a wider context in data mining wherever one needs to exploit mutual and individual knowledge present across multiple data sources.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a poverty profile for Brazil, based on three different sources of household data for 1996. We use PPV consumption data to estimate poverty and indigence lines. “Contagem” data is used to allow for an unprecedented refinement of the country’s poverty map. Poverty measures and shares are also presented for a wide range of population subgroups, based on the PNAD 1996, with new adjustments for imputed rents and spatial differences in cost of living. Robustness of the profile is verified with respect to different poverty lines, spatial price deflators, and equivalence scales. Overall poverty incidence ranges from 23% with respect to an indigence line to 45% with respect to a more generous poverty line. More importantly, however, poverty is found to vary significantly across regions and city sizes, with rural areas, small and medium towns and the metropolitan peripheries of the North and Northeast regions being poorest.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Empirical Software Engineering (ESE) replication researchers need to store and manipulate experimental data for several purposes, in particular analysis and reporting. Current research needs call for sharing and preservation of experimental data as well. In a previous work, we analyzed Replication Data Management (RDM) needs. A novel concept, called Experimental Ecosystem, was proposed to solve current deficiencies in RDMapproaches. The empirical ecosystem provides replication researchers with a common framework that integrates transparently local heterogeneous data sources. A typical situation where the Empirical Ecosystem is applicable, is when several members of a research group, or several research groups collaborating together, need to share and access each other experimental results. However, to be able to apply the Empirical Ecosystem concept and deliver all promised benefits, it is necessary to analyze the software architectures and tools that can properly support it.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

101 selected references to books and journal articles. Also includes some foreign-language titles. Alphabetical arrangement by primary authors. Each entry gives bibliographical information and annotation. Author, subject indexes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mode of access: Internet.