109 resultados para multiple data sources

em Deakin Research Online - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel Bayesian formulation to exploit shared structures across multiple data sources, constructing foundations for effective mining and retrieval across disparate domains. We jointly analyze diverse data sources using a unifying piece of metadata (textual tags). We propose a method based on Bayesian Probabilistic Matrix Factorization (BPMF) which is able to explicitly model the partial knowledge common to the datasets using shared subspaces and the knowledge specific to each dataset using individual subspaces. For the proposed model, we derive an efficient algorithm for learning the joint factorization based on Gibbs sampling. The effectiveness of the model is demonstrated by social media retrieval tasks across single and multiple media. The proposed solution is applicable to a wider context, providing a formal framework suitable for exploiting individual as well as mutual knowledge present across heterogeneous data sources of many kinds.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Joint analysis of multiple data sources is becoming increasingly popular in transfer learning, multi-task learning and cross-domain data mining. One promising approach to model the data jointly is through learning the shared and individual factor subspaces. However, performance of this approach depends on the subspace dimensionalities and the level of sharing needs to be specified a priori. To this end, we propose a nonparametric joint factor analysis framework for modeling multiple related data sources. Our model utilizes the hierarchical beta process as a nonparametric prior to automatically infer the number of shared and individual factors. For posterior inference, we provide a Gibbs sampling scheme using auxiliary variables. The effectiveness of the proposed framework is validated through its application on two real world problems - transfer learning in text and image retrieval.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nonnegative matrix factorization based methods provide one of the simplest and most effective approaches to text mining. However, their applicability is mainly limited to analyzing a single data source. In this chapter, we propose a novel joint matrix factorization framework which can jointly analyze multiple data sources by exploiting their shared and individual structures. The proposed framework is flexible to handle any arbitrary sharing configurations encountered in real world data. We derive an efficient algorithm for learning the factorization and show that its convergence is theoretically guaranteed. We demonstrate the utility and effectiveness of the proposed framework in two real-world applications—improving social media retrieval using auxiliary sources and cross-social media retrieval. Representing each social media source using their textual tags, for both applications, we show that retrieval performance exceeds the existing state-of-the-art techniques. The proposed solution provides a generic framework and can be applicable to a wider context in data mining wherever one needs to exploit mutual and individual knowledge present across multiple data sources.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nonnegative matrix factorization based methods provide one of the simplest and most effective approaches to text mining. However, their applicability is mainly limited to analyzing a single data source. In this paper, we propose a novel joint matrix factorization framework which can jointly analyze multiple data sources by exploiting their shared and individual structures. The proposed framework is flexible to handle any arbitrary sharing configurations encountered in real world data. We derive an efficient algorithm for learning the factorization and show that its convergence is theoretically guaranteed. We demonstrate the utility and effectiveness of the proposed framework in two real-world applications–improving social media retrieval using auxiliary sources and cross-social media retrieval. Representing each social media source using their textual tags, for both applications, we show that retrieval performance exceeds the existing state-of-the-art techniques. The proposed solution provides a generic framework and can be applicable to a wider context in data mining wherever one needs to exploit mutual and individual knowledge present across multiple data sources.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of extracting infrequent patterns from streams and building associations between these patterns is becoming increasingly relevant today as many events of interest such as attacks in network data or unusual stories in news data occur rarely. The complexity of the problem is compounded when a system is required to deal with data from multiple streams. To address these problems, we present a framework that combines the time based association mining with a pyramidal structure that allows a rolling analysis of the stream and maintains a synopsis of the data without requiring increasing memory resources. We apply the algorithms and show the usefulness of the techniques. © 2007 Crown Copyright.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We identify and formulate a novel problem: crosschannel anomaly detection from multiple data channels. Cross channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Using spectral approaches, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. Our mathematical analysis shows that our method is likely to reduce the false alarm rate. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The exponential increase in data, computing power and the availability of readily accessible analytical software has allowed organisations around the world to leverage the benefits of integrating multiple heterogeneous data files for enterprise-level planning and decision making. Benefits from effective data integration to the health and medical research community include more trustworthy research, higher service quality, improved personnel efficiency, reduction of redundant tasks, facilitation of auditing and more timely, relevant and specific information. The costs of poor quality processes elevate the risk of erroneous outcomes, an erosion of confidence in the data and the organisations using these data. To date there are no documented set of standards for best practice integration of heterogeneous data files for research purposes. Therefore, the aim of this paper is to describe a set of clear protocol for data file integration (Data Integration Protocol In Ten-steps; DIPIT) translational to any field of research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multimedia content understanding research requires rigorous approach to deal with the complexity of the data. At the crux of this problem is the method to deal with multilevel data whose structure exists at multiple scales and across data sources. A common example is modeling tags jointly with images to improve retrieval, classification and tag recommendation. Associated contextual observation, such as metadata, is rich that can be exploited for content analysis. A major challenge is the need for a principal approach to systematically incorporate associated media with the primary data source of interest. Taking a factor modeling approach, we propose a framework that can discover low-dimensional structures for a primary data source together with other associated information. We cast this task as a subspace learning problem under the framework of Bayesian nonparametrics and thus the subspace dimensionality and the number of clusters are automatically learnt from data instead of setting these parameters a priori. Using Beta processes as the building block, we construct random measures in a hierarchical structure to generate multiple data sources and capture their shared statistical at the same time. The model parameters are inferred efficiently using a novel combination of Gibbs and slice sampling. We demonstrate the applicability of the proposed model in three applications: image retrieval, automatic tag recommendation and image classification. Experiments using two real-world datasets show that our approach outperforms various state-of-the-art related methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

High-throughput experimental techniques provide a wide variety of heterogeneous proteomic data sources. To exploit the information spread across multiple sources for protein function prediction, these data sources are transformed into kernels and then integrated into a composite kernel. Several methods first optimize the weights on these kernels to produce a composite kernel, and then train a classifier on the composite kernel. As such, these approaches result in an optimal composite kernel, but not necessarily in an optimal classifier. On the other hand, some approaches optimize the loss of binary classifiers and learn weights for the different kernels iteratively. For multi-class or multi-label data, these methods have to solve the problem of optimizing weights on these kernels for each of the labels, which are computationally expensive and ignore the correlation among labels. In this paper, we propose a method called Predicting Protein Function using Multiple K ernels (ProMK). ProMK iteratively optimizes the phases of learning optimal weights and reduces the empirical loss of multi-label classifier for each of the labels simultaneously. ProMK can integrate kernels selectively and downgrade the weights on noisy kernels. We investigate the performance of ProMK on several publicly available protein function prediction benchmarks and synthetic datasets. We show that the proposed approach performs better than previously proposed protein function prediction approaches that integrate multiple data sources and multi-label multiple kernel learning methods. The codes of our proposed method are available at https://sites.google.com/site/guoxian85/promk.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aims. The aim of this paper is to report a trial to investigate the feasibility of the nurse practitioner role in local health service delivery and to provide information about the educational and legislative requirements for nurse practitioner practice.

Background. Nurse practitioners have been shown to offer a beneficial service and fill a gap in health care provision. However, the lack of publications describing, critiquing, or defending the way that existing nurse practitioner roles have been developed may lead to a lack of clarity in comparing the nurse practitioner scope of practice internationally. In Australia, credible exploratory research is needed to realize the potential of nurse practitioners to bridge the divide of inequitable distribution of health services. A trial of nurse practitioner services in the Australian Capital Territory provided an excellent opportunity to investigate these scope and continuity issues.

Methods. This was an observational analytic study using multiple data sources. Four models of nurse practitioner service were chosen from a competitive field of applications that were evaluated according to efficacy, feasibility, and sustainability across specified selection criteria. Each model in the trial included a clinical support team, with the nurse practitioner candidate 'working-into-the-role' and collecting demographic, clinical practice, patient outcome, and health service and consumer survey data over a 10 month period.

Findings. The trial identified the broad potential of the nurse practitioner role, its breadth and limitations, and its impact on selected health services in the Australian Capital Territory. Data from individual models were compared highlighting generic elements, and formed the basis for the development of the scope of practice for the Australian Capital Territory nurse practitioner models.

Conclusions. This study has validated a research-based, iterative process for initial development of nurse practitioner scope of practice for any Australian specialization. Importantly, the study concluded with the scope of practice as a finding, rather than commencing with it a priori. Although general areas of health care need and under-servicing were identified at the outset, the process tested both the expansion and parameters of the roles.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective:
The objective of this study was to conduct research to inform the development of standards for nurse practitioner education in Australia and New Zealand and to contribute to the international debate on nurse practitioner practice.
Setting:
The research was conducted in all states of Australia where the nurse practitioner is authorised and in New Zealand.
Subjects:
The research was informed by multiple data sources including nurse practitioner program curricula documents from all relevant universities in Australia and New Zealand, interviews with academic convenors of these programs and interviews with nurse practitioners.
Primary argument:
Findings from this research include support for master's level of education as preparation for the nurse practitioner. These programs need to have a strong clinical learning component and in-depth education for the sciences of specialty practice. Additionally an important aspect of education for the nurse practitioner is the centrality of student directed and flexible learning models. This approach is well supported by the literature on capability.
Conclusions:
There is agreement in the literature about the lack of consistent standards in nurse practitioner practice, education and nomenclature. The findings from this research contribute to the international debate in this area and bring research informed standards to nurse practitioner education in Australia and New Zealand.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Asynchronous online discussions have the potential to improve learning in universities. This thesis reports an investigation into the ways in which undergraduates learned in online discussions when they were included within their face-to-face courses. Taking a student perspective, four case studies describe and explain the approaches to learning that were used by business undergraduates in online discussions, and examine the influence of the computer-mediated conferencing (CMC) medium and curriculum design on student learning. The investigation took a qualitative approach where case studies were developed from multiple data sources. In each of the cases, a description of the setting of the online discussions introduced the learning environment. Further details of student learning behaviours in the online discussions were provided by an analysis of the systems data and a content analysis of the online discussion transcripts. In depth interpretation of interview data added student perspectives on the impact of CMC characteristics, the curriculum or learning design and the relationship between the online discussions and face-to-face classes. A comparative cross case analysis of the findings of the four cases identified and discussed general themes and broad principles arising from the cases. The campus-based students acknowledged that online discussions helped them to learn and their message postings evidenced deep approaches to learning. The students recognised the value for learning of the text based nature of the CMC environment but peer interaction was more difficult to achieve. Asynchronicity created time flexibility and time for reflection but it also presented time management problems for many undergraduates. Assessment was the most influential aspect of the curriculum design. The cases also identified the importance of a dialogical activity and the absence of the teacher from the online discussions was not problematic. The research identified new perspectives on the relationship between online discussions and face-to-face classes. Students regarded these two media as complementary rather than oppositional and affirmed the importance of pedagogic connections between them. A teaching and learning framework for online discussions was developed from these perspectives. The significance of this study lies in improved knowledge of student learning processes in online discussions in blended learning environments. The cases indicated the potential value of the CMC environment for constructivist philosophies and affirm the significant role of curriculum design with new technologies. Findings relating to the complementary nature of online and face to face discussions provided a platform for building a teaching and learning framework for blended environments which can be used to inform and improve pedagogical design, teacher expertise and student learning outcomes in asynchronous online discussions.