26 resultados para Short-text clustering


Relevância:

30.00% 30.00%

Publicador:

Resumo:

For many clustering algorithms, such as k-means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images or biological data. The fundamental question this paper addresses is: ldquoHow can we effectively estimate the natural number of clusters in a given text collection?rdquo. We propose to use spectral analysis, which analyzes the eigenvalues (not eigenvectors) of the collection, as the solution to the above. We first present the relationship between a text collection and its underlying spectra. We then show how the answer to this question enhances the clustering process. Finally, we conclude with empirical results and related work.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is argued, making reference to an Orwell text sample, that linguistic theory illuminates "positioning" (as term and concept), that positioning's flexibility contributes to critical textual analysis and to attempts to understand complex human processes, but how far language may constrain as well as facilitate such understanding must remain open.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents an image to text translation platform consisting of image segmentation, region features extraction, region blobs clustering, and translation components. A multi-label learning method is suggested for realizing the translation component. Empirical studies show that the predictive performance of the translation component is better than its counterparts when employed a dual-random ensemble multi-label classification algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Short Term Load Forecasting (STLF) is very important from the power systems grid operation point of view. STLF involves forecasting load demand in a short term time frame. The short term time frame may consist of half hourly prediction up to weekly prediction. Accurate forecasting would benefit the utility in terms of reliability and stability of the grid ensuring adequate supply is present to meet with the load demand. Apart from that it would also affect the financial performance of the utility company. An accurate forecast would result in better savings while maintaining the security of the grid. This paper outlines the STLF using a novel hybrid online learning neural network, known as the Gaussian Regression (GR). This new hybrid neural network is a combination of two existing online learning neural networks which are the Gaussian Adaptive Resonance Theory (GA) and the Generalized Regression Neural Network (GRNN). Both GA and GRNN implemented online learning, but each of them suffers from limitation. Originally GA is used for unsupervised clustering by compressing the training samples into several categories. A supervised version of GA is available, namely Gaussian ARTMAP (GAM). However, the GAM is still not capable on solving regression problem. On the other hand, GRNN is designed for solving real value estimation (regression) problem, but the learning process would involve of memorizing all training samples, hence high computational cost. The hybrid GR is considered an enhanced version of GRNN with compression ability while still maintains online learning properties. Simulation results show that GR has comparable prediction accuracy and has less prototype as compared to the original GRNN as well as the Support Vector Regression.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Failure mode and effect analysis (FMEA) is a popular safety and reliability analysis tool in examining potential failures of products, process, designs, or services, in a wide range of industries. While FMEA is a popular tool, the limitations of the traditional Risk Priority Number (RPN) model in FMEA have been highlighted in the literature. Even though many alternatives to the traditional RPN model have been proposed, there are not many investigations on the use of clustering techniques in FMEA. The main aim of this paper was to examine the use of a new Euclidean distance-based similarity measure and an incremental-learning clustering model, i.e., fuzzy adaptive resonance theory neural network, for similarity analysis and clustering of failure modes in FMEA; therefore, allowing the failure modes to be analyzed, visualized, and clustered. In this paper, the concept of a risk interval encompassing a group of failure modes is investigated. Besides that, a new approach to analyze risk ordering of different failure groups is introduced. These proposed methods are evaluated using a case study related to the edible bird nest industry in Sarawak, Malaysia. In short, the contributions of this paper are threefold: (1) a new Euclidean distance-based similarity measure, (2) a new risk interval measure for a group of failure modes, and (3) a new analysis of risk ordering of different failure groups. © 2014 The Natural Computing Applications Forum.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a Bayesian nonparametric framework for multilevel clustering which utilizes group- level context information to simultaneously discover low-dimensional structures of the group contents and partitions groups into clusters. Using the Dirichlet process as the building block, our model constructs a product base-measure with a nested structure to accommodate content and context observations at multiple levels. The proposed model possesses properties that link the nested Dinchiet processes (nDP) and the Dirichlet process mixture models (DPM) in an interesting way: integrating out all contents results in the DPM over contexts, whereas integrating out group-specific contexts results in the nDP mixture over content variables. We provide a Polyaurn view of the model and an efficient collapsed Gibbs inference procedure. Extensive experiments on real-world datasets demonstrate the advantage of utilizing context information via our model in both text and image domains.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: The increasing prevalence of diabetes and costly long-term complications associated with poor glycemic control are issues facing health services worldwide. Diabetes self-management, with the support of health care providers, is critical for successful outcomes, however, frequent clinical contact is costly. Text messages via short message service (SMS) have the advantage of instant transmission at low cost and, given the ubiquity of mobile phones, may be the ideal platform for the delivery of diabetes self-management support. A tailored text message-based diabetes support intervention called Self-Management Support for Blood Glucose (SMS4BG) was developed. The intervention incorporates prompts around diabetes education, management, and lifestyle factors (healthy eating, exercise, and stress management), as well as blood glucose monitoring reminders, and is tailored to patient preferences and clinical characteristics. OBJECTIVE: To determine the usability and acceptability of SMS4BG among adults with poorly controlled diabetes. METHODS: Adults (aged 17 to 69 years) with type 1 (n=12) or type 2 diabetes (n=30), a hemoglobin A1c (HbA1c) over 70 mmol/mol (8.6%), and who owned a mobile phone (n=42) were recruited to take part in a 3-month pilot study of SMS4BG. At registration, participants selected the modules they would like to receive and, where appropriate, the frequency and timing of blood glucose monitoring reminders. Patient satisfaction and perceptions of the usability of the program were obtained via semistructured phone interviews conducted at completion of the pilot study. HbA1c was obtained from patient records at baseline and completion of the pilot study. RESULTS: Participants received on average 109 messages during the 3-month program with 2 participants withdrawing early from the study. Follow-up interviews were completed with 93% of participants with all reporting SMS4BG to be useful and appropriate to their age and culture. Participants reported a range of perceived positive impacts of SMS4BG on their diabetes and health behaviors. HbA1c results indicated a positive impact of the program on glycemic control with a significant decrease in HbA1c from baseline to follow-up. CONCLUSIONS: A tailored text message-based intervention is both acceptable and useful in supporting self-management in people with poorly controlled diabetes. A randomized controlled trial of longer duration is needed to assess the efficacy and sustainability of SMS4BG.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Mobile technology has the potential to deliver behavior change interventions (mHealth) to reduce coronary heart disease (CHD) at modest cost. Previous studies have focused on single behaviors; however, cardiac rehabilitation (CR), a component of CHD self-management, needs to address multiple risk factors. OBJECTIVE: The aim was to investigate the effectiveness of a mHealth-delivered comprehensive CR program (Text4Heart) to improve adherence to recommended lifestyle behaviors (smoking cessation, physical activity, healthy diet, and nonharmful alcohol use) in addition to usual care (traditional CR). METHODS: A 2-arm, parallel, randomized controlled trial was conducted in New Zealand adults diagnosed with CHD. Participants were recruited in-hospital and were encouraged to attend center-based CR (usual care control). In addition, the intervention group received a personalized 24-week mHealth program, framed in social cognitive theory, sent by fully automated daily short message service (SMS) text messages and a supporting website. The primary outcome was adherence to healthy lifestyle behaviors measured using a self-reported composite health behavior score (≥3) at 3 and 6 months. Secondary outcomes included clinical outcomes, medication adherence score, self-efficacy, illness perceptions, and anxiety and/or depression at 6 months. Baseline and 6-month follow-up assessments (unblinded) were conducted in person. RESULTS: Eligible patients (N=123) recruited from 2 large metropolitan hospitals were randomized to the intervention (n=61) or the control (n=62) group. Participants were predominantly male (100/123, 81.3%), New Zealand European (73/123, 59.3%), with a mean age of 59.5 (SD 11.1) years. A significant treatment effect in favor of the intervention was observed for the primary outcome at 3 months (AOR 2.55, 95% CI 1.12-5.84; P=.03), but not at 6 months (AOR 1.93, 95% CI 0.83-4.53; P=.13). The intervention group reported significantly greater medication adherence score (mean difference: 0.58, 95% CI 0.19-0.97; P=.004). The majority of intervention participants reported reading all their text messages (52/61, 85%). The number of visits to the website per person ranged from zero to 100 (median 3) over the 6-month intervention period. CONCLUSIONS: A mHealth CR intervention plus usual care showed a positive effect on adherence to multiple lifestyle behavior changes at 3 months in New Zealand adults with CHD compared to usual care alone. The effect was not sustained to the end of the 6-month intervention. A larger study is needed to determine the size of the effect in the longer term and whether the change in behavior reduces adverse cardiovascular events. TRIAL REGISTRATION: ACTRN 12613000901707; https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=364758&isReview=true (Archived by WebCite at http://www.webcitation.org/6c4qhcHKt).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multilevel clustering problems where the con-tent and contextual information are jointly clustered are ubiquitous in modern datasets. Existing works on this problem are limited to small datasets due to the use of the Gibbs sampler. We address the problem of scaling up multi-level clustering under a Bayesian nonparametric setting, extending the MC2 model proposed in (Nguyen et al., 2014). We ground our approach in structured mean-field and stochastic variational inference (SVI) and develop a tree-structured SVI algorithm that exploits the interplay between content and context modeling. Our new algorithm avoids the need to repeatedly go through the corpus as in Gibbs sampler. More crucially, our method is immediately amendable to parallelization, facilitating a scalable distributed implementation on the Apache Spark platform. We conduct extensive experiments in a variety of domains including text, images, and real-world user application activities. Direct comparison with the Gibbs-sampler demonstrates that our method is an order-of-magnitude faster without loss of model quality. Our Spark-based implementation gains an-other order-of-magnitude speedup and can scale to large real-world datasets containing millions of documents and groups.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Failure mode and effect analysis (FMEA) is a popular safety and reliability analysis tool in examining potential failures of products, process, designs, or services, in a wide range of industries. While FMEA is a popular tool, the limitations of the traditional Risk Priority Number (RPN) model in FMEA have been highlighted in the literature. Even though many alternatives to the traditional RPN model have been proposed, there are not many investigations on the use of clustering techniques in FMEA. The main aim of this paper was to examine the use of a new Euclidean distance-based similarity measure and an incremental-learning clustering model, i.e., fuzzy adaptive resonance theory neural network, for similarity analysis and clustering of failure modes in FMEA; therefore, allowing the failure modes to be analyzed, visualized, and clustered. In this paper, the concept of a risk interval encompassing a group of failure modes is investigated. Besides that, a new approach to analyze risk ordering of different failure groups is introduced. These proposed methods are evaluated using a case study related to the edible bird nest industry in Sarawak, Malaysia. In short, the contributions of this paper are threefold: (1) a new Euclidean distance-based similarity measure, (2) a new risk interval measure for a group of failure modes, and (3) a new analysis of risk ordering of different failure groups.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introduction Text message interventions have been shown to be effective in prevention and management of several non-communicable disease risk factors. However, the extent to which their effects might vary in different participants and settings is uncertain. We aim to conduct a systematic review and individual participant data (IPD) meta-analysis of randomised clinical trials examining text message interventions aimed to prevent cardiovascular diseases (CVD) through modification of cardiovascular risk factors (CVRFs). Methods and analysis Systematic review and IPD meta-analysis will be conducted according to Preferred Reporting Items for Systematic review and Meta-Analysis of IPD (PRISMA-IPD) guidelines. Electronic database of published studies (MEDLINE, EMBASE, PsycINFO and Cochrane Library) and international trial registries will be searched to identify relevant randomised clinical trials. Authors of studies meeting the inclusion criteria will be invited to join the IPD meta-analysis group and contribute study data to the common database. The primary outcome will be the difference between intervention and control groups in blood pressure at 6-month follow-up. Key secondary outcomes include effects on lipid parameters, body mass index, smoking levels and self-reported quality of life. If sufficient data is available, we will also analyse blood pressure and other secondary outcomes at 12 months. IPD meta-analysis will be performed using a one-step approach and modelling data simultaneously while accounting for the clustering of the participants within studies. This study will use the existing data to assess the effectiveness of text message-based interventions on CVRFs, the consistency of any effects by participant subgroups and across different healthcare settings. Ethics and dissemination Ethical approval was obtained for the individual studies by the trial investigators from relevant local ethics committees. This study will include anonymised data for secondary analysis and investigators will be asked to check that this is consistent with their existing approvals. Results will be disseminated via scientific forums including peer-reviewed publications and presentations at international conferences.