936 resultados para Classification approach


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose – The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users. Design/methodology/approach – The authors describe an iterative system that begins with a small set of manually labeled terms, which are used to label queries from the log. A set of background knowledge related to these labeled queries is acquired by combining web search results on these queries. This background set is used to obtain many terms that are related to the classification task. The system then ranks each of the related terms, choosing those that most fit the personal properties of the users. These terms are then used to begin the next iteration. Findings – The authors identify the difficulties of classifying web logs, by approaching this problem from a machine learning perspective. By applying the approach developed, the authors are able to show that many queries in a large query log can be classified. Research limitations/implications – Testing results in this type of classification work is difficult, as the true personal properties of web users are unknown. Evaluation of the classification results in terms of the comparison of classified queries to well known age-related sites is a direction that is currently being exploring. Practical implications – This research is background work that can be incorporated in search engines or other web-based applications, to help marketing companies and advertisers. Originality/value – This research enhances the current state of knowledge in short-text classification and query log learning. Classification schemes, Computer networks, Information retrieval, Man-machine systems, User interfaces

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A rule-based approach for classifying previously identified medical concepts in the clinical free text into an assertion category is presented. There are six different categories of assertions for the task: Present, Absent, Possible, Conditional, Hypothetical and Not associated with the patient. The assertion classification algorithms were largely based on extending the popular NegEx and Context algorithms. In addition, a health based clinical terminology called SNOMED CT and other publicly available dictionaries were used to classify assertions, which did not fit the NegEx/Context model. The data for this task includes discharge summaries from Partners HealthCare and from Beth Israel Deaconess Medical Centre, as well as discharge summaries and progress notes from University of Pittsburgh Medical Centre. The set consists of 349 discharge reports, each with pairs of ground truth concept and assertion files for system development, and 477 reports for evaluation. The system’s performance on the evaluation data set was 0.83, 0.83 and 0.83 for recall, precision and F1-measure, respectively. Although the rule-based system shows promise, further improvements can be made by incorporating machine learning approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is a big challenge to acquire correct user profiles for personalized text classification since users may be unsure in providing their interests. Traditional approaches to user profiling adopt machine learning (ML) to automatically discover classification knowledge from explicit user feedback in describing personal interests. However, the accuracy of ML-based methods cannot be significantly improved in many cases due to the term independence assumption and uncertainties associated with them. This paper presents a novel relevance feedback approach for personalized text classification. It basically applies data mining to discover knowledge from relevant and non-relevant text and constraints specific knowledge by reasoning rules to eliminate some conflicting information. We also developed a Dempster-Shafer (DS) approach as the means to utilise the specific knowledge to build high-quality data models for classification. The experimental results conducted on Reuters Corpus Volume 1 and TREC topics support that the proposed technique achieves encouraging performance in comparing with the state-of-the-art relevance feedback models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The development of text classification techniques has been largely promoted in the past decade due to the increasing availability and widespread use of digital documents. Usually, the performance of text classification relies on the quality of categories and the accuracy of classifiers learned from samples. When training samples are unavailable or categories are unqualified, text classification performance would be degraded. In this paper, we propose an unsupervised multi-label text classification method to classify documents using a large set of categories stored in a world ontology. The approach has been promisingly evaluated by compared with typical text classification methods, using a real-world document collection and based on the ground truth encoded by human experts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Load in distribution networks is normally measured at the 11kV supply points; little or no information is known about the type of customers and their contributions to the load. This paper proposes statistical methods to decompose an unknown distribution feeder load to its customer load sector/subsector profiles. The approach used in this paper should assist electricity suppliers in economic load management, strategic planning and future network reinforcements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The benefits of applying tree-based methods to the purpose of modelling financial assets as opposed to linear factor analysis are increasingly being understood by market practitioners. Tree-based models such as CART (classification and regression trees) are particularly well suited to analysing stock market data which is noisy and often contains non-linear relationships and high-order interactions. CART was originally developed in the 1980s by medical researchers disheartened by the stringent assumptions applied by traditional regression analysis (Brieman et al. [1984]). In the intervening years, CART has been successfully applied to many areas of finance such as the classification of financial distress of firms (see Frydman, Altman and Kao [1985]), asset allocation (see Sorensen, Mezrich and Miller [1996]), equity style timing (see Kao and Shumaker [1999]) and stock selection (see Sorensen, Miller and Ooi [2000])...

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article outlines the key recommendations of the Australian Law Reform Commission’s review of the National Classification Scheme, as outlined in its report Classification – Content Regulation and Convergent Media (ALRC, 2012). It identifies key contextual factors that underpin the need for reform of media classification laws and policies, including the fragmentation of regulatory responsibilities and the convergence of media platforms, content and services, as well as discussing the ALRC’s approach to law reform.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: To derive preference-based measures from various condition-specific descriptive health-related quality of life (HRQOL) measures. A general 2-stage method is evolved: 1) an item from each domain of the HRQOL measure is selected to form a health state classification system (HSCS); 2) a sample of health states is valued and an algorithm derived for estimating the utility of all possible health states. The aim of this analysis was to determine whether confirmatory or exploratory factor analysis (CFA, EFA) should be used to derive a cancer-specific utility measure from the EORTC QLQ-C30. Methods: Data were collected with the QLQ-C30v3 from 356 patients receiving palliative radiotherapy for recurrent or metastatic cancer (various primary sites). The dimensional structure of the QLQ-C30 was tested with EFA and CFA, the latter based on a conceptual model (the established domain structure of the QLQ-C30: physical, role, emotional, social and cognitive functioning, plus several symptoms) and clinical considerations (views of both patients and clinicians about issues relevant to HRQOL in cancer). The dimensions determined by each method were then subjected to item response theory, including Rasch analysis. Results: CFA results generally supported the proposed conceptual model, with residual correlations requiring only minor adjustments (namely, introduction of two cross-loadings) to improve model fit (increment χ2(2) = 77.78, p < .001). Although EFA revealed a structure similar to the CFA, some items had loadings that were difficult to interpret. Further assessment of dimensionality with Rasch analysis aligned the EFA dimensions more closely with the CFA dimensions. Three items exhibited floor effects (>75% observation at lowest score), 6 exhibited misfit to the Rasch model (fit residual > 2.5), none exhibited disordered item response thresholds, 4 exhibited DIF by gender or cancer site. Upon inspection of the remaining items, three were considered relatively less clinically important than the remaining nine. Conclusions: CFA appears more appropriate than EFA, given the well-established structure of the QLQ-C30 and its clinical relevance. Further, the confirmatory approach produced more interpretable results than the exploratory approach. Other aspects of the general method remain largely the same. The revised method will be applied to a large number of data sets as part of the international and interdisciplinary project to develop a multi-attribute utility instrument for cancer (MAUCa).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures. These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate. Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks. In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available. In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available. Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity. Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains. This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain. Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'. In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific. In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences. Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Students in the middle years encounter an increasing range of unfamiliar visuals. Visual literacy, the ability to encode and decode visuals and to think visually, is an expectation of all middle years curriculum areas and an expectation of NAPLAN literacy and numeracy tests. This article presents a multidisciplinary approach to teaching visual literacy that links the content of all learning areas and encourages students to transfer skills from familiar to unfamiliar contexts. It proposes a classification of visuals in six parts: one-dimensional; two-dimensional; map; shape; connection; and picture, based on the properties, rather than the purpose, of the visual. By placing a visual in one of these six categories, students learn to transfer the skills used to decode familiar visuals to unfamiliar cases in the same category. The article also discusses a range of other teaching strategies that can be used to complement this multi-disciplinary approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper addresses development of an ingenious decision support system (iDSS) based on the methodology of survey instruments and identification of significant variables to be used in iDSS using statistical analysis. A survey was undertaken with pregnant women and factorial experimental design was chosen to acquire sample size. Variables with good reliability in any one of the statistical techniques such as Chi-square, Cronbach’s α and Classification Tree were incorporated in the iDSS. The ingenious decision support system was implemented with Visual Basic as front end and Microsoft SQL server management as backend. Outcome of the ingenious decision support system include advice on Symptoms, Diet and Exercise to pregnant women.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

International comparison is complicated by the use of different terms, classification methods, policy frameworks and system structures, not to mention different languages and terminology. Multi-case studies can assist in the understanding of the influence wielded by cultural, social, economic, historical and political forces upon educational decisions, policy construction and changes over time. But case studies alone are not enough. In this paper, we argue for an ecological or scaled approach that travels through macro, meso and micro levels to build nested case-studies to allow for more comprehensive analysis of the external and internal factors that shape policy-making and education systems. Such an approach allows for deeper understanding of the relationship between globalizing trends and policy developments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Visuals are a central feature of STEM in all levels of education and many areas of employment. The wide variety of visuals that students are expected to master in STEM prevents an approach that aims to teach students about every type of visual that they may encounter. This paper proposes a pedagogy that can be applied across year levels and learning areas, allowing a school-wide, cross-curricular, approach to teaching about visual, that enhances learning in STEM and all other learning areas. Visuals are classified into six categories based on their properties, unlike traditional methods that classify visuals according to purpose. As visuals in the same category share common properties, students are able to transfer their knowledge from the familiar to unfamiliar in each category. The paper details the classification and proposes some strategies that can be can be incorporated into existing methods of teaching students about visuals in all learning areas. The approach may also assist students to see the connections between the different learning areas within and outside STEM.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Topic modeling has been widely utilized in the fields of information retrieval, text mining, text classification etc. Most existing statistical topic modeling methods such as LDA and pLSA generate a term based representation to represent a topic by selecting single words from multinomial word distribution over this topic. There are two main shortcomings: firstly, popular or common words occur very often across different topics that bring ambiguity to understand topics; secondly, single words lack coherent semantic meaning to accurately represent topics. In order to overcome these problems, in this paper, we propose a two-stage model that combines text mining and pattern mining with statistical modeling to generate more discriminative and semantic rich topic representations. Experiments show that the optimized topic representations generated by the proposed methods outperform the typical statistical topic modeling method LDA in terms of accuracy and certainty.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It has been reported that poor nutritional status, in the form of weight loss and resulting body mass index (BMI) changes, is an issue in people with Parkinson's disease (PWP). The symptoms resulting from Parkinson's disease (PD) and the side effects of PD medication have been implicated in the aetiology of nutritional decline. However, the evidence on which these claims are based is, on one hand, contradictory, and on the other, restricted primarily to otherwise healthy PWP. Despite the claims that PWP suffer from poor nutritional status, evidence is lacking to inform nutrition-related care for the management of malnutrition in PWP. The aims of this thesis were to better quantify the extent of poor nutritional status in PWP, determine the important factors differentiating the well-nourished from the malnourished and evaluate the effectiveness of an individualised nutrition intervention on nutritional status. Phase DBS: Nutritional status in people with Parkinson's disease scheduled for deep-brain stimulation surgery The pre-operative rate of malnutrition in a convenience sample of people with Parkinson's disease (PWP) scheduled for deep-brain stimulation (DBS) surgery was determined. Poorly controlled PD symptoms may result in a higher risk of malnutrition in this sub-group of PWP. Fifteen patients (11 male, median age 68.0 (42.0 – 78.0) years, median PD duration 6.75 (0.5 – 24.0) years) participated and data were collected during hospital admission for the DBS surgery. The scored PG-SGA was used to assess nutritional status, anthropometric measures (weight, height, mid-arm circumference, waist circumference, body mass index (BMI)) were taken, and body composition was measured using bioelectrical impedance spectroscopy (BIS). Six (40%) of the participants were malnourished (SGA-B) while 53% reported significant weight loss following diagnosis. BMI was significantly different between SGA-A and SGA-B (25.6 vs 23.0kg/m 2, p<.05). There were no differences in any other variables, including PG-SGA score and the presence of non-motor symptoms. The conclusion was that malnutrition in this group is higher than that in other studies reporting malnutrition in PWP, and it is under-recognised. As poorer surgical outcomes are associated with poorer pre-operative nutritional status in other surgeries, it might be beneficial to identify patients at nutritional risk prior to surgery so that appropriate nutrition interventions can be implemented. Phase I: Nutritional status in community-dwelling adults with Parkinson's disease The rate of malnutrition in community-dwelling adults (>18 years) with Parkinson's disease was determined. One hundred twenty-five PWP (74 male, median age 70.0 (35.0 – 92.0) years, median PD duration 6.0 (0.0 – 31.0) years) participated. The scored PG-SGA was used to assess nutritional status, anthropometric measures (weight, height, mid-arm circumference (MAC), calf circumference, waist circumference, body mass index (BMI)) were taken. Nineteen (15%) of the participants were malnourished (SGA-B). All anthropometric indices were significantly different between SGA-A and SGA-B (BMI 25.9 vs 20.0kg/m2; MAC 29.1 – 25.5cm; waist circumference 95.5 vs 82.5cm; calf circumference 36.5 vs 32.5cm; all p<.05). The PG-SGA score was also significantly lower in the malnourished (2 vs 8, p<.05). The nutrition impact symptoms which differentiated between well-nourished and malnourished were no appetite, constipation, diarrhoea, problems swallowing and feel full quickly. This study concluded that malnutrition in community-dwelling PWP is higher than that documented in community-dwelling elderly (2 – 11%), yet is likely to be under-recognised. Nutrition impact symptoms play a role in reduced intake. Appropriate screening and referral processes should be established for early detection of those at risk. Phase I: Nutrition assessment tools in people with Parkinson's disease There are a number of validated and reliable nutrition screening and assessment tools available for use. None of these tools have been evaluated in PWP. In the sample described above, the use of the World Health Organisation (WHO) cut-off (≤18.5kg/m2), age-specific BMI cut-offs (≤18.5kg/m2 for under 65 years, ≤23.5kg/m2 for 65 years and older) and the revised Mini-Nutritional Assessment short form (MNA-SF) were evaluated as nutrition screening tools. The PG-SGA (including the SGA classification) and the MNA full form were evaluated as nutrition assessment tools using the SGA classification as the gold standard. For screening, the MNA-SF performed the best with sensitivity (Sn) of 94.7% and specificity (Sp) of 78.3%. For assessment, the PG-SGA with a cut-off score of 4 (Sn 100%, Sp 69.8%) performed better than the MNA (Sn 84.2%, Sp 87.7%). As the MNA has been recommended more for use as a nutrition screening tool, the MNA-SF might be more appropriate and take less time to complete. The PG-SGA might be useful to inform and monitor nutrition interventions. Phase I: Predictors of poor nutritional status in people with Parkinson's disease A number of assessments were conducted as part of the Phase I research, including those for the severity of PD motor symptoms, cognitive function, depression, anxiety, non-motor symptoms, constipation, freezing of gait and the ability to carry out activities of daily living. A higher score in all of these assessments indicates greater impairment. In addition, information about medical conditions, medications, age, age at PD diagnosis and living situation was collected. These were compared between those classified as SGA-A and as SGA-B. Regression analysis was used to identify which factors were predictive of malnutrition (SGA-B). Differences between the groups included disease severity (4% more severe SGA-A vs 21% SGA-B, p<.05), activities of daily living score (13 SGA-A vs 18 SGA-B, p<.05), depressive symptom score (8 SGA-A vs 14 SGA-B, p<.05) and gastrointestinal symptoms (4 SGA-A vs 6 SGA-B, p<.05). Significant predictors of malnutrition according to SGA were age at diagnosis (OR 1.09, 95% CI 1.01 – 1.18), amount of dopaminergic medication per kg body weight (mg/kg) (OR 1.17, 95% CI 1.04 – 1.31), more severe motor symptoms (OR 1.10, 95% CI 1.02 – 1.19), less anxiety (OR 0.90, 95% CI 0.82 – 0.98) and more depressive symptoms (OR 1.23, 95% CI 1.07 – 1.41). Significant predictors of a higher PG-SGA score included living alone (β=0.14, 95% CI 0.01 – 0.26), more depressive symptoms (β=0.02, 95% CI 0.01 – 0.02) and more severe motor symptoms (OR 0.01, 95% CI 0.01 – 0.02). More severe disease is associated with malnutrition, and this may be compounded by lack of social support. Phase II: Nutrition intervention Nineteen of the people identified in Phase I as requiring nutrition support were included in Phase II, in which a nutrition intervention was conducted. Nine participants were in the standard care group (SC), which received an information sheet only, and the other 10 participants were in the intervention group (INT), which received individualised nutrition information and weekly follow-up. INT gained 2.2% of starting body weight over the 12 week intervention period resulting in significant increases in weight, BMI, mid-arm circumference and waist circumference. The SC group gained 1% of starting weight over the 12 weeks which did not result in any significant changes in anthropometric indices. Energy and protein intake (18.3kJ/kg vs 3.8kJ/kg and 0.3g/kg vs 0.15g/kg) increased in both groups. The increase in protein intake was only significant in the SC group. The changes in intake, when compared between the groups, were no different. There were no significant changes in any motor or non-motor symptoms or in "off" times or dyskinesias in either group. Aspects of quality of life improved over the 12 weeks as well, especially emotional well-being. This thesis makes a significant contribution to the evidence base for the presence of malnutrition in Parkinson's disease as well as for the identification of those who would potentially benefit from nutrition screening and assessment. The nutrition intervention demonstrated that a traditional high protein, high energy approach to the management of malnutrition resulted in improved nutritional status and anthropometric indices with no effect on the presence of Parkinson's disease symptoms and a positive effect on quality of life.