229 resultados para Features extraction
Resumo:
It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of large scale terms and data patterns. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, there has been often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences; yet, how to effectively use large scale patterns remains a hard problem in text mining. To make a breakthrough in this challenging issue, this paper presents an innovative model for relevance feature discovery. It discovers both positive and negative patterns in text documents as higher level features and deploys them over low-level features (terms). It also classifies terms into categories and updates term weights based on their specificity and their distributions in patterns. Substantial experiments using this model on RCV1, TREC topics and Reuters-21578 show that the proposed model significantly outperforms both the state-of-the-art term-based methods and the pattern based methods.
Resumo:
This paper addresses two common problems that users of various products and interfaces encounter— over-featured interfaces and product documentation. Over-featured interfaces are seen as a problem as they can confuse and over-complicate everyday interactions. Researchers also often claim that users do not read product documentation, although they are often exhorted to ‘RTFM’(read the field manual).We conducted two sets of studies with users which looked at the issues of both manuals and excess features with common domestic and personal products. The quantitative set was a series of questionnaires administered to 170 people over 7 years. The qualitative set consisted of two 6-month longitudinal studies based on diaries and interviews with a total of 15 participants. We found that manuals are not read by the majority of people, and most do not use all the features of the products that they own and use regularly. Men are more likely to do both than women, and younger people are less likely to use manuals than middle-aged and older ones. More educated people are also less likely to read manuals. Over-featuring and being forced to consult manuals also appears to cause negative emotional experiences. Implications of these findings are discussed.
Resumo:
Ovarian cancer is the most common cause of gynaecological cancer death, with an overall 5-year relative survival of 43%. Impaired physical wellbeing and overall quality of life (QoL) represent major concerns for women during and following ovarian cancer treatment, predict survival and are amenable to change through interventions. Exercise, now considered an important part of overall management of a number of cancers, improves short-term outcomes (e.g., function, fatigue, QoL) during chemotherapy...
Resumo:
A staged crime scene involves deliberate alteration of evidence by the offender to simulate events that did not occur for the purpose of misleading authorities (Geberth, 2006; Turvey, 2000). This study examined 115 staged homicides from the USA to determine common elements; victim and perpetrator characteristics; and specific features of different types of staged scenes. General characteristics include: multiple victims and offenders; a previous relationship be- tween parties involved; and victims discovered in their own home, often by the offender. Staged scenes were separated by type with staged burglaries, suicides, accidents, and car accidents examined in more detail. Each type of scene displays differently with separate indicators and common features. Features of staged burglaries were: no points of entry/exit staged; non-valuables taken; scene ransacking; offender self- injury; and offenders bringing weapons to the scene. Features of staged suicides included: weapon arrangement and simulating self-injury to the victim; rearranging the body; and removing valuables. Examples of elements of staged accidents were arranging the implement/weapon and re- positioning the deceased; while staged car accidents involved: transporting the body to the vehicle and arranging both; mutilation after death; attempts to secure an alibi; and clean up at the primary crime scene. The results suggest few staging behaviors are used, despite the credibility they may have offered the façade. This is the first peer-reviewed, published study to examine the specific features of these scenes, and is the largest sample studied to date.
Resumo:
Local spatio-temporal features with a Bag-of-visual words model is a popular approach used in human action recognition. Bag-of-features methods suffer from several challenges such as extracting appropriate appearance and motion features from videos, converting extracted features appropriate for classification and designing a suitable classification framework. In this paper we address the problem of efficiently representing the extracted features for classification to improve the overall performance. We introduce two generative supervised topic models, maximum entropy discrimination LDA (MedLDA) and class- specific simplex LDA (css-LDA), to encode the raw features suitable for discriminative SVM based classification. Unsupervised LDA models disconnect topic discovery from the classification task, hence yield poor results compared to the baseline Bag-of-words framework. On the other hand supervised LDA techniques learn the topic structure by considering the class labels and improve the recognition accuracy significantly. MedLDA maximizes likelihood and within class margins using max-margin techniques and yields a sparse highly discriminative topic structure; while in css-LDA separate class specific topics are learned instead of common set of topics across the entire dataset. In our representation first topics are learned and then each video is represented as a topic proportion vector, i.e. it can be comparable to a histogram of topics. Finally SVM classification is done on the learned topic proportion vector. We demonstrate the efficiency of the above two representation techniques through the experiments carried out in two popular datasets. Experimental results demonstrate significantly improved performance compared to the baseline Bag-of-features framework which uses kmeans to construct histogram of words from the feature vectors.
Resumo:
The commercialization of aerial image processing is highly dependent on the platforms such as UAVs (Unmanned Aerial Vehicles). However, the lack of an automated UAV forced landing site detection system has been identified as one of the main impediments to allow UAV flight over populated areas in civilian airspace. This article proposes a UAV forced landing site detection system that is based on machine learning approaches including the Gaussian Mixture Model and the Support Vector Machine. A range of learning parameters are analysed including the number of Guassian mixtures, support vector kernels including linear, radial basis function Kernel (RBF) and polynormial kernel (poly), and the order of RBF kernel and polynormial kernel. Moreover, a modified footprint operator is employed during feature extraction to better describe the geometric characteristics of the local area surrounding a pixel. The performance of the presented system is compared to a baseline UAV forced landing site detection system which uses edge features and an Artificial Neural Network (ANN) region type classifier. Experiments conducted on aerial image datasets captured over typical urban environments reveal improved landing site detection can be achieved with an SVM classifier with an RBF kernel using a combination of colour and texture features. Compared to the baseline system, the proposed system provides significant improvement in term of the chance to detect a safe landing area, and the performance is more stable than the baseline in the presence of changes to the UAV altitude.
Resumo:
As negative employee attitudes towards alcohol and other drug (AOD) policies may have serious consequences for organizations, the present study examined demographic and attitudinal dimensions leading to employees’ perceptions of AOD policy effectiveness. Survey responses were obtained from 147 employees in an Australian agricultural organization. Three dimensions of attitudes towards AOD policies were examined: knowledge of policy features, attitudes towards testing, and preventative measures such as job design and organizational involvement in community health. Demographic differences were identified, with males and blue-collar employees reporting significantly more negative attitudes towards the AOD policy. Attitude dimensions were stronger predictors of perceptions of policy effectiveness than demographics, and the strongest predictor was preventative measures. This suggests that organizations should do more than design adequate and fair AOD policies, and take a more holistic approach to AOD impairment by engaging in workplace design to reduce AOD use and promote a consistent health message to employees and the community.
Resumo:
Erythropoietin (EPO), a glycoprotein hormone of ∼34 kDa, is an important hematopoietic growth factor, mainly produced in the kidney and controls the number of red blood cells circulating in the blood stream. Sensitive and rapid recombinant human EPO (rHuEPO) detection tools that improve on the current laborious EPO detection techniques are in high demand for both clinical and sports industry. A sensitive aptamer-functionalized biosensor (aptasensor) has been developed by controlled growth of gold nanostructures (AuNS) over a gold substrate (pAu/AuNS). The aptasensor selectively binds to rHuEPO and, therefore, was used to extract and detect the drug from horse plasma by surface enhanced Raman spectroscopy (SERS). Due to the nanogap separation between the nanostructures, the high population and distribution of hot spots on the pAu/AuNS substrate surface, strong signal enhancement was acquired. By using wide area illumination (WAI) setting for the Raman detection, a low RSD of 4.92% over 150 SERS measurements was achieved. The significant reproducibility of the new biosensor addresses the serious problem of SERS signal inconsistency that hampers the use of the technique in the field. The WAI setting is compatible with handheld Raman devices. Therefore, the new aptasensor can be used for the selective extraction of rHuEPO from biological fluids and subsequently screened with handheld Raman spectrometer for SERS based in-field protein detection.
Resumo:
Objective This paper presents an automatic active learning-based system for the extraction of medical concepts from clinical free-text reports. Specifically, (1) the contribution of active learning in reducing the annotation effort, and (2) the robustness of incremental active learning framework across different selection criteria and datasets is determined. Materials and methods The comparative performance of an active learning framework and a fully supervised approach were investigated to study how active learning reduces the annotation effort while achieving the same effectiveness as a supervised approach. Conditional Random Fields as the supervised method, and least confidence and information density as two selection criteria for active learning framework were used. The effect of incremental learning vs. standard learning on the robustness of the models within the active learning framework with different selection criteria was also investigated. Two clinical datasets were used for evaluation: the i2b2/VA 2010 NLP challenge and the ShARe/CLEF 2013 eHealth Evaluation Lab. Results The annotation effort saved by active learning to achieve the same effectiveness as supervised learning is up to 77%, 57%, and 46% of the total number of sequences, tokens, and concepts, respectively. Compared to the Random sampling baseline, the saving is at least doubled. Discussion Incremental active learning guarantees robustness across all selection criteria and datasets. The reduction of annotation effort is always above random sampling and longest sequence baselines. Conclusion Incremental active learning is a promising approach for building effective and robust medical concept extraction models, while significantly reducing the burden of manual annotation.
Resumo:
Contemporary models of spoken word production assume conceptual feature sharing determines the speed with which objects are named in categorically-related contexts. However, statistical models of concept representation have also identified a role for feature distinctiveness, i.e., features that identify a single concept and serve to distinguish it quickly from other similar concepts. In three experiments we investigated whether distinctive features might explain reports of counter-intuitive semantic facilitation effects in the picture word interference (PWI) paradigm. In Experiment 1, categorically-related distractors matched in terms of semantic similarity ratings (e.g., zebra and pony) and manipulated with respect to feature distinctiveness (e.g., a zebra has stripes unlike other equine species) elicited interference effects of comparable magnitude. Experiments 2 and 3 investigated the role of feature distinctiveness with respect to reports of facilitated naming with part-whole distractor-target relations (e.g., a hump is a distinguishing part of a CAMEL, whereas knee is not, vs. an unrelated part such as plug). Related part distractors did not influence target picture naming latencies significantly when the part denoted by the related distractor was not visible in the target picture (whether distinctive or not; Experiment 2). When the part denoted by the related distractor was visible in the target picture, non-distinctive part distractors slowed target naming significantly at SOA of -150 ms (Experiment 3). Thus, our results show that semantic interference does occur for part-whole distractor-target relations in PWI, but only when distractors denote features shared with the target and other category exemplars. We discuss the implications of these results for some recently developed, novel accounts of lexical access in spoken word production.
Resumo:
An automated method for extracting brain volumes from three commonly acquired three-dimensional (3D) MR images (proton density, T1 weighted, and T2-weighted) of the human head is described. The procedure is divided into four levels: preprocessing, segmentation, scalp removal, and postprocessing. A user-provided reference point is the sole operator-dependent input required. The method's parameters were first optimized and then fixed and applied to 30 repeat data sets from 15 normal older adult subjects to investigate its reproducibility. Percent differences between total brain volumes (TBVs) for the subjects' repeated data sets ranged from .5% to 2.2%. We conclude that the method is both robust and reproducible and has the potential for wide application.