7 resultados para Real-world semantics
Resumo:
With Tweet volumes reaching 500 million a day, sampling is inevitable for any application using Twitter data. Realizing this, data providers such as Twitter, Gnip and Boardreader license sampled data streams priced in accordance with the sample size. Big Data applications working with sampled data would be interested in working with a large enough sample that is representative of the universal dataset. Previous work focusing on the representativeness issue has considered ensuring the global occurrence rates of key terms, be reliably estimated from the sample. Present technology allows sample size estimation in accordance with probabilistic bounds on occurrence rates for the case of uniform random sampling. In this paper, we consider the problem of further improving sample size estimates by leveraging stratification in Twitter data. We analyze our estimates through an extensive study using simulations and real-world data, establishing the superiority of our method over uniform random sampling. Our work provides the technical know-how for data providers to expand their portfolio to include stratified sampled datasets, whereas applications are benefited by being able to monitor more topics/events at the same data and computing cost.
Resumo:
Background There is increasing interest in how culture may affect the quality of healthcare services, and previous research has shown that ‘treatment culture’—of which there are three categories (resident centred, ambiguous and traditional)—in a nursing home may influence prescribing of psychoactive medications. Objective The objective of this study was to explore and understand treatment culture in prescribing of psychoactive medications for older people with dementia in nursing homes. Method Six nursing homes—two from each treatment culture category—participated in this study. Qualitative data were collected through semi-structured interviews with nursing home staff and general practitioners (GPs), which sought to determine participants’ views on prescribing and administration of psychoactive medication, and their understanding of treatment culture and its potential influence on prescribing of psychoactive drugs. Following verbatim transcription, the data were analysed and themes were identified, facilitated by NVivo and discussion within the research team. Results Interviews took place with five managers, seven nurses, 13 care assistants and two GPs. Four themes emerged: the characteristics of the setting, the characteristics of the individual, relationships and decision making. The characteristics of the setting were exemplified by views of the setting, daily routines and staff training. The characteristics of the individual were demonstrated by views on the personhood of residents and staff attitudes. Relationships varied between staff within and outside the home. These relationships appeared to influence decision making about prescribing of medications. The data analysis found that each home exhibited traits that were indicative of its respective assigned treatment culture. Conclusion Nursing home treatment culture appeared to be influenced by four main themes. Modification of these factors may lead to a shift in culture towards a more flexible, resident-centred culture and a reduction in prescribing and use of psychoactive medication.
Resumo:
This paper addresses the problem of colorectal tumour segmentation in complex real world imagery. For efficient segmentation, a multi-scale strategy is developed for extracting the potentially cancerous region of interest (ROI) based on colour histograms while searching for the best texture resolution. To achieve better segmentation accuracy, we apply a novel bag-of-visual-words method based on rotation invariant raw statistical features and random projection based l2-norm sparse representation to classify tumour areas in histopathology images. Experimental results on 20 real world digital slides demonstrate that the proposed algorithm results in better recognition accuracy than several state of the art segmentation techniques.
Resumo:
Community-driven Question Answering (CQA) systems that crowdsource experiential information in the form of questions and answers and have accumulated valuable reusable knowledge. Clustering of QA datasets from CQA systems provides a means of organizing the content to ease tasks such as manual curation and tagging. In this paper, we present a clustering method that exploits the two-part question-answer structure in QA datasets to improve clustering quality. Our method, {\it MixKMeans}, composes question and answer space similarities in a way that the space on which the match is higher is allowed to dominate. This construction is motivated by our observation that semantic similarity between question-answer data (QAs) could get localized in either space. We empirically evaluate our method on a variety of real-world labeled datasets. Our results indicate that our method significantly outperforms state-of-the-art clustering methods for the task of clustering question-answer archives.
Resumo:
Cybercriminals ramp up their efforts with sophisticated techniques while defenders gradually update their typical security measures. Attackers often have a long-term interest in their targets. Due to a number of factors such as scale, architecture and nonproductive traffic however it makes difficult to detect them using typical intrusion detection techniques. Cyber early warning systems (CEWS) aim at alerting such attempts in their nascent stages using preliminary indicators. Design and implementation of such systems involves numerous research challenges such as generic set of indicators, intelligence gathering, uncertainty reasoning and information fusion. This paper discusses such challenges and presents the reader with compelling motivation. A carefully deployed empirical analysis using a real world attack scenario and a real network traffic capture is also presented.
Resumo:
In acoustic instruments, the controller and the sound producing system often are one and the same object. If virtualacoustic instruments are to be designed to not only simulate the vibrational behaviour of a real-world counterpart but also to inherit much of its interface dynamics, it would make sense that the physical form of the controller is similar to that of the emulated instrument. The specific physical model configuration discussed here reconnects a (silent) string controller with a modal synthesis string resonator across the real and virtual domains by direct routing of excitation signals and model parameters. The excitation signals are estimated in their original force-like form via careful calibration of the sensor, making use of adaptive filtering techniques to design an appropriate inverse filter. In addition, the excitation position is estimated from sensors mounted under the legs of the bridges on either end of the prototype string controller. The proposed methodology is explained and exemplified with preliminary results obtained with a number of off-line experiments.
Resumo:
Person re-identification involves recognizing a person across non-overlapping camera views, with different pose, illumination, and camera characteristics. We propose to tackle this problem by training a deep convolutional network to represent a person’s appearance as a low-dimensional feature vector that is invariant to common appearance variations encountered in the re-identification problem. Specifically, a Siamese-network architecture is used to train a feature extraction network using pairs of similar and dissimilar images. We show that use of a novel multi-task learning objective is crucial for regularizing the network parameters in order to prevent over-fitting due to the small size the training dataset. We complement the verification task, which is at the heart of re-identification, by training the network to jointly perform verification, identification, and to recognise attributes related to the clothing and pose of the person in each image. Additionally, we show that our proposed approach performs well even in the challenging cross-dataset scenario, which may better reflect real-world expected performance.