22 resultados para Information Filtering, Pattern Mining, Relevance Feature Discovery, Text Mining

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gait disturbances are a common feature of Parkinson’s disease, one of the most severe being freezing of gait. Sensory cueing is a common method used to facilitate stepping in people with Parkinson’s. Recent work has shown that, compared to walking to a metronome, Parkinson’s patients without freezing of gait (nFOG) showed reduced gait variability when imitating recorded sounds of footsteps made on gravel. However, it is not known if these benefits are realised through the continuity of the acoustic information or the action-relevance. Furthermore, no study has examined if these benefits extend to PD with freezing of gait. We prepared four different auditory cues (varying in action-relevance and acoustic continuity) and asked 19 Parkinson’s patients (10 nFOG, 9 with freezing of gait (FOG)) to step in place to each cue. Results showed a superiority of action-relevant cues (regardless of cue-continuity) for inducing reductions in Step coefficient of variation (CV). Acoustic continuity was associated with a significant reduction in Swing CV. Neither cue-continuity nor action-relevance was independently sufficient to increase the time spent stepping before freezing. However, combining both attributes in the same cue did yield significant improvements. This study demonstrates the potential of using action-sounds as sensory cues for Parkinson’s patients with freezing of gait. We suggest that the improvements shown might be considered audio-motor ‘priming’ (i.e., listening to the sounds of footsteps will engage sensorimotor circuitry relevant to the production of that same action, thus effectively bypassing the defective basal ganglia).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have determined the mitochondrial genotype of liver fluke present in Bison (Bison bonasus) from the herd maintained in the Bialowieza National Park in order to determine the origin of the infection. Our results demonstrated that the infrapopulations present in the bison were genetically diverse and were likely to have been derived from the population present in local cattle. From a consideration of the genetic structure of the liver fluke infrapopulations we conclude that the provision of hay at feeding stations may be implicated in the transmission of this parasite to the bison. This information may be of relevance to the successful management of the herd. © 2012 Elsevier B.V.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Malware detection is a growing problem particularly on the Android mobile platform due to its increasing popularity and accessibility to numerous third party app markets. This has also been made worse by the increasingly sophisticated detection avoidance techniques employed by emerging malware families. This calls for more effective techniques for detection and classification of Android malware. Hence, in this paper we present an n-opcode analysis based approach that utilizes machine learning to classify and categorize Android malware. This approach enables automated feature discovery that eliminates the need for applying expert or domain knowledge to define the needed features. Our experiments on 2520 samples that were performed using up to 10-gram opcode features showed that an f-measure of 98% is achievable using this approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

N-gram analysis is an approach that investigates the structure of a program using bytes, characters, or text strings. A key issue with N-gram analysis is feature selection amidst the explosion of features that occurs when N is increased. The experiments within this paper represent programs as operational code (opcode) density histograms gained through dynamic analysis. A support vector machine is used to create a reference model, which is used to evaluate two methods of feature reduction, which are 'area of intersect' and 'subspace analysis using eigenvectors.' The findings show that the relationships between features are complex and simple statistics filtering approaches do not provide a viable approach. However, eigenvector subspace analysis produces a suitable filter.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

BACKGROUND: To date, there are no clinically reliable predictive markers of response to the current treatment regimens for advanced colorectal cancer. The aim of the current study was to compare and assess the power of transcriptional profiling using a generic microarray and a disease-specific transcriptome-based microarray. We also examined the biological and clinical relevance of the disease-specific transcriptome.

METHODS: DNA microarray profiling was carried out on isogenic sensitive and 5-FU-resistant HCT116 colorectal cancer cell lines using the Affymetrix HG-U133 Plus2.0 array and the Almac Diagnostics Colorectal cancer disease specific Research tool. In addition, DNA microarray profiling was also carried out on pre-treatment metastatic colorectal cancer biopsies using the colorectal cancer disease specific Research tool. The two microarray platforms were compared based on detection of probesets and biological information.

RESULTS: The results demonstrated that the disease-specific transcriptome-based microarray was able to out-perform the generic genomic-based microarray on a number of levels including detection of transcripts and pathway analysis. In addition, the disease-specific microarray contains a high percentage of antisense transcripts and further analysis demonstrated that a number of these exist in sense:antisense pairs. Comparison between cell line models and metastatic CRC patient biopsies further demonstrated that a number of the identified sense:antisense pairs were also detected in CRC patient biopsies, suggesting potential clinical relevance.

CONCLUSIONS: Analysis from our in vitro and clinical experiments has demonstrated that many transcripts exist in sense:antisense pairs including IGF2BP2, which may have a direct regulatory function in the context of colorectal cancer. While the functional relevance of the antisense transcripts has been established by many studies, their functional role is currently unclear; however, the numbers that have been detected by the disease-specific microarray would suggest that they may be important regulatory transcripts. This study has demonstrated the power of a disease-specific transcriptome-based approach and highlighted the potential novel biologically and clinically relevant information that is gained when using such a methodology.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

One of the major challenges in systems biology is to understand the complex responses of a biological system to external perturbations or internal signalling depending on its biological conditions. Genome-wide transcriptomic profiling of cellular systems under various chemical perturbations allows the manifestation of certain features of the chemicals through their transcriptomic expression profiles. The insights obtained may help to establish the connections between human diseases, associated genes and therapeutic drugs. The main objective of this study was to systematically analyse cellular gene expression data under various drug treatments to elucidate drug-feature specific transcriptomic signatures. We first extracted drug-related information (drug features) from the collected textual description of DrugBank entries using text-mining techniques. A novel statistical method employing orthogonal least square learning was proposed to obtain drug-feature-specific signatures by integrating gene expression with DrugBank data. To obtain robust signatures from noisy input datasets, a stringent ensemble approach was applied with the combination of three techniques: resampling, leave-one-out cross validation, and aggregation. The validation experiments showed that the proposed method has the capacity of extracting biologically meaningful drug-feature-specific gene expression signatures. It was also shown that most of signature genes are connected with common hub genes by regulatory network analysis. The common hub genes were further shown to be related to general drug metabolism by Gene Ontology analysis. Each set of genes has relatively few interactions with other sets, indicating the modular nature of each signature and its drug-feature-specificity. Based on Gene Ontology analysis, we also found that each set of drug feature (DF)-specific genes were indeed enriched in biological processes related to the drug feature. The results of these experiments demonstrated the pot- ntial of the method for predicting certain features of new drugs using their transcriptomic profiles, providing a useful methodological framework and a valuable resource for drug development and characterization.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

We address the problem of mining interesting phrases from subsets of a text corpus where the subset is specified using a set of features such as keywords that form a query. Previous algorithms for the problem have proposed solutions that involve sifting through a phrase dictionary based index or a document-based index where the solution is linear in either the phrase dictionary size or the size of the document subset. We propose the usage of an independence assumption between query keywords given the top correlated phrases, wherein the pre-processing could be reduced to discovering phrases from among the top phrases per each feature in the query. We then outline an indexing mechanism where per-keyword phrase lists are stored either in disk or memory, so that popular aggregation algorithms such as No Random Access and Sort-merge Join may be adapted to do the scoring at real-time to identify the top interesting phrases. Though such an approach is expected to be approximate, we empirically illustrate that very high accuracies (of over 90%) are achieved against the results of exact algorithms. Due to the simplified list-aggregation, we are also able to provide response times that are orders of magnitude better than state-of-the-art algorithms. Interestingly, our disk-based approach outperforms the in-memory baselines by up to hundred times and sometimes more, confirming the superiority of the proposed method.

Relevância:

40.00% 40.00%

Publicador:

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The importance and use of text extraction from camera based coloured scene images is rapidly increasing with time. Text within a camera grabbed image can contain a huge amount of meta data about that scene. Such meta data can be useful for identification, indexing and retrieval purposes. While the segmentation and recognition of text from document images is quite successful, detection of coloured scene text is a new challenge for all camera based images. Common problems for text extraction from camera based images are the lack of prior knowledge of any kind of text features such as colour, font, size and orientation as well as the location of the probable text regions. In this paper, we document the development of a fully automatic and extremely robust text segmentation technique that can be used for any type of camera grabbed frame be it single image or video. A new algorithm is proposed which can overcome the current problems of text segmentation. The algorithm exploits text appearance in terms of colour and spatial distribution. When the new text extraction technique was tested on a variety of camera based images it was found to out perform existing techniques (or something similar). The proposed technique also overcomes any problems that can arise due to an unconstraint complex background. The novelty in the works arises from the fact that this is the first time that colour and spatial information are used simultaneously for the purpose of text extraction.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In the last decade, data mining has emerged as one of the most dynamic and lively areas in information technology. Although many algorithms and techniques for data mining have been proposed, they either focus on domain independent techniques or on very specific domain problems. A general requirement in bridging the gap between academia and business is to cater to general domain-related issues surrounding real-life applications, such as constraints, organizational factors, domain expert knowledge, domain adaption, and operational knowledge. Unfortunately, these either have not been addressed, or have not been sufficiently addressed, in current data mining research and development.Domain-Driven Data Mining (D3M) aims to develop general principles, methodologies, and techniques for modeling and merging comprehensive domain-related factors and synthesized ubiquitous intelligence surrounding problem domains with the data mining process, and discovering knowledge to support business decision-making. This paper aims to report original, cutting-edge, and state-of-the-art progress in D3M. It covers theoretical and applied contributions aiming to: 1) propose next-generation data mining frameworks and processes for actionable knowledge discovery, 2) investigate effective (automated, human and machine-centered and/or human-machined-co-operated) principles and approaches for acquiring, representing, modelling, and engaging ubiquitous intelligence in real-world data mining, and 3) develop workable and operational systems balancing technical significance and applications concerns, and converting and delivering actionable knowledge into operational applications rules to seamlessly engage application processes and systems.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This study investigates face recognition with partial occlusion, illumination variation and their combination, assuming no prior information about the mismatch, and limited training data for each person. The authors extend their previous posterior union model (PUM) to give a new method capable of dealing with all these problems. PUM is an approach for selecting the optimal local image features for recognition to improve robustness to partial occlusion. The extension is in two stages. First, authors extend PUM from a probability-based formulation to a similarity-based formulation, so that it operates with as little as one single training sample to offer robustness to partial occlusion. Second, they extend this new formulation to make it robust to illumination variation, and to combined illumination variation and partial occlusion, by a novel combination of multicondition relighting and optimal feature selection. To evaluate the new methods, a number of databases with various simulated and realistic occlusion/illumination mismatches have been used. The results have demonstrated the improved robustness of the new methods.