3 resultados para Molecular quantum similarity measures

em Repositório Científico do Instituto Politécnico de Lisboa - Portugal


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be cornputationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional. datasets. In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised, semi-supervised, and unsupervised learning, being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 10(5) features, show the time efficiency of our methods, with lower generalization error than state-of-the-art techniques, while being dramatically simpler and faster.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clinical and environmental samples from Portugal were screened for the presence of Aspergillus and the distributions of the species complexes were determined in order to understand how their distributions differ based on their source. Fifty-seven Aspergillus isolates from clinical samples were collected from 10 health institutions. Six species complexes were detected by internal transcribed spacer sequencing; Fumigati, Flavi, and Nigri were found most frequently (50.9%, 21.0%, and 15.8%, respectively). β-tubulin and calmodulin sequencing resulted in seven cryptic species (A. awamorii, A. brasiliensis, A. fructus, A. lentulus, A. sydowii, A. tubingensis, Emericella echinulata) being identified among the 57 isolates. Thirty-nine isolates of Aspergillus were recovered from beach sand and poultry farms, 31 from swine farms, and 80 from hospital environments, for a total 189 isolates. Eleven species complexes were found in these 189 isolates, and those belonging to the Versicolores species complex were found most frequently (23.8%). There was a significant association between the different environmental sources and distribution of the species complexes; the hospital environment had greater variability of species complexes than other environmental locations. A high prevalence of cryptic species within the Circumdati complex was detected in several environments; from the isolates analyzed, at least four cryptic species were identified, most of them growing at 37ºC. Because Aspergillus species complexes have different susceptibilities to antifungals, knowing the species-complex epidemiology for each setting, as well as the identification of cryptic species among the collected clinical isolates, is important. This may allow preventive and corrective measures to be taken, which may result in decreased exposure to those organisms and a better prognosis.