18 resultados para Multimedia Data Mining
Resumo:
Well-known data mining algorithms rely on inputs in the form of pairwise similarities between objects. For large datasets it is computationally impossible to perform all pairwise comparisons. We therefore propose a novel approach that uses approximate Principal Component Analysis to efficiently identify groups of similar objects. The effectiveness of the approach is demonstrated in the context of binary classification using the supervised normalized cut as a classifier. For large datasets from the UCI repository, the approach significantly improves run times with minimal loss in accuracy.
Resumo:
Biodiversity, a multidimensional property of natural systems, is difficult to quantify partly because of the multitude of indices proposed for this purpose. Indices aim to describe general properties of communities that allow us to compare different regions, taxa, and trophic levels. Therefore, they are of fundamental importance for environmental monitoring and conservation, although there is no consensus about which indices are more appropriate and informative. We tested several common diversity indices in a range of simple to complex statistical analyses in order to determine whether some were better suited for certain analyses than others. We used data collected around the focal plant Plantago lanceolata on 60 temperate grassland plots embedded in an agricultural landscape to explore relationships between the common diversity indices of species richness (S), Shannon's diversity (H'), Simpson's diversity (D-1), Simpson's dominance (D-2), Simpson's evenness (E), and Berger-Parker dominance (BP). We calculated each of these indices for herbaceous plants, arbuscular mycorrhizal fungi, aboveground arthropods, belowground insect larvae, and P.lanceolata molecular and chemical diversity. Including these trait-based measures of diversity allowed us to test whether or not they behaved similarly to the better studied species diversity. We used path analysis to determine whether compound indices detected more relationships between diversities of different organisms and traits than more basic indices. In the path models, more paths were significant when using H', even though all models except that with E were equally reliable. This demonstrates that while common diversity indices may appear interchangeable in simple analyses, when considering complex interactions, the choice of index can profoundly alter the interpretation of results. Data mining in order to identify the index producing the most significant results should be avoided, but simultaneously considering analyses using multiple indices can provide greater insight into the interactions in a system.
Resumo:
NH···π hydrogen bonds occur frequently between the amino acid side groups in proteins and peptides. Data-mining studies of protein crystals find that ~80% of the T-shaped histidine···aromatic contacts are CH···π, and only ~20% are NH···π interactions. We investigated the infrared (IR) and ultraviolet (UV) spectra of the supersonic-jet-cooled imidazole·benzene (Im·Bz) complex as a model for the NH···π interaction between histidine and phenylalanine. Ground- and excited-state dispersion-corrected density functional calculations and correlated methods (SCS-MP2 and SCS-CC2) predict that Im·Bz has a Cs-symmetric T-shaped minimum-energy structure with an NH···π hydrogen bond to the Bz ring; the NH bond is tilted 12° away from the Bz C₆ axis. IR depletion spectra support the T-shaped geometry: The NH stretch vibrational fundamental is red shifted by −73 cm⁻¹ relative to that of bare imidazole at 3518 cm⁻¹, indicating a moderately strong NH···π interaction. While the Sₒ(A1g) → S₁(B₂u) origin of benzene at 38 086 cm⁻¹ is forbidden in the gas phase, Im·Bz exhibits a moderately intense Sₒ → S₁ origin, which appears via the D₆h → Cs symmetry lowering of Bz by its interaction with imidazole. The NH···π ground-state hydrogen bond is strong, De=22.7 kJ/mol (1899 cm⁻¹). The combination of gas-phase UV and IR spectra confirms the theoretical predictions that the optimum Im·Bz geometry is T shaped and NH···π hydrogen bonded. We find no experimental evidence for a CH···π hydrogen-bonded ground-state isomer of Im·Bz. The optimum NH···π geometry of the Im·Bz complex is very different from the majority of the histidine·aromatic contact geometries found in protein database analyses, implying that the CH···π contacts observed in these searches do not arise from favorable binding interactions but merely from protein side-chain folding and crystal-packing constraints. The UV and IR spectra of the imidazole·(benzene)₂ cluster are observed via fragmentation into the Im·Bz+ mass channel. The spectra of Im·Bz and Im·Bz₂ are cleanly separable by IR hole burning. The UV spectrum of Im·Bz₂ exhibits two 000 bands corresponding to the Sₒ → S₁ excitations of the two inequivalent benzenes, which are symmetrically shifted by −86/+88 cm⁻¹ relative to the 000 band of benzene.