91 resultados para language diversity
Resumo:
Parallel sub-word recognition (PSWR) is a new model that has been proposed for language identification (LID) which does not need elaborate phonetic labeling of the speech data in a foreign language. The new approach performs a front-end tokenization in terms of sub-word units which are designed by automatic segmentation, segment clustering and segment HMM modeling. We develop PSWR based LID in a framework similar to the parallel phone recognition (PPR) approach in the literature. This includes a front-end tokenizer and a back-end language model, for each language to be identified. Considering various combinations of the statistical evaluation scores, it is found that PSWR can perform as well as PPR, even with broad acoustic sub-word tokenization, thus making it an efficient alternative to the PPR system.
Resumo:
Dielectric dispersion and NMRD experiments have revealed that a significant fraction of water molecules in the hydration shell of various proteins do not exhibit any slowing down of dynamics. This is usually attributed to the presence of the hydrophobic residues (HBR) on the surface, although HBRs alone cannot account for the large amplitude of the fast component. Solvation dynamics experiments and also computer simulation studies, on the other hand, repeatedly observed the presence of a non-negligible slow component. Here we show, by considering three well-known proteins (lysozyme, myoglobin and adelynate kinase), that the fast component arises partly from the response of those water molecules that are hydrogen bonded with the backbone oxygen (BBO) atoms. These are structurally and energetically less stable than those with the side chain oxygen (SCO) atoms. In addition, the electrostatic interaction energy distribution (EIED) of individual water molecules (hydrogen bonded to SCO) with side chain oxygen atoms shows a surprising two peak character with the lower energy peak almost coincident with the energy distribution of water hydrogen bonded to backbone oxygen atoms (BBO). This two peak contribution appears to be quite general as we find it for lysozyme, myoglobin and adenylate kinase (ADK). The sharp peak of EIED at small energy (at less than 2 k(B)T) for the BBO atoms, together with the first peak of EIED of SCO and the HBRs on the protein surface, explain why a large fraction (similar to 80%) of water in the protein hydration layer remains almost as mobile as bulk water Significant slowness arises only from the hydrogen bonds that populate the second peak of EIED at larger energy (at about 4 k(B)T). Thus, if we consider hydrogen bond interaction alone, only 15-20% of water molecules in the protein hydration layer can exhibit slow dynamics, resulting in an average relaxation time of about 5-10 ps. The latter estimate assumes a time constant of 20-100 ps for the slow component. Interestingly, relaxation of water molecules hydrogen bonded to back bone oxygen exhibit an initial component faster than the bulk, suggesting that hydrogen bonding of these water molecules remains frustrated. This explanation of the heterogeneous and non-exponential dynamics of water in the hydration layer is quantitatively consistent with all the available experimental results, and provides unification among diverse features.
Resumo:
Niche differentiation has been proposed as an explanation for rarity in species assemblages. To test this hypothesis requires quantifying the ecological similarity of species. This similarity can potentially be estimated by using phylogenetic relatedness. In this study, we predicted that if niche differentiation does explain the co-occurrence of rare and common species, then rare species should contribute greatly to the overall community phylogenetic diversity (PD), abundance will have phylogenetic signal, and common and rare species will be phylogenetically dissimilar. We tested these predictions by developing a novel method that integrates species rank abundance distributions with phylogenetic trees and trend analyses, to examine the relative contribution of individual species to the overall community PD. We then supplement this approach with analyses of phylogenetic signal in abundances and measures of phylogenetic similarity within and between rare and common species groups. We applied this analytical approach to 15 long-term temperate and tropical forest dynamics plots from around the world. We show that the niche differentiation hypothesis is supported in six of the nine gap-dominated forests but is rejected in the six disturbance-dominated and three gap-dominated forests. We also show that the three metrics utilized in this study each provide unique but corroborating information regarding the phylogenetic distribution of rarity in communities.
Resumo:
Background: There has been growing interest in integrative taxonomy that uses data from multiple disciplines for species delimitation. Typically, in such studies, monophyly is taken as a proxy for taxonomic distinctiveness and these units are treated as potential species. However, monophyly could arise due to stochastic processes. Thus here, we have employed a recently developed tool based on coalescent approach to ascertain the taxonomic distinctiveness of various monophyletic units. Subsequently, the species status of these taxonomic units was further tested using corroborative evidence from morphology and ecology. This inter-disciplinary approach was implemented on endemic centipedes of the genus Digitipes (Attems 1930) from the Western Ghats (WG) biodiversity hotspot of India. The species of the genus Digitipes are morphologically conserved, despite their ancient late Cretaceous origin. Principal Findings: Our coalescent analysis based on mitochondrial dataset indicated the presence of nine putative species. The integrative approach, which includes nuclear, morphology, and climate datasets supported distinctiveness of eight putative species, of which three represent described species and five were new species. Among the five new species, three were morphologically cryptic species, emphasizing the effectiveness of this approach in discovering cryptic diversity in less explored areas of the tropics like the WG. In addition, species pairs showed variable divergence along the molecular, morphological and climate axes. Conclusions: A multidisciplinary approach illustrated here is successful in discovering cryptic diversity with an indication that the current estimates of invertebrate species richness for the WG might have been underestimated. Additionally, the importance of measuring multiple secondary properties of species while defining species boundaries was highlighted given variable divergence of each species pair across the disciplines.
Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences
Resumo:
Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.
Resumo:
In this letter, we analyze the Diversity Multiplexinggain Tradeoff (DMT) performance of a training-based reciprocal Single Input Multiple Output (SIMO) system. Assuming Channel State Information (CSI) is available at the Receiver (CSIR), we propose a channel-dependent power-controlled Reverse Channel Training (RCT) scheme that enables the transmitter to directly estimate the power control parameter to be used for the forwardlink data transmission. We show that, with an RCT power of (P) over bar (gamma), gamma > 0 and a forward data transmission power of (P) over bar, our proposed scheme achieves an infinite diversity order for 0 <= g(m) < L-c-L-B,L-tau/L-c min(gamma, 1) and r > 2, where g(m) is the multiplexing gain, L-c is the channel coherence time, L-B,L-tau is the RCT duration and r is the number of receive antennas. We also derive an upper bound on the outage probability and show that it goes to zero asymptotically as exp(-(P) over bar (E)), where E (sic) (gamma - g(m)L(c)/L-c-L-B,L-tau), at high (P) over bar. Thus, the proposed scheme achieves a significantly better DMT performance compared to the finite diversity order achieved by channel-agnostic, fixed-power RCT schemes.
Resumo:
We consider a complex, additive, white Gaussian noise channel with flat fading. We study its diversity order vs transmission rate for some known power allocation schemes. The capacity region is divided into three regions. For one power allocation scheme, the diversity order is exponential throughout the capacity region. For selective channel inversion (SCI) scheme, the diversity order is exponential in low and high rate region but polynomial in mid rate region. For fast fading case we also provide a new upper bound on block error probability and a power allocation scheme that minimizes it. The diversity order behaviour of this scheme is same as for SCI but provides lower BER than the other policies.
Resumo:
A low complexity, essentially-ML decoding technique for the Golden code and the three antenna Perfect code was introduced by Sirianunpiboon, Howard and Calderbank. Though no theoretical analysis of the decoder was given, the simulations showed that this decoding technique has almost maximum-likelihood (ML) performance. Inspired by this technique, in this paper we introduce two new low complexity decoders for Space-Time Block Codes (STBCs)-the Adaptive Conditional Zero-Forcing (ACZF) decoder and the ACZF decoder with successive interference cancellation (ACZF-SIC), which include as a special case the decoding technique of Sirianunpiboon et al. We show that both ACZF and ACZF-SIC decoders are capable of achieving full-diversity, and we give a set of sufficient conditions for an STBC to give full-diversity with these decoders. We then show that the Golden code, the three and four antenna Perfect codes, the three antenna Threaded Algebraic Space-Time code and the four antenna rate 2 code of Srinath and Rajan are all full-diversity ACZF/ACZF-SIC decodable with complexity strictly less than that of their ML decoders. Simulations show that the proposed decoding method performs identical to ML decoding for all these five codes. These STBCs along with the proposed decoding algorithm have the least decoding complexity and best error performance among all known codes for transmit antennas. We further provide a lower bound on the complexity of full-diversity ACZF/ACZF-SIC decoding. All the five codes listed above achieve this lower bound and hence are optimal in terms of minimizing the ACZF/ACZF-SIC decoding complexity. Both ACZF and ACZF-SIC decoders are amenable to sphere decoding implementation.
Resumo:
Users can rarely reveal their information need in full detail to a search engine within 1--2 words, so search engines need to "hedge their bets" and present diverse results within the precious 10 response slots. Diversity in ranking is of much recent interest. Most existing solutions estimate the marginal utility of an item given a set of items already in the response, and then use variants of greedy set cover. Others design graphs with the items as nodes and choose diverse items based on visit rates (PageRank). Here we introduce a radically new and natural formulation of diversity as finding centers in resistive graphs. Unlike in PageRank, we do not specify the edge resistances (equivalently, conductances) and ask for node visit rates. Instead, we look for a sparse set of center nodes so that the effective conductance from the center to the rest of the graph has maximum entropy. We give a cogent semantic justification for turning PageRank thus on its head. In marked deviation from prior work, our edge resistances are learnt from training data. Inference and learning are NP-hard, but we give practical solutions. In extensive experiments with subtopic retrieval, social network search, and document summarization, our approach convincingly surpasses recently-published diversity algorithms like subtopic cover, max-marginal relevance (MMR), Grasshopper, DivRank, and SVMdiv.
Resumo:
Recently, Guo and Xia introduced low complexity decoders called Partial Interference Cancellation (PIC) and PIC with Successive Interference Cancellation (PIC-SIC), which include the Zero Forcing (ZF) and ZF-SIC receivers as special cases, for point-to-point MIMO channels. In this paper, we show that PIC and PIC-SIC decoders are capable of achieving the full cooperative diversity available in wireless relay networks. We give sufficient conditions for a Distributed Space-Time Block Code (DSTBC) to achieve full diversity with PIC and PIC-SIC decoders and construct a new class of DSTBCs with low complexity full-diversity PIC-SIC decoding using complex orthogonal designs. The new class of codes includes a number of known full-diversity PIC/PIC-SIC decodable Space-Time Block Codes (STBCs) constructed for point-to-point channels as special cases. The proposed DSTBCs achieve higher rates (in complex symbols per channel use) than the multigroup ML decodable DSTBCs available in the literature. Simulation results show that the proposed codes have better bit error rate performance than the best known low complexity, full-diversity DSTBCs.
Resumo:
In this paper, we consider a slow-fading nt ×nr multiple-input multiple-output (MIMO) channel subjected to block fading. Reliability (in terms of achieved diversity order) and rate (in number of symbols transmitted per channel use) are of interest in such channels. We propose a new precoding scheme which achieves both full diversity (nt ×nrth order diversity) as well as full rate (nt symbols per channel use) using partial channel state information at the transmitter (CSIT). The proposed scheme achieves full diversity and improved coding gain through an optimization over the choice of constellation sets. The optimization maximizes dmin2 for our precoding scheme subject to an energy constraint. The scheme requires feedback of nt - 1 angle parameter values, compared to 2ntnr real coefficients in case of full CSIT. Further, for the case of nt × 1 system, we prove that the capacity achieved by the proposed scheme is same as that achieved with full CSIT. Error rate performance results for nt = 3,4,8 show that the proposed scheme performs better than other precoding schemes in the literature; the better performance is due to the choice of the signal sets and the feedback angles in the proposed scheme.
Resumo:
Recent work on molecular phylogenetics of Scolopendridae from the Western Ghats, Peninsular India, has suggested the presence of six cryptic species of the otostigmine Digitipes Attems, 1930, together with three species described in previous taxonomic work by Jangi and Dass (1984). Digitipes is the correct generic attribution for a monophyletic group of Indian species, these being united with three species from tropical Africa (including the type) that share a distomedial process on the ultimate leg femur of males that is otherwise unknown in Otostigminae. Second maxillary characters previously used in the diagnosis of Digitipes are dismissed because Indian species do not possess the putatively diagnostic character states. Two new species from the Western Ghats that correspond to groupings identified based on monophyly, sequence divergence and coalescent analysis using molecular data are diagnosed based on distinct morphological characters. They are D. jangii and D. periyarensis n. spp. Three species named by Jangi and Dass (Digitipes barnabasi, D. coonoorensis and D. indicus) are revised based on new collections; D. indicus is a junior subjective synonym of Arthrorhabdus jonesii Verhoeff, 1938, the combination becoming Digitipes jonesii (Verhoeff, 1938) n. comb. The presence of Arthrorhabdus in India is accordingly refuted. Three putative species delimited by molecular and ecological data remain cryptic from the perspective of diagnostic morphological characters and are presently retained in D. barnabasi, D. jangii and D. jonesii. A molecularly-delimited species that resolved as sister group to a well-supported clade of Indian Digitipes is identified as Otostigmus ruficeps Pocock, 1890, originally described from a single specimen and revised herein. One Indian species originally assigned to Digitipes, D. gravelyi, deviates from confidently-assigned Digitipes with respect to several characters and is reassigned to Otostigmus, as O. gravelyi (Jangi and Dass, 1984) n. comb.
Resumo:
In this paper, we propose modulation diversity techniques for Spatial Modulation (SM) system using Complex Interleaved Orthogonal Design (CIOD). Specifically, we show that the standard SM scheme can achieve a transmit diversity order of two by using the CIOD meant for two transmit antenna system without incurring any additional system complexity or bandwidth requirement. Furthermore, we propose a low-complexity maximum likelihood detector for our CIOD based SM schemes by exploiting the structure of the CIOD. We show with our simulation results that the proposed schemes offer transmit diversity order of two and give a better symbol error rate performance than the conventional SM scheme.
Resumo:
N-gram language models and lexicon-based word-recognition are popular methods in the literature to improve recognition accuracies of online and offline handwritten data. However, there are very few works that deal with application of these techniques on online Tamil handwritten data. In this paper, we explore methods of developing symbol-level language models and a lexicon from a large Tamil text corpus and their application to improving symbol and word recognition accuracies. On a test database of around 2000 words, we find that bigram language models improve symbol (3%) and word recognition (8%) accuracies and while lexicon methods offer much greater improvements (30%) in terms of word recognition, there is a large dependency on choosing the right lexicon. For comparison to lexicon and language model based methods, we have also explored re-evaluation techniques which involve the use of expert classifiers to improve symbol and word recognition accuracies.