973 resultados para Compressed text search
Resumo:
Protein structure comparison is essential for understanding various aspects of protein structure, function and evolution. It can be used to explore the structural diversity and evolutionary patterns of protein families. In view of the above, a new algorithm is proposed which performs faster protein structure comparison using the peptide backbone torsional angles. It is fast, robust, computationally less expensive and efficient in finding structural similarities between two different protein structures and is also capable of identifying structural repeats within the same protein molecule.
Resumo:
Our everyday visual experience frequently involves searching for objects in clutter. Why are some searches easy and others hard? It is generally believed that the time taken to find a target increases as it becomes similar to its surrounding distractors. Here, I show that while this is qualitatively true, the exact relationship is in fact not linear. In a simple search experiment, when subjects searched for a bar differing in orientation from its distractors, search time was inversely proportional to the angular difference in orientation. Thus, rather than taking search reaction time (RT) to be a measure of target-distractor similarity, we can literally turn search time on its head (i.e. take its reciprocal 1/RT) to obtain a measure of search dissimilarity that varies linearly over a large range of target-distractor differences. I show that this dissimilarity measure has the properties of a distance metric, and report two interesting insights come from this measure: First, for a large number of searches, search asymmetries are relatively rare and when they do occur, differ by a fixed distance. Second, search distances can be used to elucidate object representations that underlie search - for example, these representations are roughly invariant to three-dimensional view. Finally, search distance has a straightforward interpretation in the context of accumulator models of search, where it is proportional to the discriminative signal that is integrated to produce a response. This is consistent with recent studies that have linked this distance to neuronal discriminability in visual cortex. Thus, while search time remains the more direct measure of visual search, its reciprocal also has the potential for interesting and novel insights. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
We consider a visual search problem studied by Sripati and Olson where the objective is to identify an oddball image embedded among multiple distractor images as quickly as possible. We model this visual search task as an active sequential hypothesis testing problem (ASHT problem). Chernoff in 1959 proposed a policy in which the expected delay to decision is asymptotically optimal. The asymptotics is under vanishing error probabilities. We first prove a stronger property on the moments of the delay until a decision, under the same asymptotics. Applying the result to the visual search problem, we then propose a ``neuronal metric'' on the measured neuronal responses that captures the discriminability between images. From empirical study we obtain a remarkable correlation (r = 0.90) between the proposed neuronal metric and speed of discrimination between the images. Although this correlation is lower than with the L-1 metric used by Sripati and Olson, this metric has the advantage of being firmly grounded in formal decision theory.
Resumo:
In pay-per-click sponsored search auctions which are currently extensively used by search engines, the auction for a keyword involves a certain number of advertisers (say k) competing for available slots (say m) to display their advertisements (ads for short). A sponsored search auction for a keyword is typically conducted for a number of rounds (say T). There are click probabilities mu(ij) associated with each agent slot pair (agent i and slot j). The search engine would like to maximize the social welfare of the advertisers, that is, the sum of values of the advertisers for the keyword. However, the search engine does not know the true values advertisers have for a click to their respective advertisements and also does not know the click probabilities. A key problem for the search engine therefore is to learn these click probabilities during the initial rounds of the auction and also to ensure that the auction mechanism is truthful. Mechanisms for addressing such learning and incentives issues have recently been introduced. These mechanisms, due to their connection to the multi-armed bandit problem, are aptly referred to as multi-armed bandit (MAB) mechanisms. When m = 1, exact characterizations for truthful MAB mechanisms are available in the literature. Recent work has focused on the more realistic but non-trivial general case when m > 1 and a few promising results have started appearing. In this article, we consider this general case when m > 1 and prove several interesting results. Our contributions include: (1) When, mu(ij)s are unconstrained, we prove that any truthful mechanism must satisfy strong pointwise monotonicity and show that the regret will be Theta T7) for such mechanisms. (2) When the clicks on the ads follow a certain click precedence property, we show that weak pointwise monotonicity is necessary for MAB mechanisms to be truthful. (3) If the search engine has a certain coarse pre-estimate of mu(ij) values and wishes to update them during the course of the T rounds, we show that weak pointwise monotonicity and type-I separatedness are necessary while weak pointwise monotonicity and type-II separatedness are sufficient conditions for the MAB mechanisms to be truthful. (4) If the click probabilities are separable into agent-specific and slot-specific terms, we provide a characterization of MAB mechanisms that are truthful in expectation.
Resumo:
How do we perform rapid visual categorization?It is widely thought that categorization involves evaluating the similarity of an object to other category items, but the underlying features and similarity relations remain unknown. Here, we hypothesized that categorization performance is based on perceived similarity relations between items within and outside the category. To this end, we measured the categorization performance of human subjects on three diverse visual categories (animals, vehicles, and tools) and across three hierarchical levels (superordinate, basic, and subordinate levels among animals). For the same subjects, we measured their perceived pair-wise similarities between objects using a visual search task. Regardless of category and hierarchical level, we found that the time taken to categorize an object could be predicted using its similarity to members within and outside its category. We were able to account for several classic categorization phenomena, such as (a) the longer times required to reject category membership; (b) the longer times to categorize atypical objects; and (c) differences in performance across tasks and across hierarchical levels. These categorization times were also accounted for by a model that extracts coarse structure from an image. The striking agreement observed between categorization and visual search suggests that these two disparate tasks depend on a shared coarse object representation.
Resumo:
This article considers a class of deploy and search strategies for multi-robot systems and evaluates their performance. The application framework used is deployment of a system of autonomous mobile robots equipped with required sensors in a search space to gather information. The lack of information about the search space is modelled as an uncertainty density distribution. The agents are deployed to maximise single-step search effectiveness. The centroidal Voronoi configuration, which achieves a locally optimal deployment, forms the basis for sequential deploy and search (SDS) and combined deploy and search (CDS) strategies. Completeness results are provided for both search strategies. The deployment strategy is analysed in the presence of constraints on robot speed and limit on sensor range for the convergence of trajectories with corresponding control laws responsible for the motion of robots. SDS and CDS strategies are compared with standard greedy and random search strategies on the basis of time taken to achieve reduction in the uncertainty density below a desired level. The simulation experiments reveal several important issues related to the dependence of the relative performances of the search strategies on parameters such as the number of robots, speed of robots and their sensor range limits.
Resumo:
The present approach uses stopwords and the gaps that oc- cur between successive stopwords –formed by contentwords– as features for sentiment classification.
Resumo:
Low-complexity near-optimal detection of signals in MIMO systems with large number (tens) of antennas is getting increased attention. In this paper, first, we propose a variant of Markov chain Monte Carlo (MCMC) algorithm which i) alleviates the stalling problem encountered in conventional MCMC algorithm at high SNRs, and ii) achieves near-optimal performance for large number of antennas (e.g., 16×16, 32×32, 64×64 MIMO) with 4-QAM. We call this proposed algorithm as randomized MCMC (R-MCMC) algorithm. Second, we propose an other algorithm based on a random selection approach to choose candidate vectors to be tested in a local neighborhood search. This algorithm, which we call as randomized search (RS) algorithm, also achieves near-optimal performance for large number of antennas with 4-QAM. The complexities of the proposed R-MCMC and RS algorithms are quadratic/sub-quadratic in number of transmit antennas, which are attractive for detection in large-MIMO systems. We also propose message passing aided R-MCMC and RS algorithms, which are shown to perform well for higher-order QAM.
Resumo:
In this paper, a comparative study is carried using three nature-inspired algorithms namely Genetic Algorithm (GA), Particle Swarm Optimization (PSO) and Cuckoo Search (CS) on clustering problem. Cuckoo search is used with levy flight. The heavy-tail property of levy flight is exploited here. These algorithms are used on three standard benchmark datasets and one real-time multi-spectral satellite dataset. The results are tabulated and analysed using various techniques. Finally we conclude that under the given set of parameters, cuckoo search works efficiently for majority of the dataset and levy flight plays an important role.
Resumo:
This paper describes a semi-automatic tool for annotation of multi-script text from natural scene images. To our knowledge, this is the maiden tool that deals with multi-script text or arbitrary orientation. The procedure involves manual seed selection followed by a region growing process to segment each word present in the image. The threshold for region growing can be varied by the user so as to ensure pixel-accurate character segmentation. The text present in the image is tagged word-by-word. A virtual keyboard interface has also been designed for entering the ground truth in ten Indic scripts, besides English. The keyboard interface can easily be generated for any script, thereby expanding the scope of the toolkit. Optionally, each segmented word can further be labeled into its constituent characters/symbols. Polygonal masks are used to split or merge the segmented words into valid characters/symbols. The ground truth is represented by a pixel-level segmented image and a '.txt' file that contains information about the number of words in the image, word bounding boxes, script and ground truth Unicode. The toolkit, developed using MATLAB, can be used to generate ground truth and annotation for any generic document image. Thus, it is useful for researchers in the document image processing community for evaluating the performance of document analysis and recognition techniques. The multi-script annotation toolokit (MAST) is available for free download.
Resumo:
This paper describes a new method of color text localization from generic scene images containing text of different scripts and with arbitrary orientations. A representative set of colors is first identified using the edge information to initiate an unsupervised clustering algorithm. Text components are identified from each color layer using a combination of a support vector machine and a neural network classifier trained on a set of low-level features derived from the geometric, boundary, stroke and gradient information. Experiments on camera-captured images that contain variable fonts, size, color, irregular layout, non-uniform illumination and multiple scripts illustrate the robustness of the method. The proposed method yields precision and recall of 0.8 and 0.86 respectively on a database of 100 images. The method is also compared with others in the literature using the ICDAR 2003 robust reading competition dataset.
Resumo:
Signal acquisition under a compressed sensing scheme offers the possibility of acquisition and reconstruction of signals sparse on some basis incoherent with measurement kernel with sub-Nyquist number of measurements. In particular when the sole objective of the acquisition is the detection of the frequency of a signal rather than exact reconstruction, then an undersampling framework like CS is able to perform the task. In this paper we explore the possibility of acquisition and detection of frequency of multiple analog signals, heavily corrupted with additive white Gaussian noise. We improvise upon the MOSAICS architecture proposed by us in our previous work to include a wider class of signals having non-integral frequency components. This makes it possible to perform multiplexed compressed sensing for general frequency sparse signals.
Resumo:
Distributed compressed sensing exploits information redundancy, inbuilt in multi-signal ensembles with interas well as intra-signal correlations, to reconstruct undersampled signals. In this paper we revisit this problem, albeit from a different perspective, of taking streaming data, from several correlated sources, as input to a real time system which, without any a priori information, incrementally learns and admits each source into the system.
Resumo:
In document community support vector machines and naïve bayes classifier are known for their simplistic yet excellent performance. Normally the feature subsets used by these two approaches complement each other, however a little has been done to combine them. The essence of this paper is a linear classifier, very similar to these two. We propose a novel way of combining these two approaches, which synthesizes best of them into a hybrid model. We evaluate the proposed approach using 20ng dataset, and compare it with its counterparts. The efficacy of our results strongly corroborate the effectiveness of our approach.
Resumo:
For compressed sensing (CS), we develop a new scheme inspired by data fusion principles. In the proposed fusion based scheme, several CS reconstruction algorithms participate and they are executed in parallel, independently. The final estimate of the underlying sparse signal is derived by fusing the estimates obtained from the participating algorithms. We theoretically analyze this fusion based scheme and derive sufficient conditions for achieving a better reconstruction performance than any participating algorithm. Through simulations, we show that the proposed scheme has two specific advantages: 1) it provides good performance in a low dimensional measurement regime, and 2) it can deal with different statistical natures of the underlying sparse signals. The experimental results on real ECG signals shows that the proposed scheme demands fewer CS measurements for an approximate sparse signal reconstruction.