973 resultados para Compressed text search
Resumo:
Models are becoming increasingly important in the software development process. As a consequence, the number of models being used is increasing, and so is the need for efficient mechanisms to search them. Various existing search engines could be used for this purpose, but they lack features to properly search models, mainly because they are strongly focused on text-based search. This paper presents Moogle, a model search engine that uses metamodeling information to create richer search indexes and to allow more complex queries to be performed. The paper also presents the results of an evaluation of Moogle, which showed that the metamodel information improves the accuracy of the search.
Resumo:
The nature of the dark matter in the Universe is one of the greatest mysteries in modern astronomy. The neutralino is a nonbaryonic dark matter candidate in minimal supersymmetric extensions to the standard model of particle physics. If the dark matter halo of our galaxy is made up of neutralinos some would become gravitationally trapped inside massive bodies like the Earth. Their pair-wise annihilation produces neutrinos that can be detected by neutrino experiments looking in the direction of the centre of the Earth. The AMANDA neutrino telescope, currently the largest in the world, consists of an array of light detectors buried deep in the Antarctic glacier at the geographical South Pole. The extremely transparent ice acts as a Cherenkov medium for muons passing the array and using the timing information of detected photons it is possible to reconstruct the muon direction. A search has been performed for nearly vertically upgoing neutrino induced muons with AMANDA-B10 data taken over the three year period 1997-99. No excess above the atmospheric neutrino background expectation was found. Upper limits at the 90 % confidence level has been set on the annihilation rate of neutralinos at the centre of the Earth and on the muon flux induced by neutrinos created by the annihilation products.
Resumo:
Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.
Resumo:
Breast cancer is the most common cancer among women. Tamoxifen is the preferred drug for estrogen receptor-positive breast cancer treatment, yet many of these cancers are intrinsically resistant to tamoxifen or acquire resistance during treatment. Therefore, scientists are searching for breast cancer drugs that have different molecular targets. Previous work revealed that 8-mer and cyclic 9-mer peptides inhibit breast cancer in mouse and rat model systems, interacting with an unknown receptor, while peptides smaller than eight amino acids did not inhibit breast cancer. We have shown that the use of replica exchange molecular dynamics predicts structure and dynamics of active peptides, leading to the discovery of smaller peptides with full biological activity. These simulations identified smaller peptide analogs with a conserved turn, a β-turn formed in the larger peptides. These analogs inhibit estrogen-dependent cell growth in a mouse uterine growth assay, a test showing reliable correlation with human breast cancer inhibition. We outline the computational methods that were tried and used with the experimental information that led to the successful completion of this research.
Resumo:
Gaussian-3 and MP2/aug-cc-pVnZ methods have been used to calculate geometries and thermochemistry of CS2(H2O)n, where n = 1–4. An extensive molecular dynamics search followed by optimization using these two methods located two dimers, six trimers, six tetramers, and two pentamers. The MP2/aug-cc-pVDZ structure matched best with the experimental result for the CS2(H2O) dimer, showing that diffuse functions are necessary to model the interactions found in this complex. For larger CS2(H2O)n clusters, the MP2/aug-cc-pVDZ minima are significantly different from the MP2(full)/6-31G* structures, revealing that the G3 model chemistry is not suitable for investigation of sulfur containing van der Waals complexes. Based on the MP2/aug-cc-pVTZ free energies, the concentration of saturated water in the atmosphere and the average amount of CS2 in the atmosphere, the concentrations of these clusters are predicted to be on the order of 105CS2(H2O) clusters∙cm−3 and 102 CS2(H2O)2 clusters∙cm−3 at 298.15 K. The MP2/aug-cc-pVDZ scaled harmonic and anharmonic frequencies of the most abundant dimer cluster at 298 K are presented, along with the MP2/aug-cc-pVDZ scaled harmonic frequencies for the CS2(H2O)n structures predicted to be present in a low-temperature molecular beam experiment.
Resumo:
The Gaussian-3 (G3) model chemistry method has been used to calculate the relative ΔG° values for all possible conformers of neutral clusters of water, (H2O)n, where n = 3−5. A complete 12-fold conformational search around each hydrogen bond produced 144, 1728, and 20 736 initial starting structures of the water trimer, tetramer, and pentamer. These structures were optimized with PM3, followed by HF/6-31G* optimization, and then with the G3 model chemistry. Only two trimers are present on the G3 potential energy hypersurface. We identified 5 tetramers and 10 pentamers on the potential energy and free-energy hypersurfaces at 298 K. None of these 17 structures were linear; all linear starting models folded into cyclic or three-dimensional structures. The cyclic pentamer is the most stable isomer at 298 K. On the basis of this and previous studies, we expect the cyclic tetramers and pentamers to be the most significant cyclic water clusters in the atmosphere.
Resumo:
In 2011, researchers at Bucknell University and Illinois Wesleyan University compared the search efficacy of Serial Solutions Summon, EBSCO Discovery Service, Google Scholar and conventional library databases. Using a mixed-methods approach, qualitative and quantitative data was gathered on students’ usage of these tools. Regardless of the search system, students exhibited a marked inability to effectively evaluate sources and a heavy reliance on default search settings. On the quantitative benchmarks measured by this study, the EBSCO Discovery Service tool outperformed the other search systems in almost every category. This article describes these results and makes recommendations for libraries considering these tools.
Resumo:
Spectrum sensing is currently one of the most challenging design problems in cognitive radio. A robust spectrum sensing technique is important in allowing implementation of a practical dynamic spectrum access in noisy and interference uncertain environments. In addition, it is desired to minimize the sensing time, while meeting the stringent cognitive radio application requirements. To cope with this challenge, cyclic spectrum sensing techniques have been proposed. However, such techniques require very high sampling rates in the wideband regime and thus are costly in hardware implementation and power consumption. In this thesis the concept of compressed sensing is applied to circumvent this problem by utilizing the sparsity of the two-dimensional cyclic spectrum. Compressive sampling is used to reduce the sampling rate and a recovery method is developed for re- constructing the sparse cyclic spectrum from the compressed samples. The reconstruction solution used, exploits the sparsity structure in the two-dimensional cyclic spectrum do-main which is different from conventional compressed sensing techniques for vector-form sparse signals. The entire wideband cyclic spectrum is reconstructed from sub-Nyquist-rate samples for simultaneous detection of multiple signal sources. After the cyclic spectrum recovery two methods are proposed to make spectral occupancy decisions from the recovered cyclic spectrum: a band-by-band multi-cycle detector which works for all modulation schemes, and a fast and simple thresholding method that works for Binary Phase Shift Keying (BPSK) signals only. In addition a method for recovering the power spectrum of stationary signals is developed as a special case. Simulation results demonstrate that the proposed spectrum sensing algorithms can significantly reduce sampling rate without sacrifcing performance. The robustness of the algorithms to the noise uncertainty of the wireless channel is also shown.
Resumo:
One of the scarcest resources in the wireless communication system is the limited frequency spectrum. Many wireless communication systems are hindered by the bandwidth limitation and are not able to provide high speed communication. However, Ultra-wideband (UWB) communication promises a high speed communication because of its very wide bandwidth of 7.5GHz (3.1GHz-10.6GHz). The unprecedented bandwidth promises many advantages for the 21st century wireless communication system. However, UWB has many hardware challenges, such as a very high speed sampling rate requirement for analog to digital conversion, channel estimation, and implementation challenges. In this thesis, a new method is proposed using compressed sensing (CS), a mathematical concept of sub-Nyquist rate sampling, to reduce the hardware complexity of the system. The method takes advantage of the unique signal structure of the UWB symbol. Also, a new digital implementation method for CS based UWB is proposed. Lastly, a comparative study is done of the CS-UWB hardware implementation methods. Simulation results show that the application of compressed sensing using the proposed method significantly reduces the number of hardware complexity compared to the conventional method of using compressed sensing based UWB receiver.