955 resultados para Extended random set
Resumo:
With the overwhelming increase in the amount of data on the web and data bases, many text mining techniques have been proposed for mining useful patterns in text documents. Extracting closed sequential patterns using the Pattern Taxonomy Model (PTM) is one of the pruning methods to remove noisy, inconsistent, and redundant patterns. However, PTM model treats each extracted pattern as whole without considering included terms, which could affect the quality of extracted patterns. This paper propose an innovative and effective method that extends the random set to accurately weigh patterns based on their distribution in the documents and their terms distribution in patterns. Then, the proposed approach will find the specific closed sequential patterns (SCSP) based on the new calculated weight. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms other state-of-the-art methods in different popular measures.
Resumo:
In recent years, the econometrics literature has shown a growing interest in the study of partially identified models, in which the object of economic and statistical interest is a set rather than a point. The characterization of this set and the development of consistent estimators and inference procedures for it with desirable properties are the main goals of partial identification analysis. This review introduces the fundamental tools of the theory of random sets, which brings together elements of topology, convex geometry, and probability theory to develop a coherent mathematical framework to analyze random elements whose realizations are sets. It then elucidates how these tools have been fruitfully applied in econometrics to reach the goals of partial identification analysis.
Resumo:
Guaranteeing the quality of extracted features that describe relevant knowledge to users or topics is a challenge because of the large number of extracted features. Most popular existing term-based feature selection methods suffer from noisy feature extraction, which is irrelevant to the user needs (noisy). One popular method is to extract phrases or n-grams to describe the relevant knowledge. However, extracted n-grams and phrases usually contain a lot of noise. This paper proposes a method for reducing the noise in n-grams. The method first extracts more specific features (terms) to remove noisy features. The method then uses an extended random set to accurately weight n-grams based on their distribution in the documents and their terms distribution in n-grams. The proposed approach not only reduces the number of extracted n-grams but also improves the performance. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms the state-of-art methods underpinned by Okapi BM25, tf*idf and Rocchio.
Resumo:
An Internet portal accessible at www.gdb.unibe.ch has been set up to automatically generate color-coded similarity maps of the ChEMBL database in relation to up to two sets of active compounds taken from the enhanced Directory of Useful Decoys (eDUD), a random set of molecules, or up to two sets of user-defined reference molecules. These maps visualize the relationships between the selected compounds and ChEMBL in six different high dimensional chemical spaces, namely MQN (42-D molecular quantum numbers), SMIfp (34-D SMILES fingerprint), APfp (20-D shape fingerprint), Xfp (55-D pharmacophore fingerprint), Sfp (1024-bit substructure fingerprint), and ECfp4 (1024-bit extended connectivity fingerprint). The maps are supplied in form of Java based desktop applications called “similarity mapplets” allowing interactive content browsing and linked to a “Multifingerprint Browser for ChEMBL” (also accessible directly at www.gdb.unibe.ch) to perform nearest neighbor searches. One can obtain six similarity mapplets of ChEMBL relative to random reference compounds, 606 similarity mapplets relative to single eDUD active sets, 30 300 similarity mapplets relative to pairs of eDUD active sets, and any number of similarity mapplets relative to user-defined reference sets to help visualize the structural diversity of compound series in drug optimization projects and their relationship to other known bioactive compounds.
Resumo:
In the systematic study of amine … LiCl [amines = NH3, CH3NH2, (CH3)2NH] complexes the possibility of an ion-pair structure and the effect of methylation on the stabilization energy is investigated. ΔEis evaluated by the SCF/4-31G method and augmented by the approximate dispersion energy calculated perturbationally. The interaction energy decreases with the increasing number of methyl groups in the amine. The dispersion energy plays a negligible role in the stabilization of complexes. None of the systems studied are ion pairs; their Li bonds are of a so-called molecular type. Due to the divergence of the multipole expansion, the attempt to correct the 4-31G stabilization energies via the electrostatic energy fails. The relative order of the ΔE in the series of complexes is verified instead in the extended basis set calculation. The lithium bonds are compared with their H-bonded analogues.
Resumo:
The Fourier transform Raman and infrared (IR) spectra of the Ceramide 3 (CER3) have been recorded in the regions 200-3500 cm(-1) and 680-4000 cm(-1), respectively. We have calculated the equilibrium geometry, harmonic vibrational wavenumbers, electrostatic potential surfaces, absolute Raman scattering activities and IR absorption intensities by the density functional theory with B3LYP functionals having extended basis set 6-311G. This work is undertaken to study the vibrational spectra of CER3 completely and to identify the various normal modes with better wavenumber accuracy. Good consistency is found between the calculated results and experimental data for the IR and Raman spectra.
Resumo:
We present a novel filtering algorithm for tracking multiple clusters of coordinated objects. Based on a Markov chain Monte Carlo (MCMC) mechanism, the new algorithm propagates a discrete approximation of the underlying filtering density. A dynamic Gaussian mixture model is utilized for representing the time-varying clustering structure. This involves point process formulations of typical behavioral moves such as birth and death of clusters as well as merging and splitting. For handling complex, possibly large scale scenarios, the sampling efficiency of the basic MCMC scheme is enhanced via the use of a Metropolis within Gibbs particle refinement step. As the proposed methodology essentially involves random set representations, a new type of estimator, termed the probability hypothesis density surface (PHDS), is derived for computing point estimates. It is further proved that this estimator is optimal in the sense of the mean relative entropy. Finally, the algorithm's performance is assessed and demonstrated in both synthetic and realistic tracking scenarios. © 2012 Elsevier Ltd. All rights reserved.
Resumo:
The quality of available network connections can often have a large impact on the performance of distributed applications. For example, document transfer applications such as FTP, Gopher and the World Wide Web suffer increased response times as a result of network congestion. For these applications, the document transfer time is directly related to the available bandwidth of the connection. Available bandwidth depends on two things: 1) the underlying capacity of the path from client to server, which is limited by the bottleneck link; and 2) the amount of other traffic competing for links on the path. If measurements of these quantities were available to the application, the current utilization of connections could be calculated. Network utilization could then be used as a basis for selection from a set of alternative connections or servers, thus providing reduced response time. Such a dynamic server selection scheme would be especially important in a mobile computing environment in which the set of available servers is frequently changing. In order to provide these measurements at the application level, we introduce two tools: bprobe, which provides an estimate of the uncongested bandwidth of a path; and cprobe, which gives an estimate of the current congestion along a path. These two measures may be used in combination to provide the application with an estimate of available bandwidth between server and client thereby enabling application-level congestion avoidance. In this paper we discuss the design and implementation of our probe tools, specifically illustrating the techniques used to achieve accuracy and robustness. We present validation studies for both tools which demonstrate their reliability in the face of actual Internet conditions; and we give results of a survey of available bandwidth to a random set of WWW servers as a sample application of our probe technique. We conclude with descriptions of other applications of our measurement tools, several of which are currently under development.
Resumo:
We consider the problem of performing topological optimizations of distributed hash tables. Such hash tables include Chord and Tapestry and are a popular building block for distributed applications. Optimizing topologies over one dimensional hash spaces is particularly difficult as the higher dimensionality of the underlying network makes close fits unlikely. Instead, current schemes are limited to heuristically performing local optimizations finding the best of small random set of peers. We propose a new class of topology optimizations based on the existence of clusters of close overlay members within the underlying network. By constructing additional overlays for each cluster, a significant portion of the search procedure can be performed within the local cluster with a corresponding reduction in the search time. Finally, we discuss the effects of these additional overlays on spatial locality and other load balancing scheme.
Resumo:
Background and purpose: We are developing a technique for highly focused vocal cord irradiation in early glottic carcinoma to optimally treat a target volume confined to a single cord. This technique, in contrast with the conventional methods, aims at sparing the healthy vocal cord. As such a technique requires sub-mm daily targeting accuracy to be effective, we investigate the accuracy achievable with on-line kV-cone beam CT (CBCT) corrections. Materials and methods: CBCT scans were obtained in 10 early glottic cancer patients in each treatment fraction. The grey value registration available in X-ray volume imaging (XVI) software (Elekta, Synergy) was applied to a volume of interest encompassing the thyroid cartilage. After application of the thus derived corrections, residue displacements with respect to the planning CT scan were measured at clearly identifiable relevant landmarks. The intra- and inter-observer variations were also measured. Results: While before correction the systematic displacements of the vocal cords were as large as 2.4 ± 3.3 mm (cranial-caudal population mean ± SD Σ), daily CBCT registration and correction reduced these values to less than 0.2 ± 0.5 mm in all directions. Random positioning errors (SD σ) were reduced to less than 1 mm. Correcting only for translations and not for rotations did not appreciably affect this accuracy. The residue random displacements partly stem from intra-observer variations (SD = 0.2-0.6 mm). Conclusion: The use of CBCT for daily image guidance in combination with standard mask fixation reduced systematic and random set-up errors of the vocal cords to <1 mm prior to the delivery of each fraction dose. Thus, this facilitates the high targeting precision required for a single vocal cord irradiation. © 2009 Elsevier Ireland Ltd. All rights reserved.
Resumo:
What constitutes a baseline level of success for protein fold recognition methods? As fold recognition benchmarks are often presented without any thought to the results that might be expected from a purely random set of predictions, an analysis of fold recognition baselines is long overdue. Given varying amounts of basic information about a protein—ranging from the length of the sequence to a knowledge of its secondary structure—to what extent can the fold be determined by intelligent guesswork? Can simple methods that make use of secondary structure information assign folds more accurately than purely random methods and could these methods be used to construct viable hierarchical classifications?
Resumo:
This paper presents a new method to calculate sky view factors (SVFs) from high resolution urban digital elevation models using a shadow casting algorithm. By utilizing weighted annuli to derive SVF from hemispherical images, the distance light source positions can be predefined and uniformly spread over the whole hemisphere, whereas another method applies a random set of light source positions with a cosine-weighted distribution of sun altitude angles. The 2 methods have similar results based on a large number of SVF images. However, when comparing variations at pixel level between an image generated using the new method presented in this paper with the image from the random method, anisotropic patterns occur. The absolute mean difference between the 2 methods is 0.002 ranging up to 0.040. The maximum difference can be as much as 0.122. Since SVF is a geometrically derived parameter, the anisotropic errors created by the random method must be considered as significant.