990 resultados para Query Performance
Resumo:
The increasing diversity of the Internet has created a vast number of multilingual resources on the Web. A huge number of these documents are written in various languages other than English. Consequently, the demand for searching in non-English languages is growing exponentially. It is desirable that a search engine can search for information over collections of documents in other languages. This research investigates the techniques for developing high-quality Chinese information retrieval systems. A distinctive feature of Chinese text is that a Chinese document is a sequence of Chinese characters with no space or boundary between Chinese words. This feature makes Chinese information retrieval more difficult since a retrieved document which contains the query term as a sequence of Chinese characters may not be really relevant to the query since the query term (as a sequence Chinese characters) may not be a valid Chinese word in that documents. On the other hand, a document that is actually relevant may not be retrieved because it does not contain the query sequence but contains other relevant words. In this research, we propose two approaches to deal with the problems. In the first approach, we propose a hybrid Chinese information retrieval model by incorporating word-based techniques with the traditional character-based techniques. The aim of this approach is to investigate the influence of Chinese segmentation on the performance of Chinese information retrieval. Two ranking methods are proposed to rank retrieved documents based on the relevancy to the query calculated by combining character-based ranking and word-based ranking. Our experimental results show that Chinese segmentation can improve the performance of Chinese information retrieval, but the improvement is not significant if it incorporates only Chinese segmentation with the traditional character-based approach. In the second approach, we propose a novel query expansion method which applies text mining techniques in order to find the most relevant words to extend the query. Unlike most existing query expansion methods, which generally select the highly frequent indexing terms from the retrieved documents to expand the query. In our approach, we utilize text mining techniques to find patterns from the retrieved documents that highly correlate with the query term and then use the relevant words in the patterns to expand the original query. This research project develops and implements a Chinese information retrieval system for evaluating the proposed approaches. There are two stages in the experiments. The first stage is to investigate if high accuracy segmentation can make an improvement to Chinese information retrieval. In the second stage, a text mining based query expansion approach is implemented and a further experiment has been done to compare its performance with the standard Rocchio approach with the proposed text mining based query expansion method. The NTCIR5 Chinese collections are used in the experiments. The experiment results show that by incorporating the text mining based query expansion with the hybrid model, significant improvement has been achieved in both precision and recall assessments.
Resumo:
Recent studies on automatic new topic identification in Web search engine user sessions demonstrated that neural networks are successful in automatic new topic identification. However most of this work applied their new topic identification algorithms on data logs from a single search engine. In this study, we investigate whether the application of neural networks for automatic new topic identification are more successful on some search engines than others. Sample data logs from the Norwegian search engine FAST (currently owned by Overture) and Excite are used in this study. Findings of this study suggest that query logs with more topic shifts tend to provide more successful results on shift-based performance measures, whereas logs with more topic continuations tend to provide better results on continuation-based performance measures.
Resumo:
The quality of discovered features in relevance feedback (RF) is the key issue for effective search query. Most existing feedback methods do not carefully address the issue of selecting features for noise reduction. As a result, extracted noisy features can easily contribute to undesirable effectiveness. In this paper, we propose a novel feature extraction method for query formulation. This method first extract term association patterns in RF as knowledge for feature extraction. Negative RF is then used to improve the quality of the discovered knowledge. A novel information filtering (IF) model is developed to evaluate the proposed method. The experimental results conducted on Reuters Corpus Volume 1 and TREC topics confirm that the proposed model achieved encouraging performance compared to state-of-the-art IF models.
Resumo:
We study lazy structure sharing as a tool for optimizing equivalence testing on complex data types, We investigate a number of strategies for implementing lazy structure sharing and provide upper and lower bounds on their performance (how quickly they effect ideal configurations of our data structure). In most cases when the strategies are applied to a restricted case of the problem, the bounds provide nontrivial improvements over the naive linear-time equivalence-testing strategy that employs no optimization. Only one strategy, however, which employs path compression, seems promising for the most general case of the problem.
Resumo:
Workstation clusters equipped with high performance interconnect having programmable network processors facilitate interesting opportunities to enhance the performance of parallel application run on them. In this paper, we propose schemes where certain application level processing in parallel database query execution is performed on the network processor. We evaluate the performance of TPC-H queries executing on a high end cluster where all tuple processing is done on the host processor, using a timed Petri net model, and find that tuple processing costs on the host processor dominate the execution time. These results are validated using a small cluster. We therefore propose 4 schemes where certain tuple processing activity is offloaded to the network processor. The first 2 schemes offload the tuple splitting activity - computation to identify the node on which to process the tuples, resulting in an execution time speedup of 1.09 relative to the base scheme, but with I/O bus becoming the bottleneck resource. In the 3rd scheme in addition to offloading tuple processing activity, the disk and network interface are combined to avoid the I/O bus bottleneck, which results in speedups up to 1.16, but with high host processor utilization. Our 4th scheme where the network processor also performs apart of join operation along with the host processor, gives a speedup of 1.47 along with balanced system resource utilizations. Further we observe that the proposed schemes perform equally well even in a scaled architecture i.e., when the number of processors is increased from 2 to 64
Resumo:
To effectively support today’s global economy, database systems need to manage data in multiple languages simultaneously. While current database systems do support the storage and management of multilingual data, they are not capable of querying across different natural languages. To address this lacuna, we have recently proposed two cross-lingual functionalities, LexEQUAL[13] and SemEQUAL[14], for matching multilingual names and concepts, respectively. In this paper, we investigate the native implementation of these multilingual functionalities as first-class operators on relational engines. Specifically, we propose a new multilingual storage datatype, and an associated algebra of the multilingual operators on this datatype. These components have been successfully implemented in the PostgreSQL database system, including integration of the algebra with the query optimizer and inclusion of a metric index in the access layer. Our experiments demonstrate that the performance of the native implementation is up to two orders-of-magnitude faster than the corresponding outsidethe- server implementation. Further, these multilingual additions do not adversely impact the existing functionality and performance. To the best of our knowledge, our prototype represents the first practical implementation of a crosslingual database query engine.
Resumo:
Query-by-Example Spoken Term Detection (QbE STD) aims at retrieving data from a speech data repository given an acoustic query containing the term of interest as input. Nowadays, it has been receiving much interest due to the high volume of information stored in audio or audiovisual format. QbE STD differs from automatic speech recognition (ASR) and keyword spotting (KWS)/spoken term detection (STD) since ASR is interested in all the terms/words that appear in the speech signal and KWS/STD relies on a textual transcription of the search term to retrieve the speech data. This paper presents the systems submitted to the ALBAYZIN 2012 QbE STD evaluation held as a part of ALBAYZIN 2012 evaluation campaign within the context of the IberSPEECH 2012 Conference(a). The evaluation consists of retrieving the speech files that contain the input queries, indicating their start and end timestamps within the appropriate speech file. Evaluation is conducted on a Spanish spontaneous speech database containing a set of talks from MAVIR workshops(b), which amount at about 7 h of speech in total. We present the database metric systems submitted along with all results and some discussion. Four different research groups took part in the evaluation. Evaluation results show the difficulty of this task and the limited performance indicates there is still a lot of room for improvement. The best result is achieved by a dynamic time warping-based search over Gaussian posteriorgrams/posterior phoneme probabilities. This paper also compares the systems aiming at establishing the best technique dealing with that difficult task and looking for defining promising directions for this relatively novel task.
Resumo:
Personal communication devices are increasingly equipped with sensors that are able to collect and locally store information from their environs. The mobility of users carrying such devices, and hence the mobility of sensor readings in space and time, opens new horizons for interesting applications. In particular, we envision a system in which the collective sensing, storage and communication resources, and mobility of these devices could be leveraged to query the state of (possibly remote) neighborhoods. Such queries would have spatio-temporal constraints which must be met for the query answers to be useful. Using a simplified mobility model, we analytically quantify the benefits from cooperation (in terms of the system's ability to satisfy spatio-temporal constraints), which we show to go beyond simple space-time tradeoffs. In managing the limited storage resources of such cooperative systems, the goal should be to minimize the number of unsatisfiable spatio-temporal constraints. We show that Data Centric Storage (DCS), or "directed placement", is a viable approach for achieving this goal, but only when the underlying network is well connected. Alternatively, we propose, "amorphous placement", in which sensory samples are cached locally, and shuffling of cached samples is used to diffuse the sensory data throughout the whole network. We evaluate conditions under which directed versus amorphous placement strategies would be more efficient. These results lead us to propose a hybrid placement strategy, in which the spatio-temporal constraints associated with a sensory data type determine the most appropriate placement strategy for that data type. We perform an extensive simulation study to evaluate the performance of directed, amorphous, and hybrid placement protocols when applied to queries that are subject to timing constraints. Our results show that, directed placement is better for queries with moderately tight deadlines, whereas amorphous placement is better for queries with looser deadlines, and that under most operational conditions, the hybrid technique gives the best compromise.
Resumo:
We propose a novel admission control policy for database queries. Our methodology uses system measurements of CPU utilization and query backlogs to determine interference between queries in execution on the same database server. Query interference may arise due to the concurrent access of hardware and software resources and can affect performance in positive and negative ways. Specifically our admission control considers the mix of jobs in service and prioritizes the query classes consuming CPU resources more efficiently. The policy ignores I/O subsystems and is therefore highly appropriate for in-memory databases. We validate our approach in trace-driven simulation and show performance increases of query slowdowns and throughputs compared to first-come first-served and shortest expected processing time first scheduling. Simulation experiments are parameterized from system traces of a SAP HANA in-memory database installation with TPC-H type workloads. © 2012 IEEE.
Resumo:
The development of new technologies that use peer-to-peer networks grows every day, with the object to supply the need of sharing information, resources and services of databases around the world. Among them are the peer-to-peer databases that take advantage of peer-to-peer networks to manage distributed knowledge bases, allowing the sharing of information semantically related but syntactically heterogeneous. However, it is a challenge to ensure the efficient search for information without compromising the autonomy of each node and network flexibility, given the structural characteristics of these networks. On the other hand, some studies propose the use of ontology semantics by assigning standardized categorization of information. The main original contribution of this work is the approach of this problem with a proposal for optimization of queries supported by the Ant Colony algorithm and classification though ontologies. The results show that this strategy enables the semantic support to the searches in peer-to-peer databases, aiming to expand the results without compromising network performance. © 2011 IEEE.
Resumo:
[EN] The exon-1 of the androgen receptor (AR) gene contains two repeat length polymorphisms which modify either the amount of AR protein inside the cell (GGN(n), polyglycine) or its transcriptional activity (CAG(n), polyglutamine). Shorter CAG and/or GGN repeats provide stronger androgen signalling and vice versa. To test the hypothesis that CAG and GGN repeat AR polymorphisms affect muscle mass and various variables of muscular strength phenotype traits, the length of CAG and GGN repeats was determined by PCR and fragment analysis and confirmed by DNA sequencing of selected samples in 282 men (28.6 +/- 7.6 years). Individuals were grouped as CAG short (CAG(S)) if harbouring repeat lengths of 21. GGN was considered short (GGN(S)) or long (GGN(L)) if GGN 23, respectively. No significant differences in lean body mass or fitness were observed between the CAG(S) and CAG(L) groups, or between GGN(S) and GGN(L) groups, but a trend for a correlation was found for the GGN repeat and lean mass of the extremities (r=-0.11, p=0.06). In summary, the lengths of CAG and GGN repeat of the AR gene do not appear to influence lean mass or fitness in young men.
Resumo:
[EN] The aim of this study was to determine the influence of activity performed during the recovery period on the aerobic and anaerobic energy yield, as well as on performance, during high-intensity intermittent exercise (HIT). Ten physical education students participated in the study. First they underwent an incremental exercise test to assess their maximal power output (Wmax) and VO2max. On subsequent days they performed three different HITs. Each HIT consisted of four cycling bouts until exhaustion at 110% Wmax. Recovery periods of 5 min were allowed between bouts. HITs differed in the kind of activity performed during the recovery periods: pedaling at 20% VO2max (HITA), stretching exercises, or lying supine. Performance was 3-4% and aerobic energy yield was 6-8% (both p < 0.05) higher during the HITA than during the other two kinds of HIT. The greater contribution of aerobic metabolism to the energy yield during the high-intensity exercise bouts with active recovery was due to faster VO2 kinetics (p< 0.01) and a higher VO2peak during the exercise bouts preceded by active recovery (p < 0.05). In contrast, the anaerobic energy yield (oxygen deficit and peak blood lactate concentrations) was similar in all HITs. Therefore, this study shows that active recovery facilitates performance by increasing aerobic contribution to the whole energy yield turnover during high-intensity intermittent exercise.
Anaerobic energy provision does not limit Wingate exercise performance in endurance-trained cyclists
Resumo:
[EN] The aim of this study was to evaluate the effects of severe acute hypoxia on exercise performance and metabolism during 30-s Wingate tests. Five endurance- (E) and five sprint- (S) trained track cyclists from the Spanish National Team performed 30-s Wingate tests in normoxia and hypoxia (inspired O(2) fraction = 0.10). Oxygen deficit was estimated from submaximal cycling economy tests by use of a nonlinear model. E cyclists showed higher maximal O(2) uptake than S (72 +/- 1 and 62 +/- 2 ml x kg(-1) x min(-1), P < 0.05). S cyclists achieved higher peak and mean power output, and 33% larger oxygen deficit than E (P < 0.05). During the Wingate test in normoxia, S relied more on anaerobic energy sources than E (P < 0.05); however, S showed a larger fatigue index in both conditions (P < 0.05). Compared with normoxia, hypoxia lowered O(2) uptake by 16% in E and S (P < 0.05). Peak power output, fatigue index, and exercise femoral vein blood lactate concentration were not altered by hypoxia in any group. Endurance cyclists, unlike S, maintained their mean power output in hypoxia by increasing their anaerobic energy production, as shown by 7% greater oxygen deficit and 11% higher postexercise lactate concentration. In conclusion, performance during 30-s Wingate tests in severe acute hypoxia is maintained or barely reduced owing to the enhancement of the anaerobic energy release. The effect of severe acute hypoxia on supramaximal exercise performance depends on training background.
Resumo:
Multiresolution Triangular Mesh (MTM) models are widely used to improve the performance of large terrain visualization by replacing the original model with a simplified one. MTM models, which consist of both original and simplified data, are commonly stored in spatial database systems due to their size. The relatively slow access speed of disks makes data retrieval the bottleneck of such terrain visualization systems. Existing spatial access methods proposed to address this problem rely on main-memory MTM models, which leads to significant overhead during query processing. In this paper, we approach the problem from a new perspective and propose a novel MTM called direct mesh that is designed specifically for secondary storage. It supports available indexing methods natively and requires no modification to MTM structure. Experiment results, which are based on two real-world data sets, show an average performance improvement of 5-10 times over the existing methods.