980 resultados para Information processing


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Quantile computation has many applications including data mining and financial data analysis. It has been shown that an is an element of-approximate summary can be maintained so that, given a quantile query d (phi, is an element of), the data item at rank [phi N] may be approximately obtained within the rank error precision is an element of N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different phi and is an element of poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In many advanced applications, data are described by multiple high-dimensional features. Moreover, different queries may weight these features differently; some may not even specify all the features. In this paper, we propose our solution to support efficient query processing in these applications. We devise a novel representation that compactly captures f features into two components: The first component is a 2D vector that reflects a distance range ( minimum and maximum values) of the f features with respect to a reference point ( the center of the space) in a metric space and the second component is a bit signature, with two bits per dimension, obtained by analyzing each feature's descending energy histogram. This representation enables two levels of filtering: The first component prunes away points that do not share similar distance ranges, while the bit signature filters away points based on the dimensions of the relevant features. Moreover, the representation facilitates the use of a single index structure to further speed up processing. We employ the classical B+-tree for this purpose. We also propose a KNN search algorithm that exploits the access orders of critical dimensions of highly selective features and partial distances to prune the search space more effectively. Our extensive experiments on both real-life and synthetic data sets show that the proposed solution offers significant performance advantages over sequential scan and retrieval methods using single and multiple VA-files.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A progressive spatial query retrieves spatial data based on previous queries (e.g., to fetch data in a more restricted area with higher resolution). A direct query, on the other side, is defined as an isolated window query. A multi-resolution spatial database system should support both progressive queries and traditional direct queries. It is conceptually challenging to support both types of query at the same time, as direct queries favour location-based data clustering, whereas progressive queries require fragmented data clustered by resolutions. Two new scaleless data structures are proposed in this paper. Experimental results using both synthetic and real world datasets demonstrate that the query processing time based on the new multiresolution approaches is comparable and often better than multi-representation data structures for both types of queries.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Spatial data are particularly useful in mobile environments. However, due to the low bandwidth of most wireless networks, developing large spatial database applications becomes a challenging process. In this paper, we provide the first attempt to combine two important techniques, multiresolution spatial data structure and semantic caching, towards efficient spatial query processing in mobile environments. Based on the study of the characteristics of multiresolution spatial data (MSD) and multiresolution spatial query, we propose a new semantic caching model called Multiresolution Semantic Caching (MSC) for caching MSD in mobile environments. MSC enriches the traditional three-category query processing in semantic cache to five categories, thus improving the performance in three ways: 1) a reduction in the amount and complexity of the remainder queries; 2) the redundant transmission of spatial data already residing in a cache is avoided; 3) a provision for satisfactory answers before 100% query results have been transmitted to the client side. Our extensive experiments on a very large and complex real spatial database show that MSC outperforms the traditional semantic caching models significantly

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A k-NN query finds the k nearest-neighbors of a given point from a point database. When it is sufficient to measure object distance using the Euclidian distance, the key to efficient k-NN query processing is to fetch and check the distances of a minimum number of points from the database. For many applications, such as vehicle movement along road networks or rover and animal movement along terrain surfaces, the distance is only meaningful when it is along a valid movement path. For this type of k-NN queries, the focus of efficient query processing is to minimize the cost of computing distances using the environment data (such as the road network data and the terrain data), which can be several orders of magnitude larger than that of the point data. Efficient processing of k-NN queries based on the Euclidian distance or the road network distance has been investigated extensively in the past. In this paper, we investigate the problem of surface k-NN query processing, where the distance is calculated from the shortest path along a terrain surface. This problem is very challenging, as the terrain data can be very large and the computational cost of finding shortest paths is very high. We propose an efficient solution based on multiresolution terrain models. Our approach eliminates the need of costly process of finding shortest paths by ranking objects using estimated lower and upper bounds of distance on multiresolution terrain models.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Domain specific information retrieval has become in demand. Not only domain experts, but also average non-expert users are interested in searching domain specific (e.g., medical and health) information from online resources. However, a typical problem to average users is that the search results are always a mixture of documents with different levels of readability. Non-expert users may want to see documents with higher readability on the top of the list. Consequently the search results need to be re-ranked in a descending order of readability. It is often not practical for domain experts to manually label the readability of documents for large databases. Computational models of readability needs to be investigated. However, traditional readability formulas are designed for general purpose text and insufficient to deal with technical materials for domain specific information retrieval. More advanced algorithms such as textual coherence model are computationally expensive for re-ranking a large number of retrieved documents. In this paper, we propose an effective and computationally tractable concept-based model of text readability. In addition to textual genres of a document, our model also takes into account domain specific knowledge, i.e., how the domain-specific concepts contained in the document affect the document’s readability. Three major readability formulas are proposed and applied to health and medical information retrieval. Experimental results show that our proposed readability formulas lead to remarkable improvements in terms of correlation with users’ readability ratings over four traditional readability measures.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We propose a novel all-optical signal processor for use at a return-to-zero receiver utilising loop mirror intensity filtering and nonlinear pulse broadening in normal dispersion fibre. The device offers reamplification and cleaning up of the optical signals, and phase margin improvement. The efficiency of the technique is demonstrated by application to 40 Gbit/s data transmission.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Modern managers are under tremendous pressure in attempting to fulfil a profoundly complex managerial task, that of handling information resources. Information management, an intricate process requiring a high measure of human cognition and discernment, involves matching a manager's lack of information processing capacity against his information needs, with voluminous information at his disposal. The nature of the task will undoubtedly become more complex in the case of a large organisation. Management of large-scale organisations is therefore an exceedingly challenging prospect for any manager to be faced with. A system that supports executive information needs will help reduce managerial and informational mismatches. In the context of the Malaysian public sector, the task of overall management lies with the Prime Minister and the Cabinet. The Prime Minister's Office is presently supporting the Prime Minister's information and managerial needs, although not without various shortcomings. The rigid formalised structure predominant of the Malaysian public sector, so opposed to dynamic treatment of problematic issues as faced by that sector, further escalates the managerial and organisational problem of coping with a state of complexity. The principal features of the research are twofold: the development of a methodology for diagnosing the problem organisation' and the design of an office system. The methodological development is done in the context of the Malaysian public sector, and aims at understanding the complexity of its communication and control situation. The outcome is a viable model of the public sector. `Design', on the other hand, is developing a syntax or language for office systems which provides an alternative to current views on office systems. The design is done with reference to, rather than for, the Prime Minister's Office. The desirable outcome will be an office model called Office Communication and Information System (OCIS).

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Improving bit error rates in optical communication systems is a difficult and important problem. The error correction must take place at high speed and be extremely accurate. We show the feasibility of using hardware implementable machine learning techniques. This may enable some error correction at the speed required.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Huge advertising budgets are invested by firms to reach and convince potential consumers to buy their products. To optimize these investments, it is fundamental not only to ensure that appropriate consumers will be reached, but also that they will be in appropriate reception conditions. Marketing research has focused on the way consumers react to advertising, as well as on some individual and contextual factors that could mediate or moderate the ad impact on consumers (e.g. motivation and ability to process information or attitudes toward advertising). Nevertheless, a factor that potentially influences consumers’ advertising reactions has not yet been studied in marketing research: fatigue. Fatigue can yet impact key variables of advertising processing, such as cognitive resources availability (Lieury 2004). Fatigue is felt when the body warns to stop an activity (or inactivity) to have some rest, allowing the individual to compensate for fatigue effects. Dittner et al. (2004) defines it as “the state of weariness following a period of exertion, mental or physical, characterized by a decreased capacity for work and reduced efficiency to respond to stimuli.’’ It signals that resources will lack if we continue with the ongoing activity. According to Schmidtke (1969), fatigue leads to troubles in information reception, in perception, in coordination, in attention getting, in concentration and in thinking. In addition, for Markle (1984) fatigue generates a decrease in memory, and in communication ability, whereas it increases time reaction, and number of errors. Thus, fatigue may have large effects on advertising processing. We suggest that fatigue determines the level of available resources. Some research about consumer responses to advertising claim that complexity is a fundamental element to take into consideration. Complexity determines the cognitive efforts the consumer must provide to understand the message (Putrevu et al. 2004). Thus, we suggest that complexity determines the level of required resources. To study this complex question about need and provision of cognitive resources, we draw upon Resource Matching Theory. Anand and Sternthal (1989, 1990) are the first to state the Resource Matching principle, saying that an ad is most persuasive when the resources required to process it match the resources the viewer is willing and able to provide. They show that when the required resources exceed those available, the message is not entirely processed by the consumer. And when there are too many available resources comparing to those required, the viewer elaborates critical or unrelated thoughts. According to the Resource Matching theory, the level of resource demanded by an ad can be high or low, and is mostly determined by the ad’s layout (Peracchio and Myers-Levy, 1997). We manipulate the level of required resources using three levels of ad complexity (low – high – extremely high). On the other side, the resource availability of an ad viewer is determined by lots of contextual and individual variables. We manipulate the level of available resources using two levels of fatigue (low – high). Tired viewers want to limit the processing effort to minimal resource requirements by making heuristics, forming overall impression at first glance. It will be easier for them to decode the message when ads are very simple. On the contrary, the most effective ads for viewers who are not tired are complex enough to draw their attention and fully use their resources. They will use more analytical strategies, looking at the details of the ad. However, if ads are too complex, they will be too difficult to understand. The viewer will be discouraged to process information and will overlook the ad. The objective of our research is to study fatigue as a moderating variable of advertising information processing. We run two experimental studies to assess the effect of fatigue on visual strategies, comprehension, persuasion and memorization. In study 1, thirty-five undergraduate students enrolled in a marketing research course participated in the experiment. The experimental design is 2 (tiredness level: between subjects) x 3 (ad complexity level: within subjects). Participants were randomly assigned a schedule time (morning: 8-10 am or evening: 10-12 pm) to perform the experiment. We chose to test subjects at various moments of the day to obtain maximum variance in their fatigue level. We use Morningness / Eveningness tendency of participants (Horne & Ostberg, 1976) as a control variable. We assess fatigue level using subjective measures - questionnaire with fatigue scales - and objective measures - reaction time and number of errors. Regarding complexity levels, we have designed our own ads in order to keep aspects other than complexity equal. We ran a pretest using the Resource Demands scale (Keller and Bloch 1997) and by rating them on complexity like Morrison and Dainoff (1972) to check for our complexity manipulation. We found three significantly different levels. After having completed the fatigue scales, participants are asked to view the ads on a screen, while their eye movements are recorded by the eye-tracker. Eye-tracking allows us to find out patterns of visual attention (Pieters and Warlop 1999). We are then able to infer specific respondents’ visual strategies according to their level of fatigue. Comprehension is assessed with a comprehension test. We collect measures of attitude change for persuasion and measures of recall and recognition at various points of time for memorization. Once the effect of fatigue will be determined across the student population, it is interesting to account for individual differences in fatigue severity and perception. Therefore, we run study 2, which is similar to the previous one except for the design: time of day is now within-subjects and complexity becomes between-subjects