414 resultados para runtime bloat
Resumo:
多向主元分析(MPCA)是利用多变量统计方法从纷杂的海量数据信息中提取出能够准确表征数据信息的几个主元,并通过投影法来降低数据的维数,主要应用于间歇生产过程中.在实际的间歇生产过程中,由于各种原因导致各批次异步造成它们运行时间的不一致,而无法直接建立有效的统计模型,正交函数近似(OFA)是一种基于正交基的投影变换技术,通过对原始数据进行OFA处理后,可以用投影系数来描述原始数据所具有的特征,并且可以达到轨迹同步化和压缩数据量的目的.对OFA法进行了部分改进,并结合MPCA法对典型的间歇过程——青霉素发酵过程进行了仿真研究.结果表明,改进的OFA计算速度有了极大的提高,且改进的OFA-MPCA法能完好地对各批次进行同步、建模并得出准确的监视结果.
Resumo:
本文依据分布计算环境(DCE)中所提供的目录服务并结合开放分布处理(ODP)所提供的交易服务(TradingServices),提出了实现分布计算环境中对象互操作的联邦交易模型。给出了该交易模型在DCE中的实现结构模型。
Resumo:
This report describes Processor Coupling, a mechanism for controlling multiple ALUs on a single integrated circuit to exploit both instruction-level and inter-thread parallelism. A compiler statically schedules individual threads to discover available intra-thread instruction-level parallelism. The runtime scheduling mechanism interleaves threads, exploiting inter-thread parallelism to maintain high ALU utilization. ALUs are assigned to threads on a cycle byscycle basis, and several threads can be active concurrently. Simulation results show that Processor Coupling performs well both on single threaded and multi-threaded applications. The experiments address the effects of memory latencies, function unit latencies, and communication bandwidth between function units.
Resumo:
X. Wang, J. Yang, X. Teng, W. Xia, and R. Jensen. Feature Selection based on Rough Sets and Particle Swarm Optimization. Pattern Recognition Letters, vol. 28, no. 4, pp. 459-471, 2007.
Resumo:
R. Daly and Q. Shen. A Framework for the Scoring of Operators on the Search Space of Equivalence Classes of Bayesian Network Structures. Proceedings of the 2005 UK Workshop on Computational Intelligence, pages 67-74.
Resumo:
Q. Shen and R. Jensen, 'Approximation-based feature selection and application for algae population estimation,' Applied Intelligence, vol. 28, no. 2, pp. 167-181, 2008. Sponsorship: EPSRC RONO: EP/E058388/1
Resumo:
As distributed information services like the World Wide Web become increasingly popular on the Internet, problems of scale are clearly evident. A promising technique that addresses many of these problems is service (or document) replication. However, when a service is replicated, clients then need the additional ability to find a "good" provider of that service. In this paper we report on techniques for finding good service providers without a priori knowledge of server location or network topology. We consider the use of two principal metrics for measuring distance in the Internet: hops, and round-trip latency. We show that these two metrics yield very different results in practice. Surprisingly, we show data indicating that the number of hops between two hosts in the Internet is not strongly correlated to round-trip latency. Thus, the distance in hops between two hosts is not necessarily a good predictor of the expected latency of a document transfer. Instead of using known or measured distances in hops, we show that the extra cost at runtime incurred by dynamic latency measurement is well justified based on the resulting improved performance. In addition we show that selection based on dynamic latency measurement performs much better in practice that any static selection scheme. Finally, the difference between the distribution of hops and latencies is fundamental enough to suggest differences in algorithms for server replication. We show that conclusions drawn about service replication based on the distribution of hops need to be revised when the distribution of latencies is considered instead.
Resumo:
Many real world image analysis problems, such as face recognition and hand pose estimation, involve recognizing a large number of classes of objects or shapes. Large margin methods, such as AdaBoost and Support Vector Machines (SVMs), often provide competitive accuracy rates, but at the cost of evaluating a large number of binary classifiers, thus making it difficult to apply such methods when thousands or millions of classes need to be recognized. This thesis proposes a filter-and-refine framework, whereby, given a test pattern, a small number of candidate classes can be identified efficiently at the filter step, and computationally expensive large margin classifiers are used to evaluate these candidates at the refine step. Two different filtering methods are proposed, ClassMap and OVA-VS (One-vs.-All classification using Vector Search). ClassMap is an embedding-based method, works for both boosted classifiers and SVMs, and tends to map the patterns and their associated classes close to each other in a vector space. OVA-VS maps OVA classifiers and test patterns to vectors based on the weights and outputs of weak classifiers of the boosting scheme. At runtime, finding the strongest-responding OVA classifier becomes a classical vector search problem, where well-known methods can be used to gain efficiency. In our experiments, the proposed methods achieve significant speed-ups, in some cases up to two orders of magnitude, compared to exhaustive evaluation of all OVA classifiers. This was achieved in hand pose recognition and face recognition systems where the number of classes ranges from 535 to 48,600.
Resumo:
This thesis elaborates on the problem of preprocessing a large graph so that single-pair shortest-path queries can be answered quickly at runtime. Computing shortest paths is a well studied problem, but exact algorithms do not scale well to real-world huge graphs in applications that require very short response time. The focus is on approximate methods for distance estimation, in particular in landmarks-based distance indexing. This approach involves choosing some nodes as landmarks and computing (offline), for each node in the graph its embedding, i.e., the vector of its distances from all the landmarks. At runtime, when the distance between a pair of nodes is queried, it can be quickly estimated by combining the embeddings of the two nodes. Choosing optimal landmarks is shown to be hard and thus heuristic solutions are employed. Given a budget of memory for the index, which translates directly into a budget of landmarks, different landmark selection strategies can yield dramatically different results in terms of accuracy. A number of simple methods that scale well to large graphs are therefore developed and experimentally compared. The simplest methods choose central nodes of the graph, while the more elaborate ones select central nodes that are also far away from one another. The efficiency of the techniques presented in this thesis is tested experimentally using five different real world graphs with millions of edges; for a given accuracy, they require as much as 250 times less space than the current approach which considers selecting landmarks at random. Finally, they are applied in two important problems arising naturally in large-scale graphs, namely social search and community detection.
Resumo:
We study the problem of preprocessing a large graph so that point-to-point shortest-path queries can be answered very fast. Computing shortest paths is a well studied problem, but exact algorithms do not scale to huge graphs encountered on the web, social networks, and other applications. In this paper we focus on approximate methods for distance estimation, in particular using landmark-based distance indexing. This approach involves selecting a subset of nodes as landmarks and computing (offline) the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, we can estimate it quickly by combining the precomputed distances of the two nodes to the landmarks. We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. Given a budget of memory for the index, which translates directly into a budget of landmarks, different landmark selection strategies can yield dramatically different results in terms of accuracy. A number of simple methods that scale well to large graphs are therefore developed and experimentally compared. The simplest methods choose central nodes of the graph, while the more elaborate ones select central nodes that are also far away from one another. The efficiency of the suggested techniques is tested experimentally using five different real world graphs with millions of edges; for a given accuracy, they require as much as 250 times less space than the current approach in the literature which considers selecting landmarks at random. Finally, we study applications of our method in two problems arising naturally in large-scale networks, namely, social search and community detection.
Resumo:
This paper is centered around the design of a thread- and memory-safe language, primarily for the compilation of application-specific services for extensible operating systems. We describe various issues that have influenced the design of our language, called Cuckoo, that guarantees safety of programs with potentially asynchronous flows of control. Comparisons are drawn between Cuckoo and related software safety techniques, including Cyclone and software-based fault isolation (SFI), and performance results suggest our prototype compiler is capable of generating safe code that executes with low runtime overheads, even without potential code optimizations. Compared to Cyclone, Cuckoo is able to safely guard accesses to memory when programs are multithreaded. Similarly, Cuckoo is capable of enforcing memory safety in situations that are potentially troublesome for techniques such as SFI.
Resumo:
© 2015 IEEE.Although definition of single-program benchmarks is relatively straight-forward-a benchmark is a program plus a specific input-definition of multi-program benchmarks is more complex. Each program may have a different runtime and they may have different interactions depending on how they align with each other. While prior work has focused on sampling multiprogram benchmarks, little attention has been paid to defining the benchmarks in their entirety. In this work, we propose a four-tuple that formally defines multi-program benchmarks in a well-defined way. We then examine how four different classes of benchmarks created by varying the elements of this tuple align with real-world use-cases. We evaluate the impact of these variations on real hardware, and see drastic variations in results between different benchmarks constructed from the same programs. Notable differences include significant speedups versus slowdowns (e.g., +57% vs -5% or +26% vs -18%), and large differences in magnitude even when the results are in the same direction (e.g., 67% versus 11%).
Resumo:
Three paradigms for distributed-memory parallel computation that free the application programmer from the details of message passing are compared for an archetypal structured scientific computation -- a nonlinear, structured-grid partial differential equation boundary value problem -- using the same algorithm on the same hardware. All of the paradigms -- parallel languages represented by the Portland Group's HPF, (semi-)automated serial-to-parallel source-to-source translation represented by CAP-Tools from the University of Greenwich, and parallel libraries represented by Argonne's PETSc -- are found to be easy to use for this problem class, and all are reasonably effective in exploiting concurrency after a short learning curve. The level of involvement required by the application programmer under any paradigm includes specification of the data partitioning, corresponding to a geometrically simple decomposition of the domain of the PDE. Programming in SPMD style for the PETSc library requires writing only the routines that discretize the PDE and its Jacobian, managing subdomain-to-processor mappings (affine global-to-local index mappings), and interfacing to library solver routines. Programming for HPF requires a complete sequential implementation of the same algorithm as a starting point, introduction of concurrency through subdomain blocking (a task similar to the index mapping), and modest experimentation with rewriting loops to elucidate to the compiler the latent concurrency. Programming with CAPTools involves feeding the same sequential implementation to the CAPTools interactive parallelization system, and guiding the source-to-source code transformation by responding to various queries about quantities knowable only at runtime. Results representative of "the state of the practice" for a scaled sequence of structured grid problems are given on three of the most important contemporary high-performance platforms: the IBM SP, the SGI Origin 2000, and the CRAYY T3E.
Resumo:
This paper describes a highly flexible component architecture, primarily designed for automotive control systems, that supports distributed dynamically- configurable context-aware behaviour. The architecture enforces a separation of design-time and run-time concerns, enabling almost all decisions concerning runtime composition and adaptation to be deferred beyond deployment. Dynamic context management contributes to flexibility. The architecture is extensible, and can embed potentially many different self-management decision technologies simultaneously. The mechanism that implements the run-time configuration has been designed to be very robust, automatically and silently handling problems arising from the evaluation of self- management logic and ensuring that in the worst case the dynamic aspects of the system collapse down to static behavior in totally predictable ways.
Resumo:
Ocean color measured from satellites provides daily, global estimates of marine inherent optical properties (IOPs). Semi-analytical algorithms (SAAs) provide one mechanism for inverting the color of the water observed by the satellite into IOPs. While numerous SAAs exist, most are similarly constructed and few are appropriately parameterized for all water masses for all seasons. To initiate community-wide discussion of these limitations, NASA organized two workshops that deconstructed SAAs to identify similarities and uniqueness and to progress toward consensus on a unified SAA. This effort resulted in the development of the generalized IOP (GIOP) model software that allows for the construction of different SAAs at runtime by selection from an assortment of model parameterizations. As such, GIOP permits isolation and evaluation of specific modeling assumptions, construction of SAAs, development of regionally tuned SAAs, and execution of ensemble inversion modeling. Working groups associated with the workshops proposed a preliminary default configuration for GIOP (GIOP-DC), with alternative model parameterizations and features defined for subsequent evaluation. In this paper, we: (1) describe the theoretical basis of GIOP; (2) present GIOP-DC and verify its comparable performance to other popular SAAs using both in situ and synthetic data sets; and, (3) quantify the sensitivities of their output to their parameterization. We use the latter to develop a hierarchical sensitivity of SAAs to various model parameterizations, to identify components of SAAs that merit focus in future research, and to provide material for discussion on algorithm uncertainties and future emsemble applications.