19 resultados para inference algorithms
em Boston University Digital Common
Resumo:
Handshape is a key articulatory parameter in sign language, and thus handshape recognition from signing video is essential for sign recognition and retrieval. Handshape transitions within monomorphemic lexical signs (the largest class of signs in signed languages) are governed by phonological rules. For example, such transitions normally involve either closing or opening of the hand (i.e., to exclusively use either folding or unfolding of the palm and one or more fingers). Furthermore, akin to allophonic variations in spoken languages, both inter- and intra- signer variations in the production of specific handshapes are observed. We propose a Bayesian network formulation to exploit handshape co-occurrence constraints, also utilizing information about allophonic variations to aid in handshape recognition. We propose a fast non-rigid image alignment method to gain improved robustness to handshape appearance variations during computation of observation likelihoods in the Bayesian network. We evaluate our handshape recognition approach on a large dataset of monomorphemic lexical signs. We demonstrate that leveraging linguistic constraints on handshapes results in improved handshape recognition accuracy. As part of the overall project, we are collecting and preparing for dissemination a large corpus (three thousand signs from three native signers) of American Sign Language (ASL) video. The video have been annotated using SignStream® [Neidle et al.] with labels for linguistic information such as glosses, morphological properties and variations, and start/end handshapes associated with each ASL sign.
Resumo:
The Google AdSense Program is a successful internet advertisement program where Google places contextual adverts on third-party websites and shares the resulting revenue with each publisher. Advertisers have budgets and bid on ad slots while publishers set reserve prices for the ad slots on their websites. Following previous modelling efforts, we model the program as a two-sided market with advertisers on one side and publishers on the other. We show a reduction from the Generalised Assignment Problem (GAP) to the problem of computing the revenue maximising allocation and pricing of publisher slots under a first-price auction. GAP is APX-hard but a (1-1/e) approximation is known. We compute truthful and revenue-maximizing prices and allocation of ad slots to advertisers under a second-price auction. The auctioneer's revenue is within (1-1/e) second-price optimal.
Resumo:
For communication-intensive parallel applications, the maximum degree of concurrency achievable is limited by the communication throughput made available by the network. In previous work [HPS94], we showed experimentally that the performance of certain parallel applications running on a workstation network can be improved significantly if a congestion control protocol is used to enhance network performance. In this paper, we characterize and analyze the communication requirements of a large class of supercomputing applications that fall under the category of fixed-point problems, amenable to solution by parallel iterative methods. This results in a set of interface and architectural features sufficient for the efficient implementation of the applications over a large-scale distributed system. In particular, we propose a direct link between the application and network layer, supporting congestion control actions at both ends. This in turn enhances the system's responsiveness to network congestion, improving performance. Measurements are given showing the efficacy of our scheme to support large-scale parallel computations.
Resumo:
This paper investigates the power of genetic algorithms at solving the MAX-CLIQUE problem. We measure the performance of a standard genetic algorithm on an elementary set of problem instances consisting of embedded cliques in random graphs. We indicate the need for improvement, and introduce a new genetic algorithm, the multi-phase annealed GA, which exhibits superior performance on the same problem set. As we scale up the problem size and test on \hard" benchmark instances, we notice a degraded performance in the algorithm caused by premature convergence to local minima. To alleviate this problem, a sequence of modi cations are implemented ranging from changes in input representation to systematic local search. The most recent version, called union GA, incorporates the features of union cross-over, greedy replacement, and diversity enhancement. It shows a marked speed-up in the number of iterations required to find a given solution, as well as some improvement in the clique size found. We discuss issues related to the SIMD implementation of the genetic algorithms on a Thinking Machines CM-5, which was necessitated by the intrinsically high time complexity (O(n3)) of the serial algorithm for computing one iteration. Our preliminary conclusions are: (1) a genetic algorithm needs to be heavily customized to work "well" for the clique problem; (2) a GA is computationally very expensive, and its use is only recommended if it is known to find larger cliques than other algorithms; (3) although our customization e ort is bringing forth continued improvements, there is no clear evidence, at this time, that a GA will have better success in circumventing local minima.
Resumo:
The performance of a randomized version of the subgraph-exclusion algorithm (called Ramsey) for CLIQUE by Boppana and Halldorsson is studied on very large graphs. We compare the performance of this algorithm with the performance of two common heuristic algorithms, the greedy heuristic and a version of simulated annealing. These algorithms are tested on graphs with up to 10,000 vertices on a workstation and graphs as large as 70,000 vertices on a Connection Machine. Our implementations establish the ability to run clique approximation algorithms on very large graphs. We test our implementations on a variety of different graphs. Our conclusions indicate that on randomly generated graphs minor changes to the distribution can cause dramatic changes in the performance of the heuristic algorithms. The Ramsey algorithm, while not as good as the others for the most common distributions, seems more robust and provides a more even overall performance. In general, and especially on deterministically generated graphs, a combination of simulated annealing with either the Ramsey algorithm or the greedy heuristic seems to perform best. This combined algorithm works particularly well on large Keller and Hamming graphs and has a competitive overall performance on the DIMACS benchmark graphs.
Resumo:
Wireless sensor networks have recently emerged as enablers of important applications such as environmental, chemical and nuclear sensing systems. Such applications have sophisticated spatial-temporal semantics that set them aside from traditional wireless networks. For example, the computation of temperature averaged over the sensor field must take into account local densities. This is crucial since otherwise the estimated average temperature can be biased by over-sampling areas where a lot more sensors exist. Thus, we envision that a fundamental service that a wireless sensor network should provide is that of estimating local densities. In this paper, we propose a lightweight probabilistic density inference protocol, we call DIP, which allows each sensor node to implicitly estimate its neighborhood size without the explicit exchange of node identifiers as in existing density discovery schemes. The theoretical basis of DIP is a probabilistic analysis which gives the relationship between the number of sensor nodes contending in the neighborhood of a node and the level of contention measured by that node. Extensive simulations confirm the premise of DIP: it can provide statistically reliable and accurate estimates of local density at a very low energy cost and constant running time. We demonstrate how applications could be built on top of our DIP-based service by computing density-unbiased statistics from estimated local densities.
Resumo:
The development and deployment of distributed network-aware applications and services over the Internet require the ability to compile and maintain a model of the underlying network resources with respect to (one or more) characteristic properties of interest. To be manageable, such models must be compact, and must enable a representation of properties along temporal, spatial, and measurement resolution dimensions. In this paper, we propose a general framework for the construction of such metric-induced models using end-to-end measurements. We instantiate our approach using one such property, packet loss rates, and present an analytical framework for the characterization of Internet loss topologies. From the perspective of a server the loss topology is a logical tree rooted at the server with clients at its leaves, in which edges represent lossy paths between a pair of internal network nodes. We show how end-to-end unicast packet probing techniques could b e used to (1) infer a loss topology and (2) identify the loss rates of links in an existing loss topology. Correct, efficient inference of loss topology information enables new techniques for aggregate congestion control, QoS admission control, connection scheduling and mirror site selection. We report on simulation, implementation, and Internet deployment results that show the effectiveness of our approach and its robustness in terms of its accuracy and convergence over a wide range of network conditions.
Resumo:
The increased diversity of Internet application requirements has spurred recent interests in transport protocols with flexible transmission controls. In window-based congestion control schemes, increase rules determine how to probe available bandwidth, whereas decrease rules determine how to back off when losses due to congestion are detected. The parameterization of these control rules is done so as to ensure that the resulting protocol is TCP-friendly in terms of the relationship between throughput and loss rate. In this paper, we define a new spectrum of window-based congestion control algorithms that are TCP-friendly as well as TCP-compatible under RED. Contrary to previous memory-less controls, our algorithms utilize history information in their control rules. Our proposed algorithms have two salient features: (1) They enable a wider region of TCP-friendliness, and thus more flexibility in trading off among smoothness, aggressiveness, and responsiveness; and (2) they ensure a faster convergence to fairness under a wide range of system conditions. We demonstrate analytically and through extensive ns simulations the steady-state and transient behaviors of several instances of this new spectrum of algorithms. In particular, SIMD is one instance in which the congestion window is increased super-linearly with time since the detection of the last loss. Compared to recently proposed TCP-friendly AIMD and binomial algorithms, we demonstrate the superiority of SIMD in: (1) adapting to sudden increases in available bandwidth, while maintaining competitive smoothness and responsiveness; and (2) rapidly converging to fairness and efficiency.
Resumo:
End-to-End differentiation between wireless and congestion loss can equip TCP control so it operates effectively in a hybrid wired/wireless environment. Our approach integrates two techniques: packet loss pairs (PLP) and Hidden Markov Modeling (HMM). A packet loss pair is formed by two back-to-back packets, where one packet is lost while the second packet is successfully received. The purpose is for the second packet to carry the state of the network path, namely the round trip time (RTT), at the time the other packet is lost. Under realistic conditions, PLP provides strong differentiation between congestion and wireless type of loss based on distinguishable RTT distributions. An HMM is then trained so observed RTTs can be mapped to model states that represent either congestion loss or wireless loss. Extensive simulations confirm the accuracy of our HMM-based technique in classifying the cause of a packet loss. We also show the superiority of our technique over the Vegas predictor, which was recently found to perform best and which exemplifies other existing loss labeling techniques.
Resumo:
The increasing diversity of Internet application requirements has spurred recent interest in transport protocols with flexible transmission controls. In window-based congestion control schemes, increase rules determine how to probe available bandwidth, whereas decrease rules determine how to back off when losses due to congestion are detected. The control rules are parameterized so as to ensure that the resulting protocol is TCP-friendly in terms of the relationship between throughput and loss rate. This paper presents a comprehensive study of a new spectrum of window-based congestion controls, which are TCP-friendly as well as TCP-compatible under RED. Our controls utilize history information in their control rules. By doing so, they improve the transient behavior, compared to recently proposed slowly-responsive congestion controls such as general AIMD and binomial controls. Our controls can achieve better tradeoffs among smoothness, aggressiveness, and responsiveness, and they can achieve faster convergence. We demonstrate analytically and through extensive ns simulations the steady-state and transient behavior of several instances of this new spectrum.
Resumo:
The goal of this work is to learn a parsimonious and informative representation for high-dimensional time series. Conceptually, this comprises two distinct yet tightly coupled tasks: learning a low-dimensional manifold and modeling the dynamical process. These two tasks have a complementary relationship as the temporal constraints provide valuable neighborhood information for dimensionality reduction and conversely, the low-dimensional space allows dynamics to be learnt efficiently. Solving these two tasks simultaneously allows important information to be exchanged mutually. If nonlinear models are required to capture the rich complexity of time series, then the learning problem becomes harder as the nonlinearities in both tasks are coupled. The proposed solution approximates the nonlinear manifold and dynamics using piecewise linear models. The interactions among the linear models are captured in a graphical model. By exploiting the model structure, efficient inference and learning algorithms are obtained without oversimplifying the model of the underlying dynamical process. Evaluation of the proposed framework with competing approaches is conducted in three sets of experiments: dimensionality reduction and reconstruction using synthetic time series, video synthesis using a dynamic texture database, and human motion synthesis, classification and tracking on a benchmark data set. In all experiments, the proposed approach provides superior performance.
Resumo:
The goal of this work is to learn a parsimonious and informative representation for high-dimensional time series. Conceptually, this comprises two distinct yet tightly coupled tasks: learning a low-dimensional manifold and modeling the dynamical process. These two tasks have a complementary relationship as the temporal constraints provide valuable neighborhood information for dimensionality reduction and conversely, the low-dimensional space allows dynamics to be learnt efficiently. Solving these two tasks simultaneously allows important information to be exchanged mutually. If nonlinear models are required to capture the rich complexity of time series, then the learning problem becomes harder as the nonlinearities in both tasks are coupled. The proposed solution approximates the nonlinear manifold and dynamics using piecewise linear models. The interactions among the linear models are captured in a graphical model. The model structure setup and parameter learning are done using a variational Bayesian approach, which enables automatic Bayesian model structure selection, hence solving the problem of over-fitting. By exploiting the model structure, efficient inference and learning algorithms are obtained without oversimplifying the model of the underlying dynamical process. Evaluation of the proposed framework with competing approaches is conducted in three sets of experiments: dimensionality reduction and reconstruction using synthetic time series, video synthesis using a dynamic texture database, and human motion synthesis, classification and tracking on a benchmark data set. In all experiments, the proposed approach provides superior performance.
Resumo:
Principality of typings is the property that for each typable term, there is a typing from which all other typings are obtained via some set of operations. Type inference is the problem of finding a typing for a given term, if possible. We define an intersection type system which has principal typings and types exactly the strongly normalizable λ-terms. More interestingly, every finite-rank restriction of this system (using Leivant's first notion of rank) has principal typings and also has decidable type inference. This is in contrast to System F where the finite rank restriction for every finite rank at 3 and above has neither principal typings nor decidable type inference. This is also in contrast to earlier presentations of intersection types where the status of these properties is not known for the finite-rank restrictions at 3 and above.Furthermore, the notion of principal typings for our system involves only one operation, substitution, rather than several operations (not all substitution-based) as in earlier presentations of principality for intersection types (of unrestricted rank). A unification-based type inference algorithm is presented using a new form of unification, β-unification.
Resumo:
Web caching aims to reduce network traffic, server load, and user-perceived retrieval delays by replicating "popular" content on proxy caches that are strategically placed within the network. While key to effective cache utilization, popularity information (e.g. relative access frequencies of objects requested through a proxy) is seldom incorporated directly in cache replacement algorithms. Rather, other properties of the request stream (e.g. temporal locality and content size), which are easier to capture in an on-line fashion, are used to indirectly infer popularity information, and hence drive cache replacement policies. Recent studies suggest that the correlation between these secondary properties and popularity is weakening due in part to the prevalence of efficient client and proxy caches (which tend to mask these correlations). This trend points to the need for proxy cache replacement algorithms that directly capture and use popularity information. In this paper, we (1) present an on-line algorithm that effectively captures and maintains an accurate popularity profile of Web objects requested through a caching proxy, (2) propose a novel cache replacement policy that uses such information to generalize the well-known GreedyDual-Size algorithm, and (3) show the superiority of our proposed algorithm by comparing it to a host of recently-proposed and widely-used algorithms using extensive trace-driven simulations and a variety of performance metrics.
Resumo:
We consider type systems that combine universal types, recursive types, and object types. We study type inference in these systems under a rank restriction, following Leivant's notion of rank. To motivate our work, we present several examples showing how our systems can be used to type programs encountered in practice. We show that type inference in the rank-k system is decidable for k ≤ 2 and undecidable for k ≥ 3. (Similar results based on different techniques are known to hold for System F, without recursive types and object types.) Our undecidability result is obtained by a reduction from a particular adaptation (which we call "regular") of the semi-unification problem and whose undecidability is, interestingly, obtained by methods totally different from those used in the case of standard (or finite) semi-unification.