15 resultados para Natural language processing (Computer science)

em Boston University Digital Common


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In a recent paper (Changes in Web Client Access Patterns: Characteristics and Caching Implications by Barford, Bestavros, Bradley, and Crovella) we performed a variety of analyses upon user traces collected in the Boston University Computer Science department in 1995 and 1998. A sanitized version of the 1995 trace has been publicly available for some time; the 1998 trace has now been sanitized, and is available from: http://www.cs.bu.edu/techreports/1999-011-usertrace-98.gz ftp://ftp.cs.bu.edu/techreports/1999-011-usertrace-98.gz This memo discusses the format of this public version of the log, and includes additional discussion of how the data was collected, how the log was sanitized, what this log is and is not useful for, and areas of potential future research interest.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In research areas involving mathematical rigor, there are numerous benefits to adopting a formal representation of models and arguments: reusability, automatic evaluation of examples, and verification of consistency and correctness. However, accessibility has not been a priority in the design of formal verification tools that can provide these benefits. In earlier work [30] we attempt to address this broad problem by proposing several specific design criteria organized around the notion of a natural context: the sphere of awareness a working human user maintains of the relevant constructs, arguments, experiences, and background materials necessary to accomplish the task at hand. In this report we evaluate our proposed design criteria by utilizing within the context of novel research a formal reasoning system that is designed according to these criteria. In particular, we consider how the design and capabilities of the formal reasoning system that we employ influence, aid, or hinder our ability to accomplish a formal reasoning task – the assembly of a machine-verifiable proof pertaining to the NetSketch formalism. NetSketch is a tool for the specification of constrained-flow applications and the certification of desirable safety properties imposed thereon. NetSketch is conceived to assist system integrators in two types of activities: modeling and design. It provides capabilities for compositional analysis based on a strongly-typed domain-specific language (DSL) for describing and reasoning about constrained-flow networks and invariants that need to be enforced thereupon. In a companion paper [13] we overview NetSketch, highlight its salient features, and illustrate how it could be used in actual applications. In this paper, we define using a machine-readable syntax major parts of the formal system underlying the operation of NetSketch, along with its semantics and a corresponding notion of validity. We then provide a proof of soundness for the formalism that can be partially verified using a lightweight formal reasoning system that simulates natural contexts. A traditional presentation of these definitions and arguments can be found in the full report on the NetSketch formalism [12].

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the joins contain much smaller number of tuples than the tuples contained in the sliding windows. Therefore, a stream buffer management policy is needed in that case. We show that the buffer replacement policy is an important determinant of the quality of the produced results. To that end, we propose GreedyDual-Join (GDJ) an adaptive and locality-aware buffering technique for managing these buffers. GDJ exploits the temporal correlations (at both long and short time scales), which we found to be prevalent in many real data streams. We note that our algorithm is readily applicable to multiple data streams and multiple joins and requires almost no additional system resources. We report results of an experimental study using both synthetic and real-world data sets. Our results demonstrate the superiority and flexibility of our approach when contrasted to other recently proposed techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The CIL compiler for core Standard ML compiles whole programs using a novel typed intermediate language (TIL) with intersection and union types and flow labels on both terms and types. The CIL term representation duplicates portions of the program where intersection types are introduced and union types are eliminated. This duplication makes it easier to represent type information and to introduce customized data representations. However, duplication incurs compile-time space costs that are potentially much greater than are incurred in TILs employing type-level abstraction or quantification. In this paper, we present empirical data on the compile-time space costs of using CIL as an intermediate language. The data shows that these costs can be made tractable by using sufficiently fine-grained flow analyses together with standard hash-consing techniques. The data also suggests that non-duplicating formulations of intersection (and union) types would not achieve significantly better space complexity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An automated system for detection of head movements is described. The goal is to label relevant head gestures in video of American Sign Language (ASL) communication. In the system, a 3D head tracker recovers head rotation and translation parameters from monocular video. Relevant head gestures are then detected by analyzing the length and frequency of the motion signal's peaks and valleys. Each parameter is analyzed independently, due to the fact that a number of relevant head movements in ASL are associated with major changes around one rotational axis. No explicit training of the system is necessary. Currently, the system can detect "head shakes." In experimental evaluation, classification performance is compared against ground-truth labels obtained from ASL linguists. Initial results are promising, as the system matches the linguists' labels in a significant number of cases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

snBench is a platform on which novice users compose and deploy distributed Sense and Respond programs for simultaneous execution on a shared, distributed infrastructure. It is a natural imperative that we have the ability to (1) verify the safety/correctness of newly submitted tasks and (2) derive the resource requirements for these tasks such that correct allocation may occur. To achieve these goals we have established a multi-dimensional sized type system for our functional-style Domain Specific Language (DSL) called Sensor Task Execution Plan (STEP). In such a type system data types are annotated with a vector of size attributes (e.g., upper and lower size bounds). Tracking multiple size aspects proves essential in a system in which Images are manipulated as a first class data type, as image manipulation functions may have specific minimum and/or maximum resolution restrictions on the input they can correctly process. Through static analysis of STEP instances we not only verify basic type safety and establish upper computational resource bounds (i.e., time and space), but we also derive and solve data and resource sizing constraints (e.g., Image resolution, camera capabilities) from the implicit constraints embedded in program instances. In fact, the static methods presented here have benefit beyond their application to Image data, and may be extended to other data types that require tracking multiple dimensions (e.g., image "quality", video frame-rate or aspect ratio, audio sampling rate). In this paper we present the syntax and semantics of our functional language, our type system that builds costs and resource/data constraints, and (through both formalism and specific details of our implementation) provide concrete examples of how the constraints and sizing information are used in practice.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Locating hands in sign language video is challenging due to a number of factors. Hand appearance varies widely across signers due to anthropometric variations and varying levels of signer proficiency. Video can be captured under varying illumination, camera resolutions, and levels of scene clutter, e.g., high-res video captured in a studio vs. low-res video gathered by a web cam in a user’s home. Moreover, the signers’ clothing varies, e.g., skin-toned clothing vs. contrasting clothing, short-sleeved vs. long-sleeved shirts, etc. In this work, the hand detection problem is addressed in an appearance matching framework. The Histogram of Oriented Gradient (HOG) based matching score function is reformulated to allow non-rigid alignment between pairs of images to account for hand shape variation. The resulting alignment score is used within a Support Vector Machine hand/not-hand classifier for hand detection. The new matching score function yields improved performance (in ROC area and hand detection rate) over the Vocabulary Guided Pyramid Match Kernel (VGPMK) and the traditional, rigid HOG distance on American Sign Language video gestured by expert signers. The proposed match score function is computationally less expensive (for training and testing), has fewer parameters and is less sensitive to parameter settings than VGPMK. The proposed detector works well on test sequences from an inexpert signer in a non-studio setting with cluttered background.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is centered around the design of a thread- and memory-safe language, primarily for the compilation of application-specific services for extensible operating systems. We describe various issues that have influenced the design of our language, called Cuckoo, that guarantees safety of programs with potentially asynchronous flows of control. Comparisons are drawn between Cuckoo and related software safety techniques, including Cyclone and software-based fault isolation (SFI), and performance results suggest our prototype compiler is capable of generating safe code that executes with low runtime overheads, even without potential code optimizations. Compared to Cyclone, Cuckoo is able to safely guard accesses to memory when programs are multithreaded. Similarly, Cuckoo is capable of enforcing memory safety in situations that are potentially troublesome for techniques such as SFI.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper formally defines the operational semantic for TRAFFIC, a specification language for flow composition applications proposed in BUCS-TR-2005-014, and presents a type system based on desired safety assurance. We provide proofs on reduction (weak-confluence, strong-normalization and unique normal form), on soundness and completeness of type system with respect to reduction, and on equivalence classes of flow specifications. Finally, we provide a pseudo-code listing of a syntax-directed type checking algorithm implementing rules of the type system capable of inferring the type of a closed flow specification.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Facial features play an important role in expressing grammatical information in signed languages, including American Sign Language(ASL). Gestures such as raising or furrowing the eyebrows are key indicators of constructions such as yes-no questions. Periodic head movements (nods and shakes) are also an essential part of the expression of syntactic information, such as negation (associated with a side-to-side headshake). Therefore, identification of these facial gestures is essential to sign language recognition. One problem with detection of such grammatical indicators is occlusion recovery. If the signer's hand blocks his/her eyebrows during production of a sign, it becomes difficult to track the eyebrows. We have developed a system to detect such grammatical markers in ASL that recovers promptly from occlusion. Our system detects and tracks evolving templates of facial features, which are based on an anthropometric face model, and interprets the geometric relationships of these templates to identify grammatical markers. It was tested on a variety of ASL sentences signed by various Deaf native signers and detected facial gestures used to express grammatical information, such as raised and furrowed eyebrows as well as headshakes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Weak references are references that do not prevent the object they point to from being garbage collected. Most realistic languages, including Java, SML/NJ, and OCaml to name a few, have some facility for programming with weak references. Weak references are used in implementing idioms like memoizing functions and hash-consing in order to avoid potential memory leaks. However, the semantics of weak references in many languages are not clearly specified. Without a formal semantics for weak references it becomes impossible to prove the correctness of implementations making use of this feature. Previous work by Hallett and Kfoury extends λgc, a language for modeling garbage collection, to λweak, a similar language with weak references. Using this previously formalized semantics for weak references, we consider two issues related to well-behavedness of programs. Firstly, we provide a new, simpler proof of the well-behavedness of the syntactically restricted fragment of λweak defined previously. Secondly, we give a natural semantic criterion for well-behavedness much broader than the syntactic restriction, which is useful as principle for programming with weak references. Furthermore we extend the result, proved in previously of λgc, which allows one to use type-inference to collect some reachable objects that are never used. We prove that this result holds of our language, and we extend this result to allow the collection of weakly-referenced reachable garbage without incurring the computational overhead sometimes associated with collecting weak bindings (e.g. the need to recompute a memoized function). Lastly we use extend the semantic framework to model the key/value weak references found in Haskell and we prove the Haskell is semantics equivalent to a simpler semantics due to the lack of side-effects in our language.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A weak reference is a reference to an object that is not followed by the pointer tracer when garbage collection is called. That is, a weak reference cannot prevent the object it references from being garbage collected. Weak references remain a troublesome programming feature largely because there is not an accepted, precise semantics that describes their behavior (in fact, we are not aware of any formalization of their semantics). The trouble is that weak references allow reachable objects to be garbage collected, therefore allowing garbage collection to influence the result of a program. Despite this difficulty, weak references continue to be used in practice for reasons related to efficient storage management, and are included in many popular programming languages (Standard ML, Haskell, OCaml, and Java). We give a formal semantics for a calculus called λweak that includes weak references and is derived from Morrisett, Felleisen, and Harper’s λgc. λgc formalizes the notion of garbage collection by means of a rewrite rule. Such a formalization is required to precisely characterize the semantics of weak references. However, the inclusion of a garbage-collection rewrite-rule in a language with weak references introduces non-deterministic evaluation, even if the parameter-passing mechanism is deterministic (call-by-value in our case). This raises the question of confluence for our rewrite system. We discuss natural restrictions under which our rewrite system is confluent, thus guaranteeing uniqueness of program result. We define conditions that allow other garbage collection algorithms to co-exist with our semantics of weak references. We also introduce a polymorphic type system to prove the absence of erroneous program behavior (i.e., the absence of “stuck evaluation”) and a corresponding type inference algorithm. We prove the type system sound and the inference algorithm sound and complete.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

When analysing the behavior of complex networked systems, it is often the case that some components within that network are only known to the extent that they belong to one of a set of possible "implementations" – e.g., versions of a specific protocol, class of schedulers, etc. In this report we augment the specification language considered in BUCSTR-2004-021, BUCS-TR-2005-014, BUCS-TR-2005-015, and BUCS-TR-2005-033, to include a non-deterministic multiple-choice let-binding, which allows us to consider compositions of networking subsystems that allow for looser component specifications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this project is the creation of a graphical "programming" interface for a sensor network tasking language called STEP. The graphical interface allows the user to specify a program execution graphically from an extensible pallet of functionalities and save the results as a properly formatted STEP file. Moreover, the software is able to load a file in STEP format and convert it into the corresponding graphical representation. During both phases a type-checker is running on the background to ensure that both the graphical representation and the STEP file are syntactically correct. This project has been motivated by the Sensorium project at Boston University. In this technical report we present the basic features of the software, the process that has been followed during the design and implementation. Finally, we describe the approach used to test and validate our software.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Personal communication devices are increasingly equipped with sensors that are able to collect and locally store information from their environs. The mobility of users carrying such devices, and hence the mobility of sensor readings in space and time, opens new horizons for interesting applications. In particular, we envision a system in which the collective sensing, storage and communication resources, and mobility of these devices could be leveraged to query the state of (possibly remote) neighborhoods. Such queries would have spatio-temporal constraints which must be met for the query answers to be useful. Using a simplified mobility model, we analytically quantify the benefits from cooperation (in terms of the system's ability to satisfy spatio-temporal constraints), which we show to go beyond simple space-time tradeoffs. In managing the limited storage resources of such cooperative systems, the goal should be to minimize the number of unsatisfiable spatio-temporal constraints. We show that Data Centric Storage (DCS), or "directed placement", is a viable approach for achieving this goal, but only when the underlying network is well connected. Alternatively, we propose, "amorphous placement", in which sensory samples are cached locally, and shuffling of cached samples is used to diffuse the sensory data throughout the whole network. We evaluate conditions under which directed versus amorphous placement strategies would be more efficient. These results lead us to propose a hybrid placement strategy, in which the spatio-temporal constraints associated with a sensory data type determine the most appropriate placement strategy for that data type. We perform an extensive simulation study to evaluate the performance of directed, amorphous, and hybrid placement protocols when applied to queries that are subject to timing constraints. Our results show that, directed placement is better for queries with moderately tight deadlines, whereas amorphous placement is better for queries with looser deadlines, and that under most operational conditions, the hybrid technique gives the best compromise.