958 resultados para Source code (Computer science)
Resumo:
If we classify variables in a program into various security levels, then a secure information flow analysis aims to verify statically that information in a program can flow only in ways consistent with the specified security levels. One well-studied approach is to formulate the rules of the secure information flow analysis as a type system. A major trend of recent research focuses on how to accommodate various sophisticated modern language features. However, this approach often leads to overly complicated and restrictive type systems, making them unfit for practical use. Also, problems essential to practical use, such as type inference and error reporting, have received little attention. This dissertation identified and solved major theoretical and practical hurdles to the application of secure information flow. ^ We adopted a minimalist approach to designing our language to ensure a simple lenient type system. We started out with a small simple imperative language and only added features that we deemed most important for practical use. One language feature we addressed is arrays. Due to the various leaking channels associated with array operations, arrays have received complicated and restrictive typing rules in other secure languages. We presented a novel approach for lenient array operations, which lead to simple and lenient typing of arrays. ^ Type inference is necessary because usually a user is only concerned with the security types for input/output variables of a program and would like to have all types for auxiliary variables inferred automatically. We presented a type inference algorithm B and proved its soundness and completeness. Moreover, algorithm B stays close to the program and the type system and therefore facilitates informative error reporting that is generated in a cascading fashion. Algorithm B and error reporting have been implemented and tested. ^ Lastly, we presented a novel framework for developing applications that ensure user information privacy. In this framework, core computations are defined as code modules that involve input/output data from multiple parties. Incrementally, secure flow policies are refined based on feedback from the type checking/inference. Core computations only interact with code modules from involved parties through well-defined interfaces. All code modules are digitally signed to ensure their authenticity and integrity. ^
A framework for transforming, analyzing, and realizing software designs in unified modeling language
Resumo:
Unified Modeling Language (UML) is the most comprehensive and widely accepted object-oriented modeling language due to its multi-paradigm modeling capabilities and easy to use graphical notations, with strong international organizational support and industrial production quality tool support. However, there is a lack of precise definition of the semantics of individual UML notations as well as the relationships among multiple UML models, which often introduces incomplete and inconsistent problems for software designs in UML, especially for complex systems. Furthermore, there is a lack of methodologies to ensure a correct implementation from a given UML design. The purpose of this investigation is to verify and validate software designs in UML, and to provide dependability assurance for the realization of a UML design.^ In my research, an approach is proposed to transform UML diagrams into a semantic domain, which is a formal component-based framework. The framework I proposed consists of components and interactions through message passing, which are modeled by two-layer algebraic high-level nets and transformation rules respectively. In the transformation approach, class diagrams, state machine diagrams and activity diagrams are transformed into component models, and transformation rules are extracted from interaction diagrams. By applying transformation rules to component models, a (sub)system model of one or more scenarios can be constructed. Various techniques such as model checking, Petri net analysis techniques can be adopted to check if UML designs are complete or consistent. A new component called property parser was developed and merged into the tool SAM Parser, which realize (sub)system models automatically. The property parser generates and weaves runtime monitoring code into system implementations automatically for dependability assurance. The framework in the investigation is creative and flexible since it not only can be explored to verify and validate UML designs, but also provides an approach to build models for various scenarios. As a result of my research, several kinds of previous ignored behavioral inconsistencies can be detected.^
Resumo:
The need to provide computers with the ability to distinguish the affective state of their users is a major requirement for the practical implementation of affective computing concepts. This dissertation proposes the application of signal processing methods on physiological signals to extract from them features that can be processed by learning pattern recognition systems to provide cues about a person's affective state. In particular, combining physiological information sensed from a user's left hand in a non-invasive way with the pupil diameter information from an eye-tracking system may provide a computer with an awareness of its user's affective responses in the course of human-computer interactions. In this study an integrated hardware-software setup was developed to achieve automatic assessment of the affective status of a computer user. A computer-based "Paced Stroop Test" was designed as a stimulus to elicit emotional stress in the subject during the experiment. Four signals: the Galvanic Skin Response (GSR), the Blood Volume Pulse (BVP), the Skin Temperature (ST) and the Pupil Diameter (PD), were monitored and analyzed to differentiate affective states in the user. Several signal processing techniques were applied on the collected signals to extract their most relevant features. These features were analyzed with learning classification systems, to accomplish the affective state identification. Three learning algorithms: Naïve Bayes, Decision Tree and Support Vector Machine were applied to this identification process and their levels of classification accuracy were compared. The results achieved indicate that the physiological signals monitored do, in fact, have a strong correlation with the changes in the emotional states of the experimental subjects. These results also revealed that the inclusion of pupil diameter information significantly improved the performance of the emotion recognition system. ^
Resumo:
The purpose of this study was to design a preventive scheme using directional antennas to improve the performance of mobile ad hoc networks. In this dissertation, a novel Directionality based Preventive Link Maintenance (DPLM) Scheme is proposed to characterize the performance gain [JaY06a, JaY06b, JCY06] by extending the life of link. In order to maintain the link and take preventive action, signal strength of data packets is measured. Moreover, location information or angle of arrival information is collected during communication and saved in the table. When measured signal strength is below orientation threshold , an orientation warning is generated towards the previous hop node. Once orientation warning is received by previous hop (adjacent) node, it verifies the correctness of orientation warning with few hello pings and initiates high quality directional link (a link above the threshold) and immediately switches to it, avoiding a link break altogether. The location information is utilized to create a directional link by orienting neighboring nodes antennas towards each other. We call this operation an orientation handoff, which is similar to soft-handoff in cellular networks. ^ Signal strength is the indicating factor, which represents the health of the link and helps to predict the link failure. In other words, link breakage happens due to node movement and subsequently reducing signal strength of receiving packets. DPLM scheme helps ad hoc networks to avoid or postpone costly operation of route rediscovery in on-demand routing protocols by taking above-mentioned preventive action. ^ This dissertation advocates close but simple collaboration between the routing, medium access control and physical layers. In order to extend the link, the Dynamic Source Routing (DSR) and IEEE 802.11 MAC protocols were modified to use the ability of directional antennas to transmit over longer distance. A directional antenna module is implemented in OPNET simulator with two separate modes of operations: omnidirectional and directional. The antenna module has been incorporated in wireless node model and simulations are performed to characterize the performance improvement of mobile ad hoc networks. Extensive simulations have shown that without affecting the behavior of the routing protocol noticeably, aggregate throughput, packet delivery ratio, end-to-end delay (latency), routing overhead, number of data packets dropped, and number of path breaks are improved considerably. We have done the analysis of the results in different scenarios to evaluate that the use of directional antennas with proposed DPLM scheme has been found promising to improve the performance of mobile ad hoc networks. ^
Resumo:
Software development is an extremely complex process, during which human errors are introduced and result in faulty software systems. It is highly desirable and important that these errors can be prevented and detected as early as possible. Software architecture design is a high-level system description, which embodies many system features and properties that are eventually implemented in the final operational system. Therefore, methods for modeling and analyzing software architecture descriptions can help prevent and reveal human errors and thus improve software quality. Furthermore, if an analyzed software architecture description can be used to derive a partial software implementation, especially when the derivation can be automated, significant benefits can be gained with regard to both the system quality and productivity. This dissertation proposes a framework for an integrated analysis on both of the design and implementation. To ensure the desirable properties of the architecture model, we apply formal verification by using the model checking technique. To ensure the desirable properties of the implementation, we develop a methodology and the associated tool to translate an architecture specification into an implementation written in the combination of Arch-Java/Java/AspectJ programming languages. The translation is semi-automatic so that many manual programming errors can be prevented. Furthermore, the translation inserting monitoring code into the implementation such that runtime verification can be performed, this provides additional assurance for the quality of the implementation. Moreover, validations for the translations from architecture model to program are provided. Finally, several case studies are experimented and presented.
Resumo:
Distributed applications are exposed as reusable components that are dynamically discovered and integrated to create new applications. These new applications, in the form of aggregate services, are vulnerable to failure due to the autonomous and distributed nature of their integrated components. This vulnerability creates the need for adaptability in aggregate services. The need for adaptation is accentuated for complex long-running applications as is found in scientific Grid computing, where distributed computing nodes may participate to solve computation and data-intensive problems. Such applications integrate services for coordinated problem solving in areas such as Bioinformatics. For such applications, when a constituent service fails, the application fails, even though there are other nodes that can substitute for the failed service. This concern is not addressed in the specification of high-level composition languages such as that of the Business Process Execution Language (BPEL). We propose an approach to transparently autonomizing existing BPEL processes in order to make them modifiable at runtime and more resilient to the failures in their execution environment. By transparent introduction of adaptive behavior, adaptation preserves the original business logic of the aggregate service and does not tangle the code for adaptive behavior with that of the aggregate service. The major contributions of this dissertation are: first, we assessed the effectiveness of BPEL language support in developing adaptive mechanisms. As a result, we identified the strengths and limitations of BPEL and came up with strategies to address those limitations. Second, we developed a technique to enhance existing BPEL processes transparently in order to support dynamic adaptation. We proposed a framework which uses transparent shaping and generative programming to make BPEL processes adaptive. Third, we developed a technique to dynamically discover and bind to substitute services. Our technique was evaluated and the result showed that dynamic utilization of components improves the flexibility of adaptive BPEL processes. Fourth, we developed an extensible policy-based technique to specify how to handle exceptional behavior. We developed a generic component that introduces adaptive behavior for multiple BPEL processes. Fifth, we identify ways to apply our work to facilitate adaptability in composite Grid services.
Resumo:
To carry out their specific roles in the cell, genes and gene products often work together in groups, forming many relationships among themselves and with other molecules. Such relationships include physical protein-protein interaction relationships, regulatory relationships, metabolic relationships, genetic relationships, and much more. With advances in science and technology, some high throughput technologies have been developed to simultaneously detect tens of thousands of pairwise protein-protein interactions and protein-DNA interactions. However, the data generated by high throughput methods are prone to noise. Furthermore, the technology itself has its limitations, and cannot detect all kinds of relationships between genes and their products. Thus there is a pressing need to investigate all kinds of relationships and their roles in a living system using bioinformatic approaches, and is a central challenge in Computational Biology and Systems Biology. This dissertation focuses on exploring relationships between genes and gene products using bioinformatic approaches. Specifically, we consider problems related to regulatory relationships, protein-protein interactions, and semantic relationships between genes. A regulatory element is an important pattern or "signal", often located in the promoter of a gene, which is used in the process of turning a gene "on" or "off". Predicting regulatory elements is a key step in exploring the regulatory relationships between genes and gene products. In this dissertation, we consider the problem of improving the prediction of regulatory elements by using comparative genomics data. With regard to protein-protein interactions, we have developed bioinformatics techniques to estimate support for the data on these interactions. While protein-protein interactions and regulatory relationships can be detected by high throughput biological techniques, there is another type of relationship called semantic relationship that cannot be detected by a single technique, but can be inferred using multiple sources of biological data. The contributions of this thesis involved the development and application of a set of bioinformatic approaches that address the challenges mentioned above. These included (i) an EM-based algorithm that improves the prediction of regulatory elements using comparative genomics data, (ii) an approach for estimating the support of protein-protein interaction data, with application to functional annotation of genes, (iii) a novel method for inferring functional network of genes, and (iv) techniques for clustering genes using multi-source data.
Resumo:
Developing analytical models that can accurately describe behaviors of Internet-scale networks is difficult. This is due, in part, to the heterogeneous structure, immense size and rapidly changing properties of today's networks. The lack of analytical models makes large-scale network simulation an indispensable tool for studying immense networks. However, large-scale network simulation has not been commonly used to study networks of Internet-scale. This can be attributed to three factors: 1) current large-scale network simulators are geared towards simulation research and not network research, 2) the memory required to execute an Internet-scale model is exorbitant, and 3) large-scale network models are difficult to validate. This dissertation tackles each of these problems. ^ First, this work presents a method for automatically enabling real-time interaction, monitoring, and control of large-scale network models. Network researchers need tools that allow them to focus on creating realistic models and conducting experiments. However, this should not increase the complexity of developing a large-scale network simulator. This work presents a systematic approach to separating the concerns of running large-scale network models on parallel computers and the user facing concerns of configuring and interacting with large-scale network models. ^ Second, this work deals with reducing memory consumption of network models. As network models become larger, so does the amount of memory needed to simulate them. This work presents a comprehensive approach to exploiting structural duplications in network models to dramatically reduce the memory required to execute large-scale network experiments. ^ Lastly, this work addresses the issue of validating large-scale simulations by integrating real protocols and applications into the simulation. With an emulation extension, a network simulator operating in real-time can run together with real-world distributed applications and services. As such, real-time network simulation not only alleviates the burden of developing separate models for applications in simulation, but as real systems are included in the network model, it also increases the confidence level of network simulation. This work presents a scalable and flexible framework to integrate real-world applications with real-time simulation.^
Resumo:
The lack of analytical models that can accurately describe large-scale networked systems makes empirical experimentation indispensable for understanding complex behaviors. Research on network testbeds for testing network protocols and distributed services, including physical, emulated, and federated testbeds, has made steady progress. Although the success of these testbeds is undeniable, they fail to provide: 1) scalability, for handling large-scale networks with hundreds or thousands of hosts and routers organized in different scenarios, 2) flexibility, for testing new protocols or applications in diverse settings, and 3) inter-operability, for combining simulated and real network entities in experiments. This dissertation tackles these issues in three different dimensions. First, we present SVEET, a system that enables inter-operability between real and simulated hosts. In order to increase the scalability of networks under study, SVEET enables time-dilated synchronization between real hosts and the discrete-event simulator. Realistic TCP congestion control algorithms are implemented in the simulator to allow seamless interactions between real and simulated hosts. SVEET is validated via extensive experiments and its capabilities are assessed through case studies involving real applications. Second, we present PrimoGENI, a system that allows a distributed discrete-event simulator, running in real-time, to interact with real network entities in a federated environment. PrimoGENI greatly enhances the flexibility of network experiments, through which a great variety of network conditions can be reproduced to examine what-if questions. Furthermore, PrimoGENI performs resource management functions, on behalf of the user, for instantiating network experiments on shared infrastructures. Finally, to further increase the scalability of network testbeds to handle large-scale high-capacity networks, we present a novel symbiotic simulation approach. We present SymbioSim, a testbed for large-scale network experimentation where a high-performance simulation system closely cooperates with an emulation system in a mutually beneficial way. On the one hand, the simulation system benefits from incorporating the traffic metadata from real applications in the emulation system to reproduce the realistic traffic conditions. On the other hand, the emulation system benefits from receiving the continuous updates from the simulation system to calibrate the traffic between real applications. Specific techniques that support the symbiotic approach include: 1) a model downscaling scheme that can significantly reduce the complexity of the large-scale simulation model, resulting in an efficient emulation system for modulating the high-capacity network traffic between real applications; 2) a queuing network model for the downscaled emulation system to accurately represent the network effects of the simulated traffic; and 3) techniques for reducing the synchronization overhead between the simulation and emulation systems.
Resumo:
As researchers and practitioners move towards a vision of software systems that configure, optimize, protect, and heal themselves, they must also consider the implications of such self-management activities on software reliability. Autonomic computing (AC) describes a new generation of software systems that are characterized by dynamically adaptive self-management features. During dynamic adaptation, autonomic systems modify their own structure and/or behavior in response to environmental changes. Adaptation can result in new system configurations and capabilities, which need to be validated at runtime to prevent costly system failures. However, although the pioneers of AC recognize that validating autonomic systems is critical to the success of the paradigm, the architectural blueprint for AC does not provide a workflow or supporting design models for runtime testing. ^ This dissertation presents a novel approach for seamlessly integrating runtime testing into autonomic software. The approach introduces an implicit self-test feature into autonomic software by tailoring the existing self-management infrastructure to runtime testing. Autonomic self-testing facilitates activities such as test execution, code coverage analysis, timed test performance, and post-test evaluation. In addition, the approach is supported by automated testing tools, and a detailed design methodology. A case study that incorporates self-testing into three autonomic applications is also presented. The findings of the study reveal that autonomic self-testing provides a flexible approach for building safe, reliable autonomic software, while limiting the development and performance overhead through software reuse. ^
Resumo:
Concurrent software executes multiple threads or processes to achieve high performance. However, concurrency results in a huge number of different system behaviors that are difficult to test and verify. The aim of this dissertation is to develop new methods and tools for modeling and analyzing concurrent software systems at design and code levels. This dissertation consists of several related results. First, a formal model of Mondex, an electronic purse system, is built using Petri nets from user requirements, which is formally verified using model checking. Second, Petri nets models are automatically mined from the event traces generated from scientific workflows. Third, partial order models are automatically extracted from some instrumented concurrent program execution, and potential atomicity violation bugs are automatically verified based on the partial order models using model checking. Our formal specification and verification of Mondex have contributed to the world wide effort in developing a verified software repository. Our method to mine Petri net models automatically from provenance offers a new approach to build scientific workflows. Our dynamic prediction tool, named McPatom, can predict several known bugs in real world systems including one that evades several other existing tools. McPatom is efficient and scalable as it takes advantage of the nature of atomicity violations and considers only a pair of threads and accesses to a single shared variable at one time. However, predictive tools need to consider the tradeoffs between precision and coverage. Based on McPatom, this dissertation presents two methods for improving the coverage and precision of atomicity violation predictions: 1) a post-prediction analysis method to increase coverage while ensuring precision; 2) a follow-up replaying method to further increase coverage. Both methods are implemented in a completely automatic tool.
Resumo:
Software engineering researchers are challenged to provide increasingly more powerful levels of abstractions to address the rising complexity inherent in software solutions. One new development paradigm that places models as abstraction at the forefront of the development process is Model-Driven Software Development (MDSD). MDSD considers models as first class artifacts, extending the capability for engineers to use concepts from the problem domain of discourse to specify apropos solutions. A key component in MDSD is domain-specific modeling languages (DSMLs) which are languages with focused expressiveness, targeting a specific taxonomy of problems. The de facto approach used is to first transform DSML models to an intermediate artifact in a HLL e.g., Java or C++, then execute that resulting code.^ Our research group has developed a class of DSMLs, referred to as interpreted DSMLs (i-DSMLs), where models are directly interpreted by a specialized execution engine with semantics based on model changes at runtime. This execution engine uses a layered architecture and is referred to as a domain-specific virtual machine (DSVM). As the domain-specific model being executed descends the layers of the DSVM the semantic gap between the user-defined model and the services being provided by the underlying infrastructure is closed. The focus of this research is the synthesis engine, the layer in the DSVM which transforms i-DSML models into executable scripts for the next lower layer to process.^ The appeal of an i-DSML is constrained as it possesses unique semantics contained within the DSVM. Existing DSVMs for i-DSMLs exhibit tight coupling between the implicit model of execution and the semantics of the domain, making it difficult to develop DSVMs for new i-DSMLs without a significant investment in resources.^ At the onset of this research only one i-DSML had been created for the user- centric communication domain using the aforementioned approach. This i-DSML is the Communication Modeling Language (CML) and its DSVM is the Communication Virtual machine (CVM). A major problem with the CVM's synthesis engine is that the domain-specific knowledge (DSK) and the model of execution (MoE) are tightly interwoven consequently subsequent DSVMs would need to be developed from inception with no reuse of expertise.^ This dissertation investigates how to decouple the DSK from the MoE and subsequently producing a generic model of execution (GMoE) from the remaining application logic. This GMoE can be reused to instantiate synthesis engines for DSVMs in other domains. The generalized approach to developing the model synthesis component of i-DSML interpreters utilizes a reusable framework loosely coupled to DSK as swappable framework extensions.^ This approach involves first creating an i-DSML and its DSVM for a second do- main, demand-side smartgrid, or microgrid energy management, and designing the synthesis engine so that the DSK and MoE are easily decoupled. To validate the utility of the approach, the SEs are instantiated using the GMoE and DSKs of the two aforementioned domains and an empirical study to support our claim of reduced developmental effort is performed.^
Resumo:
The advent of the Auger Engineering Radio Array (AERA) necessitates the development of a powerful framework for the analysis of radio measurements of cosmic ray air showers. As AERA performs "radio-hybrid" measurements of air shower radio emission in coincidence with the surface particle detectors and fluorescence telescopes of the Pierre Auger Observatory, the radio analysis functionality had to be incorporated in the existing hybrid analysis solutions for fluorescence and surface detector data. This goal has been achieved in a natural way by extending the existing Auger Offline software framework with radio functionality. In this article, we lay out the design, highlights and features of the radio extension implemented in the Auger Offline framework. Its functionality has achieved a high degree of sophistication and offers advanced features such as vectorial reconstruction of the electric field, advanced signal processing algorithms, a transparent and efficient handling of FFTs, a very detailed simulation of detector effects, and the read-in of multiple data formats including data from various radio simulation codes. The source code of this radio functionality can be made available to interested parties on request. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Nucleic Acid hairpins have been a subject of study for the last four decades. They are composed of single strand that is
hybridized to itself, and the central section forming an unhybridized loop. In nature, they stabilize single stranded RNA, serve as nucleation
sites for RNA folding, protein recognition signals, mRNA localization and regulation of mRNA degradation. On the other hand,
DNA hairpins in biological contexts have been studied with respect to forming cruciform structures that can regulate gene expression.
The use of DNA hairpins as fuel for synthetic molecular devices, including locomotion, was proposed and experimental demonstrated in 2003. They
were interesting because they bring to the table an on-demand energy/information supply mechanism.
The energy/information is hidden (from hybridization) in the hairpin’s loop, until required.
The energy/information is harnessed by opening the stem region, and exposing the single stranded loop section.
The loop region is now free for possible hybridization and help move the system into a thermodynamically favourable state.
The hidden energy and information coupled with
programmability provides another functionality, of selectively choosing what reactions to hide and
what reactions to allow to proceed, that helps develop a topological sequence of events.
Hairpins have been utilized as a source of fuel for many different DNA devices. In this thesis, we program four different
molecular devices using DNA hairpins, and experimentally validate them in the
laboratory. 1) The first device: A
novel enzyme-free autocatalytic self-replicating system composed entirely of DNA that operates isothermally. 2) The second
device: Time-Responsive Circuits using DNA have two properties: a) asynchronous: the final output is always correct
regardless of differences in the arrival time of different inputs.
b) renewable circuits which can be used multiple times without major degradation of the gate motifs
(so if the inputs change over time, the DNA-based circuit can re-compute the output correctly based on the new inputs).
3) The third device: Activatable tiles are a theoretical extension to the Tile assembly model that enhances
its robustness by protecting the sticky sides of tiles until a tile is partially incorporated into a growing assembly.
4) The fourth device: Controlled Amplification of DNA catalytic system: a device such that the amplification
of the system does not run uncontrollably until the system runs out of fuel, but instead achieves a finite
amount of gain.
Nucleic acid circuits with the ability
to perform complex logic operations have many potential practical applications, for example the ability to achieve point of care diagnostics.
We discuss the designs of our DNA Hairpin molecular devices, the results we have obtained, and the challenges we have overcome
to make these truly functional.
Resumo:
Subspaces and manifolds are two powerful models for high dimensional signals. Subspaces model linear correlation and are a good fit to signals generated by physical systems, such as frontal images of human faces and multiple sources impinging at an antenna array. Manifolds model sources that are not linearly correlated, but where signals are determined by a small number of parameters. Examples are images of human faces under different poses or expressions, and handwritten digits with varying styles. However, there will always be some degree of model mismatch between the subspace or manifold model and the true statistics of the source. This dissertation exploits subspace and manifold models as prior information in various signal processing and machine learning tasks.
A near-low-rank Gaussian mixture model measures proximity to a union of linear or affine subspaces. This simple model can effectively capture the signal distribution when each class is near a subspace. This dissertation studies how the pairwise geometry between these subspaces affects classification performance. When model mismatch is vanishingly small, the probability of misclassification is determined by the product of the sines of the principal angles between subspaces. When the model mismatch is more significant, the probability of misclassification is determined by the sum of the squares of the sines of the principal angles. Reliability of classification is derived in terms of the distribution of signal energy across principal vectors. Larger principal angles lead to smaller classification error, motivating a linear transform that optimizes principal angles. This linear transformation, termed TRAIT, also preserves some specific features in each class, being complementary to a recently developed Low Rank Transform (LRT). Moreover, when the model mismatch is more significant, TRAIT shows superior performance compared to LRT.
The manifold model enforces a constraint on the freedom of data variation. Learning features that are robust to data variation is very important, especially when the size of the training set is small. A learning machine with large numbers of parameters, e.g., deep neural network, can well describe a very complicated data distribution. However, it is also more likely to be sensitive to small perturbations of the data, and to suffer from suffer from degraded performance when generalizing to unseen (test) data.
From the perspective of complexity of function classes, such a learning machine has a huge capacity (complexity), which tends to overfit. The manifold model provides us with a way of regularizing the learning machine, so as to reduce the generalization error, therefore mitigate overfiting. Two different overfiting-preventing approaches are proposed, one from the perspective of data variation, the other from capacity/complexity control. In the first approach, the learning machine is encouraged to make decisions that vary smoothly for data points in local neighborhoods on the manifold. In the second approach, a graph adjacency matrix is derived for the manifold, and the learned features are encouraged to be aligned with the principal components of this adjacency matrix. Experimental results on benchmark datasets are demonstrated, showing an obvious advantage of the proposed approaches when the training set is small.
Stochastic optimization makes it possible to track a slowly varying subspace underlying streaming data. By approximating local neighborhoods using affine subspaces, a slowly varying manifold can be efficiently tracked as well, even with corrupted and noisy data. The more the local neighborhoods, the better the approximation, but the higher the computational complexity. A multiscale approximation scheme is proposed, where the local approximating subspaces are organized in a tree structure. Splitting and merging of the tree nodes then allows efficient control of the number of neighbourhoods. Deviation (of each datum) from the learned model is estimated, yielding a series of statistics for anomaly detection. This framework extends the classical {\em changepoint detection} technique, which only works for one dimensional signals. Simulations and experiments highlight the robustness and efficacy of the proposed approach in detecting an abrupt change in an otherwise slowly varying low-dimensional manifold.