987 resultados para data movement problem


Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we propose an approach which attempts to solve the problem of surveillance event detection, assuming that we know the definition of the events. To facilitate the discussion, we first define two concepts. The event of interest refers to the event that the user requests the system to detect; and the background activities are any other events in the video corpus. This is an unsolved problem due to many factors as listed below: 1) Occlusions and clustering: The surveillance scenes which are of significant interest at locations such as airports, railway stations, shopping centers are often crowded, where occlusions and clustering of people are frequently encountered. This significantly affects the feature extraction step, and for instance, trajectories generated by object tracking algorithms are usually not robust under such a situation. 2) The requirement for real time detection: The system should process the video fast enough in both of the feature extraction and the detection step to facilitate real time operation. 3) Massive size of the training data set: Suppose there is an event that lasts for 1 minute in a video with a frame rate of 25fps, the number of frames for this events is 60X25 = 1500. If we want to have a training data set with many positive instances of the event, the video is likely to be very large in size (i.e. hundreds of thousands of frames or more). How to handle such a large data set is a problem frequently encountered in this application. 4) Difficulty in separating the event of interest from background activities: The events of interest often co-exist with a set of background activities. Temporal groundtruth typically very ambiguous, as it does not distinguish the event of interest from a wide range of co-existing background activities. However, it is not practical to annotate the locations of the events in large amounts of video data. This problem becomes more serious in the detection of multi-agent interactions, since the location of these events can often not be constrained to within a bounding box. 5) Challenges in determining the temporal boundaries of the events: An event can occur at any arbitrary time with an arbitrary duration. The temporal segmentation of events is difficult and ambiguous, and also affected by other factors such as occlusions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Recent road safety statistics show that the decades-long fatalities decreasing trend is stopping and stagnating. Statistics further show that crashes are mostly driven by human error, compared to other factors such as environmental conditions and mechanical defects. Within human error, the dominant error source is perceptive errors, which represent about 50% of the total. The next two sources are interpretation and evaluation, which accounts together with perception for more than 75% of human error related crashes. Those statistics show that allowing drivers to perceive and understand their environment better, or supplement them when they are clearly at fault, is a solution to a good assessment of road risk, and, as a consequence, further decreasing fatalities. To answer this problem, currently deployed driving assistance systems combine more and more information from diverse sources (sensors) to enhance the driver's perception of their environment. However, because of inherent limitations in range and field of view, these systems' perception of their environment remains largely limited to a small interest zone around a single vehicle. Such limitations can be overcomed by increasing the interest zone through a cooperative process. Cooperative Systems (CS), a specific subset of Intelligent Transportation Systems (ITS), aim at compensating for local systems' limitations by associating embedded information technology and intervehicular communication technology (IVC). With CS, information sources are not limited to a single vehicle anymore. From this distribution arises the concept of extended or augmented perception. Augmented perception allows extending an actor's perceptive horizon beyond its "natural" limits not only by fusing information from multiple in-vehicle sensors but also information obtained from remote sensors. The end result of an augmented perception and data fusion chain is known as an augmented map. It is a repository where any relevant information about objects in the environment, and the environment itself, can be stored in a layered architecture. This thesis aims at demonstrating that augmented perception has better performance than noncooperative approaches, and that it can be used to successfully identify road risk. We found it was necessary to evaluate the performance of augmented perception, in order to obtain a better knowledge on their limitations. Indeed, while many promising results have already been obtained, the feasibility of building an augmented map from exchanged local perception information and, then, using this information beneficially for road users, has not been thoroughly assessed yet. The limitations of augmented perception, and underlying technologies, have not be thoroughly assessed yet. Most notably, many questions remain unanswered as to the IVC performance and their ability to deliver appropriate quality of service to support life-saving critical systems. This is especially true as the road environment is a complex, highly variable setting where many sources of imperfections and errors exist, not only limited to IVC. We provide at first a discussion on these limitations and a performance model built to incorporate them, created from empirical data collected on test tracks. Our results are more pessimistic than existing literature, suggesting IVC limitations have been underestimated. Then, we develop a new CS-applications simulation architecture. This architecture is used to obtain new results on the safety benefits of a cooperative safety application (EEBL), and then to support further study on augmented perception. At first, we confirm earlier results in terms of crashes numbers decrease, but raise doubts on benefits in terms of crashes' severity. In the next step, we implement an augmented perception architecture tasked with creating an augmented map. Our approach is aimed at providing a generalist architecture that can use many different types of sensors to create the map, and which is not limited to any specific application. The data association problem is tackled with an MHT approach based on the Belief Theory. Then, augmented and single-vehicle perceptions are compared in a reference driving scenario for risk assessment,taking into account the IVC limitations obtained earlier; we show their impact on the augmented map's performance. Our results show that augmented perception performs better than non-cooperative approaches, allowing to almost tripling the advance warning time before a crash. IVC limitations appear to have no significant effect on the previous performance, although this might be valid only for our specific scenario. Eventually, we propose a new approach using augmented perception to identify road risk through a surrogate: near-miss events. A CS-based approach is designed and validated to detect near-miss events, and then compared to a non-cooperative approach based on vehicles equiped with local sensors only. The cooperative approach shows a significant improvement in the number of events that can be detected, especially at the higher rates of system's deployment.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper addresses the problem of joint identification of infinite-frequency added mass and fluid memory models of marine structures from finite frequency data. This problem is relevant for cases where the code used to compute the hydrodynamic coefficients of the marine structure does not give the infinite-frequency added mass. This case is typical of codes based on 2D-potential theory since most 3D-potential-theory codes solve the boundary value associated with the infinite frequency. The method proposed in this paper presents a simpler alternative approach to other methods previously presented in the literature. The advantage of the proposed method is that the same identification procedure can be used to identify the fluid-memory models with or without having access to the infinite-frequency added mass coefficient. Therefore, it provides an extension that puts the two identification problems into the same framework. The method also exploits the constraints related to relative degree and low-frequency asymptotic values of the hydrodynamic coefficients derived from the physics of the problem, which are used as prior information to refine the obtained models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The development and maintenance of large and complex ontologies are often time-consuming and error-prone. Thus, automated ontology learning and revision have attracted intensive research interest. In data-centric applications where ontologies are designed or automatically learnt from the data, when new data instances are added that contradict to the ontology, it is often desirable to incrementally revise the ontology according to the added data. This problem can be intuitively formulated as the problem of revising a TBox by an ABox. In this paper we introduce a model-theoretic approach to such an ontology revision problem by using a novel alternative semantic characterisation of DL-Lite ontologies. We show some desired properties for our ontology revision. We have also developed an algorithm for reasoning with the ontology revision without computing the revision result. The algorithm is efficient as its computational complexity is in coNP in the worst case and in PTIME when the size of the new data is bounded.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The requirement of distributed computing of all-to-all comparison (ATAC) problems in heterogeneous systems is increasingly important in various domains. Though Hadoop-based solutions are widely used, they are inefficient for the ATAC pattern, which is fundamentally different from the MapReduce pattern for which Hadoop is designed. They exhibit poor data locality and unbalanced allocation of comparison tasks, particularly in heterogeneous systems. The results in massive data movement at runtime and ineffective utilization of computing resources, affecting the overall computing performance significantly. To address these problems, a scalable and efficient data and task distribution strategy is presented in this paper for processing large-scale ATAC problems in heterogeneous systems. It not only saves storage space but also achieves load balancing and good data locality for all comparison tasks. Experiments of bioinformatics examples show that about 89\% of the ideal performance capacity of the multiple machines have be achieved through using the approach presented in this paper.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Today's feature-rich multimedia products require embedded system solution with complex System-on-Chip (SoC) to meet market expectations of high performance at a low cost and lower energy consumption. The memory architecture of the embedded system strongly influences these parameters. Hence the embedded system designer performs a complete memory architecture exploration. This problem is a multi-objective optimization problem and can be tackled as a two-level optimization problem. The outer level explores various memory architecture while the inner level explores placement of data sections (data layout problem) to minimize memory stalls. Further, the designer would be interested in multiple optimal design points to address various market segments. However, tight time-to-market constraints enforces short design cycle time. In this paper we address the multi-level multi-objective memory architecture exploration problem through a combination of Multi-objective Genetic Algorithm (Memory Architecture exploration) and an efficient heuristic data placement algorithm. At the outer level the memory architecture exploration is done by picking memory modules directly from a ASIC memory Library. This helps in performing the memory architecture exploration in a integrated framework, where the memory allocation, memory exploration and data layout works in a tightly coupled way to yield optimal design points with respect to area, power and performance. We experimented our approach for 3 embedded applications and our approach explores several thousand memory architecture for each application, yielding a few hundred optimal design points in a few hours of computation time on a standard desktop.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper introduces a rule-based classification of single-word and compound verbs into a statistical machine translation approach. By substituting verb forms by the lemma of their head verb, the data sparseness problem caused by highly-inflected languages can be successfully addressed. On the other hand, the information of seen verb forms can be used to generate new translations for unseen verb forms. Translation results for an English to Spanish task are reported, producing a significant performance improvement.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Nonlinear multivariate statistical techniques on fast computers offer the potential to capture more of the dynamics of the high dimensional, noisy systems underlying financial markets than traditional models, while making fewer restrictive assumptions. This thesis presents a collection of practical techniques to address important estimation and confidence issues for Radial Basis Function networks arising from such a data driven approach, including efficient methods for parameter estimation and pruning, a pointwise prediction error estimator, and a methodology for controlling the "data mining'' problem. Novel applications in the finance area are described, including customized, adaptive option pricing and stock price prediction.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The growth of magnetron sputtered Co/Au and Pd/Co/Au superlattices on Au and Pd buffer layers, deposited onto glass substrates, has been monitored optically and magneto-optically in real time, using rotating analyser ellipsometry and Kerr polarimetry, at a wavelength of 633 nm. The magneto-optical traces, combined with ex situ and in situ hysteresis loops, provide a detailed and informative fingerprint of the optical and magnetic properties of the films as they evolve during growth. For Co/Au, oscillations in the polar magneto-optical effect developed during the deposition of An overlayers on Co and these may be attributed to quantum well states. However, the hysteresis measurements show that the magnetic field required to maintain saturation magnetization throughout the experiment was larger than available in situ, introducing a degree of confusion concerning the interpretation of the data. This problem was overcome by the incorporation of Pd layers into the Co/Au structure, thereby eliminating variation in magnetic orientation during growth of the Au layers as a contributory factor to the observations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The forest has a crucial ecological role and the continuous forest loss can cause colossal effects on the environment. As Armenia is one of the low forest covered countries in the world, this problem is more critical. Continuous forest disturbances mainly caused by illegal logging started from the early 1990s had a huge damage on the forest ecosystem by decreasing the forest productivity and making more areas vulnerable to erosion. Another aspect of the Armenian forest is the lack of continuous monitoring and absence of accurate estimation of the level of cuts in some years. In order to have insight about the forest and the disturbances in the long period of time we used Landsat TM/ETM + images. Google Earth Engine JavaScript API was used, which is an online tool enabling the access and analysis of a great amount of satellite imagery. To overcome the data availability problem caused by the gap in the Landsat series in 1988- 1998, extensive cloud cover in the study area and the missing scan lines, we used pixel based compositing for the temporal window of leaf on vegetation (June-late September). Subsequently, pixel based linear regression analyses were performed. Vegetation indices derived from the 10 biannual composites for the years 1984-2014 were used for trend analysis. In order to derive the disturbances only in forests, forest cover layer was aggregated and the original composites were masked. It has been found, that around 23% of forests were disturbed during the study period.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Study on variable stars is an important topic of modern astrophysics. After the invention of powerful telescopes and high resolving powered CCD’s, the variable star data is accumulating in the order of peta-bytes. The huge amount of data need lot of automated methods as well as human experts. This thesis is devoted to the data analysis on variable star’s astronomical time series data and hence belong to the inter-disciplinary topic, Astrostatistics. For an observer on earth, stars that have a change in apparent brightness over time are called variable stars. The variation in brightness may be regular (periodic), quasi periodic (semi-periodic) or irregular manner (aperiodic) and are caused by various reasons. In some cases, the variation is due to some internal thermo-nuclear processes, which are generally known as intrinsic vari- ables and in some other cases, it is due to some external processes, like eclipse or rotation, which are known as extrinsic variables. Intrinsic variables can be further grouped into pulsating variables, eruptive variables and flare stars. Extrinsic variables are grouped into eclipsing binary stars and chromospheri- cal stars. Pulsating variables can again classified into Cepheid, RR Lyrae, RV Tauri, Delta Scuti, Mira etc. The eruptive or cataclysmic variables are novae, supernovae, etc., which rarely occurs and are not periodic phenomena. Most of the other variations are periodic in nature. Variable stars can be observed through many ways such as photometry, spectrophotometry and spectroscopy. The sequence of photometric observa- xiv tions on variable stars produces time series data, which contains time, magni- tude and error. The plot between variable star’s apparent magnitude and time are known as light curve. If the time series data is folded on a period, the plot between apparent magnitude and phase is known as phased light curve. The unique shape of phased light curve is a characteristic of each type of variable star. One way to identify the type of variable star and to classify them is by visually looking at the phased light curve by an expert. For last several years, automated algorithms are used to classify a group of variable stars, with the help of computers. Research on variable stars can be divided into different stages like observa- tion, data reduction, data analysis, modeling and classification. The modeling on variable stars helps to determine the short-term and long-term behaviour and to construct theoretical models (for eg:- Wilson-Devinney model for eclips- ing binaries) and to derive stellar properties like mass, radius, luminosity, tem- perature, internal and external structure, chemical composition and evolution. The classification requires the determination of the basic parameters like pe- riod, amplitude and phase and also some other derived parameters. Out of these, period is the most important parameter since the wrong periods can lead to sparse light curves and misleading information. Time series analysis is a method of applying mathematical and statistical tests to data, to quantify the variation, understand the nature of time-varying phenomena, to gain physical understanding of the system and to predict future behavior of the system. Astronomical time series usually suffer from unevenly spaced time instants, varying error conditions and possibility of big gaps. This is due to daily varying daylight and the weather conditions for ground based observations and observations from space may suffer from the impact of cosmic ray particles. Many large scale astronomical surveys such as MACHO, OGLE, EROS, xv ROTSE, PLANET, Hipparcos, MISAO, NSVS, ASAS, Pan-STARRS, Ke- pler,ESA, Gaia, LSST, CRTS provide variable star’s time series data, even though their primary intention is not variable star observation. Center for Astrostatistics, Pennsylvania State University is established to help the astro- nomical community with the aid of statistical tools for harvesting and analysing archival data. Most of these surveys releases the data to the public for further analysis. There exist many period search algorithms through astronomical time se- ries analysis, which can be classified into parametric (assume some underlying distribution for data) and non-parametric (do not assume any statistical model like Gaussian etc.,) methods. Many of the parametric methods are based on variations of discrete Fourier transforms like Generalised Lomb-Scargle peri- odogram (GLSP) by Zechmeister(2009), Significant Spectrum (SigSpec) by Reegen(2007) etc. Non-parametric methods include Phase Dispersion Minimi- sation (PDM) by Stellingwerf(1978) and Cubic spline method by Akerlof(1994) etc. Even though most of the methods can be brought under automation, any of the method stated above could not fully recover the true periods. The wrong detection of period can be due to several reasons such as power leakage to other frequencies which is due to finite total interval, finite sampling interval and finite amount of data. Another problem is aliasing, which is due to the influence of regular sampling. Also spurious periods appear due to long gaps and power flow to harmonic frequencies is an inherent problem of Fourier methods. Hence obtaining the exact period of variable star from it’s time series data is still a difficult problem, in case of huge databases, when subjected to automation. As Matthew Templeton, AAVSO, states “Variable star data analysis is not always straightforward; large-scale, automated analysis design is non-trivial”. Derekas et al. 2007, Deb et.al. 2010 states “The processing of xvi huge amount of data in these databases is quite challenging, even when looking at seemingly small issues such as period determination and classification”. It will be beneficial for the variable star astronomical community, if basic parameters, such as period, amplitude and phase are obtained more accurately, when huge time series databases are subjected to automation. In the present thesis work, the theories of four popular period search methods are studied, the strength and weakness of these methods are evaluated by applying it on two survey databases and finally a modified form of cubic spline method is intro- duced to confirm the exact period of variable star. For the classification of new variable stars discovered and entering them in the “General Catalogue of Vari- able Stars” or other databases like “Variable Star Index“, the characteristics of the variability has to be quantified in term of variable star parameters.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we study two orthogonal extensions of the classical data mining problem of mining association rules, and show how they naturally interact. The first is the extension from a propositional representation to datalog, and the second is the condensed representation of frequent itemsets by means of Formal Concept Analysis (FCA). We combine the notion of frequent datalog queries with iceberg concept lattices (also called closed itemsets) of FCA and introduce two kinds of iceberg query lattices as condensed representations of frequent datalog queries. We demonstrate that iceberg query lattices provide a natural way to visualize relational association rules in a non-redundant way.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Representation error arises from the inability of the forecast model to accurately simulate the climatology of the truth. We present a rigorous framework for understanding this kind of error of representation. This framework shows that the lack of an inverse in the relationship between the true climatology (true attractor) and the forecast climatology (forecast attractor) leads to the error of representation. A new gain matrix for the data assimilation problem is derived that illustrates the proper approaches one may take to perform Bayesian data assimilation when the observations are of states on one attractor but the forecast model resides on another. This new data assimilation algorithm is the optimal scheme for the situation where the distributions on the true attractor and the forecast attractors are separately Gaussian and there exists a linear map between them. The results of this theory are illustrated in a simple Gaussian multivariate model.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Nonlinear data assimilation is high on the agenda in all fields of the geosciences as with ever increasing model resolution and inclusion of more physical (biological etc.) processes, and more complex observation operators the data-assimilation problem becomes more and more nonlinear. The suitability of particle filters to solve the nonlinear data assimilation problem in high-dimensional geophysical problems will be discussed. Several existing and new schemes will be presented and it is shown that at least one of them, the Equivalent-Weights Particle Filter, does indeed beat the curse of dimensionality and provides a way forward to solve the problem of nonlinear data assimilation in high-dimensional systems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

1. Comparative analyses are used to address the key question of what makes a species more prone to extinction by exploring the links between vulnerability and intrinsic species’ traits and/or extrinsic factors. This approach requires comprehensive species data but information is rarely available for all species of interest. As a result comparative analyses often rely on subsets of relatively few species that are assumed to be representative samples of the overall studied group. 2. Our study challenges this assumption and quantifies the taxonomic, spatial, and data type biases associated with the quantity of data available for 5415 mammalian species using the freely available life-history database PanTHERIA. 3. Moreover, we explore how existing biases influence results of comparative analyses of extinction risk by using subsets of data that attempt to correct for detected biases. In particular, we focus on links between four species’ traits commonly linked to vulnerability (distribution range area, adult body mass, population density and gestation length) and conduct univariate and multivariate analyses to understand how biases affect model predictions. 4. Our results show important biases in data availability with c.22% of mammals completely lacking data. Missing data, which appear to be not missing at random, occur frequently in all traits (14–99% of cases missing). Data availability is explained by intrinsic traits, with larger mammals occupying bigger range areas being the best studied. Importantly, we find that existing biases affect the results of comparative analyses by overestimating the risk of extinction and changing which traits are identified as important predictors. 5. Our results raise concerns over our ability to draw general conclusions regarding what makes a species more prone to extinction. Missing data represent a prevalent problem in comparative analyses, and unfortunately, because data are not missing at random, conventional approaches to fill data gaps, are not valid or present important challenges. These results show the importance of making appropriate inferences from comparative analyses by focusing on the subset of species for which data are available. Ultimately, addressing the data bias problem requires greater investment in data collection and dissemination, as well as the development of methodological approaches to effectively correct existing biases.