932 resultados para computation- and data-intensive applications


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We consider rank regression for clustered data analysis and investigate the induced smoothing method for obtaining the asymptotic covariance matrices of the parameter estimators. We prove that the induced estimating functions are asymptotically unbiased and the resulting estimators are strongly consistent and asymptotically normal. The induced smoothing approach provides an effective way for obtaining asymptotic covariance matrices for between- and within-cluster estimators and for a combined estimator to take account of within-cluster correlations. We also carry out extensive simulation studies to assess the performance of different estimators. The proposed methodology is substantially Much faster in computation and more stable in numerical results than the existing methods. We apply the proposed methodology to a dataset from a randomized clinical trial.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

K-means algorithm is a well known nonhierarchical method for clustering data. The most important limitations of this algorithm are that: (1) it gives final clusters on the basis of the cluster centroids or the seed points chosen initially, and (2) it is appropriate for data sets having fairly isotropic clusters. But this algorithm has the advantage of low computation and storage requirements. On the other hand, hierarchical agglomerative clustering algorithm, which can cluster nonisotropic (chain-like and concentric) clusters, requires high storage and computation requirements. This paper suggests a new method for selecting the initial seed points, so that theK-means algorithm gives the same results for any input data order. This paper also describes a hybrid clustering algorithm, based on the concepts of multilevel theory, which is nonhierarchical at the first level and hierarchical from second level onwards, to cluster data sets having (i) chain-like clusters and (ii) concentric clusters. It is observed that this hybrid clustering algorithm gives the same results as the hierarchical clustering algorithm, with less computation and storage requirements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research studied distributed computing of all-to-all comparison problems with big data sets. The thesis formalised the problem, and developed a high-performance and scalable computing framework with a programming model, data distribution strategies and task scheduling policies to solve the problem. The study considered storage usage, data locality and load balancing for performance improvement in solving the problem. The research outcomes can be applied in bioinformatics, biometrics and data mining and other domains in which all-to-all comparisons are a typical computing pattern.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Agricultural pests are responsible for millions of dollars in crop losses and management costs every year. In order to implement optimal site-specific treatments and reduce control costs, new methods to accurately monitor and assess pest damage need to be investigated. In this paper we explore the combination of unmanned aerial vehicles (UAV), remote sensing and machine learning techniques as a promising methodology to address this challenge. The deployment of UAVs as a sensor platform is a rapidly growing field of study for biosecurity and precision agriculture applications. In this experiment, a data collection campaign is performed over a sorghum crop severely damaged by white grubs (Coleoptera: Scarabaeidae). The larvae of these scarab beetles feed on the roots of plants, which in turn impairs root exploration of the soil profile. In the field, crop health status could be classified according to three levels: bare soil where plants were decimated, transition zones of reduced plant density and healthy canopy areas. In this study, we describe the UAV platform deployed to collect high-resolution RGB imagery as well as the image processing pipeline implemented to create an orthoimage. An unsupervised machine learning approach is formulated in order to create a meaningful partition of the image into each of the crop levels. The aim of this approach is to simplify the image analysis step by minimizing user input requirements and avoiding the manual data labelling necessary in supervised learning approaches. The implemented algorithm is based on the K-means clustering algorithm. In order to control high-frequency components present in the feature space, a neighbourhood-oriented parameter is introduced by applying Gaussian convolution kernels prior to K-means clustering. The results show the algorithm delivers consistent decision boundaries that classify the field into three clusters, one for each crop health level as shown in Figure 1. The methodology presented in this paper represents a venue for further esearch towards automated crop damage assessments and biosecurity surveillance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Network data packet capture and replay capabilities are basic requirements for forensic analysis of faults and security-related anomalies, as well as for testing and development. Cyber-physical networks, in which data packets are used to monitor and control physical devices, must operate within strict timing constraints, in order to match the hardware devices' characteristics. Standard network monitoring tools are unsuitable for such systems because they cannot guarantee to capture all data packets, may introduce their own traffic into the network, and cannot reliably reproduce the original timing of data packets. Here we present a high-speed network forensics tool specifically designed for capturing and replaying data traffic in Supervisory Control and Data Acquisition systems. Unlike general-purpose "packet capture" tools it does not affect the observed network's data traffic and guarantees that the original packet ordering is preserved. Most importantly, it allows replay of network traffic precisely matching its original timing. The tool was implemented by developing novel user interface and back-end software for a special-purpose network interface card. Experimental results show a clear improvement in data capture and replay capabilities over standard network monitoring methods and general-purpose forensics solutions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dispersing a data object into a set of data shares is an elemental stage in distributed communication and storage systems. In comparison to data replication, data dispersal with redundancy saves space and bandwidth. Moreover, dispersing a data object to distinct communication links or storage sites limits adversarial access to whole data and tolerates loss of a part of data shares. Existing data dispersal schemes have been proposed mostly based on various mathematical transformations on the data which induce high computation overhead. This paper presents a novel data dispersal scheme where each part of a data object is replicated, without encoding, into a subset of data shares according to combinatorial design theory. Particularly, data parts are mapped to points and data shares are mapped to lines of a projective plane. Data parts are then distributed to data shares using the point and line incidence relations in the plane so that certain subsets of data shares collectively possess all data parts. The presented scheme incorporates combinatorial design theory with inseparability transformation to achieve secure data dispersal at reduced computation, communication and storage costs. Rigorous formal analysis and experimental study demonstrate significant cost-benefits of the presented scheme in comparison to existing methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Acoustics is a rich source of environmental information that can reflect the ecological dynamics. To deal with the escalating acoustic data, a variety of automated classification techniques have been used for acoustic patterns or scene recognition, including urban soundscapes such as streets and restaurants; and natural soundscapes such as raining and thundering. It is common to classify acoustic patterns under the assumption that a single type of soundscapes present in an audio clip. This assumption is reasonable for some carefully selected audios. However, only few experiments have been focused on classifying simultaneous acoustic patterns in long-duration recordings. This paper proposes a binary relevance based multi-label classification approach to recognise simultaneous acoustic patterns in one-minute audio clips. By utilising acoustic indices as global features and multilayer perceptron as a base classifier, we achieve good classification performance on in-the-field data. Compared with single-label classification, multi-label classification approach provides more detailed information about the distributions of various acoustic patterns in long-duration recordings. These results will merit further biodiversity investigations, such as bird species surveys.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We deal with a single conservation law with discontinuous convex-concave type fluxes which arise while considering sign changing flux coefficients. The main difficulty is that a weak solution may not exist as the Rankine-Hugoniot condition at the interface may not be satisfied for certain choice of the initial data. We develop the concept of generalized entropy solutions for such equations by replacing the Rankine-Hugoniot condition by a generalized Rankine-Hugoniot condition. The uniqueness of solutions is shown by proving that the generalized entropy solutions form a contractive semi-group in L-1. Existence follows by showing that a Godunov type finite difference scheme converges to the generalized entropy solution. The scheme is based on solutions of the associated Riemann problem and is neither consistent nor conservative. The analysis developed here enables to treat the cases of fluxes having at most one extrema in the domain of definition completely. Numerical results reporting the performance of the scheme are presented. (C) 2006 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background Traffic offences have been considered an important predictor of crash involvement, and have often been used as a proxy safety variable for crashes. However the association between crashes and offences has never been meta-analysed and the population effect size never established. Research is yet to determine the extent to which this relationship may be spuriously inflated through systematic measurement error, with obvious implications for researchers endeavouring to accurately identify salient factors predictive of crashes. Methodology and Principal Findings Studies yielding a correlation between crashes and traffic offences were collated and a meta-analysis of 144 effects drawn from 99 road safety studies conducted. Potential impact of factors such as age, time period, crash and offence rates, crash severity and data type, sourced from either self-report surveys or archival records, were considered and discussed. After weighting for sample size, an average correlation of r = .18 was observed over the mean time period of 3.2 years. Evidence emerged suggesting the strength of this correlation is decreasing over time. Stronger correlations between crashes and offences were generally found in studies involving younger drivers. Consistent with common method variance effects, a within country analysis found stronger effect sizes in self-reported data even controlling for crash mean. Significance The effectiveness of traffic offences as a proxy for crashes may be limited. Inclusion of elements such as independently validated crash and offence histories or accurate measures of exposure to the road would facilitate a better understanding of the factors that influence crash involvement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The domination and Hamilton circuit problems are of interest both in algorithm design and complexity theory. The domination problem has applications in facility location and the Hamilton circuit problem has applications in routing problems in communications and operations research.The problem of deciding if G has a dominating set of cardinality at most k, and the problem of determining if G has a Hamilton circuit are NP-Complete. Polynomial time algorithms are, however, available for a large number of restricted classes. A motivation for the study of these algorithms is that they not only give insight into the characterization of these classes but also require a variety of algorithmic techniques and data structures. So the search for efficient algorithms, for these problems in many classes still continues.A class of perfect graphs which is practically important and mathematically interesting is the class of permutation graphs. The domination problem is polynomial time solvable on permutation graphs. Algorithms that are already available are of time complexity O(n2) or more, and space complexity O(n2) on these graphs. The Hamilton circuit problem is open for this class.We present a simple O(n) time and O(n) space algorithm for the domination problem on permutation graphs. Unlike the existing algorithms, we use the concept of geometric representation of permutation graphs. Further, exploiting this geometric notion, we develop an O(n2) time and O(n) space algorithm for the Hamilton circuit problem.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The accompanying collective research report is the result of the research project in 1986­90 between The Finnish Academy and the former Soviet Academy of Sciences. The project was organized around common field work in Finland and in the former Soviet Union and theoretical analyses of tree growth determining processes. Based on theoretical analyses, dynamic stand growth models were made and their parameters were determined utilizing the field results. Annual cycle affects the tree growth. Our theoretical approach was based on adaptation to local climate conditions from Lapland to South Russia. The initiation of growth was described as a simple low and high temperature accumulation driven model. Linking the theoretical model with long term temperature data allowed us to analyze what type of temperature response produced favorable outcome in different climates. Initiation of growth consumes the carbohydrate reserves in plants. We measured the dynamics of insoluble and soluble sugars in the very northern and Karelian conditions. Clear cyclical pattern was observed but the differences between locations were surprisingly small. Analysis of field measurements of CO2 exchange showed that irradiance is the dominating factor causing variation in photosynthetic rate in natural conditions during summer. The effect of other factors is so small that they can be omitted without any considerable loss of accuracy. A special experiment carried out in Hyytiälä showed that the needle living space, defined as the ratio between the shoot cylindric volume and needle surface area, correlates with the shoot photosynthesis. The penetration of irradiance into Scots pine canopy is a complicated phenomenon because of the movement of the sun on the sky and the complicated structure of branches and needles. A moderately simple but balanced forest radiation regime submodel was constructed. It consists of the tree crown and forest structure, the gap probability calculation and the consideration of spatial and temporal variation of radiation inside the forest. The common field excursions in different geographical regions resulted in a lot of experimental data of regularities of woody structures. The water transport seems to be a good common factor to analyse these properties of tree structure. There are evident regressions between cross-sectional areas measured at different locations along the water pathway from fine roots to needles. The observed regressions have clear geographical trends. For example, the same cross-sectional area can support three times higher needle mass in South Russia than in Lapland. Geographical trends can also be seen in shoot and needle structure. Analysis of data published by several Russian authors show, that one ton of needles transpire 42 ton of water a year. This annual amount of transpiration seems to be independent of geographical location, year and site conditions. The produced theoretical and experimental material is utilised in the development of stand growth model that describes the growth and development of Scots pine stands in Finland and the former Soviet Union. The core of the model is carbon and nutrient balances. This means that carbon obtained in photosynthesis is consumed for growth and maintenance and nutrients are taken according to the metabolic needs. The annual photosynthetic production by trees in the stand is determined as a function of irradiance and shading during the active period. The utilisation of the annual photosynthetic production to the growth of different components of trees is based on structural regularities. Since the fundamental metabolic processes are the same in all locations the same growth model structure can be applied in the large range of Scots pine. The annual photosynthetic production and structural regularities determining the allocation of resources have geographical features. The common field measurements enable the application of the model to the analysis of growth and development of stands growing on the five locations of experiments. The model enables the analysis of geographical differences in the growth of Scots pine. For example, the annual photosynthetic production of a 100-year-old stand at Voronez is 3.5 times higher than in Lapland. The share consumed to needle growth (30 %) and to growth of branches (5 %) seems to be the same in all locations. In contrast, the share of fine roots is decreasing when moving from north to south. It is 20 % in Lapland, 15 % in Hyytiälä Central Finland and Kentjärvi Karelia and 15 % in Voronez South Russia. The stem masses (115­113 ton/ha) are rather similar in Hyytiälä, Kentjärvi and Voronez, but rather low (50 ton/ha) in Lapland. In Voronez the height of the trees reach 29 m being in Hyytiälä and Kentjärvi 22 m and in Lapland only 14 m. The present approach enables utilization of structural and functional knowledge, gained in places of intensive research, in the analysis of growth and development of any stand. This opens new possibilities for growth research and also for applications in forestry practice.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The design of a dual-DSP microprocessor system and its application for parallel FFT and two-dimensional convolution are explained. The system is based on a master-salve configuration. Two ADSP-2101s are configured as slave processors and a PC/AT serves as the master. The master serves as a control processor to transfer the program code and data to the DSPs. The system architecture and the algorithms for the two applications, viz. FFT and two-dimensional convolutions, are discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Floquet analysis is widely used for small-order systems (say, order M < 100) to find trim results of control inputs and periodic responses, and stability results of damping levels and frequencies, Presently, however, it is practical neither for design applications nor for comprehensive analysis models that lead to large systems (M > 100); the run time on a sequential computer is simply prohibitive, Accordingly, a massively parallel Floquet analysis is developed with emphasis on large systems, and it is implemented on two SIMD or single-instruction, multiple-data computers with 4096 and 8192 processors, The focus of this development is a parallel shooting method with damped Newton iteration to generate trim results; the Floquet transition matrix (FTM) comes out as a byproduct, The eigenvalues and eigenvectors of the FTM are computed by a parallel QR method, and thereby stability results are generated, For illustration, flap and flap-lag stability of isolated rotors are treated by the parallel analysis and by a corresponding sequential analysis with the conventional shooting and QR methods; linear quasisteady airfoil aerodynamics and a finite-state three-dimensional wake model are used, Computational reliability is quantified by the condition numbers of the Jacobian matrices in Newton iteration, the condition numbers of the eigenvalues and the residual errors of the eigenpairs, and reliability figures are comparable in both the parallel and sequential analyses, Compared to the sequential analysis, the parallel analysis reduces the run time of large systems dramatically, and the reduction increases with increasing system order; this finding offers considerable promise for design and comprehensive-analysis applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The interest in low bit rate video coding has increased considerably. Despite rapid progress in storage density and digital communication system performance, demand for data-transmission bandwidth and storage capacity continue to exceed the capabilities of available technologies. The growth of data-intensive digital audio, video applications and the increased use of bandwidth-limited media such as video conferencing and full motion video have not only sustained the need for efficient ways to encode analog signals, but made signal compression central to digital communication and data-storage technology. In this paper we explore techniques for compression of image sequences in a manner that optimizes the results for the human receiver. We propose a new motion estimator using two novel block match algorithms which are based on human perception. Simulations with image sequences have shown an improved bit rate while maintaining ''image quality'' when compared to conventional motion estimation techniques using the MAD block match criteria.