914 resultados para high dimensional data, call detail records (CDR), wireless telecommunication industry


Relevância:

100.00% 100.00%

Publicador:

Resumo:

High-quality data about protein structures and their gene sequences are essential to the understanding of the relationship between protein folding and protein coding sequences. Firstly we constructed the EcoPDB database, which is a high-quality database of Escherichia coli genes and their corresponding PDB structures. Based on EcoPDB, we presented a novel approach based on information theory to investigate the correlation between cysteine synonymous codon usages and local amino acids flanking cysteines, the correlation between cysteine synonymous codon usages and synonymous codon usages of local amino acids flanking cysteines, as well as the correlation between cysteine synonymous codon usages and the disulfide bonding states of cysteines in the E. coli genome. The results indicate that the nearest neighboring residues and their synonymous codons of the C-terminus have the greatest influence on the usages of the synonymous codons of cysteines and the usage of the synonymous codons has a specific correlation with the disulfide bond formation of cysteines in proteins. The correlations may result from the regulation mechanism of protein structures at gene sequence level and reflect the biological function restriction that cysteines pair to form disulfide bonds. The results may also be helpful in identifying residues that are important for synonymous codon selection of cysteines to introduce disulfide bridges in protein engineering and molecular biology. The approach presented in this paper can also be utilized as a complementary computational method and be applicable to analyse the synonymous codon usages in other model organisms. (c) 2005 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present a novel indexing technique called Multi-scale Similarity Indexing (MSI) to index image's multi-features into a single one-dimensional structure. Both for text and visual feature spaces, the similarity between a point and a local partition's center in individual space is used as the indexing key, where similarity values in different features are distinguished by different scale. Then a single indexing tree can be built on these keys. Based on the property that relevant images have similar similarity values from the center of the same local partition in any feature space, certain number of irrelevant images can be fast pruned based on the triangle inequity on indexing keys. To remove the dimensionality curse existing in high dimensional structure, we propose a new technique called Local Bit Stream (LBS). LBS transforms image's text and visual feature representations into simple, uniform and effective bit stream (BS) representations based on local partition's center. Such BS representations are small in size and fast for comparison since only bit operation are involved. By comparing common bits existing in two BSs, most of irrelevant images can be immediately filtered. To effectively integrate multi-features, we also investigated the following evidence combination techniques-Certainty Factor, Dempster Shafer Theory, Compound Probability, and Linear Combination. Our extensive experiment showed that single one-dimensional index on multi-features improves multi-indices on multi-features greatly. Our LBS method outperforms sequential scan on high dimensional space by an order of magnitude. And Certainty Factor and Dempster Shafer Theory perform best in combining multiple similarities from corresponding multiple features.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complete rare earth element (except Eu) and Y concentrations from the estuarine mixing zone (salinity =0.2 to 33) of Elimbah Creek, Queensland, Australia, were measured by quadrupole ICP-MS without preconcentration. High sampling density in the low salinity regime along with high quality data allow accurate tracing of the development of the typical marine rare earth element anomalies as well as Y/Ho fractionation. Over the entire estuary, the rare earth elements are strongly removed relative to a freshwater endmember (60-80% removal). This large overall removal occurs despite a strong remineralisation peak (190% for La, 130% for Y relative to the freshwater endmember) in the mid-salinity zone. Removal and remineralisation are accompanied by fractionation of the original (freshwater) rare earth element pattern, resulting in light rare earth element depletion. Estuarine fractionation generates a large positive La anomaly and a superchondritic Y/Ho ratio. Conversely, we observe no evidence to support the generation of the negative Ce anomaly in the estuary. With the exception of Ce, the typical marine rare earth element features can thus be attributed to estuarine mixing processes. The persistence of these features in hydrogenous sediments for at least 3.71 Ga highlights the importance of estuarine processes for marine chemistry on geological timescales. (c) 2005 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we present an efficient k-Means clustering algorithm for two dimensional data. The proposed algorithm re-organizes dataset into a form of nested binary tree*. Data items are compared at each node with only two nearest means with respect to each dimension and assigned to the one that has the closer mean. The main intuition of our research is as follows: We build the nested binary tree. Then we scan the data in raster order by in-order traversal of the tree. Lastly we compare data item at each node to the only two nearest means to assign the value to the intendant cluster. In this way we are able to save the computational cost significantly by reducing the number of comparisons with means and also by the least use to Euclidian distance formula. Our results showed that our method can perform clustering operation much faster than the classical ones. © Springer-Verlag Berlin Heidelberg 2005

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With rapid advances in video processing technologies and ever fast increments in network bandwidth, the popularity of video content publishing and sharing has made similarity search an indispensable operation to retrieve videos of user interests. The video similarity is usually measured by the percentage of similar frames shared by two video sequences, and each frame is typically represented as a high-dimensional feature vector. Unfortunately, high complexity of video content has posed the following major challenges for fast retrieval: (a) effective and compact video representations, (b) efficient similarity measurements, and (c) efficient indexing on the compact representations. In this paper, we propose a number of methods to achieve fast similarity search for very large video database. First, each video sequence is summarized into a small number of clusters, each of which contains similar frames and is represented by a novel compact model called Video Triplet (ViTri). ViTri models a cluster as a tightly bounded hypersphere described by its position, radius, and density. The ViTri similarity is measured by the volume of intersection between two hyperspheres multiplying the minimal density, i.e., the estimated number of similar frames shared by two clusters. The total number of similar frames is then estimated to derive the overall similarity between two video sequences. Hence the time complexity of video similarity measure can be reduced greatly. To further reduce the number of similarity computations on ViTris, we introduce a new one dimensional transformation technique which rotates and shifts the original axis system using PCA in such a way that the original inter-distance between two high-dimensional vectors can be maximally retained after mapping. An efficient B+-tree is then built on the transformed one dimensional values of ViTris' positions. Such a transformation enables B+-tree to achieve its optimal performance by quickly filtering a large portion of non-similar ViTris. Our extensive experiments on real large video datasets prove the effectiveness of our proposals that outperform existing methods significantly.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present a novel indexing technique called Multi-scale Similarity Indexing (MSI) to index imagersquos multi-features into a single one-dimensional structure. Both for text and visual feature spaces, the similarity between a point and a local partitionrsquos center in individual space is used as the indexing key, where similarity values in different features are distinguished by different scale. Then a single indexing tree can be built on these keys. Based on the property that relevant images haves similar similarity values from the center of the same local partition in any feature space, certain number of irrelevant images can be fast pruned based on the triangle inequity on indexing keys. To remove the ldquodimensionality curserdquo existing in high dimensional structure, we propose a new technique called Local Bit Stream (LBS). LBS transforms imagersquos text and visual feature representations into simple, uniform and effective bit stream (BS) representations based on local partitionrsquos center. Such BS representations are small in size and fast for comparison since only bit operation are involved. By comparing common bits existing in two BSs, most of irrelevant images can be immediately filtered. Our extensive experiment showed that single one-dimensional index on multi-features improves multi-indices on multi-features greatly. Our LBS method outperforms sequential scan on high dimensional space by an order of magnitude.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Full-field Fourier-domain optical coherence tomography (3F-OCT) is a full-field version of spectraldomain/swept-source optical coherence tomography. A set of two-dimensional Fourier holograms is recorded at discrete wavenumbers spanning the swept-source tuning range. The resultant three-dimensional data cube contains comprehensive information on the three-dimensional morphological layout of the sample that can be reconstructed in software via three-dimensional discrete Fourier-transform. This method of recording of the OCT signal confers signal-to-noise ratio improvement in comparison with "flying-spot" time-domain OCT. The spatial resolution of the 3F-OCT reconstructed image, however, is degraded due to the presence of a phase cross-term, whose origin and effects are addressed in this paper. We present theoretical and experimental study of imaging performance of 3F-OCT, with particular emphasis on elimination of the deleterious effects of the phase cross-term.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper reassesses the role of climate as a factor shaping changes in settlement and landscape in the Swedish Iron Age (500 BC to AD 1050). Two reasons motivate this re-evaluation. First, high-resolution data based on climate proxies from the natural sciences are now increasingly available. Second, the climate-related social sciences have yielded conceptual and theoretical developments regarding vulnerability and adaptability in the present and recent past, creating new ways to analyse the effects of climatic versus societal factors on societies in the more distant past. Recent research in this field is evaluated and the explicitly climate deterministic standpoint of many recent natural science texts is criticized. Learning from recent approaches to climate change in the social sciences is crucial for understanding society–climate relationships in the past. The paper concludes that we are not yet in a position to fully evaluate the role of the new evidence of abrupt climate change in 850 BC, at the beginning of the Iron Age. Regarding the crisis in the mid first millennium AD, however, new climate data indicate that a dust veil in AD 536–537 might have aggravated the economic and societal crisis known from previous research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Using techniques from Statistical Physics, the annealed VC entropy for hyperplanes in high dimensional spaces is calculated as a function of the margin for a spherical Gaussian distribution of inputs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It has been argued that a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex data sets, and therefore a hierarchical visualization system is desirable. In this paper we extend an existing locally linear hierarchical visualization system PhiVis ¸iteBishop98a in several directions: bf(1) We allow for em non-linear projection manifolds. The basic building block is the Generative Topographic Mapping. bf(2) We introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree. General training equations are derived, regardless of the position of the model in the tree. bf(3) Using tools from differential geometry we derive expressions for local directional curvatures of the projection manifold. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. It enables the user to interactively highlight those data in the parent visualization plot which are captured by a child model. We also incorporate into our system a hierarchical, locally selective representation of magnification factors and directional curvatures of the projection manifolds. Such information is important for further refinement of the hierarchical visualization plot, as well as for controlling the amount of regularization imposed on the local models. We demonstrate the principle of the approach on a toy data set and apply our system to two more complex 12- and 19-dimensional data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It has been argued that a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex data sets, and therefore a hierarchical visualization system is desirable. In this paper we extend an existing locally linear hierarchical visualization system PhiVis ¸iteBishop98a in several directions: bf(1) We allow for em non-linear projection manifolds. The basic building block is the Generative Topographic Mapping (GTM). bf(2) We introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree. General training equations are derived, regardless of the position of the model in the tree. bf(3) Using tools from differential geometry we derive expressions for local directional curvatures of the projection manifold. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. It enables the user to interactively highlight those data in the ancestor visualization plots which are captured by a child model. We also incorporate into our system a hierarchical, locally selective representation of magnification factors and directional curvatures of the projection manifolds. Such information is important for further refinement of the hierarchical visualization plot, as well as for controlling the amount of regularization imposed on the local models. We demonstrate the principle of the approach on a toy data set and apply our system to two more complex 12- and 18-dimensional data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Are the learning procedures of genetic algorithms (GAs) able to generate optimal architectures for artificial neural networks (ANNs) in high frequency data? In this experimental study,GAs are used to identify the best architecture for ANNs. Additional learning is undertaken by the ANNs to forecast daily excess stock returns. No ANN architectures were able to outperform a random walk,despite the finding of non-linearity in the excess returns. This failure is attributed to the absence of suitable ANN structures and further implies that researchers need to be cautious when making inferences from ANN results that use high frequency data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose – The purpose of this paper is to investigate the impact of foreign exchange and interest rate changes on US banks’ stock returns. Design/methodology/approach – The approach employs an EGARCH model to account for the ARCH effects in daily returns. Most prior studies have used standard OLS estimation methods with the result that the presence of ARCH effects would have affected estimation efficiency. For comparative purposes, the standard OLS estimation method is also used to measure sensitivity. Findings – The findings are as follows: under the conditional t-distributional assumption, the EGARCH model generated a much better fit to the data although the goodness-of-fit of the model is not entirely satisfactory; the market index return accounts for most of the variation in stock returns at both the individual bank and portfolio levels; and the degree of sensitivity of the stock returns to interest rate and FX rate changes is not very pronounced despite the use of high frequency data. Earlier results had indicated that daily data provided greater evidence of exposure sensitivity. Practical implications – Assuming that banks do not hedge perfectly, these findings have important financial implications as they suggest that the hedging policies of the banks are not reflected in their stock prices. Alternatively, it is possible that different GARCH-type models might be more appropriate when modelling high frequency returns. Originality/value – The paper contributes to existing knowledge in the area by showing that ARCH effects do impact on measures of sensitivity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It has been argued that a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex data sets, and therefore a hierarchical visualization system is desirable. In this paper we extend an existing locally linear hierarchical visualization system PhiVis (Bishop98a) in several directions: 1. We allow for em non-linear projection manifolds. The basic building block is the Generative Topographic Mapping. 2. We introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree. General training equations are derived, regardless of the position of the model in the tree. 3. Using tools from differential geometry we derive expressions for local directionalcurvatures of the projection manifold. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. It enables the user to interactively highlight those data in the parent visualization plot which are captured by a child model.We also incorporate into our system a hierarchical, locally selective representation of magnification factors and directional curvatures of the projection manifolds. Such information is important for further refinement of the hierarchical visualization plot, as well as for controlling the amount of regularization imposed on the local models. We demonstrate the principle of the approach on a toy data set andapply our system to two more complex 12- and 19-dimensional data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis examines options for high capacity all optical networks. Specifically optical time division multiplexed (OTDM) networks based on electro-optic modulators are investigated experimentally, whilst comparisons with alternative approaches are carried out. It is intended that the thesis will form the basis of comparison between optical time division multiplexed networks and the more mature approach of wavelength division multiplexed networks. Following an introduction to optical networking concepts, the required component technologies are discussed. In particular various optical pulse sources are described with the demanding restrictions of optical multiplexing in mind. This is followed by a discussion of the construction of multiplexers and demultiplexers, including favoured techniques for high speed clock recovery. Theoretical treatments of the performance of Mach Zehnder and electroabsorption modulators support the design criteria that are established for the construction of simple optical time division multiplexed systems. Having established appropriate end terminals for an optical network, the thesis examines transmission issues associated with high speed RZ data signals. Propagation of RZ signals over both installed (standard fibre) and newly commissioned fibre routes are considered in turn. In the case of standard fibre systems, the use of dispersion compensation is summarised, and the application of mid span spectral inversion experimentally investigated. For green field sites, soliton like propagation of high speed data signals is demonstrated. In this case the particular restrictions of high speed soliton systems are discussed and experimentally investigated, namely the increasing impact of timing jitter and the downward pressure on repeater spacings due to the constraint of the average soliton model. These issues are each addressed through investigations of active soliton control for OTDM systems and through investigations of novel fibre types respectively. Finally the particularly remarkable networking potential of optical time division multiplexed systems is established, and infinite node cascadability using soliton control is demonstrated. A final comparison of the various technologies for optical multiplexing is presented in the conclusions, where the relative merits of the technologies for optical networking emerges as the key differentiator between technologies.