850 resultados para High-dimensional data visualization
Resumo:
Laminated sediment records from the oxygen minimum zone in the Arabian Sea offer unique ultrahigh-resolution archives for deciphering climate variability in the Arabian Sea region. Although numerous analytical techniques are available it has become increasingly popular during the past decade to analyze relative variations of sediment cores' chemical signature by non-destructive X-ray fluorescence (XRF) core scanning. We carefully selected an approximately 5 m long sediment core from the northern Arabian Sea (GeoB12309-5: 24°52.3' N; 62°59.9' E, 956 m water depth) for a detailed, comparative study of high-resolution techniques, namely non-destructive XRF core scanning (0.8 mm resolution) and ICP-MS/OES analysis on carefully selected, discrete samples (1 mm resolution). The aim of our study was to more precisely define suitable chemical elements that can be accurately analyzed and to determine which elemental ratios can be interpretated down to sub-millimeter-scale resolutions. Applying the Student's t-test our results show significantly correlating (1% significance level) elemental patterns for all S, Ca, Fe, Zr, Rb, and Sr, as well as the K/Ca, Fe/Ti and Ti/Al ratios that are all related to distinct lithological changes. After careful consideration of all errors for the ICP analysis we further provide respective factors of XRF Core Scanner software error's underestimation by applying Chi-square-tests, which is especially relevant for elements with high count rates. As demonstrated by these new, ultra-high resolution data core scanning has major advantages (high-speed, low costs, few sample preparation steps) and represents an increasingly required alternative over the time consuming, expensive, elaborative, and destructive wet chemical analyses (e.g., by ICP-MS/OES after acid digestions), and meanwhile also provides high-quality data in unprecedented resolution.
Resumo:
Pelagic sediments recording an extreme and short-lived global warming event, the Late Paleocene Thermal Maximum (LPTM), were recovered from Hole 999B (Colombian Basin) and Holes 1001A and 1001B (lower Nicaraguan Rise) in the Caribbean Sea during Ocean Drilling Program Leg 165. The LPTM consists of a 0.3-0.97 m calcareous claystone to claystone horizon. High-resolution downhole logging (Formation MicroScanner [FMS]), standard downhole logs (resistivity, velocity, density, natural gamma ray, and geochemical log), and non-destructive chemical and physical property (multisensor core logger [MSCL] and X-ray fluorescence [XRF] core scanner) data were used to identify composite sections from parallel holes and to record sedimentological and environmental changes associated with the LPTM. Downhole logging data indicate an abrupt and distinct difference in physical and chemical properties that extend for tens of meters above and below the LPTM. These observations indicate a rapid environmental change at the LPTM, which persists beyond the LPTM anomaly. Comparisons of gamma-ray attenuation porosity evaluator (GRAPE) densities from MSCL logging on split cores with FMS resistivity values allows core-to-log correlation with a high degree of accuracy. High-resolution magnetic susceptibility measurements of the cores are compared with elemental concentrations (e.g., Fe, Ca) analyzed by high-resolution XRF scanning. The high-resolution data obtained from several detailed core and downhole logging methods are the key to the construction of composite sections, the correlation of both adjacent holes and distant sites, and core-log integration. These continuous-depth series reveal the LPTM as a multiphase event with a nearly instantaneous onset, followed by a much different set of physical and chemical conditions of short duration, succeeded by a longer transition to a new, more permanent set of environmental circumstances. The estimated duration of these 'phases' are consistent with paleontological and isotopic studies of the LPTM
Resumo:
The recent development of in-situ monitoring devices, such as UV-spectrometers, makes the study of short-term stream chemistry variation relevant, especially the study of diurnal cycles, which are not yet fully understood. Our study is based on high-frequency data from an agricultural catchment (Studienlandschaft Schwingbachtal, Germany). We propose a novel approach, i.e. the combination of cluster analysis and Linear Discriminant Analysis, to mine from these data nitrate behavior patterns. As a result, we observe a seasonality of nitrate diurnal cycles, that differs from the most common cycle seasonality described in the literature, i.e. pre-dawn peaks in spring. Our cycles appear in summer and the maximum and minimum shift to a later time in late summer/autumn. This is observed both for water- and energy-limited years, thus potentially stressing the role of evapotranspiration. This concluding hypothesis on the role of evapotranspiration on nitrate stream concentration, which was obtained through data mining, broadens the perspective on the diurnal cycling of stream nitrate concentrations.
Resumo:
Evolutionary search algorithms have become an essential asset in the algorithmic toolbox for solving high-dimensional optimization problems in across a broad range of bioinformatics problems. Genetic algorithms, the most well-known and representative evolutionary search technique, have been the subject of the major part of such applications. Estimation of distribution algorithms (EDAs) offer a novel evolutionary paradigm that constitutes a natural and attractive alternative to genetic algorithms. They make use of a probabilistic model, learnt from the promising solutions, to guide the search process. In this paper, we set out a basic taxonomy of EDA techniques, underlining the nature and complexity of the probabilistic model of each EDA variant. We review a set of innovative works that make use of EDA techniques to solve challenging bioinformatics problems, emphasizing the EDA paradigm's potential for further research in this domain.
Resumo:
Probabilistic modeling is the de�ning characteristic of estimation of distribution algorithms (EDAs) which determines their behavior and performance in optimization. Regularization is a well-known statistical technique used for obtaining an improved model by reducing the generalization error of estimation, especially in high-dimensional problems. `1-regularization is a type of this technique with the appealing variable selection property which results in sparse model estimations. In this thesis, we study the use of regularization techniques for model learning in EDAs. Several methods for regularized model estimation in continuous domains based on a Gaussian distribution assumption are presented, and analyzed from di�erent aspects when used for optimization in a high-dimensional setting, where the population size of EDA has a logarithmic scale with respect to the number of variables. The optimization results obtained for a number of continuous problems with an increasing number of variables show that the proposed EDA based on regularized model estimation performs a more robust optimization, and is able to achieve signi�cantly better results for larger dimensions than other Gaussian-based EDAs. We also propose a method for learning a marginally factorized Gaussian Markov random �eld model using regularization techniques and a clustering algorithm. The experimental results show notable optimization performance on continuous additively decomposable problems when using this model estimation method. Our study also covers multi-objective optimization and we propose joint probabilistic modeling of variables and objectives in EDAs based on Bayesian networks, speci�cally models inspired from multi-dimensional Bayesian network classi�ers. It is shown that with this approach to modeling, two new types of relationships are encoded in the estimated models in addition to the variable relationships captured in other EDAs: objectivevariable and objective-objective relationships. An extensive experimental study shows the e�ectiveness of this approach for multi- and many-objective optimization. With the proposed joint variable-objective modeling, in addition to the Pareto set approximation, the algorithm is also able to obtain an estimation of the multi-objective problem structure. Finally, the study of multi-objective optimization based on joint probabilistic modeling is extended to noisy domains, where the noise in objective values is represented by intervals. A new version of the Pareto dominance relation for ordering the solutions in these problems, namely �-degree Pareto dominance, is introduced and its properties are analyzed. We show that the ranking methods based on this dominance relation can result in competitive performance of EDAs with respect to the quality of the approximated Pareto sets. This dominance relation is then used together with a method for joint probabilistic modeling based on `1-regularization for multi-objective feature subset selection in classi�cation, where six di�erent measures of accuracy are considered as objectives with interval values. The individual assessment of the proposed joint probabilistic modeling and solution ranking methods on datasets with small-medium dimensionality, when using two di�erent Bayesian classi�ers, shows that comparable or better Pareto sets of feature subsets are approximated in comparison to standard methods.
Resumo:
In recent future, wireless sensor networks (WSNs) will experience a broad high-scale deployment (millions of nodes in the national area) with multiple information sources per node, and with very specific requirements for signal processing. In parallel, the broad range deployment of WSNs facilitates the definition and execution of ambitious studies, with a large input data set and high computational complexity. These computation resources, very often heterogeneous and driven on-demand, can only be satisfied by high-performance Data Centers (DCs). The high economical and environmental impact of the energy consumption in DCs requires aggressive energy optimization policies. These policies have been already detected but not successfully proposed. In this context, this paper shows the following on-going research lines and obtained results. In the field of WSNs: energy optimization in the processing nodes from different abstraction levels, including reconfigurable application specific architectures, efficient customization of the memory hierarchy, energy-aware management of the wireless interface, and design automation for signal processing applications. In the field of DCs: energy-optimal workload assignment policies in heterogeneous DCs, resource management policies with energy consciousness, and efficient cooling mechanisms that will cooperate in the minimization of the electricity bill of the DCs that process the data provided by the WSNs.
Resumo:
Many existing engineering works model the statistical characteristics of the entities under study as normal distributions. These models are eventually used for decision making, requiring in practice the definition of the classification region corresponding to the desired confidence level. Surprisingly enough, however, a great amount of computer vision works using multidimensional normal models leave unspecified or fail to establish correct confidence regions due to misconceptions on the features of Gaussian functions or to wrong analogies with the unidimensional case. The resulting regions incur in deviations that can be unacceptable in high-dimensional models. Here we provide a comprehensive derivation of the optimal confidence regions for multivariate normal distributions of arbitrary dimensionality. To this end, firstly we derive the condition for region optimality of general continuous multidimensional distributions, and then we apply it to the widespread case of the normal probability density function. The obtained results are used to analyze the confidence error incurred by previous works related to vision research, showing that deviations caused by wrong regions may turn into unacceptable as dimensionality increases. To support the theoretical analysis, a quantitative example in the context of moving object detection by means of background modeling is given.
Resumo:
In recent future, wireless sensor networks ({WSNs}) will experience a broad high-scale deployment (millions of nodes in the national area) with multiple information sources per node, and with very specific requirements for signal processing. In parallel, the broad range deployment of {WSNs} facilitates the definition and execution of ambitious studies, with a large input data set and high computational complexity. These computation resources, very often heterogeneous and driven on-demand, can only be satisfied by high-performance Data Centers ({DCs}). The high economical and environmental impact of the energy consumption in {DCs} requires aggressive energy optimization policies. These policies have been already detected but not successfully proposed. In this context, this paper shows the following on-going research lines and obtained results. In the field of {WSNs}: energy optimization in the processing nodes from different abstraction levels, including reconfigurable application specific architectures, efficient customization of the memory hierarchy, energy-aware management of the wireless interface, and design automation for signal processing applications. In the field of {DCs}: energy-optimal workload assignment policies in heterogeneous {DCs}, resource management policies with energy consciousness, and efficient cooling mechanisms that will cooperate in the minimization of the electricity bill of the DCs that process the data provided by the WSNs.
Resumo:
This paper describes a novel approach to phonotactic LID, where instead of using soft-counts based on phoneme lattices, we use posteriogram to obtain n-gram counts. The high-dimensional vectors of counts are reduced to low-dimensional units for which we adapted the commonly used term i-vectors. The reduction is based on multinomial subspace modeling and is designed to work in the total-variability space. The proposed technique was tested on the NIST 2009 LRE set with better results to a system based on using soft-counts (Cavg on 30s: 3.15% vs 3.43%), and with very good results when fused with an acoustic i-vector LID system (Cavg on 30s acoustic 2.4% vs 1.25%). The proposed technique is also compared with another low dimensional projection system based on PCA. In comparison with the original soft-counts, the proposed technique provides better results, reduces the problems due to sparse counts, and avoids the process of using pruning techniques when creating the lattices.