958 resultados para multivariate binary data
Resumo:
Recent empirical studies have shown that multi-angle spectral data can be useful for predicting canopy height, but the physical reason for this correlation was not understood. We follow the concept of canopy spectral invariants, specifically escape probability, to gain insight into the observed correlation. Airborne Multi-Angle Imaging Spectrometer (AirMISR) and airborne Laser Vegetation Imaging Sensor (LVIS) data acquired during a NASA Terrestrial Ecology Program aircraft campaign underlie our analysis. Two multivariate linear regression models were developed to estimate LVIS height measures from 28 AirMISR multi-angle spectral reflectances and from the spectrally invariant escape probability at 7 AirMISR view angles. Both models achieved nearly the same accuracy, suggesting that canopy spectral invariant theory can explain the observed correlation. We hypothesize that the escape probability is sensitive to the aspect ratio (crown diameter to crown height). The multi-angle spectral data alone therefore may not provide enough information to retrieve canopy height globally.
Resumo:
The purpose of this lecture is to review recent development in data analysis, initialization and data assimilation. The development of 3-dimensional multivariate schemes has been very timely because of its suitability to handle the many different types of observations during FGGE. Great progress has taken place in the initialization of global models by the aid of non-linear normal mode technique. However, in spite of great progress, several fundamental problems are still unsatisfactorily solved. Of particular importance is the question of the initialization of the divergent wind fields in the Tropics and to find proper ways to initialize weather systems driven by non-adiabatic processes. The unsatisfactory ways in which such processes are being initialized are leading to excessively long spin-up times.
Resumo:
We discuss the modeling of dielectric responses of electromagnetically excited networks which are composed of a mixture of capacitors and resistors. Such networks can be employed as lumped-parameter circuits to model the response of composite materials containing conductive and insulating grains. The dynamics of the excited network systems are studied using a state space model derived from a randomized incidence matrix. Time and frequency domain responses from synthetic data sets generated from state space models are analyzed for the purpose of estimating the fraction of capacitors in the network. Good results were obtained by using either the time-domain response to a pulse excitation or impedance data at selected frequencies. A chemometric framework based on a Successive Projections Algorithm (SPA) enables the construction of multiple linear regression (MLR) models which can efficiently determine the ratio of conductive to insulating components in composite material samples. The proposed method avoids restrictions commonly associated with Archie’s law, the application of percolation theory or Kohlrausch-Williams-Watts models and is applicable to experimental results generated by either time domain transient spectrometers or continuous-wave instruments. Furthermore, it is quite generic and applicable to tomography, acoustics as well as other spectroscopies such as nuclear magnetic resonance, electron paramagnetic resonance and, therefore, should be of general interest across the dielectrics community.
Resumo:
Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element
Resumo:
The bewildering complexity of cortical microcircuits at the single cell level gives rise to surprisingly robust emergent activity patterns at the level of laminar and columnar local field potentials (LFPs) in response to targeted local stimuli. Here we report the results of our multivariate data-analytic approach based on simultaneous multi-site recordings using micro-electrode-array chips for investigation of the microcircuitary of rat somatosensory (barrel) cortex. We find high repeatability of stimulus-induced responses, and typical spatial distributions of LFP responses to stimuli in supragranular, granular, and infragranular layers, where the last form a particularly distinct class. Population spikes appear to travel with about 33 cm/s from granular to infragranular layers. Responses within barrel related columns have different profiles than those in neighbouring columns to the left or interchangeably to the right. Variations between slices occur, but can be minimized by strictly obeying controlled experimental protocols. Cluster analysis on normalized recordings indicates specific spatial distributions of time series reflecting the location of sources and sinks independent of the stimulus layer. Although the precise correspondences between single cell activity and LFPs are still far from clear, a sophisticated neuroinformatics approach in combination with multi-site LFP recordings in the standardized slice preparation is suitable for comparing normal conditions to genetically or pharmacologically altered situations based on real cortical microcircuitry.
Resumo:
An analysis method for diffusion tensor (DT) magnetic resonance imaging data is described, which, contrary to the standard method (multivariate fitting), does not require a specific functional model for diffusion-weighted (DW) signals. The method uses principal component analysis (PCA) under the assumption of a single fibre per pixel. PCA and the standard method were compared using simulations and human brain data. The two methods were equivalent in determining fibre orientation. PCA-derived fractional anisotropy and DT relative anisotropy had similar signal-to-noise ratio (SNR) and dependence on fibre shape. PCA-derived mean diffusivity had similar SNR to the respective DT scalar, and it depended on fibre anisotropy. Appropriate scaling of the PCA measures resulted in very good agreement between PCA and DT maps. In conclusion, the assumption of a specific functional model for DW signals is not necessary for characterization of anisotropic diffusion in a single fibre.
Resumo:
We examine how the accuracy of real-time forecasts from models that include autoregressive terms can be improved by estimating the models on ‘lightly revised’ data instead of using data from the latest-available vintage. The benefits of estimating autoregressive models on lightly revised data are related to the nature of the data revision process and the underlying process for the true values. Empirically, we find improvements in root mean square forecasting error of 2–4% when forecasting output growth and inflation with univariate models, and of 8% with multivariate models. We show that multiple-vintage models, which explicitly model data revisions, require large estimation samples to deliver competitive forecasts. Copyright © 2012 John Wiley & Sons, Ltd.
Resumo:
Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.
Resumo:
Recent studies showed that features extracted from brain MRIs can well discriminate Alzheimer’s disease from Mild Cognitive Impairment. This study provides an algorithm that sequentially applies advanced feature selection methods for findings the best subset of features in terms of binary classification accuracy. The classifiers that provided the highest accuracies, have been then used for solving a multi-class problem by the one-versus-one strategy. Although several approaches based on Regions of Interest (ROIs) extraction exist, the prediction power of features has not yet investigated by comparing filter and wrapper techniques. The findings of this work suggest that (i) the IntraCranial Volume (ICV) normalization can lead to overfitting and worst the accuracy prediction of test set and (ii) the combined use of a Random Forest-based filter with a Support Vector Machines-based wrapper, improves accuracy of binary classification.
Resumo:
We consider the forecasting of macroeconomic variables that are subject to revisions, using Bayesian vintage-based vector autoregressions. The prior incorporates the belief that, after the first few data releases, subsequent ones are likely to consist of revisions that are largely unpredictable. The Bayesian approach allows the joint modelling of the data revisions of more than one variable, while keeping the concomitant increase in parameter estimation uncertainty manageable. Our model provides markedly more accurate forecasts of post-revision values of inflation than do other models in the literature.
Resumo:
Theory predicts the emergence of generalists in variable environments and antagonistic pleiotropy to favour specialists in constant environments, but empirical data seldom support such generalist–specialist trade-offs. We selected for generalists and specialists in the dung fly Sepsis punctum (Diptera: Sepsidae) under conditions that we predicted would reveal antagonistic pleiotropy and multivariate trade-offs underlying thermal reaction norms for juvenile development. We performed replicated laboratory evolution using four treatments: adaptation at a hot (31 °C) or a cold (15 °C) temperature, or under regimes fluctuating between these temperatures, either within or between generations. After 20 generations, we assessed parental effects and genetic responses of thermal reaction norms for three correlated life-history traits: size at maturity, juvenile growth rate and juvenile survival. We find evidence for antagonistic pleiotropy for performance at hot and cold temperatures, and a temperature-mediated trade-off between juvenile survival and size at maturity, suggesting that trade-offs associated with environmental tolerance can arise via intensified evolutionary compromises between genetically correlated traits. However, despite this antagonistic pleiotropy, we found no support for the evolution of increased thermal tolerance breadth at the expense of reduced maximal performance, suggesting low genetic variance in the generalist–specialist dimension.
Resumo:
Liquid-liquid equilibrium experimental data for refined sunflower seed oil, artificially acidified with commercial oleic acid or commercial linoleic acid and a solvent (ethanol + water), were determined at 298.2 K. This set of experimental data and the experimental data from Cuevas et al.,(1) which were obtained from (283.2 to 333.2) K, for degummed sunflower seed oil-containing systems were correlated using NRTL and UNIQUAC models with temperature-dependent binary parameters. The deviation between experimental and calculated compositions presented average values of (1.13 and 1.41) % for NRTL and UNIQUAC equations, respectively, indicating that the models were able to correctly describe the behavior of compounds under different temperature and solvent hydration.
Resumo:
This paper presents a GIS-based multicriteria flood risk assessment and mapping approach applied to coastal drainage basins where hydrological data are not available. It involves risk to different types of possible processes: coastal inundation (storm surge), river, estuarine and flash flood, either at urban or natural areas, and fords. Based on the causes of these processes, several environmental indicators were taken to build-up the risk assessment. Geoindicators include geological-geomorphologic proprieties of Quaternary sedimentary units, water table, drainage basin morphometry, coastal dynamics, beach morphodynamics and microclimatic characteristics. Bioindicators involve coastal plain and low slope native vegetation categories and two alteration states. Anthropogenic indicators encompass land use categories properties such as: type, occupation density, urban structure type and occupation consolidation degree. The selected indicators were stored within an expert Geoenvironmental Information System developed for the State of Sao Paulo Coastal Zone (SIIGAL), which attributes were mathematically classified through deterministic approaches, in order to estimate natural susceptibilities (Sn), human-induced susceptibilities (Sa), return period of rain events (Ri), potential damages (Dp) and the risk classification (R), according to the equation R=(Sn.Sa.Ri).Dp. Thematic maps were automatically processed within the SIIGAL, in which automata cells (""geoenvironmental management units"") aggregating geological-geomorphologic and land use/native vegetation categories were the units of classification. The method has been applied to the Northern Littoral of the State of Sao Paulo (Brazil) in 32 small drainage basins, demonstrating to be very useful for coastal zone public politics, civil defense programs and flood management.
Resumo:
A joint transcriptomic and proteomic approach employing two-dimensional electrophoresis, liquid chromatography and mass spectrometry was carried out to identify peptides and proteins expressed by the venom gland of the snake Bothrops insularis, an endemic species of Queimada Grande Island, Brazil. Four protein families were mainly represented in processed spots, namely metalloproteinase, serine proteinase, phospholipase A(2) and lectin. Other represented families were growth factors, the developmental protein G10, a disintegrin and putative novel bradykinin-potentiating peptides. The enzymes were present in several isoforms. Most of the experimental data agreed with predicted values for isoelectric point and M(r) of proteins found in the transcriptome of the venom gland. The results also support the existence of posttranslational modifications and of proteolytic processing of precursor molecules which could lead to diverse multifunctional proteins. This study provides a preliminary reference map for proteins and peptides present in Bothrops insularis whole venom establishing the basis for comparative studies of other venom proteomes which could help the search for new drugs and the improvement of venom therapeutics. Altogether, our data point to the influence of transcriptional and post-translational events on the final venom composition and stress the need for a multivariate approach to snake venomics studies. (c) 2009 Elsevier B.V. All rights reserved.
Resumo:
In this paper, we introduce a Bayesian analysis for bioequivalence data assuming multivariate pharmacokinetic measures. With the introduction of correlation parameters between the pharmacokinetic measures or between the random effects in the bioequivalence models, we observe a good improvement in the bioequivalence results. These results are of great practical interest since they can yield higher accuracy and reliability for the bioequivalence tests, usually assumed by regulatory offices. An example is introduced to illustrate the proposed methodology by comparing the usual univariate bioequivalence methods with multivariate bioequivalence. We also consider some usual existing discrimination Bayesian methods to choose the best model to be used in bioequivalence studies.