1000 resultados para phenotipic data
Resumo:
We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyse the ratios of the data values. The usual approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property. This weighted log-ratio analysis is theoretically equivalent to spectral mapping , a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modelling. The weighted log-ratio methodology is applied here to frequency data in linguistics and to chemical compositional data in archaeology.
Resumo:
This paper presents a method for the measurement of changes in health inequality and income-related health inequality over time in a population.For pure health inequality (as measured by the Gini coefficient) andincome-related health inequality (as measured by the concentration index),we show how measures derived from longitudinal data can be related tocross section Gini and concentration indices that have been typicallyreported in the literature to date, along with measures of health mobilityinspired by the literature on income mobility. We also show how thesemeasures of mobility can be usefully decomposed into the contributions ofdifferent covariates. We apply these methods to investigate the degree ofincome-related mobility in the GHQ measure of psychological well-being inthe first nine waves of the British Household Panel Survey (BHPS). Thisreveals that dynamics increase the absolute value of the concentrationindex of GHQ on income by 10%.
Resumo:
1. Aim - Concerns over how global change will influence species distributions, in conjunction with increased emphasis on understanding niche dynamics in evolutionary and community contexts, highlight the growing need for robust methods to quantify niche differences between or within taxa. We propose a statistical framework to describe and compare environmental niches from occurrence and spatial environmental data.¦2. Location - Europe, North America, South America¦3. Methods - The framework applies kernel smoothers to densities of species occurrence in gridded environmental space to calculate metrics of niche overlap and test hypotheses regarding niche conservatism. We use this framework and simulated species with predefined distributions and amounts of niche overlap to evaluate several ordination and species distribution modeling techniques for quantifying niche overlap. We illustrate the approach with data on two well-studied invasive species.¦4. Results - We show that niche overlap can be accurately detected with the framework when variables driving the distributions are known. The method is robust to known and previously undocumented biases related to the dependence of species occurrences on the frequency of environmental conditions that occur across geographic space. The use of a kernel smoother makes the process of moving from geographical space to multivariate environmental space independent of both sampling effort and arbitrary choice of resolution in environmental space. However, the use of ordination and species distribution model techniques for selecting, combining and weighting variables on which niche overlap is calculated provide contrasting results.¦5. Main conclusions - The framework meets the increasing need for robust methods to quantify niche differences. It is appropriate to study niche differences between species, subspecies or intraspecific lineages that differ in their geographical distributions. Alternatively, it can be used to measure the degree to which the environmental niche of a species or intraspecific lineage has changed over time.
Resumo:
Time-lapse geophysical monitoring and inversion are valuable tools in hydrogeology for monitoring changes in the subsurface due to natural and forced (tracer) dynamics. However, the resulting models may suffer from insufficient resolution, which leads to underestimated variability and poor mass recovery. Structural joint inversion using cross-gradient constraints can provide higher-resolution models compared with individual inversions and we present the first application to time-lapse data. The results from a synthetic and field vadose zone water tracer injection experiment show that joint 3-D time-lapse inversion of crosshole electrical resistance tomography (ERT) and ground penetrating radar (GPR) traveltime data significantly improve the imaged characteristics of the point injected plume, such as lateral spreading and center of mass, as well as the overall consistency between models. The joint inversion method appears to work well for cases when one hydrological state variable (in this case moisture content) controls the time-lapse response of both geophysical methods. Citation: Doetsch, J., N. Linde, and A. Binley (2010), Structural joint inversion of time-lapse crosshole ERT and GPR traveltime data, Geophys. Res. Lett., 37, L24404, doi: 10.1029/2010GL045482.
Resumo:
A biplot, which is the multivariate generalization of the two-variable scatterplot, can be used to visualize the results of many multivariate techniques, especially those that are based on the singular value decomposition. We consider data sets consisting of continuous-scale measurements, their fuzzy coding and the biplots that visualize them, using a fuzzy version of multiple correspondence analysis. Of special interest is the way quality of fit of the biplot is measured, since it is well-known that regular (i.e., crisp) multiple correspondence analysis seriously under-estimates this measure. We show how the results of fuzzy multiple correspondence analysis can be defuzzified to obtain estimated values of the original data, and prove that this implies an orthogonal decomposition of variance. This permits a measure of fit to be calculated in the familiar form of a percentage of explained variance, which is directly comparable to the corresponding fit measure used in principal component analysis of the original data. The approach is motivated initially by its application to a simulated data set, showing how the fuzzy approach can lead to diagnosing nonlinear relationships, and finally it is applied to a real set of meteorological data.
Resumo:
Many multivariate methods that are apparently distinct can be linked by introducing oneor more parameters in their definition. Methods that can be linked in this way arecorrespondence analysis, unweighted or weighted logratio analysis (the latter alsoknown as "spectral mapping"), nonsymmetric correspondence analysis, principalcomponent analysis (with and without logarithmic transformation of the data) andmultidimensional scaling. In this presentation I will show how several of thesemethods, which are frequently used in compositional data analysis, may be linkedthrough parametrizations such as power transformations, linear transformations andconvex linear combinations. Since the methods of interest here all lead to visual mapsof data, a "movie" can be made where where the linking parameter is allowed to vary insmall steps: the results are recalculated "frame by frame" and one can see the smoothchange from one method to another. Several of these "movies" will be shown, giving adeeper insight into the similarities and differences between these methods.
Resumo:
This paper presents a comparative analysis of linear and mixed modelsfor short term forecasting of a real data series with a high percentage of missing data. Data are the series of significant wave heights registered at regular periods of three hours by a buoy placed in the Bay of Biscay.The series is interpolated with a linear predictor which minimizes theforecast mean square error. The linear models are seasonal ARIMA models and themixed models have a linear component and a non linear seasonal component.The non linear component is estimated by a non parametric regression of dataversus time. Short term forecasts, no more than two days ahead, are of interestbecause they can be used by the port authorities to notice the fleet.Several models are fitted and compared by their forecasting behavior.
Resumo:
This paper examines factors explaining subcontracting decisions in the construction industry. Rather than the more common cross-sectional analyses, we use panel data to evaluate the influence of all relevant variables. We design and use a new index of the closeness to small numbers situations to estimate the extent of hold-up problems. Results show that as specificity grows, firms tend to subcontract less. The opposite happens when output heterogeneity and the use of intangible assets and capabilities increase. Neither temporary shortage of capacity nor geographical dispersion of activities seem to affect the extent of subcontracting. Finally, proxies for uncertainty do not show any clear effect.
Resumo:
Abstract Accurate characterization of the spatial distribution of hydrological properties in heterogeneous aquifers at a range of scales is a key prerequisite for reliable modeling of subsurface contaminant transport, and is essential for designing effective and cost-efficient groundwater management and remediation strategies. To this end, high-resolution geophysical methods have shown significant potential to bridge a critical gap in subsurface resolution and coverage between traditional hydrological measurement techniques such as borehole log/core analyses and tracer or pumping tests. An important and still largely unresolved issue, however, is how to best quantitatively integrate geophysical data into a characterization study in order to estimate the spatial distribution of one or more pertinent hydrological parameters, thus improving hydrological predictions. Recognizing the importance of this issue, the aim of the research presented in this thesis was to first develop a strategy for the assimilation of several types of hydrogeophysical data having varying degrees of resolution, subsurface coverage, and sensitivity to the hydrologic parameter of interest. In this regard a novel simulated annealing (SA)-based conditional simulation approach was developed and then tested in its ability to generate realizations of porosity given crosshole ground-penetrating radar (GPR) and neutron porosity log data. This was done successfully for both synthetic and field data sets. A subsequent issue that needed to be addressed involved assessing the potential benefits and implications of the resulting porosity realizations in terms of groundwater flow and contaminant transport. This was investigated synthetically assuming first that the relationship between porosity and hydraulic conductivity was well-defined. Then, the relationship was itself investigated in the context of a calibration procedure using hypothetical tracer test data. Essentially, the relationship best predicting the observed tracer test measurements was determined given the geophysically derived porosity structure. Both of these investigations showed that the SA-based approach, in general, allows much more reliable hydrological predictions than other more elementary techniques considered. Further, the developed calibration procedure was seen to be very effective, even at the scale of tomographic resolution, for predictions of transport. This also held true at locations within the aquifer where only geophysical data were available. This is significant because the acquisition of hydrological tracer test measurements is clearly more complicated and expensive than the acquisition of geophysical measurements. Although the above methodologies were tested using porosity logs and GPR data, the findings are expected to remain valid for a large number of pertinent combinations of geophysical and borehole log data of comparable resolution and sensitivity to the hydrological target parameter. Moreover, the obtained results allow us to have confidence for future developments in integration methodologies for geophysical and hydrological data to improve the 3-D estimation of hydrological properties.
Resumo:
This article reviews the methodology of the studies on drug utilization with particular emphasis on primary care. Population based studies of drug inappropriateness can be done with microdata from Health Electronic Records and e-prescriptions. Multilevel models estimate the influence of factors affecting the appropriateness of drug prescription at different hierarchical levels: patient, doctor, health care organization and regulatory environment.Work by the GIUMAP suggest that patient characteristics are the most important factor in the appropriateness of prescriptions with significant effects at the general practicioner level.