999 resultados para Data amalgamation
Resumo:
Soil aggregation is an index of soil structure measured by mean weight diameter (MWD) or scaling factors often interpreted as fragmentation fractal dimensions (D-f). However, the MWD provides a biased estimate of soil aggregation due to spurious correlations among aggregate-size fractions and scale-dependency. The scale-invariant D-f is based on weak assumptions to allow particle counts and sensitive to the selection of the fractal domain, and may frequently exceed a value of 3, implying that D-f is a biased estimate of aggregation. Aggregation indices based on mass may be computed without bias using compositional analysis techniques. Our objective was to elaborate compositional indices of soil aggregation and to compare them to MWD and D-f using a published dataset describing the effect of 7 cropping systems on aggregation. Six aggregate-size fractions were arranged into a sequence of D-1 balances of building blocks that portray the process of soil aggregation. Isometric log-ratios (ilrs) are scale-invariant and orthogonal log contrasts or balances that possess the Euclidean geometry necessary to compute a distance between any two aggregation states, known as the Aitchison distance (A(x,y)). Close correlations (r>0.98) were observed between MWD, D-f, and the ilr when contrasting large and small aggregate sizes. Several unbiased embedded ilrs can characterize the heterogeneous nature of soil aggregates and be related to soil properties or functions. Soil bulk density and penetrater resistance were closely related to A(x,y) with reference to bare fallow. The A(x,y) is easy to implement as unbiased index of soil aggregation using standard sieving methods and may allow comparisons between studies. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Simpson's paradox, also known as amalgamation or aggregation paradox, appears whendealing with proportions. Proportions are by construction parts of a whole, which canbe interpreted as compositions assuming they only carry relative information. TheAitchison inner product space structure of the simplex, the sample space of compositions, explains the appearance of the paradox, given that amalgamation is a nonlinearoperation within that structure. Here we propose to use balances, which are specificelements of this structure, to analyse situations where the paradox might appear. Withthe proposed approach we obtain that the centre of the tables analysed is a naturalway to compare them, which avoids by construction the possibility of a paradox.Key words: Aitchison geometry, geometric mean, orthogonal projection
Resumo:
One of the tantalising remaining problems in compositional data analysis lies in how to deal with data sets in which there are components which are essential zeros. By anessential zero we mean a component which is truly zero, not something recorded as zero simply because the experimental design or the measuring instrument has not been sufficiently sensitive to detect a trace of the part. Such essential zeros occur inmany compositional situations, such as household budget patterns, time budgets,palaeontological zonation studies, ecological abundance studies. Devices such as nonzero replacement and amalgamation are almost invariably ad hoc and unsuccessful insuch situations. From consideration of such examples it seems sensible to build up amodel in two stages, the first determining where the zeros will occur and the secondhow the unit available is distributed among the non-zero parts. In this paper we suggest two such models, an independent binomial conditional logistic normal model and a hierarchical dependent binomial conditional logistic normal model. The compositional data in such modelling consist of an incidence matrix and a conditional compositional matrix. Interesting statistical problems arise, such as the question of estimability of parameters, the nature of the computational process for the estimation of both the incidence and compositional parameters caused by the complexity of the subcompositional structure, the formation of meaningful hypotheses, and the devising of suitable testing methodology within a lattice of such essential zero-compositional hypotheses. The methodology is illustrated by application to both simulated and real compositional data
Resumo:
One of the tantalising remaining problems in compositional data analysis lies in how to deal with data sets in which there are components which are essential zeros. By an essential zero we mean a component which is truly zero, not something recorded as zero simply because the experimental design or the measuring instrument has not been sufficiently sensitive to detect a trace of the part. Such essential zeros occur in many compositional situations, such as household budget patterns, time budgets, palaeontological zonation studies, ecological abundance studies. Devices such as nonzero replacement and amalgamation are almost invariably ad hoc and unsuccessful in such situations. From consideration of such examples it seems sensible to build up a model in two stages, the first determining where the zeros will occur and the second how the unit available is distributed among the non-zero parts. In this paper we suggest two such models, an independent binomial conditional logistic normal model and a hierarchical dependent binomial conditional logistic normal model. The compositional data in such modelling consist of an incidence matrix and a conditional compositional matrix. Interesting statistical problems arise, such as the question of estimability of parameters, the nature of the computational process for the estimation of both the incidence and compositional parameters caused by the complexity of the subcompositional structure, the formation of meaningful hypotheses, and the devising of suitable testing methodology within a lattice of such essential zero-compositional hypotheses. The methodology is illustrated by application to both simulated and real compositional data
Resumo:
Simpson's paradox, also known as amalgamation or aggregation paradox, appears when dealing with proportions. Proportions are by construction parts of a whole, which can be interpreted as compositions assuming they only carry relative information. The Aitchison inner product space structure of the simplex, the sample space of compositions, explains the appearance of the paradox, given that amalgamation is a nonlinear operation within that structure. Here we propose to use balances, which are specific elements of this structure, to analyse situations where the paradox might appear. With the proposed approach we obtain that the centre of the tables analysed is a natural way to compare them, which avoids by construction the possibility of a paradox. Key words: Aitchison geometry, geometric mean, orthogonal projection
Resumo:
The studied sector of the central Ribeira Fold Belt (SE Brazil) comprises metatexites, diatexites, charnockites and blastomylonites. This study integrates petrological and thermochronological data in order to constrain the thermotectonic and geodynamic evolution of this Neoproterozoic-Ordovician mobile belt during Western Gondwana amalgamation. New data indicate that after an earlier collision stage at similar to 610 Ma (zircon, U-Pb age), peak metamorphism and lower crust partial melting, coeval with the main regional high grade D(1) thrust deformation, occurred at 572-562 Ma (zircon, U-Pb ages). The overall average cooling rate was low (<5 degrees C/Ma) from 750 to 250 degrees C (at similar to 455 Ma; biotite-WR Rb-Sr age), but disparate cooling paths indicate differential uplift between distinct lithotypes: (a) metatexites and blastomylonites show a overall stable 3-5 degrees C/Ma cooling rate; (b) charnockites and associated rocks remained at T>650 degrees C during sub-horizontal D(2) shearing until similar to 510-470 Ma (garnet-WR Sm-Nd ages) (1-2 degrees C/Ma), being then rapidly exhumed/cooled (8-30 degrees C/Ma) during post-orogenic D(3) deformation with late granite emplacement at similar to 490 Ma (zircon, U-Pb age). Cooling rates based on garnet-biotite Fe-Mg diffusion are broadly consistent with the geochronological cooling rates: (a) metatexites were cooled faster at high temperatures (6 degrees C/Ma) and slowly at low temperatures (0.1 degrees C/Ma), decreasing cooling rates with time; (b) charnockites show low cooling rates (2 degrees C/Ma) near metamorphic peak conditions and high cooling rates (120 degrees C/Ma) at lower temperatures, increasing cooling rates during retrogression. The charnockite thermal evolution and the extensive production of granitoid melts in the area imply that high geothermal gradients were sustained fora long period of time (50-90 Ma). This thermal anomaly most likely reflects upwelling of asthenospheric mantle and magma underplating coupled with long-term generation of high HPE (heat producing elements) granitoids. These factors must have sustained elevated crustal geotherms for similar to 100 Ma, promoting widespread charnockite generation at middle to lower crustal levels. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
The Ibituruna quartz-syenite was emplaced as a sill in the Ribeira-Aracuai Neoproterozoic belt (Southeastern Brazil) during the last stages of the Gondwana supercontinent amalgamation. We have measured the Anisotropy of Magnetic Susceptibility (AMS) in samples from the Ibituruna sill to unravel its magnetic fabric that is regarded as a proxy for its magmatic fabric. A large magnetic anisotropy, dominantly due to magnetite, and a consistent magnetic fabric have been determined over the entire Ibituruna massif. The magmatic foliation and lineation are strikingly parallel to the solid-state mylonitic foliation and lineation measured in the country-rock. Altogether, these observations suggest that the Ibituruna sill was emplaced during the high temperature (similar to 750 degrees C) regional deformation and was deformed before full solidification coherently with its country-rock. Unexpectedly, geochronological data suggest a rather different conclusion. LA-ICP-MS and SHRIMP ages of zircons from the Ibituruna quartz-syenite are in the range 530-535 Ma and LA-ICP-MS ages of zircons and monazites from synkinematic leucocratic veins in the country-rocks suggest a crystallization at similar to 570-580 Ma, i.e., an HT deformation >35My older than the emplacement of the Ibituruna quartz-syenite. Conclusions from the structural and the geochronological studies are therefore conflicting. A possible explanation arises from (40)Ar-(39)Ar thermochronology. We have dated amphiboles from the quartz-syenite, and amphiboles and biotites from the country-rock. Together with the ages of monazites and zircons in the country-rock, (40)Ar-(39)Ar mineral ages suggest a very low cooling rate: <3 degrees C/My between 570 and similar to 500 Ma and similar to 5 degrees C/My between 500 and 460 Ma. Assuming a protracted regional deformation consistent over tens of My, under such stable thermal conditions the fabric and microstructure of deformed rocks may remain almost unchanged even if they underwent and recorded strain pulses separated by long periods of time. This may be a characteristic of slow cooling ""hot orogens"" that rocks deformed at significantly different periods during the orogeny, but under roughly unchanged temperature conditions, may display almost indiscernible microstructure and fabric. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
The interpretation of data on genetic variation with regard to the relative roles of different evolutionary factors that produce and maintain genetic variation depends critically on our assumptions concerning effective population size and the level of migration between neighboring populations. In humans, recent population growth and movements of specific ethnic groups across wide geographic areas mean that any theory based on assumptions of constant population size and absence of substructure is generally untenable. We examine the effects of population subdivision on the pattern of protein genetic variation in a total sample drawn from an artificial agglomerate of 12 tribal populations of Central and South America, analyzing the pooled sample as though it were a single population. Several striking findings emerge. (1) Mean heterozygosity is not sensitive to agglomeration, but the number of different alleles (allele count) is inflated, relative to neutral mutation/drift/equilibrium expectation. (2) The inflation is most serious for rare alleles, especially those which originally occurred as tribally restricted "private" polymorphisms. (3) The degree of inflation is an increasing function of both the number of populations encompassed by the sample and of the genetic divergence among them. (4) Treating an agglomerated population as though it were a panmictic unit of long standing can lead to serious biases in estimates of mutation rates, selection pressures, and effective population sizes. Current DNA studies indicate the presence of numerous genetic variants in human populations. The findings and conclusions of this paper are all fully applicable to the study of genetic variation at the DNA level as well.
Resumo:
High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.
Resumo:
The article seeks to investigate patterns of performance and relationships between grip strength, gait speed and self-rated health, and investigate the relationships between them, considering the variables of gender, age and family income. This was conducted in a probabilistic sample of community-dwelling elderly aged 65 and over, members of a population study on frailty. A total of 689 elderly people without cognitive deficit suggestive of dementia underwent tests of gait speed and grip strength. Comparisons between groups were based on low, medium and high speed and strength. Self-related health was assessed using a 5-point scale. The males and the younger elderly individuals scored significantly higher on grip strength and gait speed than the female and oldest did; the richest scored higher than the poorest on grip strength and gait speed; females and men aged over 80 had weaker grip strength and lower gait speed; slow gait speed and low income arose as risk factors for a worse health evaluation. Lower muscular strength affects the self-rated assessment of health because it results in a reduction in functional capacity, especially in the presence of poverty and a lack of compensatory factors.
Resumo:
Obstructive sleep apnea syndrome has a high prevalence among adults. Cephalometric variables can be a valuable method for evaluating patients with this syndrome. To correlate cephalometric data with the apnea-hypopnea sleep index. We performed a retrospective and cross-sectional study that analyzed the cephalometric data of patients followed in the Sleep Disorders Outpatient Clinic of the Discipline of Otorhinolaryngology of a university hospital, from June 2007 to May 2012. Ninety-six patients were included, 45 men, and 51 women, with a mean age of 50.3 years. A total of 11 patients had snoring, 20 had mild apnea, 26 had moderate apnea, and 39 had severe apnea. The distance from the hyoid bone to the mandibular plane was the only variable that showed a statistically significant correlation with the apnea-hypopnea index. Cephalometric variables are useful tools for the understanding of obstructive sleep apnea syndrome. The distance from the hyoid bone to the mandibular plane showed a statistically significant correlation with the apnea-hypopnea index.
Resumo:
In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.
Resumo:
To assess the completeness and reliability of the Information System on Live Births (Sinasc) data. A cross-sectional analysis of the reliability and completeness of Sinasc's data was performed using a sample of Live Birth Certificate (LBC) from 2009, related to births from Campinas, Southeast Brazil. For data analysis, hospitals were grouped according to category of service (Unified National Health System, private or both), 600 LBCs were randomly selected and the data were collected in LBC-copies through mothers and newborns' hospital records and by telephone interviews. The completeness of LBCs was evaluated, calculating the percentage of blank fields, and the LBCs agreement comparing the originals with the copies was evaluated by Kappa and intraclass correlation coefficients. The percentage of completeness of LBCs ranged from 99.8%-100%. For the most items, the agreement was excellent. However, the agreement was acceptable for marital status, maternal education and newborn infants' race/color, low for prenatal visits and presence of birth defects, and very low for the number of deceased children. The results showed that the municipality Sinasc is reliable for most of the studied variables. Investments in training of the professionals are suggested in an attempt to improve system capacity to support planning and implementation of health activities for the benefit of maternal and child population.
Resumo:
Often in biomedical research, we deal with continuous (clustered) proportion responses ranging between zero and one quantifying the disease status of the cluster units. Interestingly, the study population might also consist of relatively disease-free as well as highly diseased subjects, contributing to proportion values in the interval [0, 1]. Regression on a variety of parametric densities with support lying in (0, 1), such as beta regression, can assess important covariate effects. However, they are deemed inappropriate due to the presence of zeros and/or ones. To evade this, we introduce a class of general proportion density, and further augment the probabilities of zero and one to this general proportion density, controlling for the clustering. Our approach is Bayesian and presents a computationally convenient framework amenable to available freeware. Bayesian case-deletion influence diagnostics based on q-divergence measures are automatic from the Markov chain Monte Carlo output. The methodology is illustrated using both simulation studies and application to a real dataset from a clinical periodontology study.