943 resultados para MULTIVARIATE DISTRIBUTIONS
Resumo:
We present a new algorithm to compute the voxel-wise genetic contribution to brain fiber microstructure using diffusion tensor imaging (DTI) in a dataset of 25 monozygotic (MZ) twins and 25 dizygotic (DZ) twin pairs (100 subjects total). First, the structural and DT scans were linearly co-registered. Structural MR scans were nonlinearly mapped via a 3D fluid transformation to a geometrically centered mean template, and the deformation fields were applied to the DTI volumes. After tensor re-orientation to realign them to the anatomy, we computed several scalar and multivariate DT-derived measures including the geodesic anisotropy (GA), the tensor eigenvalues and the full diffusion tensors. A covariance-weighted distance was measured between twins in the Log-Euclidean framework [2], and used as input to a maximum-likelihood based algorithm to compute the contributions from genetics (A), common environmental factors (C) and unique environmental ones (E) to fiber architecture. Quanititative genetic studies can take advantage of the full information in the diffusion tensor, using covariance weighted distances and statistics on the tensor manifold.
Resumo:
Twin studies are a major research direction in imaging genetics, a new field, which combines algorithms from quantitative genetics and neuroimaging to assess genetic effects on the brain. In twin imaging studies, it is common to estimate the intraclass correlation (ICC), which measures the resemblance between twin pairs for a given phenotype. In this paper, we extend the commonly used Pearson correlation to a more appropriate definition, which uses restricted maximum likelihood methods (REML). We computed proportion of phenotypic variance due to additive (A) genetic factors, common (C) and unique (E) environmental factors using a new definition of the variance components in the diffusion tensor-valued signals. We applied our analysis to a dataset of Diffusion Tensor Images (DTI) from 25 identical and 25 fraternal twin pairs. Differences between the REML and Pearson estimators were plotted for different sample sizes, showing that the REML approach avoids severe biases when samples are smaller. Measures of genetic effects were computed for scalar and multivariate diffusion tensor derived measures including the geodesic anisotropy (tGA) and the full diffusion tensors (DT), revealing voxel-wise genetic contributions to brain fiber microstructure.
Resumo:
Several genetic variants are thought to influence white matter (WM) integrity, measured with diffusion tensor imaging (DTI). Voxel based methods can test genetic associations, but heavy multiple comparisons corrections are required to adjust for searching the whole brain and for all genetic variants analyzed. Thus, genetic associations are hard to detect even in large studies. Using a recently developed multi-SNP analysis, we examined the joint predictive power of a group of 18 cholesterol-related single nucleotide polymorphisms (SNPs) on WM integrity, measured by fractional anisotropy. To boost power, we limited the analysis to brain voxels that showed significant associations with total serum cholesterol levels. From this space, we identified two genes with effects that replicated in individual voxel-wise analyses of the whole brain. Multivariate analyses of genetic variants on a reduced anatomical search space may help to identify SNPs with strongest effects on the brain from a broad panel of genes.
Resumo:
This review is focused on the impact of chemometrics for resolving data sets collected from investigations of the interactions of small molecules with biopolymers. These samples have been analyzed with various instrumental techniques, such as fluorescence, ultraviolet–visible spectroscopy, and voltammetry. The impact of two powerful and demonstrably useful multivariate methods for resolution of complex data—multivariate curve resolution–alternating least squares (MCR–ALS) and parallel factor analysis (PARAFAC)—is highlighted through analysis of applications involving the interactions of small molecules with the biopolymers, serum albumin, and deoxyribonucleic acid. The outcomes illustrated that significant information extracted by the chemometric methods was unattainable by simple, univariate data analysis. In addition, although the techniques used to collect data were confined to ultraviolet–visible spectroscopy, fluorescence spectroscopy, circular dichroism, and voltammetry, data profiles produced by other techniques may also be processed. Topics considered including binding sites and modes, cooperative and competitive small molecule binding, kinetics, and thermodynamics of ligand binding, and the folding and unfolding of biopolymers. Applications of the MCR–ALS and PARAFAC methods reviewed were primarily published between 2008 and 2013.
Resumo:
Background Spatial analysis is increasingly important for identifying modifiable geographic risk factors for disease. However, spatial health data from surveys are often incomplete, ranging from missing data for only a few variables, to missing data for many variables. For spatial analyses of health outcomes, selection of an appropriate imputation method is critical in order to produce the most accurate inferences. Methods We present a cross-validation approach to select between three imputation methods for health survey data with correlated lifestyle covariates, using as a case study, type II diabetes mellitus (DM II) risk across 71 Queensland Local Government Areas (LGAs). We compare the accuracy of mean imputation to imputation using multivariate normal and conditional autoregressive prior distributions. Results Choice of imputation method depends upon the application and is not necessarily the most complex method. Mean imputation was selected as the most accurate method in this application. Conclusions Selecting an appropriate imputation method for health survey data, after accounting for spatial correlation and correlation between covariates, allows more complete analysis of geographic risk factors for disease with more confidence in the results to inform public policy decision-making.
Resumo:
As more raw sugar factories become involved in the manufacture of by-products and cogeneration, bagasse is becoming an increasingly valuable commodity. However, in most factories, most of the bagasse produced is used to generate steam in relatively old and inefficient boilers. Efficient bagasse fired boilers are a high capital cost item and the cost of supplying the steam required to run a sugar factory by other means is prohibitive. For many factories a more realistic way to reduce bagasse consumption is to increase the efficiency of existing boilers. The Farleigh No. 3 boiler is a relatively old low efficiency boiler. Like many in the industry, the performance of this boiler has been adversely affected by uneven gas and air flow distributions and air heater leaks. The combustion performance and efficiency of this boiler have been significantly improved by making the gas and air flow distributions through the boiler more uniform and repairing the air heater. The estimated bagasse savings easily justify the cost of the boiler improvements.
Resumo:
Cum ./LSTA_A_8828879_O_XML_IMAGES/LSTA_A_8828879_O_ILM0001.gif rule [Singh (1975)] has been suggested in the literature for finding approximately optimum strata boundaries for proportional allocation, when the stratification is done on the study variable. This paper shows that for the class of density functions arising from the Wang and Aggarwal (1984) representation of the Lorenz Curve (or DBV curves in case of inventory theory), the cum ./LSTA_A_8828879_O_XML_IMAGES/LSTA_A_8828879_O_ILM0002.gif rule in place of giving approximately optimum strata boundaries, yields exactly optimum boundaries. It is also shown that the conjecture of Mahalanobis (1952) “. . .an optimum or nearly optimum solutions will be obtained when the expected contribution of each stratum to the total aggregate value of Y is made equal for all strata” yields exactly optimum strata boundaries for the case considered in the paper.
Resumo:
Many active pharmaceutical ingredients (APIs) have both anhydrate and hydrate forms. Due to the different physicochemical properties of solid forms, the changes in solid-state may result in therapeutic, pharmaceutical, legal and commercial problems. In order to obtain good solid dosage form quality and performance, there is a constant need to understand and control these phase transitions during manufacturing and storage. Thus it is important to detect and also quantify the possible transitions between the different forms. In recent years, vibrational spectroscopy has become an increasingly popular tool to characterise the solid-state forms and their phase transitions. It offers several advantages over other characterisation techniques including an ability to obtain molecular level information, minimal sample preparation, and the possibility of monitoring changes non-destructively in-line. Dehydration is the phase transition of hydrates which is frequently encountered during the dosage form production and storage. The aim of the present thesis was to investigate the dehydration behaviour of diverse pharmaceutical hydrates by near infrared (NIR), Raman and terahertz pulsed spectroscopic (TPS) monitoring together with multivariate data analysis. The goal was to reveal new perspectives for investigation of the dehydration at the molecular level. Solid-state transformations were monitored during dehydration of diverse hydrates on hot-stage. The results obtained from qualitative experiments were used to develop a method and perform the quantification of the solid-state forms during process induced dehydration in a fluidised bed dryer. Both in situ and in-line process monitoring and quantification was performed. This thesis demonstrated the utility of vibrational spectroscopy techniques and multivariate modelling to monitor and investigate dehydration behaviour in situ and during fluidised bed drying. All three spectroscopic methods proved complementary in the study of dehydration. NIR spectroscopy models could quantify the solid-state forms in the binary system, but were unable to quantify all the forms in the quaternary system. Raman spectroscopy models on the other hand could quantify all four solid-state forms that appeared upon isothermal dehydration. The speed of spectroscopic methods makes them applicable for monitoring dehydration and the quantification of multiple forms was performed during phase transition. Thus the solid-state structure information at the molecular level was directly obtained. TPS detected the intermolecular phonon modes and Raman spectroscopy detected mostly the changes in intramolecular vibrations. Both techniques revealed information about the crystal structure changes. NIR spectroscopy, on the other hand was more sensitive to water content and hydrogen bonding environment of water molecules. This study provides a basis for real time process monitoring using vibrational spectroscopy during pharmaceutical manufacturing.
Resumo:
This study reports a diachronic corpus investigation of common-number pronouns used to convey unknown or otherwise unspecified reference. The study charts agreement patterns in these pronouns in various diachronic and synchronic corpora. The objective is to provide base-line data on variant frequencies and distributions in the history of English, as there are no previous systematic corpus-based observations on this topic. This study seeks to answer the questions of how pronoun use is linked with the overall typological development in English and how their diachronic evolution is embedded in the linguistic and social structures in which they are used. The theoretical framework draws on corpus linguistics and historical sociolinguistics, grammaticalisation, diachronic typology, and multivariate analysis of modelling sociolinguistic variation. The method employs quantitative corpus analyses from two main electronic corpora, one from Modern English and the other from Present-day English. The Modern English material is the Corpus of Early English Correspondence, and the time frame covered is 1500-1800. The written component of the British National Corpus is used in the Present-day English investigations. In addition, the study draws supplementary data from other electronic corpora. The material is used to compare the frequencies and distributions of common-number pronouns between these two time periods. The study limits the common-number uses to two subsystems, one anaphoric to grammatically singular antecedents and one cataphoric, in which the pronoun is followed by a relative clause. Various statistical tools are used to process the data, ranging from cross-tabulations to multivariate VARBRUL analyses in which the effects of sociolinguistic and systemic parameters are assessed to model their impact on the dependent variable. This study shows how one pronoun type has extended its uses in both subsystems, an increase linked with grammaticalisation and the changes in other pronouns in English through the centuries. The variationist sociolinguistic analysis charts how grammaticalisation in the subsystems is embedded in the linguistic and social structures in which the pronouns are used. The study suggests a scale of two statistical generalisations of various sociolinguistic factors which contribute to grammaticalisation and its embedding at various stages of the process.
Resumo:
In this dissertation, I present an overall methodological framework for studying linguistic alternations, focusing specifically on lexical variation in denoting a single meaning, that is, synonymy. As the practical example, I employ the synonymous set of the four most common Finnish verbs denoting THINK, namely ajatella, miettiä, pohtia and harkita ‘think, reflect, ponder, consider’. As a continuation to previous work, I describe in considerable detail the extension of statistical methods from dichotomous linguistic settings (e.g., Gries 2003; Bresnan et al. 2007) to polytomous ones, that is, concerning more than two possible alternative outcomes. The applied statistical methods are arranged into a succession of stages with increasing complexity, proceeding from univariate via bivariate to multivariate techniques in the end. As the central multivariate method, I argue for the use of polytomous logistic regression and demonstrate its practical implementation to the studied phenomenon, thus extending the work by Bresnan et al. (2007), who applied simple (binary) logistic regression to a dichotomous structural alternation in English. The results of the various statistical analyses confirm that a wide range of contextual features across different categories are indeed associated with the use and selection of the selected think lexemes; however, a substantial part of these features are not exemplified in current Finnish lexicographical descriptions. The multivariate analysis results indicate that the semantic classifications of syntactic argument types are on the average the most distinctive feature category, followed by overall semantic characterizations of the verb chains, and then syntactic argument types alone, with morphological features pertaining to the verb chain and extra-linguistic features relegated to the last position. In terms of overall performance of the multivariate analysis and modeling, the prediction accuracy seems to reach a ceiling at a Recall rate of roughly two-thirds of the sentences in the research corpus. The analysis of these results suggests a limit to what can be explained and determined within the immediate sentential context and applying the conventional descriptive and analytical apparatus based on currently available linguistic theories and models. The results also support Bresnan’s (2007) and others’ (e.g., Bod et al. 2003) probabilistic view of the relationship between linguistic usage and the underlying linguistic system, in which only a minority of linguistic choices are categorical, given the known context – represented as a feature cluster – that can be analytically grasped and identified. Instead, most contexts exhibit degrees of variation as to their outcomes, resulting in proportionate choices over longer stretches of usage in texts or speech.
Resumo:
Data from surveys of recreational anglers fishing on three estuaries in eastern Australia reveal highly skewed distributions of catches with many zeros. Such data may be analysed using a two component approach involving a binary (zero/non-zero catch) response and the non-zero catches. A truncated regression model was effective in analysing the non-zero catches. Covariates were incorporated in the modelling, and their critical assessment has led to improved measures of fishing effort for this recreational fishery.
Resumo:
Distributions of lesser mealworm, Alphitobius diaperinus (Panzer) (Coleoptera: Tenebrionidae), in litter of a compacted earth floor broiler house in southeastern Queensland, Australia, were studied over two flocks. Larvae were the predominant stage recorded. Significantly low densities occurred in open locations and under drinker cups where chickens had complete access, whereas high densities were found under feed pans and along house edges where chicken access was restricted. For each flock, lesser mealworm numbers increased at all locations over the first 14 d, especially under feed pans and along house edges, peaking at 26 d and then declining over the final 28 d. A life stage profile per flock was devised that consisted of the following: beetles emerge from the earth floor at the beginning of each flock, and females lay eggs, producing larvae that peak in numbers at 3 wk; after a further 3 to 4 wk, larvae leave litter to pupate in the earth floor, and beetles then emerge by the end of the flock time. Removing old litter from the brooder section at the end of a flock did not greatly reduce mealworm numbers over the subsequent flock, but it seemed to prevent numbers increasing, while an increase in numbers in the grow-out section was recorded after reusing litter. Areas under feed pans and along house edges accounted for 5% of the total house area, but approximately half the estimated total number of lesser mealworms in the broiler house occurred in these locations. The results of this study will be used to determine optimal deployment of site-specific treatments for lesser mealworm control.
Resumo:
Species distribution modelling (SDM) typically analyses species’ presence together with some form of absence information. Ideally absences comprise observations or are inferred from comprehensive sampling. When such information is not available, then pseudo-absences are often generated from the background locations within the study region of interest containing the presences, or else absence is implied through the comparison of presences to the whole study region, e.g. as is the case in Maximum Entropy (MaxEnt) or Poisson point process modelling. However, the choice of which absence information to include can be both challenging and highly influential on SDM predictions (e.g. Oksanen and Minchin, 2002). In practice, the use of pseudo- or implied absences often leads to an imbalance where absences far outnumber presences. This leaves analysis highly susceptible to ‘naughty-noughts’: absences that occur beyond the envelope of the species, which can exert strong influence on the model and its predictions (Austin and Meyers, 1996). Also known as ‘excess zeros’, naughty noughts can be estimated via an overall proportion in simple hurdle or mixture models (Martin et al., 2005). However, absences, especially those that occur beyond the species envelope, can often be more diverse than presences. Here we consider an extension to excess zero models. The two-staged approach first exploits the compartmentalisation provided by classification trees (CTs) (as in O’Leary, 2008) to identify multiple sources of naughty noughts and simultaneously delineate several species envelopes. Then SDMs can be fit separately within each envelope, and for this stage, we examine both CTs (as in Falk et al., 2014) and the popular MaxEnt (Elith et al., 2006). We introduce a wider range of model performance measures to improve treatment of naughty noughts in SDM. We retain an overall measure of model performance, the area under the curve (AUC) of the Receiver-Operating Curve (ROC), but focus on its constituent measures of false negative rate (FNR) and false positive rate (FPR), and how these relate to the threshold in the predicted probability of presence that delimits predicted presence from absence. We also propose error rates more relevant to users of predictions: false omission rate (FOR), the chance that a predicted absence corresponds to (and hence wastes) an observed presence, and the false discovery rate (FDR), reflecting those predicted (or potential) presences that correspond to absence. A high FDR may be desirable since it could help target future search efforts, whereas zero or low FOR is desirable since it indicates none of the (often valuable) presences have been ignored in the SDM. For illustration, we chose Bradypus variegatus, a species that has previously been published as an exemplar species for MaxEnt, proposed by Phillips et al. (2006). We used CTs to increasingly refine the species envelope, starting with the whole study region (E0), eliminating more and more potential naughty noughts (E1–E3). When combined with an SDM fit within the species envelope, the best CT SDM had similar AUC and FPR to the best MaxEnt SDM, but otherwise performed better. The FNR and FOR were greatly reduced, suggesting that CTs handle absences better. Interestingly, MaxEnt predictions showed low discriminatory performance, with the most common predicted probability of presence being in the same range (0.00-0.20) for both true absences and presences. In summary, this example shows that SDMs can be improved by introducing an initial hurdle to identify naughty noughts and partition the envelope before applying SDMs. This improvement was barely detectable via AUC and FPR yet visible in FOR, FNR, and the comparison of predicted probability of presence distribution for pres/absence.
Resumo:
The use of near infrared (NIR) hyperspectral imaging and hyperspectral image analysis for distinguishing between hard, intermediate and soft maize kernels from inbred lines was evaluated. NIR hyperspectral images of two sets (12 and 24 kernels) of whole maize kernels were acquired using a Spectral Dimensions MatrixNIR camera with a spectral range of 960-1662 nm and a sisuChema SWIR (short wave infrared) hyperspectral pushbroom imaging system with a spectral range of 1000-2498 nm. Exploratory principal component analysis (PCA) was used on absorbance images to remove background, bad pixels and shading. On the cleaned images. PCA could be used effectively to find histological classes including glassy (hard) and floury (soft) endosperm. PCA illustrated a distinct difference between glassy and floury endosperm along principal component (PC) three on the MatrixNIR and PC two on the sisuChema with two distinguishable clusters. Subsequently partial least squares discriminant analysis (PLS-DA) was applied to build a classification model. The PLS-DA model from the MatrixNIR image (12 kernels) resulted in root mean square error of prediction (RMSEP) value of 0.18. This was repeated on the MatrixNIR image of the 24 kernels which resulted in RMSEP of 0.18. The sisuChema image yielded RMSEP value of 0.29. The reproducible results obtained with the different data sets indicate that the method proposed in this paper has a real potential for future classification uses.
Resumo:
Aim: Birds play a major role in the dispersal of seeds of many fleshy-fruited invasive plants. The fruits that birds choose to consume are influenced by fruit traits. However, little is known of how the traits of invasive plant fruits contribute to invasiveness or to their use by frugivores. We aim to gain a greater understanding of these relationships to improve invasive plant management. Location: South-east Queensland, Australia. Methods: We measure a variety of fruit morphology, pulp nutrient and phenology traits of a suite of bird-dispersed alien plants. Frugivore richness of these aliens was derived from the literature. Using regressions and multivariate methods, we investigate relationships between fruit traits, frugivore richness and invasiveness. Results: Plant invasiveness was negatively correlated to fruit size, and all highly invasive species had quite similar fruit morphology [smaller fruits, seeds of intermediate size and few (<10) seeds per fruit]. Lower pulp water was the only pulp nutrient trait associated with invasiveness. There were strong positive relationships between the diversity of bird frugivores and plant invasiveness, and in the diversity of bird frugivores in the study region and another part of the plants' alien range. Main conclusions: Our results suggest that weed risk assessments (WRA) and predictions of invasive success for bird-dispersed plants can be improved. Scoring criteria for WRA regarding fruit size would need to be system-specific, depending on the fruit-processing capabilities of local frugivores. Frugivore richness could be quantified in the plant's natural range, its invasive range elsewhere, or predictions made based on functionally similar fruits.