762 resultados para Weighted Overlay Analysis
em Queensland University of Technology - ePrints Archive
Resumo:
Objective. Ankylosing spondylitis (AS) is a debilitating chronic inflammatory condition with a high degree of familiality (λs=82) and heritability (>90%) that primarily affects spinal and sacroiliac joints. Whole genome scans for linkage to AS phenotypes have been conducted, although results have been inconsistent between studies and all have had modest sample sizes. One potential solution to these issues is to combine data from multiple studies in a retrospective meta-analysis. Methods: The International Genetics of Ankylosing Spondylitis Consortium combined data from three whole genome linkage scans for AS (n=3744 subjects) to determine chromosomal markers that show evidence of linkage with disease. Linkage markers typed in different centres were integrated into a consensus map to facilitate effective data pooling. We performed a weighted meta-analysis to combine the linkage results, and compared them with the three individual scans and a combined pooled scan. Results: In addition to the expected region surrounding the HLA-B27 gene on chromosome 6, we determined that several marker regions showed significant evidence of linkage with disease status. Regions on chromosome 10q and 16q achieved 'suggestive' evidence of linkage, and regions on chromosomes 1q, 3q, 5q, 6q, 9q, 17q and 19q showed at least nominal linkage in two or more scans and in the weighted meta-analysis. Regions previously associated with AS on chromosome 2q (the IL-1 gene cluster) and 22q (CYP2D6) exhibited nominal linkage in the meta-analysis, providing further statistical support for their involvement in susceptibility to AS. Conclusion: These findings provide a useful guide for future studies aiming to identify the genes involved in this highly heritable condition. . Published by on behalf of the British Society for Rheumatology.
Resumo:
Physical and chemical properties of biodiesel are influenced by structural features of the fatty acids, such as chain length, degree of unsaturation and branching of the carbon chain. This study investigated if microalgal fatty acid profiles are suitable for biodiesel characterization and species selection through Preference Ranking Organisation Method for Enrichment Evaluation (PROMETHEE) and Graphical Analysis for Interactive Assistance (GAIA) analysis. Fatty acid methyl ester (FAME) profiles were used to calculate the likely key chemical and physical properties of the biodiesel [cetane number (CN), iodine value (IV), cold filter plugging point, density, kinematic viscosity, higher heating value] of nine microalgal species (this study) and twelve species from the literature, selected for their suitability for cultivation in subtropical climates. An equal-parameter weighted (PROMETHEE-GAIA) ranked Nannochloropsis oculata, Extubocellulus sp. and Biddulphia sp. highest; the only species meeting the EN14214 and ASTM D6751-02 biodiesel standards, except for the double bond limit in the EN14214. Chlorella vulgaris outranked N. oculata when the twelve microalgae were included. Culture growth phase (stationary) and, to a lesser extent, nutrient provision affected CN and IV values of N. oculata due to lower eicosapentaenoic acid (EPA) contents. Application of a polyunsaturated fatty acid (PUFA) weighting to saturation led to a lower ranking of species exceeding the double bond EN14214 thresholds. In summary, CN, IV, C18:3 and double bond limits were the strongest drivers in equal biodiesel parameter-weighted PROMETHEE analysis.
Resumo:
Reliable pollutant build-up prediction plays a critical role in the accuracy of urban stormwater quality modelling outcomes. However, water quality data collection is resource demanding compared to streamflow data monitoring, where a greater quantity of data is generally available. Consequently, available water quality data sets span only relatively short time scales unlike water quantity data. Therefore, the ability to take due consideration of the variability associated with pollutant processes and natural phenomena is constrained. This in turn gives rise to uncertainty in the modelling outcomes as research has shown that pollutant loadings on catchment surfaces and rainfall within an area can vary considerably over space and time scales. Therefore, the assessment of model uncertainty is an essential element of informed decision making in urban stormwater management. This paper presents the application of a range of regression approaches such as ordinary least squares regression, weighted least squares Regression and Bayesian Weighted Least Squares Regression for the estimation of uncertainty associated with pollutant build-up prediction using limited data sets. The study outcomes confirmed that the use of ordinary least squares regression with fixed model inputs and limited observational data may not provide realistic estimates. The stochastic nature of the dependent and independent variables need to be taken into consideration in pollutant build-up prediction. It was found that the use of the Bayesian approach along with the Monte Carlo simulation technique provides a powerful tool, which attempts to make the best use of the available knowledge in the prediction and thereby presents a practical solution to counteract the limitations which are otherwise imposed on water quality modelling.
Resumo:
Genomic sequences are fundamentally text documents, admitting various representations according to need and tokenization. Gene expression depends crucially on binding of enzymes to the DNA sequence at small, poorly conserved binding sites, limiting the utility of standard pattern search. However, one may exploit the regular syntactic structure of the enzyme's component proteins and the corresponding binding sites, framing the problem as one of detecting grammatically correct genomic phrases. In this paper we propose new kernels based on weighted tree structures, traversing the paths within them to capture the features which underpin the task. Experimentally, we and that these kernels provide performance comparable with state of the art approaches for this problem, while offering significant computational advantages over earlier methods. The methods proposed may be applied to a broad range of sequence or tree-structured data in molecular biology and other domains.
Resumo:
We consider ranked-based regression models for clustered data analysis. A weighted Wilcoxon rank method is proposed to take account of within-cluster correlations and varying cluster sizes. The asymptotic normality of the resulting estimators is established. A method to estimate covariance of the estimators is also given, which can bypass estimation of the density function. Simulation studies are carried out to compare different estimators for a number of scenarios on the correlation structure, presence/absence of outliers and different correlation values. The proposed methods appear to perform well, in particular, the one incorporating the correlation in the weighting achieves the highest efficiency and robustness against misspecification of correlation structure and outliers. A real example is provided for illustration.
Resumo:
Different international plant protection organisations advocate different schemes for conducting pest risk assessments. Most of these schemes use structured questionnaire in which experts are asked to score several items using an ordinal scale. The scores are then combined using a range of procedures, such as simple arithmetic mean, weighted averages, multiplication of scores, and cumulative sums. The most useful schemes will correctly identify harmful pests and identify ones that are not. As the quality of a pest risk assessment can depend on the characteristics of the scoring system used by the risk assessors (i.e., on the number of points of the scale and on the method used for combining the component scores), it is important to assess and compare the performance of different scoring systems. In this article, we proposed a new method for assessing scoring systems. Its principle is to simulate virtual data using a stochastic model and, then, to estimate sensitivity and specificity values from these data for different scoring systems. The interest of our approach was illustrated in a case study where several scoring systems were compared. Data for this analysis were generated using a probabilistic model describing the pest introduction process. The generated data were then used to simulate the outcome of scoring systems and to assess the accuracy of the decisions about positive and negative introduction. The results showed that ordinal scales with at most 5 or 6 points were sufficient and that the multiplication-based scoring systems performed better than their sum-based counterparts. The proposed method could be used in the future to assess a great diversity of scoring systems.
Resumo:
Complex networks have been studied extensively due to their relevance to many real-world systems such as the world-wide web, the internet, biological and social systems. During the past two decades, studies of such networks in different fields have produced many significant results concerning their structures, topological properties, and dynamics. Three well-known properties of complex networks are scale-free degree distribution, small-world effect and self-similarity. The search for additional meaningful properties and the relationships among these properties is an active area of current research. This thesis investigates a newer aspect of complex networks, namely their multifractality, which is an extension of the concept of selfsimilarity. The first part of the thesis aims to confirm that the study of properties of complex networks can be expanded to a wider field including more complex weighted networks. Those real networks that have been shown to possess the self-similarity property in the existing literature are all unweighted networks. We use the proteinprotein interaction (PPI) networks as a key example to show that their weighted networks inherit the self-similarity from the original unweighted networks. Firstly, we confirm that the random sequential box-covering algorithm is an effective tool to compute the fractal dimension of complex networks. This is demonstrated on the Homo sapiens and E. coli PPI networks as well as their skeletons. Our results verify that the fractal dimension of the skeleton is smaller than that of the original network due to the shortest distance between nodes is larger in the skeleton, hence for a fixed box-size more boxes will be needed to cover the skeleton. Then we adopt the iterative scoring method to generate weighted PPI networks of five species, namely Homo sapiens, E. coli, yeast, C. elegans and Arabidopsis Thaliana. By using the random sequential box-covering algorithm, we calculate the fractal dimensions for both the original unweighted PPI networks and the generated weighted networks. The results show that self-similarity is still present in generated weighted PPI networks. This implication will be useful for our treatment of the networks in the third part of the thesis. The second part of the thesis aims to explore the multifractal behavior of different complex networks. Fractals such as the Cantor set, the Koch curve and the Sierspinski gasket are homogeneous since these fractals consist of a geometrical figure which repeats on an ever-reduced scale. Fractal analysis is a useful method for their study. However, real-world fractals are not homogeneous; there is rarely an identical motif repeated on all scales. Their singularity may vary on different subsets; implying that these objects are multifractal. Multifractal analysis is a useful way to systematically characterize the spatial heterogeneity of both theoretical and experimental fractal patterns. However, the tools for multifractal analysis of objects in Euclidean space are not suitable for complex networks. In this thesis, we propose a new box covering algorithm for multifractal analysis of complex networks. This algorithm is demonstrated in the computation of the generalized fractal dimensions of some theoretical networks, namely scale-free networks, small-world networks, random networks, and a kind of real networks, namely PPI networks of different species. Our main finding is the existence of multifractality in scale-free networks and PPI networks, while the multifractal behaviour is not confirmed for small-world networks and random networks. As another application, we generate gene interactions networks for patients and healthy people using the correlation coefficients between microarrays of different genes. Our results confirm the existence of multifractality in gene interactions networks. This multifractal analysis then provides a potentially useful tool for gene clustering and identification. The third part of the thesis aims to investigate the topological properties of networks constructed from time series. Characterizing complicated dynamics from time series is a fundamental problem of continuing interest in a wide variety of fields. Recent works indicate that complex network theory can be a powerful tool to analyse time series. Many existing methods for transforming time series into complex networks share a common feature: they define the connectivity of a complex network by the mutual proximity of different parts (e.g., individual states, state vectors, or cycles) of a single trajectory. In this thesis, we propose a new method to construct networks of time series: we define nodes by vectors of a certain length in the time series, and weight of edges between any two nodes by the Euclidean distance between the corresponding two vectors. We apply this method to build networks for fractional Brownian motions, whose long-range dependence is characterised by their Hurst exponent. We verify the validity of this method by showing that time series with stronger correlation, hence larger Hurst exponent, tend to have smaller fractal dimension, hence smoother sample paths. We then construct networks via the technique of horizontal visibility graph (HVG), which has been widely used recently. We confirm a known linear relationship between the Hurst exponent of fractional Brownian motion and the fractal dimension of the corresponding HVG network. In the first application, we apply our newly developed box-covering algorithm to calculate the generalized fractal dimensions of the HVG networks of fractional Brownian motions as well as those for binomial cascades and five bacterial genomes. The results confirm the monoscaling of fractional Brownian motion and the multifractality of the rest. As an additional application, we discuss the resilience of networks constructed from time series via two different approaches: visibility graph and horizontal visibility graph. Our finding is that the degree distribution of VG networks of fractional Brownian motions is scale-free (i.e., having a power law) meaning that one needs to destroy a large percentage of nodes before the network collapses into isolated parts; while for HVG networks of fractional Brownian motions, the degree distribution has exponential tails, implying that HVG networks would not survive the same kind of attack.
Resumo:
This paper introduces the Weighted Linear Discriminant Analysis (WLDA) technique, based upon the weighted pairwise Fisher criterion, for the purposes of improving i-vector speaker verification in the presence of high intersession variability. By taking advantage of the speaker discriminative information that is available in the distances between pairs of speakers clustered in the development i-vector space, the WLDA technique is shown to provide an improvement in speaker verification performance over traditional Linear Discriminant Analysis (LDA) approaches. A similar approach is also taken to extend the recently developed Source Normalised LDA (SNLDA) into Weighted SNLDA (WSNLDA) which, similarly, shows an improvement in speaker verification performance in both matched and mismatched enrolment/verification conditions. Based upon the results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset, we believe that both WLDA and WSNLDA are viable as replacement techniques to improve the performance of LDA and SNLDA-based i-vector speaker verification.
Resumo:
This paper investigates the use of the dimensionality-reduction techniques weighted linear discriminant analysis (WLDA), and weighted median fisher discriminant analysis (WMFD), before probabilistic linear discriminant analysis (PLDA) modeling for the purpose of improving speaker verification performance in the presence of high inter-session variability. Recently it was shown that WLDA techniques can provide improvement over traditional linear discriminant analysis (LDA) for channel compensation in i-vector based speaker verification systems. We show in this paper that the speaker discriminative information that is available in the distance between pair of speakers clustered in the development i-vector space can also be exploited in heavy-tailed PLDA modeling by using the weighted discriminant approaches prior to PLDA modeling. Based upon the results presented within this paper using the NIST 2008 Speaker Recognition Evaluation dataset, we believe that WLDA and WMFD projections before PLDA modeling can provide an improved approach when compared to uncompensated PLDA modeling for i-vector based speaker verification systems.
Resumo:
Objectives: To investigate the efficacy of progestin treatment to achieve pathological complete response (pCR) in patients with complex atypical endometrial hyperplasia (CAH) or early endometrial adenocarcinoma (EC). Methods: A systematic search identified 3245 potentially relevant citations. Studies containing less than ten eligible CAH or EC patients in either oral or intrauterine treatment arm were excluded. Only information from patients receiving six or more months of treatment and not receiving other treatments was included. Weighted proportions of patients achieving pCR were calculated using R software. Results: Twelve studies met the selection criteria. Eleven studies reported treatment of patients with oral (219 patients, 117 with CAH, 102 with grade 1 Stage I EC) and one reported treatment of patients with intrauterine progestin (11 patients with grade 1 Stage IEC). Overall, 74% (95% confidence interval [CI] 65-81%) of patients with CAH and 72% (95% CI 62-80%) of patients with grade 1 Stage I EC achieved a pCR to oral progestin. Disease progression while on oral treatment was reported for 6/219 (2.7%), and relapse after initial complete response for 32/159 (20.1%) patients. The weighted mean pCR rate of patients with grade 1 Stage I EC treated with intrauterine progestin from one prospective pilot study and an unpublished retrospective case series from the Queensland Centre of Gynaecologic Oncology (QCGC) was 68% (95% CI 45- 86%). Conclusions: There is a lack of high quality evidence for the efficacy of progestin in CAH or EC. The available evidence however suggests that treatment with oral or intrauterine progestin is similarly effective. The risk of progression during treatment is small but longer follow-up is required. Evidence from prospective controlled clinical trials is warranted to establish how the efficacy of progestin for the treatment of CAH and EC can be improved further.
Resumo:
Objective: To calculate pooled risk estimates of the association between pigmentary characteristics and basal cell carcinoma (BCC) of the skin. Methods: We searched three electronic databases and reviewed the reference lists of the retrieved articles until July 2012 to identify eligible epidemiologic studies. Eligible studies were those published in between 1965 and July 2012 that permitted quantitative assessment of the association between histologically-confirmed BCC and any of the following characteristics: hair colour, eye colour, skin colour, skin phototype, tanning and burning ability, and presence of freckling or melanocytic nevi. We included 29 studies from 2236 initially identified. We calculated summary odds ratios (ORs) using weighted averages of the log OR, using random effects models. Results: We found strongest associations with red hair (OR 2.02; 95% CI: 1.68, 2.44), fair skin colour (OR 2.11; 95% CI: 1.56, 2.86), and having skin that burns and never tans (OR 2.03; 95% CI: 1.73, 2.38). All other factors had weaker but positive associations with BCC, with the exception of freckling of the face in adulthood which showed no association. Conclusions: Although most studies report risk estimates that are in the same direction, there is significant heterogeneity in the size of the estimates. The associations were quite modest and remarkably similar, with ORs between about 1.5 and 2.5 for the highest risk level for each factor. Given the public health impact of BCC, this meta-analysis will make a valuable contribution to our understanding of BCC.
Resumo:
The method of generalized estimating equations (GEE) is a popular tool for analysing longitudinal (panel) data. Often, the covariates collected are time-dependent in nature, for example, age, relapse status, monthly income. When using GEE to analyse longitudinal data with time-dependent covariates, crucial assumptions about the covariates are necessary for valid inferences to be drawn. When those assumptions do not hold or cannot be verified, Pepe and Anderson (1994, Communications in Statistics, Simulations and Computation 23, 939–951) advocated using an independence working correlation assumption in the GEE model as a robust approach. However, using GEE with the independence correlation assumption may lead to significant efficiency loss (Fitzmaurice, 1995, Biometrics 51, 309–317). In this article, we propose a method that extracts additional information from the estimating equations that are excluded by the independence assumption. The method always includes the estimating equations under the independence assumption and the contribution from the remaining estimating equations is weighted according to the likelihood of each equation being a consistent estimating equation and the information it carries. We apply the method to a longitudinal study of the health of a group of Filipino children.
Resumo:
Data in germplasm collections contain a mixture of data types; binary, multistate and quantitative. Given the multivariate nature of these data, the pattern analysis methods of classification and ordination have been identified as suitable techniques for statistically evaluating the available diversity. The proximity (or resemblance) measure, which is in part the basis of the complementary nature of classification and ordination techniques, is often specific to particular data types. The use of a combined resemblance matrix has an advantage over data type specific proximity measures. This measure accommodates the different data types without manipulating them to be of a specific type. Descriptors are partitioned into their data types and an appropriate proximity measure is used on each. The separate proximity matrices, after range standardisation, are added as a weighted average and the combined resemblance matrix is then used for classification and ordination. Germplasm evaluation data for 831 accessions of groundnut (Arachis hypogaea L.) from the Australian Tropical Field Crops Genetic Resource Centre, Biloela, Queensland were examined. Data for four binary, five ordered multistate and seven quantitative descriptors have been documented. The interpretative value of different weightings - equal and unequal weighting of data types to obtain a combined resemblance matrix - was investigated by using principal co-ordinate analysis (ordination) and hierarchical cluster analysis. Equal weighting of data types was found to be more valuable for these data as the results provided a greater insight into the patterns of variability available in the Australian groundnut germplasm collection. The complementary nature of pattern analysis techniques enables plant breeders to identify relevant accessions in relation to the descriptors which distinguish amongst them. This additional information may provide plant breeders with a more defined entry point into the germplasm collection for identifying sources of variability for their plant improvement program, thus improving the utilisation of germplasm resources.
Resumo:
Corporate social responsibility is imperative for manufacturing companies to achieve sustainable development. Under a strong environmental information disclosure system, polluting companies are disadvantaged in terms of market competitiveness, because they lack an environmentally friendly image. The objective of this study is to analyze productive inefficiency change in relation to toxic chemical substance emissions for the United States and Japan and their corresponding policies. We apply the weighted Russell directional distance model to measure companies productive inefficiency, which represents their production technology. The data encompass 330 US manufacturing firms observed from 1999 to 2007, and 466 Japanese manufacturing firms observed from 2001 to 2008. The article focuses on nine high-pollution industries (rubber and plastics; chemicals and allied products; paper and pulp; steel and non-ferrous metal; fabricated metal; industrial machinery; electrical products; transportation equipment; precision instruments) categorized into two industry groups: basic materials industries and processing and assembly industries. The results show that productive inefficiency decreased in all industrial sectors in the United States and Japan from 2001 to 2007. In particular, that of the electrical products industry decreased rapidly after 2002 for both countries, possibly because of the enforcement of strict environmental regulations for electrical products exported to European markets.
Resumo:
Meta-analysis is a method to obtain a weighted average of results from various studies. In addition to pooling effect sizes, meta-analysis can also be used to estimate disease frequencies, such as incidence and prevalence. In this article we present methods for the meta-analysis of prevalence. We discuss the logit and double arcsine transformations to stabilise the variance. We note the special situation of multiple category prevalence, and propose solutions to the problems that arise. We describe the implementation of these methods in the MetaXL software, and present a simulation study and the example of multiple sclerosis from the Global Burden of Disease 2010 project. We conclude that the double arcsine transformation is preferred over the logit, and that the MetaXL implementation of multiple category prevalence is an improvement in the methodology of the meta-analysis of prevalence.