938 resultados para Correlation and Regression Analysis
Resumo:
Issued May 1980.
Resumo:
Chiefly tables.
Resumo:
Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.
Resumo:
Hydrophobicity as measured by Log P is an important molecular property related to toxicity and carcinogenicity. With increasing public health concerns for the effects of Disinfection By-Products (DBPs), there are considerable benefits in developing Quantitative Structure and Activity Relationship (QSAR) models capable of accurately predicting Log P. In this research, Log P values of 173 DBP compounds in 6 functional classes were used to develop QSAR models, by applying 3 molecular descriptors, namely, Energy of the Lowest Unoccupied Molecular Orbital (ELUMO), Number of Chlorine (NCl) and Number of Carbon (NC) by Multiple Linear Regression (MLR) analysis. The QSAR models developed were validated based on the Organization for Economic Co-operation and Development (OECD) principles. The model Applicability Domain (AD) and mechanistic interpretation were explored. Considering the very complex nature of DBPs, the established QSAR models performed very well with respect to goodness-of-fit, robustness and predictability. The predicted values of Log P of DBPs by the QSAR models were found to be significant with a correlation coefficient R2 from 81% to 98%. The Leverage Approach by Williams Plot was applied to detect and remove outliers, consequently increasing R 2 by approximately 2% to 13% for different DBP classes. The developed QSAR models were statistically validated for their predictive power by the Leave-One-Out (LOO) and Leave-Many-Out (LMO) cross validation methods. Finally, Monte Carlo simulation was used to assess the variations and inherent uncertainties in the QSAR models of Log P and determine the most influential parameters in connection with Log P prediction. The developed QSAR models in this dissertation will have a broad applicability domain because the research data set covered six out of eight common DBP classes, including halogenated alkane, halogenated alkene, halogenated aromatic, halogenated aldehyde, halogenated ketone, and halogenated carboxylic acid, which have been brought to the attention of regulatory agencies in recent years. Furthermore, the QSAR models are suitable to be used for prediction of similar DBP compounds within the same applicability domain. The selection and integration of various methodologies developed in this research may also benefit future research in similar fields.
Resumo:
The goal of this project is to learn the necessary steps to create a finite element model, which can accurately predict the dynamic response of a Kohler Engines Heavy Duty Air Cleaner (HDAC). This air cleaner is composed of three glass reinforced plastic components and two air filters. Several uncertainties arose in the finite element (FE) model due to the HDAC’s component material properties and assembly conditions. To help understand and mitigate these uncertainties, analytical and experimental modal models were created concurrently to perform a model correlation and calibration. Over the course of the project simple and practical methods were found for future FE model creation. Similarly, an experimental method for the optimal acquisition of experimental modal data was arrived upon. After the model correlation and calibration was performed a validation experiment was used to confirm the FE models predictive capabilities.
Resumo:
Neuroimaging research involves analyses of huge amounts of biological data that might or might not be related with cognition. This relationship is usually approached using univariate methods, and, therefore, correction methods are mandatory for reducing false positives. Nevertheless, the probability of false negatives is also increased. Multivariate frameworks have been proposed for helping to alleviate this balance. Here we apply multivariate distance matrix regression for the simultaneous analysis of biological and cognitive data, namely, structural connections among 82 brain regions and several latent factors estimating cognitive performance. We tested whether cognitive differences predict distances among individuals regarding their connectivity pattern. Beginning with 3,321 connections among regions, the 36 edges better predicted by the individuals' cognitive scores were selected. Cognitive scores were related to connectivity distances in both the full (3,321) and reduced (36) connectivity patterns. The selected edges connect regions distributed across the entire brain and the network defined by these edges supports high-order cognitive processes such as (a) (fluid) executive control, (b) (crystallized) recognition, learning, and language processing, and (c) visuospatial processing. This multivariate study suggests that one widespread, but limited number, of regions in the human brain, supports high-level cognitive ability differences. Hum Brain Mapp, 2016. © 2016 Wiley Periodicals, Inc.
Resumo:
In this thesis, the relationship between air pollution and human health has been investigated utilising Geographic Information System (GIS) as an analysis tool. The research focused on how vehicular air pollution affects human health. The main objective of this study was to analyse the spatial variability of pollutants, taking Brisbane City in Australia as a case study, by the identification of the areas of high concentration of air pollutants and their relationship with the numbers of death caused by air pollutants. A correlation test was performed to establish the relationship between air pollution, number of deaths from respiratory disease, and total distance travelled by road vehicles in Brisbane. GIS was utilized to investigate the spatial distribution of the air pollutants. The main finding of this research is the comparison between spatial and non-spatial analysis approaches, which indicated that correlation analysis and simple buffer analysis of GIS using the average levels of air pollutants from a single monitoring station or by group of few monitoring stations is a relatively simple method for assessing the health effects of air pollution. There was a significant positive correlation between variable under consideration, and the research shows a decreasing trend of concentration of nitrogen dioxide at the Eagle Farm and Springwood sites and an increasing trend at CBD site. Statistical analysis shows that there exists a positive relationship between the level of emission and number of deaths, though the impact is not uniform as certain sections of the population are more vulnerable to exposure. Further statistical tests found that the elderly people of over 75 years age and children between 0-15 years of age are the more vulnerable people exposed to air pollution. A non-spatial approach alone may be insufficient for an appropriate evaluation of the impact of air pollutant variables and their inter-relationships. It is important to evaluate the spatial features of air pollutants before modeling the air pollution-health relationships.
Resumo:
Barmah Forest virus (BFV) disease is one of the most widespread mosquito-borne diseases in Australia. The number of outbreaks and the incidence rate of BFV in Australia have attracted growing concerns about the spatio-temporal complexity and underlying risk factors of BFV disease. A large number of notifications has been recorded continuously in Queensland since 1992. Yet, little is known about the spatial and temporal characteristics of the disease. I aim to use notification data to better understand the effects of climatic, demographic, socio-economic and ecological risk factors on the spatial epidemiology of BFV disease transmission, develop predictive risk models and forecast future disease risks under climate change scenarios. Computerised data files of daily notifications of BFV disease and climatic variables in Queensland during 1992-2008 were obtained from Queensland Health and Australian Bureau of Meteorology, respectively. Projections on climate data for years 2025, 2050 and 2100 were obtained from Council of Scientific Industrial Research Organisation. Data on socio-economic, demographic and ecological factors were also obtained from relevant government departments as follows: 1) socio-economic and demographic data from Australian Bureau of Statistics; 2) wetlands data from Department of Environment and Resource Management and 3) tidal readings from Queensland Department of Transport and Main roads. Disease notifications were geocoded and spatial and temporal patterns of disease were investigated using geostatistics. Visualisation of BFV disease incidence rates through mapping reveals the presence of substantial spatio-temporal variation at statistical local areas (SLA) over time. Results reveal high incidence rates of BFV disease along coastal areas compared to the whole area of Queensland. A Mantel-Haenszel Chi-square analysis for trend reveals a statistically significant relationship between BFV disease incidence rates and age groups (ƒÓ2 = 7587, p<0.01). Semi-variogram analysis and smoothed maps created from interpolation techniques indicate that the pattern of spatial autocorrelation was not homogeneous across the state. A cluster analysis was used to detect the hot spots/clusters of BFV disease at a SLA level. Most likely spatial and space-time clusters are detected at the same locations across coastal Queensland (p<0.05). The study demonstrates heterogeneity of disease risk at a SLA level and reveals the spatial and temporal clustering of BFV disease in Queensland. Discriminant analysis was employed to establish a link between wetland classes, climate zones and BFV disease. This is because the importance of wetlands in the transmission of BFV disease remains unclear. The multivariable discriminant modelling analyses demonstrate that wetland types of saline 1, riverine and saline tidal influence were the most significant risk factors for BFV disease in all climate and buffer zones, while lacustrine, palustrine, estuarine and saline 2 and saline 3 wetlands were less important. The model accuracies were 76%, 98% and 100% for BFV risk in subtropical, tropical and temperate climate zones, respectively. This study demonstrates that BFV disease risk varied with wetland class and climate zone. The study suggests that wetlands may act as potential breeding habitats for BFV vectors. Multivariable spatial regression models were applied to assess the impact of spatial climatic, socio-economic and tidal factors on the BFV disease in Queensland. Spatial regression models were developed to account for spatial effects. Spatial regression models generated superior estimates over a traditional regression model. In the spatial regression models, BFV disease incidence shows an inverse relationship with minimum temperature, low tide and distance to coast, and positive relationship with rainfall in coastal areas whereas in whole Queensland the disease shows an inverse relationship with minimum temperature and high tide and positive relationship with rainfall. This study determines the most significant spatial risk factors for BFV disease across Queensland. Empirical models were developed to forecast the future risk of BFV disease outbreaks in coastal Queensland using existing climatic, socio-economic and tidal conditions under climate change scenarios. Logistic regression models were developed using BFV disease outbreak data for the existing period (2000-2008). The most parsimonious model had high sensitivity, specificity and accuracy and this model was used to estimate and forecast BFV disease outbreaks for years 2025, 2050 and 2100 under climate change scenarios for Australia. Important contributions arising from this research are that: (i) it is innovative to identify high-risk coastal areas by creating buffers based on grid-centroid and the use of fine-grained spatial units, i.e., mesh blocks; (ii) a spatial regression method was used to account for spatial dependence and heterogeneity of data in the study area; (iii) it determined a range of potential spatial risk factors for BFV disease; and (iv) it predicted the future risk of BFV disease outbreaks under climate change scenarios in Queensland, Australia. In conclusion, the thesis demonstrates that the distribution of BFV disease exhibits a distinct spatial and temporal variation. Such variation is influenced by a range of spatial risk factors including climatic, demographic, socio-economic, ecological and tidal variables. The thesis demonstrates that spatial regression method can be applied to better understand the transmission dynamics of BFV disease and its risk factors. The research findings show that disease notification data can be integrated with multi-factorial risk factor data to develop build-up models and forecast future potential disease risks under climate change scenarios. This thesis may have implications in BFV disease control and prevention programs in Queensland.
Resumo:
Polybrominated diphenyl ethers (PBDEs) are considered to be a cost effective and efficient way to reduce flammability therefore reducing harm caused by fires. PBDEs are incorporated into a variety of manufactured products and are found worldwide in biological and environmental samples (e.g. Hites et al. 2004). Unlike other persistent organic pollutants there is limited data on PBDE concentrations by age and/or other population specific factors. Some studies have shown no variation in adult serum PBDE concentrations with age (e.g. Mazdai et al., 2003, Meironyte Guvenius et al., 2003) while Petreas et al. (2003) and Schecter et al. (2005) found results to be suggestive of an age trend in adult data but no statistically significant correlation was found. In addition to the data on adult concentrations there is limited data which investigates the levels of PBDEs in infants and young children. Fangström et al. (2005) showed that in seven year olds there was no difference in PBDE concentration when compared to adult concentrations. While Thomsen et al. (2002, 2005) found the concentration of PBDEs in pooled samples of blood serum from a 0-4 years age group to be higher than other age groups (4 to > 60 years). In addition, a family of four was studied in the U.S. and the concentrations were found to be greatest in the 18-month-old infant followed by the 5 year old child, then the mother and father (Fischer et al., 2006). The objectives of this study were to assess age, gender and regional trends of PBDE concentrations in a representative sample of the Australian population.
Resumo:
This study investigates the impact floods on property values using the hedonic property price approach and other relevant econometric techniques. The main objectives of this research are to investigate (1) the impact of the release of flood-risk information and the actual floods on property values (2) the temporal behaviour of negative impacts (3) the property submarket behaviour (4) the behaviour of flood affected vs flood non-affected areas and (5) the property market efficiency. The thesis expanded on the existing literature on natural disasters by applying a range of econometric techniques. Findings of this research are useful for policy decision-making which is aimed at minimizing the negative impacts of natural hazards on property markets. The thesis findings also provide a better framework for decision-making in the property insurance market. The methodological improvements that are made in the thesis will be invaluable for analysing the impacts of natural hazards elsewhere.
Resumo:
Many studies have shown that we can gain additional information on time series by investigating their accompanying complex networks. In this work, we investigate the fundamental topological and fractal properties of recurrence networks constructed from fractional Brownian motions (FBMs). First, our results indicate that the constructed recurrence networks have exponential degree distributions; the average degree exponent 〈λ〉 increases first and then decreases with the increase of Hurst index H of the associated FBMs; the relationship between H and 〈λ〉 can be represented by a cubic polynomial function. We next focus on the motif rank distribution of recurrence networks, so that we can better understand networks at the local structure level. We find the interesting superfamily phenomenon, i.e., the recurrence networks with the same motif rank pattern being grouped into two superfamilies. Last, we numerically analyze the fractal and multifractal properties of recurrence networks. We find that the average fractal dimension 〈dB〉 of recurrence networks decreases with the Hurst index H of the associated FBMs, and their dependence approximately satisfies the linear formula 〈dB〉≈2-H, which means that the fractal dimension of the associated recurrence network is close to that of the graph of the FBM. Moreover, our numerical results of multifractal analysis show that the multifractality exists in these recurrence networks, and the multifractality of these networks becomes stronger at first and then weaker when the Hurst index of the associated time series becomes larger from 0.4 to 0.95. In particular, the recurrence network with the Hurst index H=0.5 possesses the strongest multifractality. In addition, the dependence relationships of the average information dimension 〈D(1)〉 and the average correlation dimension 〈D(2)〉 on the Hurst index H can also be fitted well with linear functions. Our results strongly suggest that the recurrence network inherits the basic characteristic and the fractal nature of the associated FBM series.