77 resultados para GENE PREDICTION
Resumo:
Numerical weather prediction (NWP) models provide the basis for weather forecasting by simulating the evolution of the atmospheric state. A good forecast requires that the initial state of the atmosphere is known accurately, and that the NWP model is a realistic representation of the atmosphere. Data assimilation methods are used to produce initial conditions for NWP models. The NWP model background field, typically a short-range forecast, is updated with observations in a statistically optimal way. The objective in this thesis has been to develope methods in order to allow data assimilation of Doppler radar radial wind observations. The work has been carried out in the High Resolution Limited Area Model (HIRLAM) 3-dimensional variational data assimilation framework. Observation modelling is a key element in exploiting indirect observations of the model variables. In the radar radial wind observation modelling, the vertical model wind profile is interpolated to the observation location, and the projection of the model wind vector on the radar pulse path is calculated. The vertical broadening of the radar pulse volume, and the bending of the radar pulse path due to atmospheric conditions are taken into account. Radar radial wind observations are modelled within observation errors which consist of instrumental, modelling, and representativeness errors. Systematic and random modelling errors can be minimized by accurate observation modelling. The impact of the random part of the instrumental and representativeness errors can be decreased by calculating spatial averages from the raw observations. Model experiments indicate that the spatial averaging clearly improves the fit of the radial wind observations to the model in terms of observation minus model background (OmB) standard deviation. Monitoring the quality of the observations is an important aspect, especially when a new observation type is introduced into a data assimilation system. Calculating the bias for radial wind observations in a conventional way can result in zero even in case there are systematic differences in the wind speed and/or direction. A bias estimation method designed for this observation type is introduced in the thesis. Doppler radar radial wind observation modelling, together with the bias estimation method, enables the exploitation of the radial wind observations also for NWP model validation. The one-month model experiments performed with the HIRLAM model versions differing only in a surface stress parameterization detail indicate that the use of radar wind observations in NWP model validation is very beneficial.
Resumo:
Data assimilation provides an initial atmospheric state, called the analysis, for Numerical Weather Prediction (NWP). This analysis consists of pressure, temperature, wind, and humidity on a three-dimensional NWP model grid. Data assimilation blends meteorological observations with the NWP model in a statistically optimal way. The objective of this thesis is to describe methodological development carried out in order to allow data assimilation of ground-based measurements of the Global Positioning System (GPS) into the High Resolution Limited Area Model (HIRLAM) NWP system. Geodetic processing produces observations of tropospheric delay. These observations can be processed either for vertical columns at each GPS receiver station, or for the individual propagation paths of the microwave signals. These alternative processing methods result in Zenith Total Delay (ZTD) and Slant Delay (SD) observations, respectively. ZTD and SD observations are of use in the analysis of atmospheric humidity. A method is introduced for estimation of the horizontal error covariance of ZTD observations. The method makes use of observation minus model background (OmB) sequences of ZTD and conventional observations. It is demonstrated that the ZTD observation error covariance is relatively large in station separations shorter than 200 km, but non-zero covariances also appear at considerably larger station separations. The relatively low density of radiosonde observing stations limits the ability of the proposed estimation method to resolve the shortest length-scales of error covariance. SD observations are shown to contain a statistically significant signal on the asymmetry of the atmospheric humidity field. However, the asymmetric component of SD is found to be nearly always smaller than the standard deviation of the SD observation error. SD observation modelling is described in detail, and other issues relating to SD data assimilation are also discussed. These include the determination of error statistics, the tuning of observation quality control and allowing the taking into account of local observation error correlation. The experiments made show that the data assimilation system is able to retrieve the asymmetric information content of hypothetical SD observations at a single receiver station. Moreover, the impact of real SD observations on humidity analysis is comparable to that of other observing systems.
Resumo:
Modern-day weather forecasting is highly dependent on Numerical Weather Prediction (NWP) models as the main data source. The evolving state of the atmosphere with time can be numerically predicted by solving a set of hydrodynamic equations, if the initial state is known. However, such a modelling approach always contains approximations that by and large depend on the purpose of use and resolution of the models. Present-day NWP systems operate with horizontal model resolutions in the range from about 40 km to 10 km. Recently, the aim has been to reach operationally to scales of 1 4 km. This requires less approximations in the model equations, more complex treatment of physical processes and, furthermore, more computing power. This thesis concentrates on the physical parameterization methods used in high-resolution NWP models. The main emphasis is on the validation of the grid-size-dependent convection parameterization in the High Resolution Limited Area Model (HIRLAM) and on a comprehensive intercomparison of radiative-flux parameterizations. In addition, the problems related to wind prediction near the coastline are addressed with high-resolution meso-scale models. The grid-size-dependent convection parameterization is clearly beneficial for NWP models operating with a dense grid. Results show that the current convection scheme in HIRLAM is still applicable down to a 5.6 km grid size. However, with further improved model resolution, the tendency of the model to overestimate strong precipitation intensities increases in all the experiment runs. For the clear-sky longwave radiation parameterization, schemes used in NWP-models provide much better results in comparison with simple empirical schemes. On the other hand, for the shortwave part of the spectrum, the empirical schemes are more competitive for producing fairly accurate surface fluxes. Overall, even the complex radiation parameterization schemes used in NWP-models seem to be slightly too transparent for both long- and shortwave radiation in clear-sky conditions. For cloudy conditions, simple cloud correction functions are tested. In case of longwave radiation, the empirical cloud correction methods provide rather accurate results, whereas for shortwave radiation the benefit is only marginal. Idealised high-resolution two-dimensional meso-scale model experiments suggest that the reason for the observed formation of the afternoon low level jet (LLJ) over the Gulf of Finland is an inertial oscillation mechanism, when the large-scale flow is from the south-east or west directions. The LLJ is further enhanced by the sea-breeze circulation. A three-dimensional HIRLAM experiment, with a 7.7 km grid size, is able to generate a similar LLJ flow structure as suggested by the 2D-experiments and observations. It is also pointed out that improved model resolution does not necessary lead to better wind forecasts in the statistical sense. In nested systems, the quality of the large-scale host model is really important, especially if the inner meso-scale model domain is small.
Resumo:
Numerical models, used for atmospheric research, weather prediction and climate simulation, describe the state of the atmosphere over the heterogeneous surface of the Earth. Several fundamental properties of atmospheric models depend on orography, i.e. on the average elevation of land over a model area. The higher is the models' resolution, the more the details of orography directly influence the simulated atmospheric processes. This sets new requirements for the accuracy of the model formulations with respect to the spatially varying orography. Orography is always averaged, representing the surface elevation within the horizontal resolution of the model. In order to remove the smallest scales and steepest slopes, the continuous spectrum of orography is normally filtered (truncated) even more, typically beyond a few gridlengths of the model. This means, that in the numerical weather prediction (NWP) models, there will always be subgridscale orography effects, which cannot be explicitly resolved by numerical integration of the basic equations, but require parametrization. In the subgrid-scale, different physical processes contribute in different scales. The parametrized processes interact with the resolved-scale processes and with each other. This study contributes to building of a consistent, scale-dependent system of orography-related parametrizations for the High Resolution Limited Area Model (HIRLAM). The system comprises schemes for handling the effects of mesoscale (MSO) and small-scale (SSO) orographic effects on the simulated flow and a scheme of orographic effects on the surface-level radiation fluxes. Representation of orography, scale-dependencies of the simulated processes and interactions between the parametrized and resolved processes are discussed. From the high-resolution digital elevation data, orographic parameters are derived for both momentum and radiation flux parametrizations. Tools for diagnostics and validation are developed and presented. The parametrization schemes applied, developed and validated in this study, are currently being implemented into the reference version of HIRLAM.
Resumo:
Neuroblastoma has successfully served as a model system for the identification of neuroectoderm-derived oncogenes. However, in spite of various efforts, only a few clinically useful prognostic markers have been found. Here, we present a framework, which integrates DNA, RNA and tissue data to identify and prioritize genetic events that represent clinically relevant new therapeutic targets and prognostic biomarkers for neuroblastoma.
Resumo:
Part I: Parkinson’s disease is a slowly progressive neurodegenerative disorder in which particularly the dopaminergic neurons of the substantia nigra pars compacta degenerate and die. Current conventional treatment is based on restraining symptoms but it has no effect on the progression of the disease. Gene therapy research has focused on the possibility of restoring the lost brain function by at least two means: substitution of critical enzymes needed for the synthesis of dopamine and slowing down the progression of the disease by supporting the functions of the remaining nigral dopaminergic neurons by neurotrophic factors. The striatal levels of enzymes such as tyrosine hydroxylase, dopadecarboxylase and GTP-CH1 are decreased as the disease progresses. By replacing one or all of the enzymes, dopamine levels in the striatum may be restored to normal and behavioral impairments caused by the disease may be ameliorated especially in the later stages of the disease. The neurotrophic factors glial cell derived neurotrophic factor (GDNF) and neurturin have shown to protect and restore functions of dopaminergic cell somas and terminals as well as improve behavior in animal lesion models. This therapy may be best suited at the early stages of the disease when there are more dopaminergic neurons for neurotrophic factors to reach. Viral vector-mediated gene transfer provides a tool to deliver proteins with complex structures into specific brain locations and provides long-term protein over-expression. Part II: The aim of our study was to investigate the effects of two orally dosed COMT inhibitors entacapone (10 and 30 mg/kg) and tolcapone (10 and 30 mg/kg) with a subsequent administration of a peripheral dopadecarboxylase inhibitor carbidopa (30 mg/kg) and L- dopa (30 mg/kg) on dopamine and its metabolite levels in the dorsal striatum and nucleus accumbens of freely moving rats using dual-probe in vivo microdialysis. Earlier similarly designed studies have only been conducted in the dorsal striatum. We also confirmed the result of earlier ex vivo studies regarding the effects of intraperitoneally dosed tolcapone (30 mg/kg) and entacapone (30 mg/kg) on striatal and hepatic COMT activity. The results obtained from the dorsal striatum were generally in line with earlier studies, where tolcapone tended to increase dopamine and DOPAC levels and decrease HVA levels. Entacapone tended to keep striatal dopamine and HVA levels elevated longer than in controls and also tended to elevate the levels of DOPAC. Surprisingly in the nucleus accumbens, dopamine levels after either dose of entacapone or tolcapone were not elevated. Accumbal DOPAC levels, especially in the tolcapone 30 mg/kg group, were elevated nearly to the same extent as measured in the dorsal striatum. Entacapone 10 mg/kg elevated accumbal HVA levels more than the dose of 30 mg/kg and the effect was more pronounced in the nucleus accumbens than in the dorsal striatum. This suggests that entacapone 30 mg/kg has minor central effects. Also our ex vivo study results obtained from the dorsal striatum suggest that entacapone 30 mg/kg has minor and transient central effects, even though central HVA levels were not suppressed below those of the control group in either brain area in the microdialysis study. Both entacapone and tolcapone suppressed hepatic COMT activity more than striatal COMT activity. Tolcapone was more effective than entacapone in the dorsal striatum. The differences between dopamine and its metabolite levels in the dorsal striatum and nucleus accumbens may be due to different properties of the two brain areas.
Resumo:
Bone mass accrual and maintenance are regulated by a complex interplay between genetic and environmental factors. Recent studies have revealed an important role for the low-density lipoprotein receptor-related protein 5 (LRP5) in this process. The aim of this thesis study was to identify novel variants in the LRP5 gene and to further elucidate the association of LRP5 and its variants with various bone health related clinical characteristics. The results of our studies show that loss-of-function mutations in LRP5 cause severe osteoporosis not only in homozygous subjects but also in the carriers of these mutations, who have significantly reduced bone mineral density (BMD) and increased susceptibility to fractures. In addition, we demonstrated for the first time that a common polymorphic LRP5 variant (p.A1330V) was associated with reduced peak bone mass, an important determinant of BMD and osteoporosis in later life. The results from these two studies are concordant with results seen in other studies on LRP5 mutations and in association studies linking genetic variation in LRP5 with BMD and osteoporosis. Several rare LRP5 variants were identified in children with recurrent fractures. Sequencing and multiplex ligation-dependent probe amplification (MLPA) analyses revealed no disease-causing mutations or whole-exon deletions. Our findings from clinical assessments and family-based genotype-phenotype studies suggested that the rare LRP5 variants identified are not the definite cause of fractures in these children. Clinical assessments of our study subjects with LPR5 mutations revealed an unexpectedly high prevalence of impaired glucose tolerance and dyslipidaemia. Moreover, in subsequent studies we discovered that common polymorphic LRP5 variants are associated with unfavorable metabolic characteristics. Changes in lipid profile were already apparent in pre-pubertal children. These results, together with the findings from other studies, suggest an important role for LRP5 also in glucose and lipid metabolism. Our results underscore the important role of LRP5 not only in bone mass accrual and maintenance of skeletal health but also in glucose and lipid metabolism. The role of LRP5 in bone metabolism has long been studied, but further studies with larger study cohorts are still needed to evaluate the specific role of LRP5 variants as metabolic risk factors.
Resumo:
One of the most fundamental and widely accepted ideas in finance is that investors are compensated through higher returns for taking on non-diversifiable risk. Hence the quantification, modeling and prediction of risk have been, and still are one of the most prolific research areas in financial economics. It was recognized early on that there are predictable patterns in the variance of speculative prices. Later research has shown that there may also be systematic variation in the skewness and kurtosis of financial returns. Lacking in the literature so far, is an out-of-sample forecast evaluation of the potential benefits of these new more complicated models with time-varying higher moments. Such an evaluation is the topic of this dissertation. Essay 1 investigates the forecast performance of the GARCH (1,1) model when estimated with 9 different error distributions on Standard and Poor’s 500 Index Future returns. By utilizing the theory of realized variance to construct an appropriate ex post measure of variance from intra-day data it is shown that allowing for a leptokurtic error distribution leads to significant improvements in variance forecasts compared to using the normal distribution. This result holds for daily, weekly as well as monthly forecast horizons. It is also found that allowing for skewness and time variation in the higher moments of the distribution does not further improve forecasts. In Essay 2, by using 20 years of daily Standard and Poor 500 index returns, it is found that density forecasts are much improved by allowing for constant excess kurtosis but not improved by allowing for skewness. By allowing the kurtosis and skewness to be time varying the density forecasts are not further improved but on the contrary made slightly worse. In Essay 3 a new model incorporating conditional variance, skewness and kurtosis based on the Normal Inverse Gaussian (NIG) distribution is proposed. The new model and two previously used NIG models are evaluated by their Value at Risk (VaR) forecasts on a long series of daily Standard and Poor’s 500 returns. The results show that only the new model produces satisfactory VaR forecasts for both 1% and 5% VaR Taken together the results of the thesis show that kurtosis appears not to exhibit predictable time variation, whereas there is found some predictability in the skewness. However, the dynamic properties of the skewness are not completely captured by any of the models.
Resumo:
The upstream proinflammatory interleukin-1 (IL-1) cytokines, together with a naturally occurring IL-1 receptor antagonist (IL-1Ra), play a significant role in several diseases and physiologic conditions. The IL-1 proteins affect glucose homeostasis at multiple levels contributing to vascular injuries and metabolic dysregulations that precede diabetes. An association between IL-1 gene variations and IL-1Ra levels has been suggested, and genetic studies have reported associations with metabolic dysregulation and altered inflammatory responses. The principal aims of this study were to: 1) examine the associations of IL-1 gene variation and IL-1Ra expression in the development and persistence of thyroid antibodies in subacute thyroiditis; 2) investigate the associations of common variants in the IL-1 gene family with plasma glucose and insulin concentrations, glucose homeostasis measures and prevalent diabetes in a representative population sample; 3) investigate genetic and non-genetic determinants of IL-1Ra phenotypes in a cross-sectional setting in three independent study populations; 4) investigate in a prospective setting (a) whether variants of the IL-1 gene family are predictors for clinically incident diabetes in two population-based observational cohort studies; and (b) whether the IL-1Ra levels predict the progression of metabolic syndrome to overt diabetes during the median follow-up of 10.8 and 7.1 years. Results from on patients with subacte thyroiditis showed that the systemic IL-1Ra levels are elevated during a specific proinflammatory response and they correlated with C-reactive protein (CRP) levels. Genetic variation in the IL-1 family seemed to have an association with the appearance of thyroid peroxidase antibodies and persisting local autoimmune responses during the follow-up. Analysis of patients suffering from diabetes and metabolic traits suggested that genetic IL-1 variation and IL-1Ra play a role in glucose homeostasis and in the development of type 2 diabetes. The coding IL-1 beta SNP rs1143634 was associated with traits related to insulin resistance in cross-sectional analyses. Two haplotype variants of the IL-1 beta gene were associated with prevalent diabetes or incident diabetes in a prospective setting and both of these haplotypes were tagged by rs1143634. Three variants of the IL-1Ra gene and one of the IL-1 beta gene were consistently identified as significant, independent determinants of the IL-1Ra phenotype in two or three populations. The proportion of the phenotypic variation explained by the genetic factors was modest however, while obesity and other metabolic traits explained a larger part. Body mass index was the strongest predictor of systemic IL-1Ra concentration overall. Furthermore, the age-adjusted IL-1Ra concentrations were elevated in individuals with metabolic syndrome or diabetes when compared to those free of metabolic dysregulation. In prospective analyses the systemic IL-1Ra levels were found as independent predictors for the development of diabetes in people with metabolic syndrome even after adjustment for multiple other factors, including plasma glucose and CRP levels. The predictive power of IL-1Ra was better than that of CRP. The prospective results also provided some evidence for a role of common IL-1 alpha promoter SNP rs1800587 in the development of type 2 diabetes among men and suggested that the role may be gender specific. Likewise, common variations in the IL-1 beta coding region may have a gender specific association with diabetes development. Further research on the potential benefits of IL-1Ra measurements in identifying individuals at high risk for diabetes, who then could be targeted for specific treatment interventions, is warranted. It has been reported in the recent literature that IL-1Ra secreted from adipose tissue has beneficial effects on glucose homeostasis. Furthermore, treatment with recombinant human IL-1Ra has been shown to have a substantial therapeutic potential. The genetic results from the prospective analyses performed in this study remain inconclusive, but together with the cross-sectional analyses they suggest gender-specific effects of the IL-1 variants on the risk of diabetes. Larger studies with more extensive genotyping and resequencing may help to pinpoint the exact variants responsible and to further elucidate the biological mechanisms for the observed associations. This would improve our understanding of the pathways linking inflammation and obesity with glucose and insulin metabolism.
Resumo:
The Thesis presents a state-space model for a basketball league and a Kalman filter algorithm for the estimation of the state of the league. In the state-space model, each of the basketball teams is associated with a rating that represents its strength compared to the other teams. The ratings are assumed to evolve in time following a stochastic process with independent Gaussian increments. The estimation of the team ratings is based on the observed game scores that are assumed to depend linearly on the true strengths of the teams and independent Gaussian noise. The team ratings are estimated using a recursive Kalman filter algorithm that produces least squares optimal estimates for the team strengths and predictions for the scores of the future games. Additionally, if the Gaussianity assumption holds, the predictions given by the Kalman filter maximize the likelihood of the observed scores. The team ratings allow probabilistic inference about the ranking of the teams and their relative strengths as well as about the teams’ winning probabilities in future games. The predictions about the winners of the games are correct 65-70% of the time. The team ratings explain 16% of the random variation observed in the game scores. Furthermore, the winning probabilities given by the model are concurrent with the observed scores. The state-space model includes four independent parameters that involve the variances of noise terms and the home court advantage observed in the scores. The Thesis presents the estimation of these parameters using the maximum likelihood method as well as using other techniques. The Thesis also gives various example analyses related to the American professional basketball league, i.e., National Basketball Association (NBA), and regular seasons played in year 2005 through 2010. Additionally, the season 2009-2010 is discussed in full detail, including the playoffs.
Resumo:
Tiivistelmä ReferatAbstract Metabolomics is a rapidly growing research field that studies the response of biological systems to environmental factors, disease states and genetic modifications. It aims at measuring the complete set of endogenous metabolites, i.e. the metabolome, in a biological sample such as plasma or cells. Because metabolites are the intermediates and end products of biochemical reactions, metabolite compositions and metabolite levels in biological samples can provide a wealth of information on on-going processes in a living system. Due to the complexity of the metabolome, metabolomic analysis poses a challenge to analytical chemistry. Adequate sample preparation is critical to accurate and reproducible analysis, and the analytical techniques must have high resolution and sensitivity to allow detection of as many metabolites as possible. Furthermore, as the information contained in the metabolome is immense, the data set collected from metabolomic studies is very large. In order to extract the relevant information from such large data sets, efficient data processing and multivariate data analysis methods are needed. In the research presented in this thesis, metabolomics was used to study mechanisms of polymeric gene delivery to retinal pigment epithelial (RPE) cells. The aim of the study was to detect differences in metabolomic fingerprints between transfected cells and non-transfected controls, and thereafter to identify metabolites responsible for the discrimination. The plasmid pCMV-β was introduced into RPE cells using the vector polyethyleneimine (PEI). The samples were analyzed using high performance liquid chromatography (HPLC) and ultra performance liquid chromatography (UPLC) coupled to a triple quadrupole (QqQ) mass spectrometer (MS). The software MZmine was used for raw data processing and principal component analysis (PCA) was used in statistical data analysis. The results revealed differences in metabolomic fingerprints between transfected cells and non-transfected controls. However, reliable fingerprinting data could not be obtained because of low analysis repeatability. Therefore, no attempts were made to identify metabolites responsible for discrimination between sample groups. Repeatability and accuracy of analyses can be influenced by protocol optimization. However, in this study, optimization of analytical methods was hindered by the very small number of samples available for analysis. In conclusion, this study demonstrates that obtaining reliable fingerprinting data is technically demanding, and the protocols need to be thoroughly optimized in order to approach the goals of gaining information on mechanisms of gene delivery.
Resumo:
All protein-encoding genes in eukaryotes are transcribed into messenger RNA (mRNA) by RNA Polymerase II (RNAP II), whose activity therefore needs to be tightly controlled. An important and only partially understood level of regulation is the multiple phosphorylations of RNAP II large subunit C-terminal domain (CTD). Sequential phosphorylations regulate transcription initiation and elongation, and recruit factors involved in co-transcriptional processing of mRNA. Based largely on studies in yeast models and in vitro, the kinase activity responsible for the phosphorylation of the serine-5 (Ser5) residues of RNAP II CTD has been attributed to the Mat1/Cdk7/CycH trimer as part of Transcription Factor IIH. However, due to the lack of good mammalian genetic models, the roles of both RNAP II Ser5 phosphorylation as well as TFIIH kinase in transcription have provided ambiguous results and the in vivo kinase of Ser5 has remained elusive. The primary objective of this study was to elucidate the role of mammalian TFIIH, and specifically the Mat1 subunit in CTD phosphorylation and general RNAP II-mediated transcription. The approach utilized the Cre-LoxP system to conditionally delete murine Mat1 in cardiomyocytes and hepatocytes in vivo and and in cell culture models. The results identify the TFIIH kinase as the major mammalian Ser5 kinase and demonstrate its requirement for general transcription, noted by the use of nascent mRNA labeling. Also a role for Mat1 in regulating general mRNA turnover was identified, providing a possible rationale for earlier negative findings. A secondary objective was to identify potential gene- and tissue-specific roles of Mat1 and the TFIIH kinase through the use of tissue-specific Mat1 deletion. Mat1 was found to be required for the transcriptional function of PGC-1 in cardiomyocytes. Transriptional activation of lipogenic SREBP1 target genes following Mat1 deletion in hepatocytes revealed a repressive role for Mat1apparently mediated via co-repressor DMAP1 and the DNA methyltransferase Dnmt1. Finally, Mat1 and Cdk7 were also identified as a negative regulators of adipocyte differentiation through the inhibitory phosphorylation of Peroxisome proliferator-activated receptor (PPAR) γ. Together, these results demonstrate gene- and tissue-specific roles for the Mat1 subunit of TFIIH and open up new therapeutic possibilities in the treatment of diseases such as type II diabetes, hepatosteatosis and obesity.
Resumo:
Gene mapping is a systematic search for genes that affect observable characteristics of an organism. In this thesis we offer computational tools to improve the efficiency of (disease) gene-mapping efforts. In the first part of the thesis we propose an efficient simulation procedure for generating realistic genetical data from isolated populations. Simulated data is useful for evaluating hypothesised gene-mapping study designs and computational analysis tools. As an example of such evaluation, we demonstrate how a population-based study design can be a powerful alternative to traditional family-based designs in association-based gene-mapping projects. In the second part of the thesis we consider a prioritisation of a (typically large) set of putative disease-associated genes acquired from an initial gene-mapping analysis. Prioritisation is necessary to be able to focus on the most promising candidates. We show how to harness the current biomedical knowledge for the prioritisation task by integrating various publicly available biological databases into a weighted biological graph. We then demonstrate how to find and evaluate connections between entities, such as genes and diseases, from this unified schema by graph mining techniques. Finally, in the last part of the thesis, we define the concept of reliable subgraph and the corresponding subgraph extraction problem. Reliable subgraphs concisely describe strong and independent connections between two given vertices in a random graph, and hence they are especially useful for visualising such connections. We propose novel algorithms for extracting reliable subgraphs from large random graphs. The efficiency and scalability of the proposed graph mining methods are backed by extensive experiments on real data. While our application focus is in genetics, the concepts and algorithms can be applied to other domains as well. We demonstrate this generality by considering coauthor graphs in addition to biological graphs in the experiments.