954 resultados para Presence-only data
Resumo:
Large-scale chromosome rearrangements such as copy number variants (CNVs) and inversions encompass a considerable proportion of the genetic variation between human individuals. In a number of cases, they have been closely linked with various inheritable diseases. Single-nucleotide polymorphisms (SNPs) are another large part of the genetic variance between individuals. They are also typically abundant and their measuring is straightforward and cheap. This thesis presents computational means of using SNPs to detect the presence of inversions and deletions, a particular variety of CNVs. Technically, the inversion-detection algorithm detects the suppressed recombination rate between inverted and non-inverted haplotype populations whereas the deletion-detection algorithm uses the EM-algorithm to estimate the haplotype frequencies of a window with and without a deletion haplotype. As a contribution to population biology, a coalescent simulator for simulating inversion polymorphisms has been developed. Coalescent simulation is a backward-in-time method of modelling population ancestry. Technically, the simulator also models multiple crossovers by using the Counting model as the chiasma interference model. Finally, this thesis includes an experimental section. The aforementioned methods were tested on synthetic data to evaluate their power and specificity. They were also applied to the HapMap Phase II and Phase III data sets, yielding a number of candidates for previously unknown inversions, deletions and also correctly detecting known such rearrangements.
Resumo:
Analyzing statistical dependencies is a fundamental problem in all empirical science. Dependencies help us understand causes and effects, create new scientific theories, and invent cures to problems. Nowadays, large amounts of data is available, but efficient computational tools for analyzing the data are missing. In this research, we develop efficient algorithms for a commonly occurring search problem - searching for the statistically most significant dependency rules in binary data. We consider dependency rules of the form X->A or X->not A, where X is a set of positive-valued attributes and A is a single attribute. Such rules describe which factors either increase or decrease the probability of the consequent A. A classical example are genetic and environmental factors, which can either cause or prevent a disease. The emphasis in this research is that the discovered dependencies should be genuine - i.e. they should also hold in future data. This is an important distinction from the traditional association rules, which - in spite of their name and a similar appearance to dependency rules - do not necessarily represent statistical dependencies at all or represent only spurious connections, which occur by chance. Therefore, the principal objective is to search for the rules with statistical significance measures. Another important objective is to search for only non-redundant rules, which express the real causes of dependence, without any occasional extra factors. The extra factors do not add any new information on the dependence, but can only blur it and make it less accurate in future data. The problem is computationally very demanding, because the number of all possible rules increases exponentially with the number of attributes. In addition, neither the statistical dependency nor the statistical significance are monotonic properties, which means that the traditional pruning techniques do not work. As a solution, we first derive the mathematical basis for pruning the search space with any well-behaving statistical significance measures. The mathematical theory is complemented by a new algorithmic invention, which enables an efficient search without any heuristic restrictions. The resulting algorithm can be used to search for both positive and negative dependencies with any commonly used statistical measures, like Fisher's exact test, the chi-squared measure, mutual information, and z scores. According to our experiments, the algorithm is well-scalable, especially with Fisher's exact test. It can easily handle even the densest data sets with 10000-20000 attributes. Still, the results are globally optimal, which is a remarkable improvement over the existing solutions. In practice, this means that the user does not have to worry whether the dependencies hold in future data or if the data still contains better, but undiscovered dependencies.
Resumo:
Wireless access is expected to play a crucial role in the future of the Internet. The demands of the wireless environment are not always compatible with the assumptions that were made on the era of the wired links. At the same time, new services that take advantage of the advances in many areas of technology are invented. These services include delivery of mass media like television and radio, Internet phone calls, and video conferencing. The network must be able to deliver these services with acceptable performance and quality to the end user. This thesis presents an experimental study to measure the performance of bulk data TCP transfers, streaming audio flows, and HTTP transfers which compete the limited bandwidth of the GPRS/UMTS-like wireless link. The wireless link characteristics are modeled with a wireless network emulator. We analyze how different competing workload types behave with regular TPC and how the active queue management, the Differentiated services (DiffServ), and a combination of TCP enhancements affect the performance and the quality of service. We test on four link types including an error-free link and the links with different Automatic Repeat reQuest (ARQ) persistency. The analysis consists of comparing the resulting performance in different configurations based on defined metrics. We observed that DiffServ and Random Early Detection (RED) with Explicit Congestion Notification (ECN) are useful, and in some conditions necessary, for quality of service and fairness because a long queuing delay and congestion related packet losses cause problems without DiffServ and RED. However, we observed situations, where there is still room for significant improvements if the link-level is aware of the quality of service. Only very error-prone link diminishes the benefits to nil. The combination of TCP enhancements improves performance. These include initial window of four, Control Block Interdependence (CBI) and Forward RTO recovery (F-RTO). The initial window of four helps a later starting TCP flow to start faster but generates congestion under some conditions. CBI prevents slow-start overshoot and balances slow start in the presence of error drops, and F-RTO reduces unnecessary retransmissions successfully.
Resumo:
The oxidase-peroxidase from Datura innoxia which catalyses the oxidation of formylphenylacetic acid ethyl ester to benzoylformic acid ethyl ester and formic acid was also found to catalyse the oxidation of NADH in the presence of Mn2+ and formylphenylacetic acid ethyl ester. NADH was not oxidized in the absence of formylphenylacetic acid ethyl ester, although formylphenylacetonitrile or phenylacetaldehyde could replace it in the reaction. The reaction appeared to be complex and for every mol of NADH oxidized 3-4 g-atoms of oxygen were utilized, with a concomitant formation of approx. 0.8 mol of H2O2, the latter being identified by the starch-iodide test and decomposition by catalase. Benzoylformic acid ethyl ester was also formed in the reaction, but in a nonlinear fashion, indicating a lag phase. In the absence of Mn2+, NADH oxidation was not only very low, but itself inhibited the formation of benzoylformic acid ethyl ester from formylphenylacetic acid ethyl ester. A reaction mechanism for the oxidation of NADH in the presence of formylphenylacetic acid ethyl ester is proposed.
Resumo:
The dissertation consists of three essays on misplanning wealth and health accumulation. The conventional economics assumes that individual's intertemporal preferences are exponential (exponential preferences, EP). Recent findings in behavioural economics have shown that, actually, people do discount near future relatively heavier than distant future. This implies hyperbolic intertemporal preferences (HP). Essays I and II concentrate especially on the effects of a delayed completion of tasks, a feature of behaviour that HP enables. Essay III uses current Finnish data to analyse the evolvement of the quality adjusted life years (QALYs) and inconsistencies in measuring that. Essay I studies the existence effects of a lucrative retirement savings program (SP) on the retirement savings of different individual types having HP. If the individual does not know that he will have HP also in the future, i.e. he is the naïve, for certain conditions, he delays the enrolment on SP until he abandons it. Very interesting finding is that the naïve retires then poorer in the presence than in the absence of SP. For the same conditions, the individual who knows that he will have HP also in the future, i.e. he is the sophisticated, gains from the existence of SP, and retires with greater retirement savings in the presence than in the absence of SP. Finally, capabilities to learn from past behaviour and about intertemporal preferences improve possibilities to gain from the existence but an adequate time to learn must be then guaranteed. Essay II studies delayed doctor's visits, theirs effects on the costs of a public health care system and government's attempts to control patient behaviour and fund the system. The controlling devices are a consultation fee and a deductible for that. The deductible is effective only for a patient whose diagnosis reveals a disease that would not get cured without the doctor's visit. The naives delay their visits the longest while EP-patients are the quickest visitors. To control the naives, the government should implement a low fee and a high deductible, while for the sophisticates the opposite is true. Finally, if all the types exist in an economy then using an incorrect conventional assumption that all individuals have EP leads to worse situation and requires higher tax rates than assuming incorrectly but unconventionally that only the naives exists. Essay III studies the development of QALYs in Finland 1995/96-2004. The essay concentrates on developing a consistent measure, i.e. independent of discounting, for measuring the age and gender specific QALY-changes and their incidences. For the given time interval, use of a relative change out of an attainable change seems to be almost intact to discounting and reveals that the greatest gains are for older age groups.
Resumo:
Glial cell line-derived neurotrophic factor (GDNF) and its family members neurturin (NRTN), artemin (ARTN) and persephin (PSPN) are growth factors, which are involved in the development, differentiation and maintenance of many neuron types. In addition, they function outside of the nervous system, e.g. in the development of kidney, testis and liver. GDNF family ligand (GFL) signalling happens through a tetrameric receptor complex, which includes two glycosylphosphatidylinositol (GPI)-anchored GDNF family receptor (GFRα) molecules and two RET (rearranged during transfection) receptor tyrosine kinases. Each of the ligands binds preferentially one of the four GFRα receptors: GDNF binds to GFRα1, NRTN to GFRα2, ARTN to GFRα3 and PSPN to GFRα4. The signal is then delivered by RET, which cannot bind the GFLs on its own, but can bind the GFL-GFRα complex. Under normal cellular conditions, RET is only phosphorylated on the cell surface after ligand binding. At least the GDNF-GFRα1 complex is believed to recruit RET to lipid rafts, where downstream signalling occurs. In general, GFRαs consist of three cysteine-rich domains, but all GFRα4s except for chicken GFRα4 lack domain 1 (D1). We characterised the biochemical and cell biological properties of mouse PSPN receptor GFRα4 and showed that it has a significantly weaker capacity than GFRα1 to recruit RET to the lipid rafts. In spite of that, it can phosphorylate RET in the presence of PSPN and contribute to neuronal differentiation and survival. Therefore, the recruitment of RET to the lipid rafts does not seem to be crucial for the biological activity of all GFRα receptors. Secondly, we demonstrated that GFRα1 D1 stabilises the GDNF-GFRα1 complex and thus affects the phosphorylation of RET and contributes to the biological activity. This may be important in physiological conditions, where the concentration of the ligand or the soluble GFRα1 receptor is low. Our results also suggest a role for D1 in heparin binding and, consequently, in the biodistribution of released GFRα1 or in the formation of the GFL-GFRα-RET complex. We also presented the crystallographic structure of GDNF in the complex with GFRα1 domains 2 and 3. The structure differs from the previously published ARTN-GFRα3 structure in three significant ways. The biochemical data verify the structure and reveal residues participating in the interactions between GFRα1 and GDNF, and preliminarily also between GFRα1 and RET and heparin. Finally, we showed that, the precursor of the oncogenic MEN 2B (multiple endocrine neoplasia type 2) form of RET gets phosphorylated already during its synthesis in the endoplasmic reticulum (ER). We also demonstrated that it associates with Src homology 2 domain-containing protein (SHC) and growth factor receptor-bound protein (GRB2) in the ER, and has the capacity to activate several downstream signalling molecules.
Resumo:
Jacalin [Artocarpus integrifolia (jack fruit) agglutinin] is made up of two types of chains, heavy and light, with M(r) values of 16,200 +/- 1200 and 2090 +/- 300 respectively (on the basis of gel-permeation chromatography under denaturing conditions). Its complete amino acid sequence was determined by manual degradation using a 4-dimethylaminoazobenzene 4'-isothiocyanate double-coupling method. Peptide fragments for sequence analysis were obtained by chemical cleavages of the heavy chain with CNBr, hydroxylamine hydrochloride and iodosobenzoic acid and enzymic cleavage with Staphylococcus aureus proteinase. The peptides were purified by a combination gel-permeation and reverse-phase chromatography. The light chains, being only 20 residues long, could be sequenced without fragmentation. Amino acid analyses and carboxypeptidase-Y-digestion C-terminal analyses of the subunits provided supportive evidence for their sequence. Computer-assisted alignment of the jacalin heavy-chain sequence failed to show sequence similarity to that of any lectin for which the complete sequence is known. Analyses of the sequence showed the presence of an internal repeat spanning residues 7-64 and 76-130. The internal repeat was found to be statistically significant.
Resumo:
The thermodynamics of tie binding of calcium and magnesium ions to a calcium binding protein from Entamoeba histolytica was investigated by isothermal titration calorimetry (ITC) in 20 mM MOPS buffer (pH 7.0) at 20 degrees C. Enthalpy titration curves of calcium show the presence of four Ca2+ binding sites, There exist two low-affinity sites for Ca2+, both of which are exothermic in nature and with positive cooperative interaction between them. Two other high affinity sites for Ca2+ exist of which one is endothermic and the other exothermic, again with positive cooperative interaction. The binding constants for Ca2+ at the four sites have been verified by a competitive binding assay, where CaBP competes with a chromophoric chelator 5, 5'-Br-2 BAPTA to bind Ca2+ and a Ca2+ titration employing intrinsic tyrosine fluorescence of the protein, The enthalpy of titration of magnesium in the absence of calcium is single site and endothermic in nature. In the case of the titrations performed using protein presaturated with magnesium, the amount of heat produced is altered. Further, the interaction between the high-affinity sites changes to negative cooperativity. No exchange of heat was observed throughout the addition of magnesium in the presence of 1 mM calcium, Titrations performed on a cleaved peptide comprising the N-terminus and the central linker show the existence of two Ca2+ specific sites, These results indicate that this CaBP has one high-affinity Ca-Mg site, one high-affinity Ca-specific site, and two low-affinity Ca-specific sites. The thermodynamic parameters of the binding of these metal ions were used to elucidate the energetics at the individual site(s) and the interactions involved therein at various concentrations of the denaturant, guanidine hydrochloride, ranging from 0.05 to 6.5 M. Unfolding of the protein was also monitored by titration calorimetry as a function of the concentration of the denaturant. These data show that at a GdnHCl concentration of 0.25 M the binding affinity for the Mg2+ ion is lost and there are only two sites which can bind to Ca2+, with substantial loss cooperativity. At concentrations beyond 2.5 M GdnHCl, at which the unfolding of the tertiary structure of this protein is observed by near UV CD spectroscopy, the binding of Ca2+ ions is lost. We thus show that the domain containing the two low-affinity sites is the first to unfold in the presence of GdnHCl. Control experiments with change in ionic strength by addition of KCI in the range 0.25-1 M show the existence of four sites with altered ion binding parameters.
Resumo:
We report cloning of the DNA encoding winged bean basic agglutinin (WBA I). Using oligonucleotide primers corresponding to N- and C-termini of the mature lectin, the complete coding sequence for WBA I could be amplified from genomic DNA. DNA sequence determination by the chain termination method revealed the absence of any intervening sequences in the gene. The DNA deduced amino acid sequence of WBA I displayed some differences with its primary structure established previously by chemical means. Comparison of the sequence of WBA I with that of other legume lectins highlighted several interesting features, including the existence of the largest specificity determining loop which might account for its oligosaccharide-binding specificity and the presence of an additional N-glycosylation site. These data also throw some light on the relationship between the primary structure of the protein and its probable mode of dimerization.
Resumo:
We present some results on multicarrier analysis of magnetotransport data, Both synthetic as well as data from narrow gap Hg0.8Cd0.2Te samples are used to demonstrate applicability of various algorithms vs. nonlinear least square fitting, Quantitative Mobility Spectrum Analysis (QMSA) and Maximum Entropy Mobility Spectrum Analysis (MEMSA). Comments are made from our experience oil these algorithms, and, on the inversion procedure from experimental R/sigma-B to S-mu specifically with least square fitting as an example. Amongst the conclusions drawn are: (i) Experimentally measured resistivity (R-xx, R-xy) should also be used instead of just the inverted conductivity (sigma(xx), sigma(xy)) to fit data to semiclassical expressions for better fits especially at higher B. (ii) High magnetic field is necessary to extract low mobility carrier parameters. (iii) Provided the error in data is not large, better estimates to carrier parameters of remaining carrier species can be obtained at any stage by subtracting highest mobility carrier contribution to sigma from the experimental data and fitting with the remaining carriers. (iv)Even in presence of high electric field, an approximate multicarrier expression can be used to guess the carrier mobilities and their variations before solving the full Boltzmann equation.
Resumo:
The idea of extracting knowledge in process mining is a descendant of data mining. Both mining disciplines emphasise data flow and relations among elements in the data. Unfortunately, challenges have been encountered when working with the data flow and relations. One of the challenges is that the representation of the data flow between a pair of elements or tasks is insufficiently simplified and formulated, as it considers only a one-to-one data flow relation. In this paper, we discuss how the effectiveness of knowledge representation can be extended in both disciplines. To this end, we introduce a new representation of the data flow and dependency formulation using a flow graph. The flow graph solves the issue of the insufficiency of presenting other relation types, such as many-to-one and one-to-many relations. As an experiment, a new evaluation framework is applied to the Teleclaim process in order to show how this method can provide us with more precise results when compared with other representations.
Resumo:
With transplant rejection rendered a minor concern and survival rates after liver transplantation (LT) steadily improving, long-term complications are attracting more attention. Current immunosuppressive therapies, together with other factors, are accompanied by considerable long-term toxicity, which clinically manifests as renal dysfunction, high risk for cardiovascular disease, and cancer. This thesis investigates the incidence, causes, and risk factors for such renal dysfunction, cardiovascular risk, and cancer after LT. Long-term effects of LT are further addressed by surveying the quality of life and employment status of LT recipients. The consecutive patients included had undergone LT at Helsinki University Hospital from 1982 onwards. Data regarding renal function – creatinine and estimated glomerular filtration rate (GFR) – were recorded before and repeatedly after LT in 396 patients. The presence of hypertension, dyslipidemia, diabetes, impaired fasting glucose, and overweight/obesity before and 5 years after LT was determined among 77 patients transplanted for acute liver failure. The entire cohort of LT patients (540 patients), including both children and adults, was linked with the Finnish Cancer Registry, and numbers of cancers observed were compared to site-specific expected numbers based on national cancer incidence rates stratified by age, gender, and calendar time. Health-related quality of life (HRQoL), measured by the 15D instrument, and employment status were surveyed among all adult patients alive in 2007 (401 patients). The response rate was 89%. Posttransplant cardiovascular risk factor prevalence and HRQoL were compared with that in the age- and gender-matched Finnish general population. The cumulative risk for chronic kidney disease increased from 10% at 5 years to 16% at 10 years following LT. GFR up to 10 years after LT could be predicted by the GFR at 1 year. In patients transplanted for chronic liver disease, a moderate correlation of pretransplant GFR with later GFR was also evident, whereas in acute liver failure patients after LT, even severe pretransplant renal dysfunction often recovered. By 5 years after LT, 71% of acute liver failure patients were receiving antihypertensive medications, 61% were exhibiting dyslipidemia, 10% were diabetic, 32% were overweight, and 13% obese. Compared with the general population, only hypertension displayed a significantly elevated prevalence among patients – 2.7-fold – whereas patients exhibited 30% less dyslipidemia and 71% less impaired fasting glucose. The cumulative incidence of cancer was 5% at 5 years and 13% at 10. Compared with the general population, patients were subject to a 2.6-fold cancer risk, with non-melanoma skin cancer (standardized incidence ratio, SIR, 38.5) and non-Hodgkin lymphoma (SIR 13.9) being the predominant malignancies. Non-Hodgkin lymphoma was associated with male gender, young age, and the immediate posttransplant period, whereas old age and antibody induction therapy raised skin-cancer risk. HRQoL deviated clinically unimportantly from the values in the general population, but significant deficits among patients were evident in some physical domains. HRQoL did not seem to decrease with longer follow-up. Although 87% of patients reported improved working capacity, data on return to working life showed marked age-dependency: Among patients aged less than 40 at LT, 70 to 80% returned to work, among those aged 40 to 50, 55%, and among those above 50, 15% to 28%. The most common cause for unemployment was early retirement before LT. Those patients employed exhibited better HRQoL than those unemployed. In conclusion, although renal impairment, hypertension, and cancer are evidently common after LT and increase with time, patients’ quality of life remains comparable with that of the general population.
Resumo:
Predicting temporal responses of ecosystems to disturbances associated with industrial activities is critical for their management and conservation. However, prediction of ecosystem responses is challenging due to the complexity and potential non-linearities stemming from interactions between system components and multiple environmental drivers. Prediction is particularly difficult for marine ecosystems due to their often highly variable and complex natures and large uncertainties surrounding their dynamic responses. Consequently, current management of such systems often rely on expert judgement and/or complex quantitative models that consider only a subset of the relevant ecological processes. Hence there exists an urgent need for the development of whole-of-systems predictive models to support decision and policy makers in managing complex marine systems in the context of industry based disturbances. This paper presents Dynamic Bayesian Networks (DBNs) for predicting the temporal response of a marine ecosystem to anthropogenic disturbances. The DBN provides a visual representation of the problem domain in terms of factors (parts of the ecosystem) and their relationships. These relationships are quantified via Conditional Probability Tables (CPTs), which estimate the variability and uncertainty in the distribution of each factor. The combination of qualitative visual and quantitative elements in a DBN facilitates the integration of a wide array of data, published and expert knowledge and other models. Such multiple sources are often essential as one single source of information is rarely sufficient to cover the diverse range of factors relevant to a management task. Here, a DBN model is developed for tropical, annual Halophila and temperate, persistent Amphibolis seagrass meadows to inform dredging management and help meet environmental guidelines. Specifically, the impacts of capital (e.g. new port development) and maintenance (e.g. maintaining channel depths in established ports) dredging is evaluated with respect to the risk of permanent loss, defined as no recovery within 5 years (Environmental Protection Agency guidelines). The model is developed using expert knowledge, existing literature, statistical models of environmental light, and experimental data. The model is then demonstrated in a case study through the analysis of a variety of dredging, environmental and seagrass ecosystem recovery scenarios. In spatial zones significantly affected by dredging, such as the zone of moderate impact, shoot density has a very high probability of being driven to zero by capital dredging due to the duration of such dredging. Here, fast growing Halophila species can recover, however, the probability of recovery depends on the presence of seed banks. On the other hand, slow growing Amphibolis meadows have a high probability of suffering permanent loss. However, in the maintenance dredging scenario, due to the shorter duration of dredging, Amphibolis is better able to resist the impacts of dredging. For both types of seagrass meadows, the probability of loss was strongly dependent on the biological and ecological status of the meadow, as well as environmental conditions post-dredging. The ability to predict the ecosystem response under cumulative, non-linear interactions across a complex ecosystem highlights the utility of DBNs for decision support and environmental management.
Resumo:
Deoxyhypusine synthase, an NAD(+)-dependent enzyme, catalyzes the first step in the post-translational synthesis of an unusual amino acid, hypusine (N-epsilon-(4-amino-2-hydroxybutyl)lysine), in the eukaryotic initiation factor 5A precursor protein. Two putative deoxyhypusine synthase (DHS) sequences have been identified in the Leishmania donovani genome, which are present on chromosomes 20: DHSL20 (DHS-like gene from chromosome 20) and DHS34 (DHS from chromosome 34). Although both sequences exhibit an overall conservation of key residues, DHSL20 protein lacks a critical lysine residue, and the recombinant protein showed no DHS activity in vitro. However, DHS34 contains the critical lysine residue, and the recombinant DHS34 effectively catalyzed deoxyhypusine synthesis. Furthermore, in vivo labeling confirmed that hypusination of eukaryotic initiation factor 5A occurs in intact Leishmania parasites. Interestingly, the DHS34 is much longer, with 601 amino acids, compared with the human DHS enzyme (369 amino acids) and contains several unique insertions. To study the physiological role of DHS34 in Leishmania, gene deletion mutations were attempted via targeted gene replacement. However, chromosomal null mutants of DHS34 could only be obtained in the presence of a DHS34-containing episome. The present data provide evidence that DHS34 is essential for L. donovani and that structural differences in the human and leishmanial DHS enzyme may be exploited for designing selective inhibitors against the parasite.
Resumo:
The solubilities of three chlorophenols, namely, 4-chlorophenol, 2,4-dichlorophenol, and 2,4,6-trichlorophenol, in supercritical carbon dioxide were determined at temperatures from (308 to 3 18) K in the pressure range of (8.8 to 15.6) MPa. The Solubilities were determined both in the absence of cosolvents and in the presence of two cosolvents, methanol and acetone. The solubilities (in the absence of cosolvents) in mole fraction of 4-chlorophenol, 2,4-dichlorophenol, and 2,4,6-trichlorophenol at 308 K were in the range of (0.0113 to 0.0215), (0.0312 to 0.0645), and (0.008 to 0.0173), respectively. The Solubilities of the chlorophenols followed the order 2,4-dichlorophenol & 4-chlorophenol & phenol & 2,4,6-trichlorophenol & pentachlorophenol. The solubility data were correlated with the Charstil model and with the Mendez-Santiago and Teja model. The overall deviation between the experimental data and the correlated results Was less than 6 % in averaged absolute relative deviation (AARD) for both of the models.