23 resultados para Unicode Common Locale Data Repository
em University of Queensland eSpace - Australia
Resumo:
Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571-578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.
Resumo:
Even when data repositories exhibit near perfect data quality, users may formulate queries that do not correspond to the information requested. Users’ poor information retrieval performance may arise from either problems understanding of the data models that represent the real world systems, or their query skills. This research focuses on users’ understanding of the data structures, i.e., their ability to map the information request and the data model. The Bunge-Wand-Weber ontology was used to formulate three sets of hypotheses. Two laboratory experiments (one using a small data model and one using a larger data model) tested the effect of ontological clarity on users’ performance when undertaking component, record, and aggregate level tasks. The results indicate for the hypotheses associated with different representations but equivalent semantics that parsimonious data model participants performed better for component level tasks but that ontologically clearer data model participants performed better for record and aggregate level tasks.
Resumo:
A data warehouse is a data repository which collects and maintains a large amount of data from multiple distributed, autonomous and possibly heterogeneous data sources. Often the data is stored in the form of materialized views in order to provide fast access to the integrated data. One of the most important decisions in designing a data warehouse is the selection of views for materialization. The objective is to select an appropriate set of views that minimizes the total query response time with the constraint that the total maintenance time for these materialized views is within a given bound. This view selection problem is totally different from the view selection problem under the disk space constraint. In this paper the view selection problem under the maintenance time constraint is investigated. Two efficient, heuristic algorithms for the problem are proposed. The key to devising the proposed algorithms is to define good heuristic functions and to reduce the problem to some well-solved optimization problems. As a result, an approximate solution of the known optimization problem will give a feasible solution of the original problem. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
Markov chain Monte Carlo (MCMC) is a methodology that is gaining widespread use in the phylogenetics community and is central to phylogenetic software packages such as MrBayes. An important issue for users of MCMC methods is how to select appropriate values for adjustable parameters such as the length of the Markov chain or chains, the sampling density, the proposal mechanism, and, if Metropolis-coupled MCMC is being used, the number of heated chains and their temperatures. Although some parameter settings have been examined in detail in the literature, others are frequently chosen with more regard to computational time or personal experience with other data sets. Such choices may lead to inadequate sampling of tree space or an inefficient use of computational resources. We performed a detailed study of convergence and mixing for 70 randomly selected, putatively orthologous protein sets with different sizes and taxonomic compositions. Replicated runs from multiple random starting points permit a more rigorous assessment of convergence, and we developed two novel statistics, delta and epsilon, for this purpose. Although likelihood values invariably stabilized quickly, adequate sampling of the posterior distribution of tree topologies took considerably longer. Our results suggest that multimodality is common for data sets with 30 or more taxa and that this results in slow convergence and mixing. However, we also found that the pragmatic approach of combining data from several short, replicated runs into a metachain to estimate bipartition posterior probabilities provided good approximations, and that such estimates were no worse in approximating a reference posterior distribution than those obtained using a single long run of the same length as the metachain. Precision appears to be best when heated Markov chains have low temperatures, whereas chains with high temperatures appear to sample trees with high posterior probabilities only rarely. [Bayesian phylogenetic inference; heating parameter; Markov chain Monte Carlo; replicated chains.]
Resumo:
We present the design rationale and basic workings of a low-cost, easy-to-use power system simulator developed to support investigations into human interface design for a hydropower plant. The power system simulator is based on three important components: models of power system components, a data repository, and human interface elements. Dynamic Data Exchange (DDE) allows simulator components to communicate with each other within the simulator. To construct the modules of the simulator we have combined the advantages of commercial software such as Matlab/Simulink, ActiveX Control, Visual Basic and Excel and integrated them in the simulator. An important advantage of our approach is that further components of the simulator now can be developed independently. An initial assessment of the simulator indicates it is fit for intended purpose.
Resumo:
This document records the process of migrating eprints.org data to a Fez repository. Fez is a Web-based digital repository and workflow management system based on Fedora (http://www.fedora.info/). At the time of migration, the University of Queensland Library was using EPrints 2.2.1 [pepper] for its ePrintsUQ repository. Once we began to develop Fez, we did not upgrade to later versions of eprints.org software since we knew we would be migrating data from ePrintsUQ to the Fez-based UQ eSpace. Since this document records our experiences of migration from an earlier version of eprints.org, anyone seeking to migrate eprints.org data into a Fez repository might encounter some small differences. Moving UQ publication data from an eprints.org repository into a Fez repository (hereafter called UQ eSpace (http://espace.uq.edu.au/) was part of a plan to integrate metadata (and, in some cases, full texts) about all UQ research outputs, including theses, images, multimedia and datasets, in a single repository. This tied in with the plan to identify and capture the research output of a single institution, the main task of the eScholarshipUQ testbed for the Australian Partnership for Sustainable Repositories project (http://www.apsr.edu.au/). The migration could not occur at UQ until the functionality in Fez was at least equal to that of the existing ePrintsUQ repository. Accordingly, as Fez development occurred throughout 2006, a list of eprints.org functionality not currently supported in Fez was created so that programming of such development could be planned for and implemented.
Resumo:
This paper reviews the key features of an environment to support domain users in spatial information system (SIS) development. It presents a full design and prototype implementation of a repository system for the storage and management of metadata, focusing on a subset of spatial data integrity constraint classes. The system is designed to support spatial system development and customization by users within the domain that the system will operate.
Resumo:
Several constitutively active mutant forms of the common β subunit of the human IL-3, IL-5 and GM-CSF receptors (hβc), which enable it to signal in the absence of ligand, have recently been described. Two of these, V449E and I374N, are amino acid substitutions in the transmembrane and extracellular regions of hβc, respectively. A third, FIΔ, contains a 37 amino acid duplication in the extracellular domain. We have shown previously that when expressed in primary murine haemopoietic cells, the extracellular mutants confer factor-independence on cells of the neutrophil and monocyte lineages only, whereas V449E does so on all cell types of the myeloid and erythroid compartments. To study the in vivo effects and leukaemic potential of these mutants, we have expressed all three in mice by bone marrow reconstitution using retrovirally infected donor cells. Expression of the extracellular mutants leads to an early onset, chronic myeloproliferative disorder marked by elevations in the neutrophil, monocyte, erythrocyte and platelet lineages. In contrast, expression of V449E leads to an acute leukaemia-like syndrome of anaemia, thrombocytopaenia and blast cell expansion. These data support the possibility that activating mutations in hβc are involved in haemopoietic disorders in man.
Resumo:
The majority of common diseases such as cancer, allergy, diabetes, or heart disease are characterized by complex genetic traits, in which genetic and environmental components contribute to disease susceptibility. Our knowledge of the genetic factors underlying most of such diseases is limited. A major goal in the post-genomic era is to identify and characterize disease susceptibility genes and to use this knowledge for disease treatment and prevention. More than 500 genes are conserved across the invertebrate and vertebrate genomes. Because of gene conservation, various organisms including yeast, fruitfly, zebrafish, rat, and mouse have been used as genetic models.
Resumo:
In order to understand the determinants of schistosome-related hepato- and spleno-megaly better, 14 002 subjects aged 3-60 years (59% male; mean age =32 years) were randomly selected from 43 villages, all in Hunan province, China, where schistosomiasis caused by Schistosoma japonicum is endemic. The abdomen of each subject was examined along the mid-sternal (MSL) and mid-clavicular lines, for evidence of current hepato- and/or spleno-megaly, and a questionnaire was used to collect information on the medical history of each individual. Current infections with S. japonicum were detected by stool examination. Almost all (99.8%) of the subjects were ethnically Han by descent and most (77%) were engaged in farming. Although schistosomiasis appeared common (42% of the subjects claiming to have had the disease), only 45% of the subjects said they had received anti-schistosomiasis drugs. Overall, 1982 (14%) of the subjects had S. japonicum infections (as revealed by miracidium-hatching tests and/or Katon Katz smears) when examined and 22% had palpable hepatomegaly (i.e. enlargement of at least 3 cm along the MSL), although only 2.5% had any form of detectable splenomegaly (i.e. a Hackett's grade of at least 1). Multiple logistic regression revealed that male subjects, fishermen, farmers, subjects aged greater than or equal to 25 years, subjects with a history of schistosomiasis, and subjects who had had bloody stools in the previous 2 weeks were all at relatively high risk of hepato- and/or spleno-megaly. In areas moderately endemic for Schistosoma japonicum, occupational exposure and disease history appear to be good predictors of current disease status among older residents. These results reconfirm those reported earlier in the same region.
Resumo:
It is generally accepted that two major gene pools exist in cultivated common bean (Phaseolus vulgaris L.), a Middle American and an Andean one. Some evidence, based on unique phaseolin morphotypes and AFLP analysis, suggests that at least one more gene pool exists in cultivated common bean. To investigate this hypothesis, 1072 accessions from a common bean core collection from the primary centres of origin, held at CIAT, were investigated. Various agronomic and morphological attributes (14 categorical and 11 quantitative) were measured. Multivariate analyses, consisting of homogeneity analysis and clustering for categorical data, clustering and ordination techniques for quantitative data and nonlinear principal component analysis for mixed data, were undertaken. The results of most analyses supported the existence of the two major gene pools. However, the analysis of categorical data of protein types showed an additional minor gene pool. The minor gene pool is designated North Andean and includes phaseolin types CH, S and T; lectin types 312, Pr, B and K; and mostly A5, A6 and A4 types alpha-amylase inhibitor. Analysis of the combined categorical data of protein types and some plant categorical data also suggested that some other germplasm with C type phaseolin are distinguished from the major gene pools.
Resumo:
We compare Bayesian methodology utilizing free-ware BUGS (Bayesian Inference Using Gibbs Sampling) with the traditional structural equation modelling approach based on another free-ware package, Mx. Dichotomous and ordinal (three category) twin data were simulated according to different additive genetic and common environment models for phenotypic variation. Practical issues are discussed in using Gibbs sampling as implemented by BUGS to fit subject-specific Bayesian generalized linear models, where the components of variation may be estimated directly. The simulation study (based on 2000 twin pairs) indicated that there is a consistent advantage in using the Bayesian method to detect a correct model under certain specifications of additive genetics and common environmental effects. For binary data, both methods had difficulty in detecting the correct model when the additive genetic effect was low (between 10 and 20%) or of moderate range (between 20 and 40%). Furthermore, neither method could adequately detect a correct model that included a modest common environmental effect (20%) even when the additive genetic effect was large (50%). Power was significantly improved with ordinal data for most scenarios, except for the case of low heritability under a true ACE model. We illustrate and compare both methods using data from 1239 twin pairs over the age of 50 years, who were registered with the Australian National Health and Medical Research Council Twin Registry (ATR) and presented symptoms associated with osteoarthritis occurring in joints of the hand.
Resumo:
Mycorthizae play a critical role in nutrient capture from soils. Arbuscular mycorrhizae (AM) and ectomycorrhizae (EM) are the most important mycorrhizae in agricultural and natural ecosystems. AM and EM fungi use inorganic NH4+ and NO3-, and most EM fungi are capable of using organic nitrogen. The heavier stable isotope N-15 is discriminated against during biogeochemical and biochemical processes. Differences in N-15 (atom%) or delta(15)N (parts per thousand) provide nitrogen movement information in an experimental system. A range of 20 to 50% of one-way N-transfer has been observed from legumes to nonlegumes. Mycorrhizal fungal mycelia can extend from one plant's roots to another plant's roots to form common mycorrhizal networks (CMNs). Individual species, genera, even families of plants can be interconnected by CMNs. They are capable of facilitating nutrient uptake and flux. Nutrients such as carbon, nitrogen and phosphorus and other elements may then move via either AM or EM networks from plant to plant. Both N-15 labeling and N-15 natural abundance techniques have been employed to trace N movement between plants interconnected by AM or EM networks. Fine mesh (25similar to45 mum) has been used to separate root systems and allow only hyphal penetration and linkages but no root contact between plants. In many studies, nitrogen from N-2-fixing mycorrhizal plants transferred to non-N-2-fixing mycorrhizal plants (one-way N-transfer). In a few studies, N is also transferred from non-N-2-fixing mycorrhizal plants to N-2-fixing mycorrhizal plants (two-way N-transfer). There is controversy about whether N-transfer is direct through CMNs, or indirect through the soil. The lack of convincing data underlines the need for creative, careful experimental manipulations. Nitrogen is crucial to productivity in most terrestrial ecosystems, and there are potential benefits of management in soil-plant systems to enhance N-transfer. Thus, two-way N-transfer warrants further investigation with many species and under field conditions.
Resumo:
Objective. To determine whether squamous cervical cancers exhibit mutations or deletions in MHC class I genes or transport-associated protein (TAP) genes. Methods. Polymerase chain reaction based protocols were used to examine HLA class I and TAP genes in a panel of cervical tumours, using DNA from corresponding blood samples as controls. SSP-PCR protocols were similarly used for examination of all TAP alleles in tumour and blood samples. Results. In a series of cervical carcinomas, 7 of 27 (26%) exhibited mutations in HLA-A genes, while 12 of 23 (52%) exhibited mutations in TAP genes. HLA gene mutations were detected in 2 of 14 CIN2-3 lesions, and TAP gene mutations in none of 14, a frequency significantly less than observed in the cervical carcinoma samples (P < 0.01). The TAP 2A/2B heterozygous genotype was observed with increased frequency in patients with cervical cancer compared to population controls (P < 0.02). Conclusion. These data suggest that TAP genes may be relevant to evolution of cervical cancer from precursor lesions. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
Progress in bean breeding programs requires the exploitation of genetic variation that is present among races or through introgression across gene pools of Phaseolus vulgaris L. Of the two major common bean gene pools, the Andean gene pool seems to have a narrow genetic base, with about 10% of the accessions in the CIAT core collection presenting evidence of introgression. The objective of this study was to quantify the degree of spontaneous introgression in a sample of common bean landraces from the Andean gene pool. The effects of introgression on morphological, economic and nutritional attributes were also investigated. Homogeneity analysis was performed on molecular marker data from 426 Andean-type accessions from the primary centres of origin of the CIAT common bean core collection and two check varieties. Quantitative attribute diversity for 15 traits was studied based on the groups found from the cluster analysis of marker prevalence indices computed for each accession. The two-group summary consisted of one group of 58 accessions (14%) with low prevalence indices and another group of 370 accessions (86%) with high prevalence indices. The smaller group occupied the outlying area of points displayed from homogeneity analysis, yet their geographic origin was widely distributed over the Andean region. This group was regarded as introgressed, since its accessions displayed traits that are associated with the Middle American gene pool: high resistance to Andean disease isolates but low resistance to Middle American disease isolates, low seed weight and high scores for all nutrient elements. Genotypes generated by spontaneous introgression can be helpful for breeders to overcome the difficulties in transferring traits between gene pools.