978 resultados para Blog datasets
Resumo:
The Australian Soil Resources Information System (ASRIS) database compiles the best publicly available information available across Commonwealth, State, and Territory agencies into a national database of soil profile data, digital soil and land resources maps, and climate, terrain, and lithology datasets. These datasets are described in detail in this paper. Most datasets are thematic grids that cover the intensively used agricultural zones in Australia.
Resumo:
Background: A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological pathways. In this work we utilised the latest information on the mouse transcriptome as revealed by the RIKEN FANTOM2 project to identify novel human disease-related candidate genes. We define a new term patholog to mean a homolog of a human disease-related gene encoding a product ( transcript, anti-sense or protein) potentially relevant to disease. Rather than just focus on Mendelian inheritance, we applied the analysis to all potential pathologs regardless of their inheritance pattern. Results: Bioinformatic analysis and human curation of 60,770 RIKEN full-length mouse cDNA clones produced 2,578 sequences that showed similarity ( 70 - 85% identity) to known human-disease genes. Using a newly developed biological information extraction and annotation tool ( FACTS) in parallel with human expert analysis of 17,051 MEDLINE scientific abstracts we identified 182 novel potential pathologs. Of these, 36 were identified by computational tools only, 49 by human expert analysis only and 97 by both methods. These pathologs were related to neoplastic ( 53%), hereditary ( 24%), immunological ( 5%), cardio-vascular (4%), or other (14%), disorders. Conclusions: Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease. For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature. This paper demonstrates the power of combining human expert annotation with FACTS, a newly developed bioinformatics tool, to identify novel pathologs from within large-scale mouse transcript datasets.
Resumo:
1. Cluster analysis of reference sites with similar biota is the initial step in creating River Invertebrate Prediction and Classification System (RIVPACS) and similar river bioassessment models such as Australian River Assessment System (AUSRIVAS). This paper describes and tests an alternative prediction method, Assessment by Nearest Neighbour Analysis (ANNA), based on the same philosophy as RIVPACS and AUSRIVAS but without the grouping step that some people view as artificial. 2. The steps in creating ANNA models are: (i) weighting the predictor variables using a multivariate approach analogous to principal axis correlations, (ii) calculating the weighted Euclidian distance from a test site to the reference sites based on the environmental predictors, (iii) predicting the faunal composition based on the nearest reference sites and (iv) calculating an observed/expected (O/E) analogous to RIVPACS/AUSRIVAS. 3. The paper compares AUSRIVAS and ANNA models on 17 datasets representing a variety of habitats and seasons. First, it examines each model's regressions for Observed versus Expected number of taxa, including the r(2), intercept and slope. Second, the two models' assessments of 79 test sites in New Zealand are compared. Third, the models are compared on test and presumed reference sites along a known trace metal gradient. Fourth, ANNA models are evaluated for western Australia, a geographically distinct region of Australia. The comparisons demonstrate that ANNA and AUSRIVAS are generally equivalent in performance, although ANNA turns out to be potentially more robust for the O versus E regressions and is potentially more accurate on the trace metal gradient sites. 4. The ANNA method is recommended for use in bioassessment of rivers, at least for corroborating the results of the well established AUSRIVAS- and RIVPACS-type models, if not to replace them.
Resumo:
Phylogenetic hypotheses are presented for Pultenaea based on cpDNA (trnL-F and ndhF) and nrDNA ( ITS) sequence data. Pultenaea, as it is currently circumscribed, comprises six strongly supported lineages whose relationships with each other and 18 closely related genera are weak or conflicting among datasets. The lack of resolution among the six Pultenaea clades and their relatives appears to be the result of a rapid radiation, which is evident in molecular data from both the chloroplast and nuclear genomes. The molecular data provide no support for the monophyly of Pultenaea as it currently stands. Given these results, Pultenaea could split into many smaller genera. We prefer the taxonomically stable alternative of subsuming all 19 genera currently recognised in Pultenaea sensu lato (= the Mirbelia group) into an expanded concept of Pultenaea that would comprise similar to 470 species.
Resumo:
Hepatitis B is a worldwide health problem affecting about 2 billion people and more than 350 million are chronic carriers of the virus. Nine HBV genotypes (A to I) have been described. The geographical distribution of HBV genotypes is not completely understood due to the limited number of samples from some parts of the world. One such example is Colombia, in which few studies have described the HBV genotypes. In this study, we characterized HBV genotypes in 143 HBsAg-positive volunteer blood donors from Colombia. A fragment of 1306 bp partially comprising HBsAg and the DNA polymerase coding regions (S/POL) was amplified and sequenced. Bayesian phylogenetic analyses were conducted using the Markov Chain Monte Carlo (MCMC) approach to obtain the maximum clade credibility (MCC) tree using BEAST v.1.5.3. Of all samples, 68 were positive and 52 were successfully sequenced. Genotype F was the most prevalent in this population (77%) - subgenotypes F3 (75%) and Fib (2%). Genotype G (7.7%) and subgenotype A2 (15.3%) were also found. Genotype G sequence analysis suggests distinct introductions of this genotype in the country. Furthermore, we estimated the time of the most recent common ancestor (TMRCA) for each HBV/F subgenotype and also for Colombian F3 sequences using two different datasets: (i) 77 sequences comprising 1306 bp of S/POL region and (ii) 283 sequences comprising 681 bp of S/POL region. We also used two other previously estimated evolutionary rates: (i) 2.60 x 10(-4) s/s/y and (ii) 1.5 x 10(-5) s/s/y. Here we report the HBV genotypes circulating in Colombia and estimated the TMRCA for the four different subgenotypes of genotype F. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Mitochondrial DNA (mtDNA) population data for forensic purposes are still scarce for some populations, which may limit the evaluation of forensic evidence especially when the rarity of a haplotype needs to be determined in a database search. In order to improve the collection of mtDNA lineages from the Iberian and South American subcontinents, we here report the results of a collaborative study involving nine laboratories from the Spanish and Portuguese Speaking Working Group of the International Society for Forensic Genetics (GHEP-ISFG) and EMPOP. The individual laboratories contributed population data that were generated throughout the past 10 years, but in the majority of cases have not been made available to the scientific community. A total of 1019 haplotypes from Iberia (Basque Country, 2 general Spanish populations, 2 North and 1 Central Portugal populations), and Latin America (3 populations from Sao Paulo) were collected, reviewed and harmonized according to defined EMPOP criteria. The majority of data ambiguities that were found during the reviewing process (41 in total) were transcription errors confirming that the documentation process is still the most error-prone stage in reporting mtDNA population data, especially when performed manually. This GHEP-EMPOP collaboration has significantly improved the quality of the individual mtDNA datasets and adds mtDNA population data as valuable resource to the EMPOP database (www.empop.org). (C) 2010 Elsevier Ireland Ltd. All rights reserved.
Resumo:
In this paper, we propose a method based on association rule-mining to enhance the diagnosis of medical images (mammograms). It combines low-level features automatically extracted from images and high-level knowledge from specialists to search for patterns. Our method analyzes medical images and automatically generates suggestions of diagnoses employing mining of association rules. The suggestions of diagnosis are used to accelerate the image analysis performed by specialists as well as to provide them an alternative to work on. The proposed method uses two new algorithms, PreSAGe and HiCARe. The PreSAGe algorithm combines, in a single step, feature selection and discretization, and reduces the mining complexity. Experiments performed on PreSAGe show that this algorithm is highly suitable to perform feature selection and discretization in medical images. HiCARe is a new associative classifier. The HiCARe algorithm has an important property that makes it unique: it assigns multiple keywords per image to suggest a diagnosis with high values of accuracy. Our method was applied to real datasets, and the results show high sensitivity (up to 95%) and accuracy (up to 92%), allowing us to claim that the use of association rules is a powerful means to assist in the diagnosing task.
Resumo:
Human leukocyte antigen (HLA) haplotypes are frequently evaluated for population history inferences and association studies. However, the available typing techniques for the main HLA loci usually do not allow the determination of the allele phase and the constitution of a haplotype, which may be obtained by a very time-consuming and expensive family-based segregation study. Without the family-based study, computational inference by probabilistic models is necessary to obtain haplotypes. Several authors have used the expectation-maximization (EM) algorithm to determine HLA haplotypes, but high levels of erroneous inferences are expected because of the genetic distance among the main HLA loci and the presence of several recombination hotspots. In order to evaluate the efficiency of computational inference methods, 763 unrelated individuals stratified into three different datasets had their haplotypes manually defined in a family-based study of HLA-A, -B, -DRB1 and -DQB1 segregation, and these haplotypes were compared with the data obtained by the following three methods: the Expectation-Maximization (EM) and Excoffier-Laval-Balding (ELB) algorithms using the arlequin 3.11 software, and the PHASE method. When comparing the methods, we observed that all algorithms showed a poor performance for haplotype reconstruction with distant loci, estimating incorrect haplotypes for 38%-57% of the samples considering all algorithms and datasets. We suggest that computational haplotype inferences involving low-resolution HLA-A, HLA-B, HLA-DRB1 and HLA-DQB1 haplotypes should be considered with caution.
Resumo:
Objectives To evaluate the presence of false flow three-dimensional (3D) power Doppler signals in `flow-free` models. Methods 3D power Doppler datasets were acquired from three different flow-free phantoms (muscle, air and water) with two different transducers and Virtual Organ Computer-aided AnaLysis was used to generate a sphere that was serially applied through the 3D dataset. The vascularization flow index was used to compare artifactual signals at different depths (from 0 to 6 cm) within the different phantoms and at different gain and pulse repetition frequency (PR F) settings. Results Artifactual Doppler signals were seen in all phantoms despite these being flow-free. The pattern was very similar and the degree of artifact appeared to be dependent on the gain and distance from the transducer. False signals were more evident in the far field and increased as the gain was increased, with false signals first appearing with a gain of 1 dB in the air and muscle phantoms. False signals were seen at a lower gain with the water phantom (-15 dB) and these were associated with vertical lines of Doppler artifact that were related to PRF, and disappeared when reflections were attenuated. Conclusions Artifactual Doppler signals are seen in flow-free phantoms and are related to the gain settings and the distance from the transducer. In the in-vivo situation, the lowest gain settings that allow the detection of blood flow and adequate definition of vessel architecture should be used, which invariably means using a setting near or below the middle of the range available. Additionally, observers should be aware of vertical lines when evaluating cystic or liquid-containing structures. Copyright (C) 2010 ISUOC. Published by John Wiley & Sons, Ltd.
Resumo:
Formal Concept Analysis is an unsupervised machine learning technique that has successfully been applied to document organisation by considering documents as objects and keywords as attributes. The basic algorithms of Formal Concept Analysis then allow an intelligent information retrieval system to cluster documents according to keyword views. This paper investigates the scalability of this idea. In particular we present the results of applying spatial data structures to large datasets in formal concept analysis. Our experiments are motivated by the application of the Formal Concept Analysis idea of a virtual filesystem [11,17,15]. In particular the libferris [1] Semantic File System. This paper presents customizations to an RD-Tree Generalized Index Search Tree based index structure to better support the application of Formal Concept Analysis to large data sources.
Resumo:
Functional MRI (fMRI) data often have low signal-to-noise-ratio (SNR) and are contaminated by strong interference from other physiological sources. A promising tool for extracting signals, even under low SNR conditions, is blind source separation (BSS), or independent component analysis (ICA). BSS is based on the assumption that the detected signals are a mixture of a number of independent source signals that are linearly combined via an unknown mixing matrix. BSS seeks to determine the mixing matrix to recover the source signals based on principles of statistical independence. In most cases, extraction of all sources is unnecessary; instead, a priori information can be applied to extract only the signal of interest. Herein we propose an algorithm based on a variation of ICA, called Dependent Component Analysis (DCA), where the signal of interest is extracted using a time delay obtained from an autocorrelation analysis. We applied such method to inspect functional Magnetic Resonance Imaging (fMRI) data, aiming to find the hemodynamic response that follows neuronal activation from an auditory stimulation, in human subjects. The method localized a significant signal modulation in cortical regions corresponding to the primary auditory cortex. The results obtained by DCA were also compared to those of the General Linear Model (GLM), which is the most widely used method to analyze fMRI datasets.
Resumo:
Tick-borne zoonoses (TBZ) are emerging diseases worldwide. A large amount of information (e.g. case reports, results of epidemiological surveillance, etc.) is dispersed through various reference sources (ISI and non-ISI journals, conference proceedings, technical reports, etc.). An integrated database-derived from the ICTTD-3 project (http://www.icttd.nl)-was developed in order to gather TBZ records in the (sub-)tropics, collected both by the authors and collaborators worldwide. A dedicated website (http://www.tickbornezoonoses.org) was created to promote collaboration and circulate information. Data collected are made freely available to researchers for analysis by spatial methods, integrating mapped ecological factors for predicting TBZ risk. The authors present the assembly process of the TBZ database: the compilation of an updated list of TBZ relevant for (sub-)tropics, the database design and its structure, the method of bibliographic search, the assessment of spatial precision of geo-referenced records. At the time of writing, 725 records extracted from 337 publications related to 59 countries in the (sub-)tropics, have been entered in the database. TBZ distribution maps were also produced. Imported cases have been also accounted for. The most important datasets with geo-referenced records were those on Spotted Fever Group rickettsiosis in Latin-America and Crimean-Congo Haemorrhagic Fever in Africa. The authors stress the need for international collaboration in data collection to update and improve the database. Supervision of data entered remains always necessary. Means to foster collaboration are discussed. The paper is also intended to describe the challenges encountered to assemble spatial data from various sources and to help develop similar data collections.
Resumo:
We inferred the phylogeny of 33 species of ticks from the subfamilies Rhipicephalinae and Hyalomminae from analyses of nuclear and mitochondrial DNA and morphology. We used nucleotide sequences from 12S rRNA, cytochrome c oxidase I, internal transcribed spacer 2 of the nuclear rRNA, and 18S rRNA. Nucleotide sequences and morphology were analyzed separately and together in a total-evidence analysis. Analyses of the five partitions together (3303 characters) gave the best-resolved and the best-supported hypothesis so far for the phylogeny of ticks in the Rhipicephalinae and Hyalomminae, despite the fact that some partitions did not have data for some taxa. However, most of the hidden conflict (lower support in the total-evidence analyses compared to that in the individual analyses) was found in those partitions that had taxa without data. The partitions with complete taxonomic sampling had more hidden support (higher support in the total-evidence analyses compared to that in the separate-partition analyses) than hidden conflict. Mapping of geographic origins of ticks onto our phylogeny indicates an African origin for the Rhipicephalinae sensu lato (i.e., including Hyalomma spp.), the Rhipicephalus-Boophilus lineage, the Dermacentor-Anocentor lineage, and the Rhipicephalus-Booophilus-Nosomma-Hyalomma-Rhipicentor lineage. The Nosomma-Hyalomma lineage appears to have evolved in Asia. Our total-evidence phylogeny indicates that (i) the genus Rhipicephalus is paraphyletic with respect to the genus Boophilus, (ii) the genus Dermacentor is paraphyletic with respect to the genus Anocentor, and (iii) some subgenera of the genera Hyalomma and Rhipicephalus are paraphyletic with respect to other subgenera in these genera. Study of the Rhipicephalinae and Hyalomminae over the last 7 years has shown that analyses of individual datasets (e.g., one gene or morphology) seldom resolve many phylogenetic relationships, but analyses of more than one dataset can generate well-resolved phylogenies for these ticks. (C) 2001 Academic Press.
Resumo:
The 16S rRNA gene (16S rDNA) is currently the most widely used gene for estimating the evolutionary history of prokaryotes, To date, there are more than 30 000 16S rDNA sequences available from the core databases, GenBank, EMBL and DDBJ, This great number may cause a dilemma when composing datasets for phylogenetic analysis, since the choice and number of reference organisms are known to affect the resulting tree topology. A group of sequences appearing monophyletic in one dataset may not be so in another. This can be especially problematic when establishing the relationships of distantly related sequences at the division (phylum) level. In this study, a multiple-outgroup approach to resolving division-level phylogenetic relationships is suggested using 16S rDNA data. The approach is illustrated by two case studies concerning the monophyly of two recently proposed bacterial divisions, OP9 and OP10.
Resumo:
1. Schizophrenia is a chronic, disabling brain disease that affects approxmately 1% of the world's population. It is characterized by delusions, hallucinations and formal thought disorder, together with a decline in socio-occupational functioning. While the causes for schizophrenia remain unknown, evidence from family, twin and adoption studies clearly demonstrates that it aggregates in families, with this clustering largely attributable to genetic rather than cultural or environmental factors. Identifying the genes involved, however, has proven to be a difficult task because schizophrenia is a complex trait characterized by an imprecise phenotype, the existence of phenocopies and the presence of low disease penetrance, 2. The current working hypothesis for schizophrenia causation is that multiple genes of small to moderate effect confer compounding risk through interactions with each other and with non-genetic risk factors, The same genes may be commonly involved in conferring risk across populations or they may vary in number and strength between different populations. To search for evidence of such genetic loci, both candidate gene and genome-wide linkage studies have been used in clinical cohorts collected from a variety of populations. Collectively, these works provide some evidence for the involvement of a number of specific genes (e.g. the 5-hydroxytryptamine (5-HT) type 2a receptor (5-HT2a) gene and the dopamine D-3 receptor gene) and as yet unidentified factors localized to specific chromosomal regions, including 6p, 6q, 8p, 13q and 22q, These data provide suggestive, but no conclusive, evidence for causative genes. 3. To enable further progress there is a need to: (i) collect fine-grained clinical datasets while searching the schizophrenia phenotype for subgroups or dimensions that may provide a more direct route to causative genes; and (ii) integrate recent refinements in molecular genetic technology, including modern composite marker maps, DNA expression assays and relevant animal models, while using the latest analytical techniques to extract maximum information in order to help distinguish a true result from a false-positive finding.