971 resultados para Genetic clustering analysis
Resumo:
BACKGROUND: We used four years of paediatric severe acute respiratory illness (SARI) sentinel surveillance in Blantyre, Malawi to identify factors associated with clinical severity and co-viral clustering.
METHODS: From January 2011 to December 2014, 2363 children aged 3 months to 14 years presenting to hospital with SARI were enrolled. Nasopharyngeal aspirates were tested for influenza and other respiratory viruses. We assessed risk factors for clinical severity and conducted clustering analysis to identify viral clusters in children with co-viral detection.
RESULTS: Hospital-attended influenza-positive SARI incidence was 2.0 cases per 10,000 children annually; it was highest children aged under 1 year (6.3 cases per 10,000), and HIV-infected children aged 5 to 9 years (6.0 cases per 10,000). 605 (26.8%) SARI cases had warning signs, which were positively associated with HIV infection (adjusted risk ratio [aRR]: 2.4, 95% CI: 1.4, 3.9), RSV infection (aRR: 1.9, 95% CI: 1.3, 3.0) and rainy season (aRR: 2.4, 95% CI: 1.6, 3.8). We identified six co-viral clusters; one cluster was associated with SARI with warning signs.
CONCLUSIONS: Influenza vaccination may benefit young children and HIV infected children in this setting. Viral clustering may be associated with SARI severity; its assessment should be included in routine SARI surveillance.
Resumo:
Rigid adherence to pre-specified thresholds and static graphical representations can lead to incorrect decisions on merging of clusters. As an alternative to existing automated or semi-automated methods, we developed a visual analytics approach for performing hierarchical clustering analysis of short time-series gene expression data. Dynamic sliders control parameters such as the similarity threshold at which clusters are merged and the level of relative intra-cluster distinctiveness, which can be used to identify "weak-edges" within clusters. An expert user can drill down to further explore the dendrogram and detect nested clusters and outliers. This is done by using the sliders and by pointing and clicking on the representation to cut the branches of the tree in multiple-heights. A prototype of this tool has been developed in collaboration with a small group of biologists for analysing their own datasets. Initial feedback on the tool has been positive.
Resumo:
Macro- and microarrays are well-established technologies to determine gene functions through repeated measurements of transcript abundance. We constructed a chicken skeletal muscle-associated array based on a muscle-specific EST database, which was used to generate a tissue expression dataset of similar to 4500 chicken genes across 5 adult tissues (skeletal muscle, heart, liver, brain, and skin). Only a small number of ESTs were sufficiently well characterized by BLAST searches to determine their probable cellular functions. Evidence of a particular tissue-characteristic expression can be considered an indication that the transcript is likely to be functionally significant. The skeletal muscle macroarray platform was first used to search for evidence of tissue-specific expression, focusing on the biological function of genes/transcripts, since gene expression profiles generated across tissues were found to be reliable and consistent. Hierarchical clustering analysis revealed consistent clustering among genes assigned to 'developmental growth', such as the ontology genes and germ layers. Accuracy of the expression data was supported by comparing information from known transcripts and tissue from which the transcript was derived with macroarray data. Hybridization assays resulted in consistent tissue expression profile, which will be useful to dissect tissue-regulatory networks and to predict functions of novel genes identified after extensive sequencing of the genomes of model organisms. Screening our skeletal-muscle platform using 5 chicken adult tissues allowed us identifying 43 'tissue-specific' transcripts, and 112 co-expressed uncharacterized transcripts with 62 putative motifs. This platform also represents an important tool for functional investigation of novel genes; to determine expression pattern according to developmental stages; to evaluate differences in muscular growth potential between chicken lines, and to identify tissue-specific genes.
Resumo:
The present study determined the distribution pattern of the hermit crab Loxopagurus loxochelis by a comparison of catch, depth and environmental factors at two separate bays (Caraguatatuba and Ubatuba) of Sao Paulo State, Brazil. The influence of these parameters on the distribution of males, non- ovigerous females and ovigerous females was also evaluated. Crabs were collected monthly, over a period of one year (from July/2002 to June/2003), in seven depths, from 5 to 35 m. Abiotic factors were monitored as follows: superficial and bottom salinity (psu), superficial and bottom temperature (C), organic matter content (%) and sediment composition (%). In total, 366 hermit crabs were sampled in Caraguatatuba and 126 in Ubatuba. The highest frequency of occurrence was verified at 20 m during winter (July) in Caraguatatuba and 25 m during summer (January) in Ubatuba. The highest occurrences were recorded in the regions with bottom salinities ranging from 34 to 36 psu, bottom temperatures from 18 to 24 C and, low percentages of organic matter, gravel and mud; and large proportion of sand in the substrate. There was no significant correlation between the total frequency of organisms and the environmental factors analyzed in both regions. This evidence suggests that other variables as biotic interactions can influence the pattern of distribution of L. loxochelis in the analyzed region, which is considered the limit of the northern distribution of this species.
Resumo:
Background: Since establishing universal free access to antiretroviral therapy in 1996, the Brazilian Health System has increased the number of centers providing HIV/AIDS outpatient care from 33 to 540. There had been no formal monitoring of the quality of these services until a survey of 336 AIDS health centers across 7 Brazilian states was undertaken in 2002. Managers of the services were asked to assess their clinics according to parameters of service inputs and service delivery processes. This report analyzes the survey results and identifies predictors of the overall quality of service delivery. Methods: The survey involved completion of a multiple-choice questionnaire comprising 107 parameters of service inputs and processes of delivering care, with responses assessed according to their likely impact on service quality using a 3-point scale. K-means clustering was used to group these services according to their scored responses. Logistic regression analysis was performed to identify predictors of high service quality. Results: The questionnaire was completed by 95.8% (322) of the managers of the sites surveyed. Most sites scored about 50% of the benchmark expectation. K-means clustering analysis identified four quality levels within which services could be grouped: 76 services (24%) were classed as level 1 (best), 53 (16%) as level 2 (medium), 113 (35%) as level 3 (poor), and 80 (25%) as level 4 (very poor). Parameters of service delivery processes were more important than those relating to service inputs for determining the quality classification. Predictors of quality services included larger care sites, specialization for HIV/AIDS, and location within large municipalities. Conclusion: The survey demonstrated highly variable levels of HIV/AIDS service quality across the sites. Many sites were found to have deficiencies in the processes of service delivery processes that could benefit from quality improvement initiatives. These findings could have implications for how HIV/AIDS services are planned in Brazil to achieve quality standards, such as for where service sites should be located, their size and staffing requirements. A set of service delivery indicators has been identified that could be used for routine monitoring of HIV/AIDS service delivery for HIV/AIDS in Brazil (and potentially in other similar settings).
Resumo:
Background: High-throughput molecular approaches for gene expression profiling, such as Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS) or Sequencing-by-Synthesis (SBS) represent powerful techniques that provide global transcription profiles of different cell types through sequencing of short fragments of transcripts, denominated sequence tags. These techniques have improved our understanding about the relationships between these expression profiles and cellular phenotypes. Despite this, more reliable datasets are still necessary. In this work, we present a web-based tool named S3T: Score System for Sequence Tags, to index sequenced tags in accordance with their reliability. This is made through a series of evaluations based on a defined rule set. S3T allows the identification/selection of tags, considered more reliable for further gene expression analysis. Results: This methodology was applied to a public SAGE dataset. In order to compare data before and after filtering, a hierarchical clustering analysis was performed in samples from the same type of tissue, in distinct biological conditions, using these two datasets. Our results provide evidences suggesting that it is possible to find more congruous clusters after using S3T scoring system. Conclusion: These results substantiate the proposed application to generate more reliable data. This is a significant contribution for determination of global gene expression profiles. The library analysis with S3T is freely available at http://gdm.fmrp.usp.br/s3t/.S3T source code and datasets can also be downloaded from the aforementioned website.
Resumo:
In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called. 632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.
Resumo:
Paget's disease of bone is a common condition characterized by bone pain, deformity, pathological fracture, and an increased incidence of osteosarcoma. Genetic factors play a role in the pathogenesis of Paget's disease but the molecular basis remains largely unknown. Susceptibility loci for Paget's disease of bone have been mapped to chromosome 6p21.3 (PDB1) and 18q121.1-q22 (PDB2) in different pedigrees, We have identified a large pedigree of over 250 individuals with 49 informative individuals affected with Paget's disease of bone; 31 of whom are available for genotypic analysis. The disease is inherited as an autosomal dominant trait in the pedigree with high penetrance by the sixth decade. Linkage analysis has been performed with markers at PDB1; these data show significant exclusion of linkage with log,, of the odds ratio (LOD) scores < -2 in this region. Linkage analysis of microsatellite markers from the PDB2 region has excluded linkage with this region, with a 30 cM exclusion region (LOD score < -2.0) centered on D18S42, These data confirm the genetic heterogeneity of Paget's disease of bone. Our hypothesis is that a novel susceptibility gene relevant to the pathogenesis of Paget's disease of bone lies elsewhere in the genome in the affected members of this pedigree and will be identified using a microsatellite genomewide scan followed by positional cloning.
Resumo:
We focus on mixtures of factor analyzers from the perspective of a method for model-based density estimation from high-dimensional data, and hence for the clustering of such data. This approach enables a normal mixture model to be fitted to a sample of n data points of dimension p, where p is large relative to n. The number of free parameters is controlled through the dimension of the latent factor space. By working in this reduced space, it allows a model for each component-covariance matrix with complexity lying between that of the isotropic and full covariance structure models. We shall illustrate the use of mixtures of factor analyzers in a practical example that considers the clustering of cell lines on the basis of gene expressions from microarray experiments. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
Bethylidae specimens from the Reserve were studied in its ecological and faunistic aspects. The material was collected by Malaise and Window traps simultaneously in ten different areas of the Reserve during four years. The total number of genera and specimens were analyzed. Indices of diversity and evenness were used for characterizing the community ecology. Clustering analysis of localities and genera were provided. Nine genera of Bethylidae were found in the Reserve, being Pseudisobrachium Kieffer, 1904 and Apenesia Westwood, 1874 the most common ones. Window trap was more efficient than Malaise trap in terms of genus diversity.
Resumo:
Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Estatística e Gestão de Informação
Resumo:
Systematics is the study of diversity of the organisms and their relationships comprising classification, nomenclature and identification. The term classification or taxonomy means the arrangement of the organisms in groups (rate) and the nomenclature is the attribution of correct international scientific names to organisms and identification is the inclusion of unknown strains in groups derived from classification. Therefore, classification for a stable nomenclature and a perfect identification are required previously. The beginning of the new bacterial systematics era can be remembered by the introduction and application of new taxonomic concepts and techniques, from the 50s and 60s. Important progress were achieved using numerical taxonomy and molecular taxonomy. Molecular taxonomy, brought into effect after the emergence of the Molecular Biology resources, provided knowledge that comprises systematics of bacteria, in which occurs great evolutionary interest, or where is observed the necessity of eliminating any environmental interference. When you study the composition and disposition of nucleotides in certain portions of the genetic material, you study searching their genome, much less susceptible to environmental alterations than proteins, codified based on it. In the molecular taxonomy, you can research both DNA and RNA, and the main techniques that have been used in the systematics comprise the build of restriction maps, DNA-DNA hybridization, DNA-RNA hybridization, sequencing of DNA sequencing of sub-units 16S and 23S of rRNA, RAPD, RFLP, PFGE etc. Techniques such as base sequencing, though they are extremely sensible and greatly precise, are relatively onerous and impracticable to the great majority of the bacterial taxonomy laboratories. Several specialized techniques have been applied to taxonomic studies of microorganisms. In the last years, these have included preliminary electrophoretic analysis of soluble proteins and isoenzymes, and subsequently determination of deoxyribonucleic acid base composition and assessment of base sequence homology by means of DNA-RNA hybrid experiments beside others. These various techniques, as expected, have generally indicated a lack of taxonomic information in microbial systematics. There are numberless techniques and methodologies that make bacteria identification and classification study possible, part of them described here, allowing establish different degrees of subspecific and interspecific similarity through phenetic-genetic polymorphism analysis. However, was pointed out the necessity of using more than one technique for better establish similarity degrees within microorganisms. Obtaining data resulting from application of a sole technique isolatedly may not provide significant information from Bacterial Systematics viewpoint
Resumo:
RESUMO - Enquadramento/Objectivos: As doenças oncológicas constituem a segunda causa de morte em Portugal, e têm um profundo impacto psicossocial, não só pela sua elevada incidência e mortalidade mas também pelos enormes custos envolvidos na sua prevenção, tratamento e reabilitação. De acordo com estudos anteriores, existem disparidades geográficas na incidência da doença oncológica. É por isso indispensável caracterizar e analisar as diferentes distribuições espaciais no tempo e no espaço, para controlar a doença e promover a saúde, contribuindo ao mesmo tempo para uma melhor compreensão da etiologia da doença. Este projecto compreende 3 objectivos principais que são: a caracterização de distribuição espacio-temporal do cancro do pulmão e do cancro do estômago, separadamente e em conjunto, na região sul de Portugal Continental (abrangida pelo ROR-Sul) no espaço temporal de 2000 a 2008, procurando identificar potenciais áreas de risco no desenvolvimento destes tumores. Metodologia: Numa primeira fase realizou-se um estudo descritivo das taxas de incidência dos tumores aqui retratados por idades, por sexo, por ano e por distritos. Posteriormente com o objectivo de identificar a presença de áreas de elevada incidência, procedeu-se à análise de clustering espacio-temporal das taxas de incidência ao nível dos concelhos na região do estudo, em 2000-2008. Resultados: Os resultados da análise descritiva revelaram que ambos os tumores são mais incidentes nos homens do que nas mulheres e que estes são igualmente mais incidentes em pessoas com mais de 75 anos. A análise de clustering espacio temporal permitiu verificar a existência um padrão geográfico heterogéneo da incidência de ambos os tumores, da qual resultaram 3 clusters para o cancro do estômago e 2 clusters para o cancro do pulmão (p <0,001). Os clusters do estômago pertencem maioritariamente à região do Alentejo e os clusters do cancro do pulmão à região da grande Lisboa. Conclusões: Os resultados da análise de clustering demonstraram um padrão heterogéneo da distribuição da incidência dos dois cancros na região e período temporal do estudo. As zonas identificadas de elevado risco são diferentes para ambos o tumores. A região que apresenta maior risco para o desenvolvimento do cancro do estômago é o Alentejo e do pulmão é o distrito de Lisboa.
Resumo:
Propolis is a chemically complex biomass produced by honeybees (Apis mellifera) from plant resins added of salivary enzymes, beeswax, and pollen. The biological activities described for propolis were also identified for donor plants resin, but a big challenge for the standardization of the chemical composition and biological effects of propolis remains on a better understanding of the influence of seasonality on the chemical constituents of that raw material. Since propolis quality depends, among other variables, on the local flora which is strongly influenced by (a)biotic factors over the seasons, to unravel the harvest season effect on the propolis chemical profile is an issue of recognized importance. For that, fast, cheap, and robust analytical techniques seem to be the best choice for large scale quality control processes in the most demanding markets, e.g., human health applications. For that, UV-Visible (UV-Vis) scanning spectrophotometry of hydroalcoholic extracts (HE) of seventy-three propolis samples, collected over the seasons in 2014 (summer, spring, autumn, and winter) and 2015 (summer and autumn) in Southern Brazil was adopted. Further machine learning and chemometrics techniques were applied to the UV-Vis dataset aiming to gain insights as to the seasonality effect on the claimed chemical heterogeneity of propolis samples determined by changes in the flora of the geographic region under study. Descriptive and classification models were built following a chemometric approach, i.e. principal component analysis (PCA) and hierarchical clustering analysis (HCA) supported by scripts written in the R language. The UV-Vis profiles associated with chemometric analysis allowed identifying a typical pattern in propolis samples collected in the summer. Importantly, the discrimination based on PCA could be improved by using the dataset of the fingerprint region of phenolic compounds ( = 280-400m), suggesting that besides the biological activities of those secondary metabolites, they also play a relevant role for the discrimination and classification of that complex matrix through bioinformatics tools. Finally, a series of machine learning approaches, e.g., partial least square-discriminant analysis (PLS-DA), k-Nearest Neighbors (kNN), and Decision Trees showed to be complementary to PCA and HCA, allowing to obtain relevant information as to the sample discrimination.
Resumo:
The relationships between environmental factors and temporal and spatial variations of benthic communities of three rocky shores of the state of Espírito Santo, Southeast Brazil, were studied. Sampling was conducted every three months, from August 2006 to May 2007, using intersection points. Chthamalus bisinuatus (Pilsbry, 1916) (Crustacea) and Brachidontes spp. (Mollusca) were the most abundant taxa, occupying the upper level of the intertidal zone of the rocky shore. The species richness was higher at the lower levels. The invasive species Isognomon bicolor (C. B. Adams, 1845) (Mollusca) occurred at low densities in the studied areas. The clustering analysis dendrogram indicated a separation of communities based on exposed and sheltered areas. According to the variance analyses, the communities were significantly different among the studied areas and seasons. The extent of wave exposure and shore slope influenced the species variability. The Setibão site showed the highest diversity and richness, most likely due to greater wave exposure. The communities showed greater variation in the lower levels where environmental conditions were less severe, relative to the other levels.