Biblioteca Digital

833 resultados para complete linkage clustering

Novel Cone Transducin Alpha Subunit Mutation In Tunisian Patients And Genotype-phenotype Correlation In Complete Achromatopsia

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose: Complete achromatopsia is a rare autosomal recessive disease due to CNGA3, CNGB3, GNAT2 and PDE6C mutations. We studied a large consanguineous Tunisian family including twelve individuals.Methods: Ophthalmic evaluation included a full clinical examination, color vision testing, optical coherence tomography and electroretinography. Linkage analysis using microsatellite markers flanking CNGA3, CNGB3, GNAT2 and PDE6C genes was performed. Mutations were screened by direct sequencing.Results: In all affected subjects, acuity ranged from 20/50 to 20/200. Fundus examination was normal except for two patients who had respectively 4 mm and 5 mm diameters of peripheral congenital hypertrophy. Likewise retinal layers exploration by OCT revealed no change in the thickness of the central retina. Color Vision with 100 Hue Farnsworth test described a profound color impairment along all three axes of color vision. The haplotype analysis of GNAT2 markers revealed that all affected offspring were homozygous by descent for the four polymorphic markers. The maximum lod score value, 4.33, confirmed the evidence for linkage to the GNAT2 gene.A homozygous novel nonsense mutation R313X was identified segregating with an identical GNAT2 haplotype in all affected subjects. This mutation could interrupt interaction with photoactivated rhodopsin, resulting in a failure of visual transduction. In fact, ERG showed a clearly abolished photopic b-wave and flicker responses with no residual cone function justifying the severe GNAT2 achromatopsia phenotype.Conclusions: This is the first report of the clinical and genetic investigation of complete achromatopsia in North Africa and of the largest family with recessive achromatopsia involving GNAT2, thus providing a unique opportunity for genotype phenotype correlation for this extremely rare condition.

Análisis del gen adra2a receptor alfa 2a adrenergico en pacientes con trastorno de hiperactividad y déficit de atención

Relevância:

30.00% 30.00%

Publicador:

Resumo:

El trastorno de hiperactividad y déficit de atención (THDA), es definido clínicamente como una alteración en el comportamiento, caracterizada por inatención, hiperactividad e impulsividad. Estos aspectos son clasificados en tres subtipos, que son: Inatento, hiperactivo impulsivo y mixto. Clínicamente se describe un espectro amplio que incluye desordenes académicos, trastornos de aprendizaje, déficit cognitivo, trastornos de conducta, personalidad antisocial, pobres relaciones interpersonales y aumento de la ansiedad, que pueden continuar hasta la adultez. A nivel global se ha estimado una prevalencia entre el 1% y el 22%, con amplias variaciones, dadas por la edad, procedencia y características sociales. En Colombia, se han realizado estudios en Bogotá y Antioquia, que han permitido establecer una prevalencia del 5% y 15%, respectivamente. La causa específica no ha sido totalmente esclarecida, sin embargo se ha calculado una heredabilidad cercana al 80% en algunas poblaciones, demostrando el papel fundamental de la genética en la etiología de la enfermedad. Los factores genéticos involucrados se relacionan con cambios neuroquímicos de los sistemas dopaminérgicos, serotoninérgicos y noradrenérgicos, particularmente en los sistemas frontales subcorticales, corteza cerebral prefrontal, en las regiones ventral, medial, dorsolateral y la porción anterior del cíngulo. Basados en los datos de estudios previos que sugieren una herencia poligénica multifactorial, se han realizado esfuerzos continuos en la búsqueda de genes candidatos, a través de diferentes estrategias. Particularmente los receptores Alfa 2 adrenérgicos, se encuentran en la corteza cerebral, cumpliendo funciones de asociación, memoria y es el sitio de acción de fármacos utilizados comúnmente en el tratamiento de este trastorno, siendo esta la principal evidencia de la asociación de este receptor con el desarrollo del THDA. Hasta la fecha se han descrito más de 80 polimorfismos en el gen (ADRA2A), algunos de los cuales se han asociado con la entidad. Sin embargo, los resultados son controversiales y varían según la metodología diagnóstica empleada y la población estudiada, antecedentes y comorbilidades. Este trabajo pretende establecer si las variaciones en la secuencia codificante del gen ADRA2A, podrían relacionarse con el fenotipo del Trastorno de Hiperactividad y el Déficit de Atención.

Soft topographic map for clustering and classification of bacteria

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and present some singular cases potentially due to incorrect classification or erroneous annotations in the database.

A genetic linkage map of microsatellite, gene-specific and morphological markers in diploid Fragaria

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Diploid Fragaria provide a potential model for genomic studies in the Rosaceae. To develop a genetic linkage map of diploid Fragaria, we scored 78 markers (68 microsatellites, one sequence-characterised amplified region, six gene-specific markers and three morphological traits) in an interspecific F2 population of 94 plants generated from a cross of F.vesca f. semperflorens × F. nubicola. Co-segregation analysis arranged 76 markers into seven discrete linkage groups covering 448 cM, with linkage group sizes ranging from 100.3 cM to 22.9 cM. Marker coverage was generally good; however some clustering of markers was observed on six of the seven linkage groups. Segregation distortion was observed at a high proportion of loci (54%), which could reflect the interspecific nature of the progeny and, in some cases, the self-incompatibility of F. nubicola. Such distortion may also account for some of the marker clustering observed in the map. One of the morphological markers, pale-green leaf (pg) has not previously been mapped in Fragaria and was located to the mid-point of linkage group VI. The transferable nature of the markers used in this study means that the map will be ideal for use as a framework for additional marker incorporation aimed at enhancing and resolving map coverage of the diploid Fragaria genome. The map also provides a sound basis for linkage map transfer to the cultivated octoploid strawberry.

Development of a dense SNP-based linkage map of an apple rootstock progeny using the Malus Infinium whole genome genotyping array

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background A whole-genome genotyping array has previously been developed for Malus using SNP data from 28 Malus genotypes. This array offers the prospect of high throughput genotyping and linkage map development for any given Malus progeny. To test the applicability of the array for mapping in diverse Malus genotypes, we applied the array to the construction of a SNPbased linkage map of an apple rootstock progeny. Results Of the 7,867 Malus SNP markers on the array, 1,823 (23.2 %) were heterozygous in one of the two parents of the progeny, 1,007 (12.8 %) were heterozygous in both parental genotypes, whilst just 2.8 % of the 921 Pyrus SNPs were heterozygous. A linkage map spanning 1,282.2 cM was produced comprising 2,272 SNP markers, 306 SSR markers and the S-locus. The length of the M432 linkage map was increased by 52.7 cM with the addition of the SNP markers, whilst marker density increased from 3.8 cM/marker to 0.5 cM/marker. Just three regions in excess of 10 cM remain where no markers were mapped. We compared the positions of the mapped SNP markers on the M432 map with their predicted positions on the ‘Golden Delicious’ genome sequence. A total of 311 markers (13.7 % of all mapped markers) mapped to positions that conflicted with their predicted positions on the ‘Golden Delicious’ pseudo-chromosomes, indicating the presence of paralogous genomic regions or misassignments of genome sequence contigs during the assembly and anchoring of the genome sequence. Conclusions We incorporated data for the 2,272 SNP markers onto the map of the M432 progeny and have presented the most complete and saturated map of the full 17 linkage groups of M. pumila to date. The data were generated rapidly in a high-throughput semi-automated pipeline, permitting significant savings in time and cost over linkage map construction using microsatellites. The application of the array will permit linkage maps to be developed for QTL analyses in a cost-effective manner, and the identification of SNPs that have been assigned erroneous positions on the ‘Golden Delicious’ reference sequence will assist in the continued improvement of the genome sequence assembly for that variety.

Heterogeneous tensor decomposition for clustering via manifold optimization

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tensor clustering is an important tool that exploits intrinsically rich structures in real-world multiarray or Tensor datasets. Often in dealing with those datasets, standard practice is to use subspace clustering that is based on vectorizing multiarray data. However, vectorization of tensorial data does not exploit complete structure information. In this paper, we propose a subspace clustering algorithm without adopting any vectorization process. Our approach is based on a novel heterogeneous Tucker decomposition model taking into account cluster membership information. We propose a new clustering algorithm that alternates between different modes of the proposed heterogeneous tensor model. All but the last mode have closed-form updates. Updating the last mode reduces to optimizing over the multinomial manifold for which we investigate second order Riemannian geometry and propose a trust-region algorithm. Numerical experiments show that our proposed algorithm compete effectively with state-of-the-art clustering algorithms that are based on tensor factorization.

Contribuições aos Processos de Clustering com Base em Métricas não-Euclidianas

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work we present a new clustering method that groups up points of a data set in classes. The method is based in a algorithm to link auxiliary clusters that are obtained using traditional vector quantization techniques. It is described some approaches during the development of the work that are based in measures of distances or dissimilarities (divergence) between the auxiliary clusters. This new method uses only two a priori information, the number of auxiliary clusters Na and a threshold distance dt that will be used to decide about the linkage or not of the auxiliary clusters. The number os classes could be automatically found by the method, that do it based in the chosen threshold distance dt, or it is given as additional information to help in the choice of the correct threshold. Some analysis are made and the results are compared with traditional clustering methods. In this work different dissimilarities metrics are analyzed and a new one is proposed based on the concept of negentropy. Besides grouping points of a set in classes, it is proposed a method to statistical modeling the classes aiming to obtain a expression to the probability of a point to belong to one of the classes. Experiments with several values of Na e dt are made in tests sets and the results are analyzed aiming to study the robustness of the method and to consider heuristics to the choice of the correct threshold. During this work it is explored the aspects of information theory applied to the calculation of the divergences. It will be explored specifically the different measures of information and divergence using the Rényi entropy. The results using the different metrics are compared and commented. The work also has appendix where are exposed real applications using the proposed method

A linkage map for the B-genome of Arachis (Fabaceae) and its synteny to the A-genome

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

A clustering method for robust and reliable large scale functional and structural protein sequence annotation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.

Linkage mapping of ovine microphthalmia to chromosome 23, the sheep orthologue of human chromosome 18

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PURPOSE: To characterize the phenotype and map the locus responsible for autosomal recessive inherited ovine microphthalmia (OMO) in sheep. METHODS: Microphthalmia-affected lambs and their available relatives were collected in a field, and experimental matings were performed to obtain affected and normal lambs for detailed necropsy and histologic examinations. The matings resulted in 18 sheep families with 48 cases of microphthalmia. A comparative candidate gene approach was used to map the disease locus within the sheep genome. Initially, 27 loci responsible for the microphthalmia-anophthalmia phenotypes in humans or mice were selected to test for comparative linkage. Fifty flanking markers that were predicted from comparative genomic analysis to be closely linked to these genes were tested for linkage to the disease locus. After observation of statistical evidence for linkage, a confirmatory fine mapping strategy was applied by further genotyping of 43 microsatellites. RESULTS: The clinical and pathologic examinations showed slightly variable expressivity of isolated bilateral microphthalmia. The anterior eye chamber was small or absent, and a white mass admixed with cystic spaces extended from the papilla to the anterior eye chamber, while no recognizable vitreous body or lens was found within the affected eyes. Significant linkage to a single candidate region was identified at sheep chromosome 23. Fine mapping and haplotype analysis assigned the candidate region to a critical interval of 12.4 cM. This ovine chromosome segment encompasses an ancestral chromosomal breakpoint corresponding to two orthologue segments of human chromosomes 18, short and long arms. For the examined animals, we excluded the complete coding region and adjacent intronic regions of ovine TGIF1 to harbor disease-causing mutations. CONCLUSIONS: This is the first genetic localization for hereditary ovine isolated microphthalmia. It seems unlikely that a mutation in the TGIF1 gene is responsible for this disorder. The studied sheep represent a valuable large animal model for similar human ocular phenotypes.

Molecular epidemiology and genetic linkage of macrolide and aminoglycoside resistance in Staphylococcus intermedius of canine origin

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A collection of 77 Staphylococcus intermedius isolates from dogs and cats in Switzerland was examined for resistance to erythromycin. Resistance profiles for 14 additional antibiotics were compared between erythromycin-resistant and susceptible isolates. A resistance prevalence of 27% for erythromycin was observed in the population under study. Complete correlation between resistance to erythromycin, and to spiramycin, streptomycin, and neomycin was observed. The erythromycin-resistant isolates all had a reduced susceptibility to clindamycin when compared to the erythromycin-susceptible isolates. Both constitutive and inducible resistance phenotypes were observed for clindamycin. Ribotyping showed that macrolide-aminoglycoside resistance was randomly distributed among unrelated strains. This suggests that this particular resistance profile is not related to a single bacterial clone but to the horizontal transfer of resistance gene clusters in S. intermedius populations. The erythromycin-resistant isolates were all carrying erm(B), but not erm(A), erm(C), or msr(A). The erm(B) gene was physically linked to Tn5405-like elements known as resistance determinants for streptomycin, streptothricin, neomycin and kanamycin. Analysis of the region flanking erm(B) showed the presence of two different groups of erm(B)-Tn5405-like elements in the S. intermedius population examined and of elements found in Gram-positive species other than staphylococci. This strongly suggests that erm(B) or the whole erm(B)-Tn5405-like elements in S. intermedius originate from other bacterial species, possibly from enterococci.

Incidence of AIDS-Defining and Other Cancers in HIV-Positive Children in South Africa: Record Linkage Study.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND Little is known on the risk of cancer in HIV-positive children in sub-Saharan Africa. We examined incidence and risk factors of AIDS-defining and other cancers in pediatric antiretroviral therapy (ART) programs in South Africa. METHODS We linked the records of five ART programs in Johannesburg and Cape Town to those of pediatric oncology units, based on name and surname, date of birth, folder and civil identification numbers. We calculated incidence rates and obtained hazard ratios (HR) with 95% confidence intervals (CI) from Cox regression models including ART, sex, age, and degree of immunodeficiency. Missing CD4 counts and CD4% were multiply imputed. Immunodeficiency was defined according to World Health Organization 2005 criteria. RESULTS Data of 11,707 HIV-positive children were included in the analysis. During 29,348 person-years of follow-up 24 cancers were diagnosed, for an incidence rate of 82 per 100,000 person-years (95% CI 55-122). The most frequent cancers were Kaposi Sarcoma (34 per 100,000 person-years) and Non Hodgkin Lymphoma (31 per 100,000 person-years). The incidence of non AIDS-defining malignancies was 17 per 100,000. The risk of developing cancer was lower on ART (HR 0.29, 95%CI 0.09-0.86), and increased with age at enrolment (>10 versus <3 years: HR 7.3, 95% CI 2.2-24.6) and immunodeficiency at enrolment (advanced/severe versus no/mild: HR 3.5, 95%CI 1.1-12.0). The HR for the effect of ART from complete case analysis was similar but ceased to be statistically significant (p=0.078). CONCLUSIONS Early HIV diagnosis and linkage to care, with start of ART before advanced immunodeficiency develops, may substantially reduce the burden of cancer in HIV-positive children in South Africa and elsewhere.

Theoretical and experimental studies of linkage disequilibrium in human populations

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Linkage disequilibrium (LD) is defined as the nonrandom association of alleles at two or more loci in a population and may be a useful tool in a diverse array of applications including disease gene mapping, elucidating the demographic history of populations, and testing hypotheses of human evolution. However, the successful application of LD-based approaches to pertinent genetic questions is hampered by a lack of understanding about the forces that mediate the genome-wide distribution of LD within and between human populations. Delineating the genomic patterns of LD is a complex task that will require interdisciplinary research that transcends traditional scientific boundaries. The research presented in this dissertation is predicated upon the need for interdisciplinary studies and both theoretical and experimental projects were pursued. In the theoretical studies, I have investigated the effect of genotyping errors and SNP identification strategies on estimates of LD. The primary importance of these two chapters is that they provide important insights and guidance for the design of future empirical LD studies. Furthermore, I analyzed the allele frequency distribution of 26,530 single nucleotide polymorphisms (SNPs) in three populations and generated the first-generation natural selection map of the human genome, which will be an important resource for explaining and understanding genomic patterns of LD. Finally, in the experimental study, I describe a novel and simple, low-cost, and high-throughput SNP genotyping method. The theoretical analyses and experimental tools developed in this dissertation will facilitate a more complete understanding of patterns of LD in human populations. ^

Semi-supervised subspace clustering and applications to neuroscience

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Machine learning techniques are used for extracting valuable knowledge from data. Nowa¬days, these techniques are becoming even more important due to the evolution in data ac¬quisition and storage, which is leading to data with different characteristics that must be exploited. Therefore, advances in data collection must be accompanied with advances in machine learning techniques to solve new challenges that might arise, on both academic and real applications. There are several machine learning techniques depending on both data characteristics and purpose. Unsupervised classification or clustering is one of the most known techniques when data lack of supervision (unlabeled data) and the aim is to discover data groups (clusters) according to their similarity. On the other hand, supervised classification needs data with supervision (labeled data) and its aim is to make predictions about labels of new data. The presence of data labels is a very important characteristic that guides not only the learning task but also other related tasks such as validation. When only some of the available data are labeled whereas the others remain unlabeled (partially labeled data), neither clustering nor supervised classification can be used. This scenario, which is becoming common nowadays because of labeling process ignorance or cost, is tackled with semi-supervised learning techniques. This thesis focuses on the branch of semi-supervised learning closest to clustering, i.e., to discover clusters using available labels as support to guide and improve the clustering process. Another important data characteristic, different from the presence of data labels, is the relevance or not of data features. Data are characterized by features, but it is possible that not all of them are relevant, or equally relevant, for the learning process. A recent clustering tendency, related to data relevance and called subspace clustering, claims that different clusters might be described by different feature subsets. This differs from traditional solutions to data relevance problem, where a single feature subset (usually the complete set of original features) is found and used to perform the clustering process. The proximity of this work to clustering leads to the first goal of this thesis. As commented above, clustering validation is a difficult task due to the absence of data labels. Although there are many indices that can be used to assess the quality of clustering solutions, these validations depend on clustering algorithms and data characteristics. Hence, in the first goal three known clustering algorithms are used to cluster data with outliers and noise, to critically study how some of the most known validation indices behave. The main goal of this work is however to combine semi-supervised clustering with subspace clustering to obtain clustering solutions that can be correctly validated by using either known indices or expert opinions. Two different algorithms are proposed from different points of view to discover clusters characterized by different subspaces. For the first algorithm, available data labels are used for searching for subspaces firstly, before searching for clusters. This algorithm assigns each instance to only one cluster (hard clustering) and is based on mapping known labels to subspaces using supervised classification techniques. Subspaces are then used to find clusters using traditional clustering techniques. The second algorithm uses available data labels to search for subspaces and clusters at the same time in an iterative process. This algorithm assigns each instance to each cluster based on a membership probability (soft clustering) and is based on integrating known labels and the search for subspaces into a model-based clustering approach. The different proposals are tested using different real and synthetic databases, and comparisons to other methods are also included when appropriate. Finally, as an example of real and current application, different machine learning tech¬niques, including one of the proposals of this work (the most sophisticated one) are applied to a task of one of the most challenging biological problems nowadays, the human brain model¬ing. Specifically, expert neuroscientists do not agree with a neuron classification for the brain cortex, which makes impossible not only any modeling attempt but also the day-to-day work without a common way to name neurons. Therefore, machine learning techniques may help to get an accepted solution to this problem, which can be an important milestone for future research in neuroscience. Resumen Las técnicas de aprendizaje automático se usan para extraer información valiosa de datos. Hoy en día, la importancia de estas técnicas está siendo incluso mayor, debido a que la evolución en la adquisición y almacenamiento de datos está llevando a datos con diferentes características que deben ser explotadas. Por lo tanto, los avances en la recolección de datos deben ir ligados a avances en las técnicas de aprendizaje automático para resolver nuevos retos que pueden aparecer, tanto en aplicaciones académicas como reales. Existen varias técnicas de aprendizaje automático dependiendo de las características de los datos y del propósito. La clasificación no supervisada o clustering es una de las técnicas más conocidas cuando los datos carecen de supervisión (datos sin etiqueta), siendo el objetivo descubrir nuevos grupos (agrupaciones) dependiendo de la similitud de los datos. Por otra parte, la clasificación supervisada necesita datos con supervisión (datos etiquetados) y su objetivo es realizar predicciones sobre las etiquetas de nuevos datos. La presencia de las etiquetas es una característica muy importante que guía no solo el aprendizaje sino también otras tareas relacionadas como la validación. Cuando solo algunos de los datos disponibles están etiquetados, mientras que el resto permanece sin etiqueta (datos parcialmente etiquetados), ni el clustering ni la clasificación supervisada se pueden utilizar. Este escenario, que está llegando a ser común hoy en día debido a la ignorancia o el coste del proceso de etiquetado, es abordado utilizando técnicas de aprendizaje semi-supervisadas. Esta tesis trata la rama del aprendizaje semi-supervisado más cercana al clustering, es decir, descubrir agrupaciones utilizando las etiquetas disponibles como apoyo para guiar y mejorar el proceso de clustering. Otra característica importante de los datos, distinta de la presencia de etiquetas, es la relevancia o no de los atributos de los datos. Los datos se caracterizan por atributos, pero es posible que no todos ellos sean relevantes, o igualmente relevantes, para el proceso de aprendizaje. Una tendencia reciente en clustering, relacionada con la relevancia de los datos y llamada clustering en subespacios, afirma que agrupaciones diferentes pueden estar descritas por subconjuntos de atributos diferentes. Esto difiere de las soluciones tradicionales para el problema de la relevancia de los datos, en las que se busca un único subconjunto de atributos (normalmente el conjunto original de atributos) y se utiliza para realizar el proceso de clustering. La cercanía de este trabajo con el clustering lleva al primer objetivo de la tesis. Como se ha comentado previamente, la validación en clustering es una tarea difícil debido a la ausencia de etiquetas. Aunque existen muchos índices que pueden usarse para evaluar la calidad de las soluciones de clustering, estas validaciones dependen de los algoritmos de clustering utilizados y de las características de los datos. Por lo tanto, en el primer objetivo tres conocidos algoritmos se usan para agrupar datos con valores atípicos y ruido para estudiar de forma crítica cómo se comportan algunos de los índices de validación más conocidos. El objetivo principal de este trabajo sin embargo es combinar clustering semi-supervisado con clustering en subespacios para obtener soluciones de clustering que puedan ser validadas de forma correcta utilizando índices conocidos u opiniones expertas. Se proponen dos algoritmos desde dos puntos de vista diferentes para descubrir agrupaciones caracterizadas por diferentes subespacios. Para el primer algoritmo, las etiquetas disponibles se usan para bus¬car en primer lugar los subespacios antes de buscar las agrupaciones. Este algoritmo asigna cada instancia a un único cluster (hard clustering) y se basa en mapear las etiquetas cono-cidas a subespacios utilizando técnicas de clasificación supervisada. El segundo algoritmo utiliza las etiquetas disponibles para buscar de forma simultánea los subespacios y las agru¬paciones en un proceso iterativo. Este algoritmo asigna cada instancia a cada cluster con una probabilidad de pertenencia (soft clustering) y se basa en integrar las etiquetas conocidas y la búsqueda en subespacios dentro de clustering basado en modelos. Las propuestas son probadas utilizando diferentes bases de datos reales y sintéticas, incluyendo comparaciones con otros métodos cuando resulten apropiadas. Finalmente, a modo de ejemplo de una aplicación real y actual, se aplican diferentes técnicas de aprendizaje automático, incluyendo una de las propuestas de este trabajo (la más sofisticada) a una tarea de uno de los problemas biológicos más desafiantes hoy en día, el modelado del cerebro humano. Específicamente, expertos neurocientíficos no se ponen de acuerdo en una clasificación de neuronas para la corteza cerebral, lo que imposibilita no sólo cualquier intento de modelado sino también el trabajo del día a día al no tener una forma estándar de llamar a las neuronas. Por lo tanto, las técnicas de aprendizaje automático pueden ayudar a conseguir una solución aceptada para este problema, lo cual puede ser un importante hito para investigaciones futuras en neurociencia.

A complete genome screen for genes predisposing to severe bipolar disorder in two Costa Rican pedigrees

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bipolar mood disorder (BP) is a debilitating syndrome characterized by episodes of mania and depression. We designed a multistage study to detect all major loci predisposing to severe BP (termed BP-I) in two pedigrees drawn from the Central Valley of Costa Rica, where the population is largely descended from a few founders in the 16th–18th centuries. We considered only individuals with BP-I as affected and screened the genome for linkage with 473 microsatellite markers. We used a model for linkage analysis that incorporated a high phenocopy rate and a conservative estimate of penetrance. Our goal in this study was not to establish definitive linkage but rather to detect all regions possibly harboring major genes for BP-I in these pedigrees. To facilitate this aim, we evaluated the degree to which markers that were informative in our data set provided coverage of each genome region; we estimate that at least 94% of the genome has been covered, at a predesignated threshold determined through prior linkage simulation analyses. We report here the results of our genome screen for BP-I loci and indicate several regions that merit further study, including segments in 18q, 18p, and 11p, in which suggestive lod scores were observed for two or more contiguous markers. Isolated lod scores that exceeded our thresholds in one or both families also occurred on chromosomes 1, 2, 3, 4, 5, 7, 13, 15, 16, and 17. Interesting regions highlighted in this genome screen will be followed up using linkage disequilibrium (LD) methods.

«
1
2
3
4
5
6
7
8
...
55
56
»