962 resultados para Clustering a large document collection
Resumo:
OBJECTIVE: This study assessed clustering of multiple risk behaviors (i.e., low leisure-time physical activity, low fruits/vegetables intake, and high alcohol consumption) with level of cigarette consumption. METHODS: Data from the 2002 Swiss Health Survey, a population-based cross-sectional telephone survey assessing health and self-reported risk behaviors, were used. 18,005 subjects (8052 men and 9953 women) aged 25 years old or more participated. RESULTS: Smokers more frequently had low leisure time physical activity, low fruits/vegetables intake, and high alcohol consumption than non- and ex-smokers. Frequency of each risk behavior increased steadily with cigarette consumption. Clustering of risk behaviors increased with cigarette consumption in both men and women. For men, the odds ratios of multiple (> or =2) risk behaviors other than smoking, adjusted for age, nationality, and educational level, were 1.14 (95% confidence interval: 0.97, 1.33) for ex-smokers, 1.24 (0.93, 1.64) for light smokers (1-9 cigarettes/day), 1.72 (1.36, 2.17) for moderate smokers (10-19 cigarettes/day), and 3.07 (2.59, 3.64) for heavy smokers (> or =20 cigarettes/day) versus non-smokers. Similar odds ratios were found for women for corresponding groups, i.e., 1.01 (0.86, 1.19), 1.26 (1.00, 1.58), 1.62 (1.33, 1.98), and 2.75 (2.30, 3.29). CONCLUSIONS: Counseling and intervention with smokers should take into account the strong clustering of risk behaviors with level of cigarette consumption.
Resumo:
BACKGROUND: The Complete Arabidopsis Transcript MicroArray (CATMA) initiative combines the efforts of laboratories in eight European countries 1 to deliver gene-specific sequence tags (GSTs) for the Arabidopsis research community. The CATMA initiative offers the power and flexibility to regularly update the GST collection according to evolving knowledge about the gene repertoire. These GST amplicons can easily be reamplified and shared, subsets can be picked at will to print dedicated arrays, and the GSTs can be cloned and used for other functional studies. This ongoing initiative has already produced approximately 24,000 GSTs that have been made publicly available for spotted microarray printing and RNA interference. RESULTS: GSTs from the CATMA version 2 repertoire (CATMAv2, created in 2002) were mapped onto the gene models from two independent Arabidopsis nuclear genome annotation efforts, TIGR5 and PSB-EuGène, to consolidate a list of genes that were targeted by previously designed CATMA tags. A total of 9,027 gene models were not tagged by any amplified CATMAv2 GST, and 2,533 amplified GSTs were no longer predicted to tag an updated gene model. To validate the efficacy of GST mapping criteria and design rules, the predicted and experimentally observed hybridization characteristics associated to GST features were correlated in transcript profiling datasets obtained with the CATMAv2 microarray, confirming the reliability of this platform. To complete the CATMA repertoire, all 9,027 gene models for which no GST had yet been designed were processed with an adjusted version of the Specific Primer and Amplicon Design Software (SPADS). A total of 5,756 novel GSTs were designed and amplified by PCR from genomic DNA. Together with the pre-existing GST collection, this new addition constitutes the CATMAv3 repertoire. It comprises 30,343 unique amplified sequences that tag 24,202 and 23,009 protein-encoding nuclear gene models in the TAIR6 and EuGène genome annotations, respectively. To cover the remaining untagged genes, we identified 543 additional GSTs using less stringent design criteria and designed 990 sequence tags matching multiple members of gene families (Gene Family Tags or GFTs) to cover any remaining untagged genes. These latter 1,533 features constitute the CATMAv4 addition. CONCLUSION: To update the CATMA GST repertoire, we designed 7,289 additional sequence tags, bringing the total number of tagged TAIR6-annotated Arabidopsis nuclear protein-coding genes to 26,173. This resource is used both for the production of spotted microarrays and the large-scale cloning of hairpin RNA silencing vectors. All information about the resulting updated CATMA repertoire is available through the CATMA database http://www.catma.org.
Resumo:
Allergic conjunctivitis (AC) is an inflammatory disease of the conjunctiva caused mainly by an IgE-mediated mechanism. It is the most common type of ocular allergy. Despite being the most benign form of conjunctivitis, AC has a considerable effect on patient quality of life, reduces work productivity, and increases health care costs. No consensus has been reached on its classification, diagnosis, or treatment. Consequently, the literature provides little information on its natural history, epidemiological data are scarce, and it is often difficult to ascertain its true morbidity. The main objective of the Consensus Document on Allergic Conjunctivitis (Documento dE Consenso sobre Conjuntivitis Alérgica [DECA]), which was drafted by an expert panel from the Spanish Society of Allergology and Spanish Society of Ophthalmology, was to reach agreement on basic criteria that could prove useful for both specialists and primary care physicians and facilitate the diagnosis, classification, and treatment of AC. This document is the first of its kind to describe and analyze aspects of AC that could make it possible to control symptoms.
Resumo:
BACKGROUND: Jeune asphyxiating thoracic dystrophy (JATD) is a rare, often lethal, recessively inherited chondrodysplasia characterised by shortened ribs and long bones, sometimes accompanied by polydactyly, and renal, liver and retinal disease. Mutations in intraflagellar transport (IFT) genes cause JATD, including the IFT dynein-2 motor subunit gene DYNC2H1. Genetic heterogeneity and the large DYNC2H1 gene size have hindered JATD genetic diagnosis. AIMS AND METHODS: To determine the contribution to JATD we screened DYNC2H1 in 71 JATD patients JATD patients combining SNP mapping, Sanger sequencing and exome sequencing. RESULTS AND CONCLUSIONS: We detected 34 DYNC2H1 mutations in 29/71 (41%) patients from 19/57 families (33%), showing it as a major cause of JATD especially in Northern European patients. This included 13 early protein termination mutations (nonsense/frameshift, deletion, splice site) but no patients carried these in combination, suggesting the human phenotype is at least partly hypomorphic. In addition, 21 missense mutations were distributed across DYNC2H1 and these showed some clustering to functional domains, especially the ATP motor domain. DYNC2H1 patients largely lacked significant extra-skeletal involvement, demonstrating an important genotype-phenotype correlation in JATD. Significant variability exists in the course and severity of the thoracic phenotype, both between affected siblings with identical DYNC2H1 alleles and among individuals with different alleles, which suggests the DYNC2H1 phenotype might be subject to modifier alleles, non-genetic or epigenetic factors. Assessment of fibroblasts from patients showed accumulation of anterograde IFT proteins in the ciliary tips, confirming defects similar to patients with other retrograde IFT machinery mutations, which may be of undervalued potential for diagnostic purposes.
Resumo:
The classification of Art painting images is a computer vision applications that isgrowing considerably. The goal of this technology, is to classify an art paintingimage automatically, in terms of artistic style, technique used, or its author. For thispurpose, the image is analyzed extracting some visual features. Many articlesrelated with these problems have been issued, but in general the proposed solutionsare focused in a very specific field. In particular, algorithms are tested using imagesat different resolutions, acquired under different illumination conditions. Thatmakes complicate the performance comparison of the different methods. In thiscontext, it will be very interesting to construct a public art image database, in orderto compare all the existing algorithms under the same conditions. This paperpresents a large art image database, with their corresponding labels according to thefollowing characteristics: title, author, style and technique. Furthermore, a tool thatmanages this database have been developed, and it can be used to extract differentvisual features for any selected image. This data can be exported to a file in CSVformat, allowing researchers to analyze the data with other tools. During the datacollection, the tool stores the elapsed time in the calculation. Thus, this tool alsoallows to compare the efficiency, in computation time, of different mathematicalprocedures for extracting image data.
Resumo:
Abstract: To cluster textual sequence types (discourse types/modes) in French texts, K-means algorithm with high-dimensional embeddings and fuzzy clustering algorithm were applied on clauses whose POS (part-ofspeech) n-gram profiles were previously extracted. Uni-, bi- and trigrams were used on four 19th century French short stories by Maupassant. For high-dimensional embeddings, power transformations on the chi-squared distances between clauses were explored. Preliminary results show that highdimensional embeddings improve the quality of clustering, contrasting the use of bi and trigrams whose performance is disappointing, possibly because of feature space sparsity.
Resumo:
Some of the elements that characterize the globalization of food and agriculture are industrialization and intensification of agriculture and liberalization of agricultural markets, that favours elongation of the food chain and homogenization of food habits (nutrition transition), among other impacts. As a result, the probability of food contamination has increased with the distance and the number of “hands" that may contact the food (critical points); the nutritional quality of food has been reduced because of increased transport and longer periods of time from collection to consumption; and the number of food-related diseases due to changes in eating patterns has increased. In this context, there exist different agencies and regulations intended to ensure food safety at different levels, e.g. at the international level, Codex Alimentarius develops standards and regulations for the marketing of food in a global market. Although governments determine the legal framework, the food industry manages the safety of their products, and thus, develops its own standards for their marketing, such as the Good Agricultural Practices (GAP) programs. The participation of the private sector in the creation of regulatory standards strengthens the free trade of food products, favouring mostly large agribusiness companies. These standards are in most cases unattainable for small producers and food safety regulations are favouring removal of the peasantry and increase concentration and control in the food system by industrial actors. Particularly women, who traditionally have been in charge of the artisanal transformation process, can be more affected by these norms than men. In this project I am analysing the impcact of food safety norms over small farms, based on the case of artisanal production made by women in Spain.
Resumo:
Many terrestrial and marine systems are experiencing accelerating decline due to the effects of global change. This situation has raised concern about the consequences of biodiversity losses for ecosystem function, ecosystem service provision, and human well-being. Coastal marine habitats are a main focus of attention because they harbour a high biological diversity, are among the most productive systems of the world and present high anthropogenic interaction levels. The accelerating degradation of many terrestrial and marine systems highlights the urgent need to evaluate the consequence of biodiversity loss. Because marine biodiversity is a dynamic entity and this study was interested global change impacts, this study focused on benthic biodiversity trends over large spatial and long temporal scales. The main aim of this project was to investigate the current extent of biodiversity of the high diverse benthic coralligenous community in the Mediterranean Sea, detect its changes, and predict its future changes over broad spatial and long temporal scales. These marine communities are characterized by structural species with low growth rates and long life spans; therefore they are considered particularly sensitive to disturbances. For this purpose, this project analyzed permanent photographic plots over time at four locations in the NW Mediterranean Sea. The spatial scale of this study provided information on the level of species similarity between these locations, thus offering a solid background on the amount of large scale variability in coralligenous communities; whereas the temporal scale was fundamental to determine the natural variability in order to discriminate between changes observed due to natural factors and those related to the impact of disturbances (e.g. mass mortality events related to positive thermal temperatures, extreme catastrophic events). This study directly addressed the challenging task of analyzing quantitative biodiversity data of these high diverse marine benthic communities. Overall, the scientific knowledge gained with this research project will improve our understanding in the function of marine ecosystems and their trajectories related to global change.
Resumo:
BACKGROUND: Persistence is a key factor for long-term blood pressure control, which is of high prognostic importance for patients at increased cardiovascular risk. Here we present the results of a post-marketing survey including 4769 hypertensive patients treated with irbesartan in 886 general practices in Switzerland. The goal of this survey was to evaluate the tolerance and the blood pressure lowering effect of irbesartan as well as the factors affecting persistence in a large unselected population. METHODS: Prospective observational survey conducted in general practices in all regions of Switzerland. Previously untreated and uncontrolled pre-treated patients were started with a daily dose of 150 mg irbesartan and followed up to 6 months. RESULTS: After an observation time slightly exceeding 4 months, the average reduction in systolic and diastolic blood pressure was 20 (95% confidence interval (CI) -19.6 to -20.7 mmHg) and 12 mmHg (95% CI -11.4 to -12.1 mmHg), respectively. At this time, 26% of patients had a blood pressure < 140/90 mmHg and 60% had a diastolic blood pressure < 90 mmHg. The drug was well tolerated with an incidence of adverse events (dizziness, headaches,...) of 8.0%. In this survey more than 80% of patients were still on irbesartan at 4 month. The most important factors predictive of persistence were the tolerability profile and the ability to achieve a blood pressure target < or = 140/90 mmHg before visit 2. Patients who switched from a fixed combination treatment tended to discontinue irbesartan more often whereas those who abandoned the previous treatment because of cough (a class side effect of ACE-Inhibitors) were more persistent with irbesartan. CONCLUSION: The results of this survey confirm that irbesartan is effective, well tolerated and well accepted by patients, as indicated by the good persistence. This post-marketing survey also emphasizes the importance of the tolerability profile and of achieving an early control of blood pressure as positive predictors of persistence.
Resumo:
Résumé La fragmentation des membranes est un processus commun à beaucoup d'organelles dans une cellule. Les mitochondries, le noyau, le réticulum endoplasmique, les phagosomes, les peroxisomes, l'appareil de Golgi et les lysosomes (vacuoles chez la levure) se fragmentent en plusieurs copies en réponse à des sitmulis environnementaux, tels que des stresses, ou dans une situtation normale durant le cycle cellulaire, afin d' être transférer dans les cellules filles. La fragmentation des membranes est également observée pendant le processus d'endocytose, lors de la formation de vésicules endocytiques, mais également dans tout le traffic intracellulaire, lors de la genèse d'une vésicule de transport. Le processus de fragmentation est donc généralement important. La découverte en 1991 d'une dynamin-like GTPase comme protéine impliquée dans la fragmentation de la membrane plasmique durant l'endocytose a ouvert ce domaine de recherche. Dès lors des dynamines ont été découvertes sur la pluspart des organelles, ce qui suggère un processus de fragmentation des membranes commun à l'ensemble de la cellule. Cependant, l'ensemble des protéines impliquées ainsi que le mécanisme de la fragmentation reste encore à élucider. Mon projet de thèse était d'établir un test in vitro de fragmentation des vacuoles utile à la compréhension du mécanisme de ce processus. Le choix de ce système est judicieux pour plusieurs raisons; premièrement les vacuoles fragmentent naturellement durant le cycle cellulaire, deuxièment leur taille permet de visualiser facilement leur morphologie par simple microscopie optique, finalement elles peuvent être isolées en quantité intéressante avec un haut degré de pureté. In vivo, les vacuoles peuvent être facilement fragmentées par un stress osmotique. Un tel test permet d'identifier des protéines impliquées dans le mécanisme comme dans le criblage que j'ai effectué sur l'ensemble de la collection de délétions des gènes non-essentiels chez la levure. Cependant un test in vitro est ensuite indispensable pour jouer avec les protéines découvertes afin d'en élucider le mécanisme. Avec mon test in vitro, j'ai confirmé l'implication des protéines SNAREs dans la fragmentation et j'ai permis de comprendre la régulation de la quantité de vacuoles et de leur taille par le complexe TORC1 dans une situation de stress. 7 Résumé large public Les cellules de chaque organisme sont composées de différents compartiments appelés organelles. Chacun possède une fonction bien définie afin de permettre la vie et la croissance de la cellule. Ils sont entourés de membrane, qui joue le role de barrière spécifiquement perméable, afin de garder l'intégrité de chacun. Dans des conditions de croissance normale, les cellules prolifèrent. Durant la division cellulaire amenant à la formation d'une nouvelle cellule, chaque organelle doit se diviser afin de fournir l'ensemble des organelles à la cellule fille. La division de chaque organelle nécessite la fragmentation de la membrane les entourant. Des protéines dynamine-like GTPase ont été découvertes sur presque l'ensemble des organelles d'une cellule. Elles sont impliquées dans les processus de fragmentation des membranes. Dès lors l'idée d'un mécanisme commun est apparu. Cependant cette réaction, par sa complexité, ne peut pas impliquer une protéine unique. La découverte d'autres facteurs et la compréhension du mécanisme reste à faire. La première étape peut se faire par étude in vivo, c'est-à-dire avec des cellules entières, la deuxième étape, quant à elle, nécessite d'isoler les protéines impliquées et de jouer avec les différents paramètres, ce qui signifie donc un travail in vitro, séparé des cellules. Mon travail a constisté à établir un procédé expérimental in vitro pour étudier la fragmentation des membranes. Je travaille avec des vacuoles de levures pour étudier les réactions membranaires. Les vacuoles sont les plus grandes organelles présentes dans les levures. Elles sont impliquées principalement dans la digestion. Comme toute organelle, elles se fragmentent durant la division cellulaire. Le procédé expérimental comporte une première étape, l'isolation des vacuoles et, deuxièmement, l'incubation de celles-ci avec des composés essentiels à la réaction. En parallèle, j'ai mis en évidence, par un travail in vivo, de nouvelles protéines impliquées dans le processus de fragmentation des membranes. Ceci a été fait en réalisant un criblage par microscopie d'une collection de mutants. Parmi ces mutants, j'ai cherché ceux qui présentaient un défaut dans la fragmentation des vacuoles. Ces deux procédés expérimentaux, in vitro et in vivo, m'ont permis de découvrir de nouvelles protéines impliquées dans cette réaction, ainsi que de mettre en évidence un mécanisme utlilisé par la cellule pour réguler la fragmentation des vacuoles. 8 Summary Fragmentation of membranes is common for many organelles in a cell. Mitochondria, nucleus, endoplasmic reticulum, phagosomes, peroxisomes, Golgi and lysosomes (vacuoles in yeast) fragment into multiple copies in response to environmental stimuli, such as stresses, or in a normal situation during the cell cycle in order to be transferred into the daughter cell. Fragmentation of membrane occurs during endocytosis, at the latest step in endocytic vesicle formation, and also in intracellular trafficking, when traffic vesicles bud. This field of research was opened in 1991 when a dynamin-like GTPase was found to be involved in fragmentation of the plasma membrane during endocytosis. Since dynamin-like GTPases have been found on most organelles, similarities in their mechanisms of fragmentation might exist. However, many proteins involved in the mechanism of fragmentation remain unknown. My thesis project was to establish an in vitro assay for membrane fragmentation in order to create a tool to study the mechanism of this process. I chose vacuoles as a model organelle for several reasons: first of all, vacuoles fragment under physiological conditions during cell cycle, secondly their size makes their morphology easily visible under the light microscope, and finally vacuoles can be isolated in good amounts with relatively high degrees of purity. In vivo, vacuole fragmentation can be induced with an osmotic shock. Such a simple assay facilitates the identification of new proteins involved in the process. I used this tool to screen of the entire knockout collection of non-essential genes in Saccharomyces cerevisiae for mutants defective in vacuole fragmentation. The in vitro system will be useful to characterize the mutants and to study the mechanism of fragmentation in detail. I used my in vitro assay to confirm the involvement of vacuolar SNARE proteins in fragmentation of the organelle and to uncover that number and size of vacuoles in the cell is regulated by the TORC1 complex via selective stimulation of fragmentation activity.
Resumo:
We have carried out an initial analysis of the dynamics of the recent evolution of the splice-sites sequences on a large collection of human, rodent (mouse and rat), and chicken introns. Our results indicate that the sequences of splice sites are largely homogeneous within tetrapoda. We have also found that orthologous splice signals between human and rodents and within rodents are more conserved than unrelated splice sites, but the additional conservation can be explained mostly by background intron conservation. In contrast, additional conservation over background is detectable in orthologous mammalian and chicken splice sites. Our results also indicate that the U2 and U12 intron classes seem to have evolved independently since the split of mammals and birds; we have not been able to find a convincing case of interconversion between these two classes in our collections of orthologous introns. Similarly, we have not found a single case of switching between AT-AC and GT-AG subtypes within U12 introns, suggesting that this event has been a rare occurrence in recent evolutionary times. Switching between GT-AG and the noncanonical GC-AG U2 subtypes, on the contrary, does not appear to be unusual; in particular, T to C mutations appear to be relatively well tolerated in GT-AG introns with very strong donor sites.
Resumo:
One of the first useful products from the human genome will be a set of predicted genes. Besides its intrinsic scientific interest, the accuracy and completeness of this data set is of considerable importance for human health and medicine. Though progress has been made on computational gene identification in terms of both methods and accuracy evaluation measures, most of the sequence sets in which the programs are tested are short genomic sequences, and there is concern that these accuracy measures may not extrapolate well to larger, more challenging data sets. Given the absence of experimentally verified large genomic data sets, we constructed a semiartificial test set comprising a number of short single-gene genomic sequences with randomly generated intergenic regions. This test set, which should still present an easier problem than real human genomic sequence, mimics the approximately 200kb long BACs being sequenced. In our experiments with these longer genomic sequences, the accuracy of GENSCAN, one of the most accurate ab initio gene prediction programs, dropped significantly, although its sensitivity remained high. Conversely, the accuracy of similarity-based programs, such as GENEWISE, PROCRUSTES, and BLASTX was not affected significantly by the presence of random intergenic sequence, but depended on the strength of the similarity to the protein homolog. As expected, the accuracy dropped if the models were built using more distant homologs, and we were able to quantitatively estimate this decline. However, the specificities of these techniques are still rather good even when the similarity is weak, which is a desirable characteristic for driving expensive follow-up experiments. Our experiments suggest that though gene prediction will improve with every new protein that is discovered and through improvements in the current set of tools, we still have a long way to go before we can decipher the precise exonic structure of every gene in the human genome using purely computational methodology.
Resumo:
This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5' rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations.
Resumo:
The information provided by the alignment-independent GRid Independent Descriptors (GRIND) can be condensed by the application of principal component analysis, obtaining a small number of principal properties (GRIND-PP), which is more suitable for describing molecular similarity. The objective of the present study is to optimize diverse parameters involved in the obtention of the GRIND-PP and validate their suitability for applications, requiring a biologically relevant description of the molecular similarity. With this aim, GRIND-PP computed with a collection of diverse settings were used to carry out ligand-based virtual screening (LBVS) on standard conditions. The quality of the results obtained was remarkable and comparable with other LBVS methods, and their detailed statistical analysis allowed to identify the method settings more determinant for the quality of the results and their optimum. Remarkably, some of these optimum settings differ significantly from those used in previously published applications, revealing their unexplored potential. Their applicability in large compound database was also explored by comparing the equivalence of the results obtained using either computed or projected principal properties. In general, the results of the study confirm the suitability of the GRIND-PP for practical applications and provide useful hints about how they should be computed for obtaining optimum results.
Resumo:
This paper proposes a novel approach for the analysis of illicit tablets based on their visual characteristics. In particular, the paper concentrates on the problem of ecstasy pill seizure profiling and monitoring. The presented method extracts the visual information from pill images and builds a representation of it, i.e. it builds a pill profile based on the pill visual appearance. Different visual features are used to build different image similarity measures, which are the basis for a pill monitoring strategy based on both discriminative and clustering models. The discriminative model permits to infer whether two pills come from the same seizure, while the clustering models groups of pills that share similar visual characteristics. The resulting clustering structure allows to perform a visual identification of the relationships between different seizures. The proposed approach was evaluated using a data set of 621 Ecstasy pill pictures. The results demonstrate that this is a feasible and cost effective method for performing pill profiling and monitoring.