150 resultados para Number representation format


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The coverage and volume of geo-referenced datasets are extensive and incessantly¦growing. The systematic capture of geo-referenced information generates large volumes¦of spatio-temporal data to be analyzed. Clustering and visualization play a key¦role in the exploratory data analysis and the extraction of knowledge embedded in¦these data. However, new challenges in visualization and clustering are posed when¦dealing with the special characteristics of this data. For instance, its complex structures,¦large quantity of samples, variables involved in a temporal context, high dimensionality¦and large variability in cluster shapes.¦The central aim of my thesis is to propose new algorithms and methodologies for¦clustering and visualization, in order to assist the knowledge extraction from spatiotemporal¦geo-referenced data, thus improving making decision processes.¦I present two original algorithms, one for clustering: the Fuzzy Growing Hierarchical¦Self-Organizing Networks (FGHSON), and the second for exploratory visual data analysis:¦the Tree-structured Self-organizing Maps Component Planes. In addition, I present¦methodologies that combined with FGHSON and the Tree-structured SOM Component¦Planes allow the integration of space and time seamlessly and simultaneously in¦order to extract knowledge embedded in a temporal context.¦The originality of the FGHSON lies in its capability to reflect the underlying structure¦of a dataset in a hierarchical fuzzy way. A hierarchical fuzzy representation of¦clusters is crucial when data include complex structures with large variability of cluster¦shapes, variances, densities and number of clusters. The most important characteristics¦of the FGHSON include: (1) It does not require an a-priori setup of the number¦of clusters. (2) The algorithm executes several self-organizing processes in parallel.¦Hence, when dealing with large datasets the processes can be distributed reducing the¦computational cost. (3) Only three parameters are necessary to set up the algorithm.¦In the case of the Tree-structured SOM Component Planes, the novelty of this algorithm¦lies in its ability to create a structure that allows the visual exploratory data analysis¦of large high-dimensional datasets. This algorithm creates a hierarchical structure¦of Self-Organizing Map Component Planes, arranging similar variables' projections in¦the same branches of the tree. Hence, similarities on variables' behavior can be easily¦detected (e.g. local correlations, maximal and minimal values and outliers).¦Both FGHSON and the Tree-structured SOM Component Planes were applied in¦several agroecological problems proving to be very efficient in the exploratory analysis¦and clustering of spatio-temporal datasets.¦In this thesis I also tested three soft competitive learning algorithms. Two of them¦well-known non supervised soft competitive algorithms, namely the Self-Organizing¦Maps (SOMs) and the Growing Hierarchical Self-Organizing Maps (GHSOMs); and the¦third was our original contribution, the FGHSON. Although the algorithms presented¦here have been used in several areas, to my knowledge there is not any work applying¦and comparing the performance of those techniques when dealing with spatiotemporal¦geospatial data, as it is presented in this thesis.¦I propose original methodologies to explore spatio-temporal geo-referenced datasets¦through time. Our approach uses time windows to capture temporal similarities and¦variations by using the FGHSON clustering algorithm. The developed methodologies¦are used in two case studies. In the first, the objective was to find similar agroecozones¦through time and in the second one it was to find similar environmental patterns¦shifted in time.¦Several results presented in this thesis have led to new contributions to agroecological¦knowledge, for instance, in sugar cane, and blackberry production.¦Finally, in the framework of this thesis we developed several software tools: (1)¦a Matlab toolbox that implements the FGHSON algorithm, and (2) a program called¦BIS (Bio-inspired Identification of Similar agroecozones) an interactive graphical user¦interface tool which integrates the FGHSON algorithm with Google Earth in order to¦show zones with similar agroecological characteristics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5' rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The induction of fungal metabolites by fungal co-cultures grown on solid media was explored using multi-well co-cultures in 2 cm diameter Petri dishes. Fungi were grown in 12-well plates to easily and rapidly obtain the large number of replicates necessary for employing metabolomic approaches. Fungal culture using such a format accelerated the production of metabolites by several weeks compared with using the large-format 9 cm Petri dishes. This strategy was applied to a co-culture of a Fusarium and an Aspergillus strain. The metabolite composition of the cultures was assessed using ultra-high pressure liquid chromatography coupled to electrospray ionisation and time-of-flight mass spectrometry, followed by automated data mining. The de novo production of metabolites was dramatically increased by nutriment reduction. A time-series study of the induction of the fungal metabolites of interest over nine days revealed that they exhibited various induction patterns. The concentrations of most of the de novo induced metabolites increased over time. However, interesting patterns were observed, such as with the presence of some compounds only at certain time points. This result indicates the complexity and dynamic nature of fungal metabolism. The large-scale production of the compounds of interest was verified by co-culture in 15 cm Petri dishes; most of the induced metabolites of interest (16/18) were found to be produced as effectively as on a small scale, although not in the same time frames. Large-scale production is a practical solution for the future production, identification and biological evaluation of these metabolites.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In severe and variable conditions, specialized resource selection strategies should be less fre‐ quent because extinction risks increase for species that depend on a single and unstable resource. Psithyrus (Bombus subgenus Psithyrus) are bumblebee parasites that usurp Bombus nests and display inter‐specific variation in the number of hosts they parasitize. Using a phylogenetic comparative frame‐ work, we show that Psithyrus species at higher elevations display a higher number of hosts species com‐ pared with species restricted to lower elevations. Species inhabiting high elevations also cover a larger temperature range, suggesting that species able to occur in colder conditions may benefit from recruit‐ ment from populations occurring in warmer conditions. Our results provide evidence for an 'altitudinal niche breadth hypothesis' in parasitic species, showing a decrease in the parasites' specialization along the elevational gradient, and also suggesting that Rapoport's rule might apply to Psithyrus.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For the last 2 decades, supertree reconstruction has been an active field of research and has seen the development of a large number of major algorithms. Because of the growing popularity of the supertree methods, it has become necessary to evaluate the performance of these algorithms to determine which are the best options (especially with regard to the supermatrix approach that is widely used). In this study, seven of the most commonly used supertree methods are investigated by using a large empirical data set (in terms of number of taxa and molecular markers) from the worldwide flowering plant family Sapindaceae. Supertree methods were evaluated using several criteria: similarity of the supertrees with the input trees, similarity between the supertrees and the total evidence tree, level of resolution of the supertree and computational time required by the algorithm. Additional analyses were also conducted on a reduced data set to test if the performance levels were affected by the heuristic searches rather than the algorithms themselves. Based on our results, two main groups of supertree methods were identified: on one hand, the matrix representation with parsimony (MRP), MinFlip, and MinCut methods performed well according to our criteria, whereas the average consensus, split fit, and most similar supertree methods showed a poorer performance or at least did not behave the same way as the total evidence tree. Results for the super distance matrix, that is, the most recent approach tested here, were promising with at least one derived method performing as well as MRP, MinFlip, and MinCut. The output of each method was only slightly improved when applied to the reduced data set, suggesting a correct behavior of the heuristic searches and a relatively low sensitivity of the algorithms to data set sizes and missing data. Results also showed that the MRP analyses could reach a high level of quality even when using a simple heuristic search strategy, with the exception of MRP with Purvis coding scheme and reversible parsimony. The future of supertrees lies in the implementation of a standardized heuristic search for all methods and the increase in computing power to handle large data sets. The latter would prove to be particularly useful for promising approaches such as the maximum quartet fit method that yet requires substantial computing power.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Due to diverging levels of political influence of various income groups, political institutions likely reflect¦the policy preferences of certain groups of citizens better than others, independently of their numerical¦weight. This runs counter the egalitarian principle of 'one citizen, one vote'. The present article documents¦a general trend of underrepresentation of the preferences of relatively poor citizens both by¦parties and by governments across Western democracies, although important cross-national differences¦exist.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Résumé : Le glioblastome (GBM, WHO grade IV) est la tumeur cérébrale primaire la plus fréquente et la plus maligne, son pronostic reste très réservé et sa réponse aux différents traitements limitée. Récemment, une étude clinique randomisée (EORTC 26981/NCIC CE.3) a démontré que le traitement combiné de temozolomide et radiothérapie (RT/TMZ) est le meilleur dans les cas de GBM nouvellement diagnostiqués [1]. Cependant, seul un sous-groupe de patients bénéficie du traitement RT/TMZ et même parmi eux, leur survie reste très limitée. Pour tenter de mieux comprendre les réponses au traitement RT/TMZ, la biologie du GBM, identifier d'autres facteurs de résistance et découvrir de nouvelles cibles aux traitements, nous avons conduit une analyse moléculaire étendue à 73 patients inclus dans cette étude clinique. Nous avons complété les résultats moléculaires déjà obtenus par un profil génomique du nombre de copies par Array Comparative Genomic Hybridization. Afin d'atteindre nos objectifs, nous avons analysé en parallèle les données cliniques des patients et leurs profils moléculaires. Nos résultats confirment des analyses connues dans le domaine des aberrations du nombre de copies (CNA) et de profils du glioblastome. Nous avons observé une bonne corrélation entre le CNA génomique et l'expression de l'ARN messager dans le glioblastome et identifié un nouveau modèle de CNA du chromosome 7 pouvant présenter un intérêt clinique. Nous avons aussi observé par l'analyse du CNA que moins de 10% des glioblastomes conservent leurs mécanismes de suppression de tumeurs p53 et Rb1. Nous avons aussi observé que l'amplification du CDK4 peut constituer un facteur supplémentaire de résistance au traitement RT/TMZ, cette observation nécessite confirmation sur un plus grand nombre d'analyses. Nous avons montré que dans notre analyse des profils moléculaires et cliniques, il n'est pas possible de différencier le GBM à composante oligodendrogliale (GBM-O) du glioblastome. En superposant les profils moléculaires et les modèles expérimentaux in vitro, nous avons identifié WIF-1 comme un gène suppresseur de tumeur probable et une activation du signal WNT dans la pathologie du glioblastome. Ces observations pourraient servir à une meilleure compréhension de cette maladie dans le futur. Abstract : Glioblastoma, (GBM, WHO grade IV) is the most malignant and most frequent primary brain tumor with a very poor prognosis and response to therapy. A recent randomized clinical trial (EORTC26981/NCIC CE.3) established RT/TMZ as the 1St effective chemo-radiation therapy in newly diagnosed GBM [1]. However only a genetic subgroup of patients benefit from RT/TMZ and even in this subgroup overall survival remains very dismal. To explain the observed response to RT/TMZ, have a better understanding of GBM biology, identify other resistance factors and discover new drugable targets a comprehensive molecular analysis was performed in 73 of these GBM trial cohort. We complemented the available molecular data with a genomic copy number profiling by Array Comparative Genomic Hybridization. We proceeded to align the molecular profiles and the Clinical data, to meet our project objectives. Our data confirm known GBM Copy Number Aberrations and profiles. We observed a good correlation of genomic CN and mRNA expression in GBM, and identified new interesting CNA pattern for chromosome 7 with a potential clinical value. We also observed that by copy number aberration data alone, less than 10% of GBM have an intact p53 and Rb1 tumor .suppressor pathways. We equally observed that CDK4 amplification might constitute an additional RT/TMZ resistant factor, an observation that will need confirmation in a larger data set. We show that the molecular and clinical profiles in our data set, does not support the identification of GBM-O as a new entity in GBM. By combining the molecular profiles and in vitro model experiments we identify WIF1 as a potential GBM TSG and an activated WNT signaling as a pathologic event in GBM worth incorporation in attempts to better understand and impact outcome in this disease.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Adenovirus serotype 5 (Ad5) vectors and specific neutralizing antibodies (NAbs) generate immune complexes (ICs) which are potent inducers of dendritic cell (DC) maturation. Here we show that ICs generated with rare Ad vector serotypes, such as Ad26 and Ad35, which are lead candidates in HIV vaccine development, are poor inducers of DC maturation and that their potency in inducing DC maturation strongly correlated with the number of Toll-like receptor 9 (TLR9)-agonist motifs present in the Ad vector's genome. In addition, we showed that antihexon but not antifiber antibodies are responsible for the induction of Ad IC-mediated DC maturation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The loss of presynaptic markers is thought to represent a strong pathologic correlate of cognitive decline in Alzheimer's disease (AD). Spinophilin is a postsynaptic marker mainly located to the heads of dendritic spines. We assessed total numbers of spinophilin-immunoreactive puncta. in the CA I and CA3 fields of hippocampus and area 9 in 18 elderly individuals with various degrees of cognitive decline. The decrease in spinophilin-immunoreactivity was significantly related to both Braak neurofibrillary tangle (NFT) staging and clinical severity but not A beta deposition staging. The total number of spinophilin-immunoreactive puncta in CA I field and area 9 were significantly related to MMSE scores and predicted 23.5 and 61.9% of its variability. The relationship between total number of spinophilin-immunoreactive puncta in CA I field and MMSE scores did not persist when adjusting for Braak NFT staging. In contrast, the total number of spinophilin-immunoreactive puncta in area 9 was still significantly related to the cognitive outcome explaining an extra 9.6% of MMSE and 25.6% of the Clinical Dementia Rating scores variability. Our data suggest that neocortical dendritic spine loss is an independent parameter to consider in AD clinicopathologic correlations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: Genotypes obtained with commercial SNP arrays have been extensively used in many large case-control or population-based cohorts for SNP-based genome-wide association studies for a multitude of traits. Yet, these genotypes capture only a small fraction of the variance of the studied traits. Genomic structural variants (GSV) such as Copy Number Variation (CNV) may account for part of the missing heritability, but their comprehensive detection requires either next-generation arrays or sequencing. Sophisticated algorithms that infer CNVs by combining the intensities from SNP-probes for the two alleles can already be used to extract a partial view of such GSV from existing data sets. RESULTS: Here we present several advances to facilitate the latter approach. First, we introduce a novel CNV detection method based on a Gaussian Mixture Model. Second, we propose a new algorithm, PCA merge, for combining copy-number profiles from many individuals into consensus regions. We applied both our new methods as well as existing ones to data from 5612 individuals from the CoLaus study who were genotyped on Affymetrix 500K arrays. We developed a number of procedures in order to evaluate the performance of the different methods. This includes comparison with previously published CNVs as well as using a replication sample of 239 individuals, genotyped with Illumina 550K arrays. We also established a new evaluation procedure that employs the fact that related individuals are expected to share their CNVs more frequently than randomly selected individuals. The ability to detect both rare and common CNVs provides a valuable resource that will facilitate association studies exploring potential phenotypic associations with CNVs. CONCLUSION: Our new methodologies for CNV detection and their evaluation will help in extracting additional information from the large amount of SNP-genotyping data on various cohorts and use this to explore structural variants and their impact on complex traits.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Detailed knowledge of the anatomy and connectivity pattern of cortico-basal ganglia circuits is essential to an understanding of abnormal cortical function and pathophysiology associated with a wide range of neurological and neuropsychiatric diseases. We aim to study the spatial extent and topography of human basal ganglia connectivity in vivo. Additionally, we explore at an anatomical level the hypothesis of coexistent segregated and integrative cortico-basal ganglia loops. We use probabilistic tractography on magnetic resonance diffusion weighted imaging data to segment basal ganglia and thalamus in 30 healthy subjects based on their cortical and subcortical projections. We introduce a novel method to define voxel-based connectivity profiles that allow representation of projections from a source to more than one target region. Using this method, we localize specific relay nuclei within predefined functional circuits. We find strong correlation between tractography-based basal ganglia parcellation and anatomical data from previously reported invasive tracing studies in nonhuman primates. Additionally, we show in vivo the anatomical basis of segregated loops and the extent of their overlap in prefrontal, premotor, and motor networks. Our findings in healthy humans support the notion that probabilistic diffusion tractography can be used to parcellate subcortical gray matter structures on the basis of their connectivity patterns. The coexistence of clearly segregated and also overlapping connections from cortical sites to basal ganglia subregions is a neuroanatomical correlate of both parallel and integrative networks within them. We believe that this method can be used to examine pathophysiological concepts in a number of basal ganglia-related disorders.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Volumes of data used in science and industry are growing rapidly. When researchers face the challenge of analyzing them, their format is often the first obstacle. Lack of standardized ways of exploring different data layouts requires an effort each time to solve the problem from scratch. Possibility to access data in a rich, uniform manner, e.g. using Structured Query Language (SQL) would offer expressiveness and user-friendliness. Comma-separated values (CSV) are one of the most common data storage formats. Despite its simplicity, with growing file size handling it becomes non-trivial. Importing CSVs into existing databases is time-consuming and troublesome, or even impossible if its horizontal dimension reaches thousands of columns. Most databases are optimized for handling large number of rows rather than columns, therefore, performance for datasets with non-typical layouts is often unacceptable. Other challenges include schema creation, updates and repeated data imports. To address the above-mentioned problems, I present a system for accessing very large CSV-based datasets by means of SQL. It's characterized by: "no copy" approach - data stay mostly in the CSV files; "zero configuration" - no need to specify database schema; written in C++, with boost [1], SQLite [2] and Qt [3], doesn't require installation and has very small size; query rewriting, dynamic creation of indices for appropriate columns and static data retrieval directly from CSV files ensure efficient plan execution; effortless support for millions of columns; due to per-value typing, using mixed text/numbers data is easy; very simple network protocol provides efficient interface for MATLAB and reduces implementation time for other languages. The software is available as freeware along with educational videos on its website [4]. It doesn't need any prerequisites to run, as all of the libraries are included in the distribution package. I test it against existing database solutions using a battery of benchmarks and discuss the results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Macroscopic features such as volume, surface estimate, thickness and caudorostral length of the human primary visual cortex (Brodman's area 17) of 46 human brains between midgestation and 93 years were studied by means of camera lucida drawings from serial frontal sections. Individual values were best fitted by a logistic function from midgestation to adulthood and by a regression line between adulthood and old age. Allometric functions were calculated to study developmental relationships between all the features. The three-dimensional shape of area 17 was also reconstructed from the serial sections in 15 cases and correlated with the sequence of morphological events. The sulcal pattern of area 17 begins to develop around 21 weeks of gestation but remains rather simple until birth, while it becomes more convoluted, particularly in the caudal part, during the postnatal period. Until birth, a large increase in cortical thickness (about 83% of its mean adult value) and caudorostral length (69%) produces a moderate increase in cortical volume (31%) and surface estimate (40%) of area 17. After birth, the cortical volume and surface undergo their maximum growth rate, in spite of a rather small increase in cortical thickness and caudorostral length. This is due to the development of the pattern of gyrification within and around the calcarine fissure. All macroscopic features have reached the mean adult value by the end of the first postnatal year. With aging, the only features to undergo significant regression are the cortical surface estimate and the caudorostral length. The total number of neurons in area 17 shows great interindividual variability at all ages. No decrease in the postnatal period or in aging could be demonstrated.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We showed earlier how to predict the writhe of any rational knot or link in its ideal geometric configuration, or equivalently the average of the 3D writhe over statistical ensembles of random configurations of a given knot or link (Cerf and Stasiak 2000 Proc. Natl Acad. Sci. USA 97 3795). There is no general relation between the minimal crossing number of a knot and the writhe of its ideal geometric configuration. However, within individual families of knots linear relations between minimal crossing number and writhe were observed (Katritch et al 1996 Nature 384 142). Here we present a method that allows us to express the writhe as a linear function of the minimal crossing number within Conway families of knots and links in their ideal configuration. The slope of the lines and the shift between any two lines with the same