978 resultados para Blog datasets
Resumo:
Relatório de estágio de mestrado em Ensino de Informática
Resumo:
Recently, there has been a growing interest in the field of metabolomics, materialized by a remarkable growth in experimental techniques, available data and related biological applications. Indeed, techniques as Nuclear Magnetic Resonance, Gas or Liquid Chromatography, Mass Spectrometry, Infrared and UV-visible spectroscopies have provided extensive datasets that can help in tasks as biological and biomedical discovery, biotechnology and drug development. However, as it happens with other omics data, the analysis of metabolomics datasets provides multiple challenges, both in terms of methodologies and in the development of appropriate computational tools. Indeed, from the available software tools, none addresses the multiplicity of existing techniques and data analysis tasks. In this work, we make available a novel R package, named specmine, which provides a set of methods for metabolomics data analysis, including data loading in different formats, pre-processing, metabolite identification, univariate and multivariate data analysis, machine learning, and feature selection. Importantly, the implemented methods provide adequate support for the analysis of data from diverse experimental techniques, integrating a large set of functions from several R packages in a powerful, yet simple to use environment. The package, already available in CRAN, is accompanied by a web site where users can deposit datasets, scripts and analysis reports to be shared with the community, promoting the efficient sharing of metabolomics data analysis pipelines.
Resumo:
ი. გოგებაშვილის სახელ. სასწავლებლის მოსწავლეების მიერ შექმნილი ბლოგი წმიდა ექვთიმეს დაბადებიდან 150 წლისთავისადმი მიძღვნილი კონკურსისათვის. საიტზე წარმოდგენილია ეროვნულ სამეცნიერო ბიბლიოთეკაში გადაღებული ფოტოები მასალების შეგროვების დროს. მასალა იხ. ლინკზე http://eqvtime150.blogspot.com/p/blog-page_3864.html
Resumo:
The aim of this note is to complement some of the results appearing in Dolado et al. (2003) article “Publishing Performance in Economics: Spanish Rankings” Particularly we want to focus on three issues: the robustness of the results regardless of the time span considered, the evaluation of a researcher to the advance of the knowledge, and to what extent the choice of a particular database to download the results can affect the results. Differences are significant when we expand the time period considered. There are also small but significant differences if we combine datasets to derive the rankings.
Resumo:
The severely poor are very poor since their consumption is far below the absolute poverty line, and the chronically poor are very poor since their consumption persists for long periods below the absolute poverty line. A combination of chronic poverty and severe poverty (CSP) must represent the very worst instance of poverty. Yet the exercise in this paper of asking simple questions about CSP shows large research gaps. Quantified statements on CSP at the country level can be made for just 14 countries, and at the household level in just six countries. This data suggests a positive correlation between severe poverty and chronic poverty, both at the country level and the household level. Understanding the CSP relationship whether it is strong, where it arises, what causes it may improve our explanation of observed cross-country variation in the elasticity between macroeconomic growth and poverty reduction, and why within countries, some households take better advantage of opportunities afforded by macroeconomic growth. Some limited data suggests similarity in socioeconomic characteristics of the severe poor and the chronic poor in terms of location, household size, gender, education and economic sector of work. Of concern is that microlongitudinal datasets drop large proportions of their base year samples, and how this affects our understanding of CSP is not well evaluated. On causal mechanisms, evidence suggests that CSP may be caused by parental CSP (i.e. an intergenerational CSP cycle) and in households not previously poor, CSP may be caused by a morbidity cycle.
Resumo:
This work complements some of the results appearing in the article ?Publishing Performance in Economics: Spanish Rankings? by Dolado et al. . Specifically we focus on the robustness of the results regardless of the time span considered, the effect of the choice of a particular database on the final results, and the effects on changes in the unit of institutional measure (departments versus institutions as a whole). Differences are significant when we expand the time period considered. There are also significant but small differences if we combine datasets to derive the rankings. Finally, department rankings offer a more precise picture of the situation of the Spanish academics, although results do not differ substantially from those obtained when overall institutions are considered.
Resumo:
Anatomical structures and mechanisms linking genes to neuropsychiatric disorders are not deciphered. Reciprocal copy number variants at the 16p11.2 BP4-BP5 locus offer a unique opportunity to study the intermediate phenotypes in carriers at high risk for autism spectrum disorder (ASD) or schizophrenia (SZ). We investigated the variation in brain anatomy in 16p11.2 deletion and duplication carriers. Beyond gene dosage effects on global brain metrics, we show that the number of genomic copies negatively correlated to the gray matter volume and white matter tissue properties in cortico-subcortical regions implicated in reward, language and social cognition. Despite the near absence of ASD or SZ diagnoses in our 16p11.2 cohort, the pattern of brain anatomy changes in carriers spatially overlaps with the well-established structural abnormalities in ASD and SZ. Using measures of peripheral mRNA levels, we confirm our genomic copy number findings. This combined molecular, neuroimaging and clinical approach, applied to larger datasets, will help interpret the relative contributions of genes to neuropsychiatric conditions by measuring their effect on local brain anatomy.Molecular Psychiatry advance online publication, 25 November 2014; doi:10.1038/mp.2014.145.
Resumo:
Introduction: Non-invasive brain imaging techniques often contrast experimental conditions across a cohort of participants, obfuscating distinctions in individual performance and brain mechanisms that are better characterised by the inter-trial variability. To overcome such limitations, we developed topographic analysis methods for single-trial EEG data [1]. So far this was typically based on time-frequency analysis of single-electrode data or single independent components. The method's efficacy is demonstrated for event-related responses to environmental sounds, hitherto studied at an average event-related potential (ERP) level. Methods: Nine healthy subjects participated to the experiment. Auditory meaningful sounds of common objects were used for a target detection task [2]. On each block, subjects were asked to discriminate target sounds, which were living or man-made auditory objects. Continuous 64-channel EEG was acquired during the task. Two datasets were considered for each subject including single-trial of the two conditions, living and man-made. The analysis comprised two steps. In the first part, a mixture of Gaussians analysis [3] provided representative topographies for each subject. In the second step, conditional probabilities for each Gaussian provided statistical inference on the structure of these topographies across trials, time, and experimental conditions. Similar analysis was conducted at group-level. Results: Results show that the occurrence of each map is structured in time and consistent across trials both at the single-subject and at group level. Conducting separate analyses of ERPs at single-subject and group levels, we could quantify the consistency of identified topographies and their time course of activation within and across participants as well as experimental conditions. A general agreement was found with previous analysis at average ERP level. Conclusions: This novel approach to single-trial analysis promises to have impact on several domains. In clinical research, it gives the possibility to statistically evaluate single-subject data, an essential tool for analysing patients with specific deficits and impairments and their deviation from normative standards. In cognitive neuroscience, it provides a novel tool for understanding behaviour and brain activity interdependencies at both single-subject and at group levels. In basic neurophysiology, it provides a new representation of ERPs and promises to cast light on the mechanisms of its generation and inter-individual variability.
Resumo:
BACKGROUND: Zebrafish is a clinically-relevant model of heart regeneration. Unlike mammals, it has a remarkable heart repair capacity after injury, and promises novel translational applications. Amputation and cryoinjury models are key research tools for understanding injury response and regeneration in vivo. An understanding of the transcriptional responses following injury is needed to identify key players of heart tissue repair, as well as potential targets for boosting this property in humans. RESULTS: We investigated amputation and cryoinjury in vivo models of heart damage in the zebrafish through unbiased, integrative analyses of independent molecular datasets. To detect genes with potential biological roles, we derived computational prediction models with microarray data from heart amputation experiments. We focused on a top-ranked set of genes highly activated in the early post-injury stage, whose activity was further verified in independent microarray datasets. Next, we performed independent validations of expression responses with qPCR in a cryoinjury model. Across in vivo models, the top candidates showed highly concordant responses at 1 and 3 days post-injury, which highlights the predictive power of our analysis strategies and the possible biological relevance of these genes. Top candidates are significantly involved in cell fate specification and differentiation, and include heart failure markers such as periostin, as well as potential new targets for heart regeneration. For example, ptgis and ca2 were overexpressed, while usp2a, a regulator of the p53 pathway, was down-regulated in our in vivo models. Interestingly, a high activity of ptgis and ca2 has been previously observed in failing hearts from rats and humans. CONCLUSIONS: We identified genes with potential critical roles in the response to cardiac damage in the zebrafish. Their transcriptional activities are reproducible in different in vivo models of cardiac injury.
Resumo:
The project aims to achieve two objectives. First, we are analysing the labour market implications of the assumption that firms cannot pay similarly qualified employees differently according to when they joined the firm. For example, if the general situation for workers improves, a firm that seeks to hire new workers may feel it has to pay more to new hires. However, if the firm must pay the same wage to new hires and incumbents due to equal treatment, it would either have to raise the wage of the incumbents, or offer new workers a lower wage than the firm would do otherwise. This is very different from the standard assumption in economic analysis that firms are free to treat newly hired workers independently of existing hires. Second, we will use detailed data on individual wages to try to gauge whether (and to what extent) equity is a feature of actual labour markets. To investigate this, we are using two matched employer-employee panel datasets, one from Portugal and the other from Brazil. These unique datasets provide objective records on millions of workers and their firms over a long period of time, so that we can identify which firms employ which workers at each time. The datasets also include a large number of firm and worker variables.
Resumo:
MOTIVATION: Microarray results accumulated in public repositories are widely reused in meta-analytical studies and secondary databases. The quality of the data obtained with this technology varies from experiment to experiment, and an efficient method for quality assessment is necessary to ensure their reliability. RESULTS: The lack of a good benchmark has hampered evaluation of existing methods for quality control. In this study, we propose a new independent quality metric that is based on evolutionary conservation of expression profiles. We show, using 11 large organ-specific datasets, that IQRray, a new quality metrics developed by us, exhibits the highest correlation with this reference metric, among 14 metrics tested. IQRray outperforms other methods in identification of poor quality arrays in datasets composed of arrays from many independent experiments. In contrast, the performance of methods designed for detecting outliers in a single experiment like Normalized Unscaled Standard Error and Relative Log Expression was low because of the inability of these methods to detect datasets containing only low-quality arrays and because the scores cannot be directly compared between experiments. AVAILABILITY AND IMPLEMENTATION: The R implementation of IQRray is available at: ftp://lausanne.isb-sib.ch/pub/databases/Bgee/general/IQRray.R. CONTACT: Marta.Rosikiewicz@unil.ch SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Resumo:
PECUBE is a three-dimensional thermal-kinematic code capable of solving the heat production-diffusion-advection equation under a temporally varying surface boundary condition. It was initially developed to assess the effects of time-varying surface topography (relief) on low-temperature thermochronological datasets. Thermochronometric ages are predicted by tracking the time-temperature histories of rock-particles ending up at the surface and by combining these with various age-prediction models. In the decade since its inception, the PECUBE code has been under continuous development as its use became wider and addressed different tectonic-geomorphic problems. This paper describes several major recent improvements in the code, including its integration with an inverse-modeling package based on the Neighborhood Algorithm, the incorporation of fault-controlled kinematics, several different ways to address topographic and drainage change through time, the ability to predict subsurface (tunnel or borehole) data, prediction of detrital thermochronology data and a method to compare these with observations, and the coupling with landscape-evolution (or surface-process) models. Each new development is described together with one or several applications, so that the reader and potential user can clearly assess and make use of the capabilities of PECUBE. We end with describing some developments that are currently underway or should take place in the foreseeable future. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
The aim of this study is to perform a thorough comparison of quantitative susceptibility mapping (QSM) techniques and their dependence on the assumptions made. The compared methodologies were: two iterative single orientation methodologies minimizing the l2, l1TV norm of the prior knowledge of the edges of the object, one over-determined multiple orientation method (COSMOS) and anewly proposed modulated closed-form solution (MCF). The performance of these methods was compared using a numerical phantom and in-vivo high resolution (0.65mm isotropic) brain data acquired at 7T using a new coil combination method. For all QSM methods, the relevant regularization and prior-knowledge parameters were systematically changed in order to evaluate the optimal reconstruction in the presence and absence of a ground truth. Additionally, the QSM contrast was compared to conventional gradient recalled echo (GRE) magnitude and R2* maps obtained from the same dataset. The QSM reconstruction results of the single orientation methods show comparable performance. The MCF method has the highest correlation (corrMCF=0.95, r(2)MCF =0.97) with the state of the art method (COSMOS) with additional advantage of extreme fast computation time. The l-curve method gave the visually most satisfactory balance between reduction of streaking artifacts and over-regularization with the latter being overemphasized when the using the COSMOS susceptibility maps as ground-truth. R2* and susceptibility maps, when calculated from the same datasets, although based on distinct features of the data, have a comparable ability to distinguish deep gray matter structures.
Quantitative comparison of reconstruction methods for intra-voxel fiber recovery from diffusion MRI.
Resumo:
Validation is arguably the bottleneck in the diffusion magnetic resonance imaging (MRI) community. This paper evaluates and compares 20 algorithms for recovering the local intra-voxel fiber structure from diffusion MRI data and is based on the results of the "HARDI reconstruction challenge" organized in the context of the "ISBI 2012" conference. Evaluated methods encompass a mixture of classical techniques well known in the literature such as diffusion tensor, Q-Ball and diffusion spectrum imaging, algorithms inspired by the recent theory of compressed sensing and also brand new approaches proposed for the first time at this contest. To quantitatively compare the methods under controlled conditions, two datasets with known ground-truth were synthetically generated and two main criteria were used to evaluate the quality of the reconstructions in every voxel: correct assessment of the number of fiber populations and angular accuracy in their orientation. This comparative study investigates the behavior of every algorithm with varying experimental conditions and highlights strengths and weaknesses of each approach. This information can be useful not only for enhancing current algorithms and develop the next generation of reconstruction methods, but also to assist physicians in the choice of the most adequate technique for their studies.
Resumo:
SUMMARY : Eukaryotic DNA interacts with the nuclear proteins using non-covalent ionic interactions. Proteins can recognize specific nucleotide sequences based on the sterical interactions with the DNA and these specific protein-DNA interactions are the basis for many nuclear processes, e.g. gene transcription, chromosomal replication, and recombination. New technology termed ChIP-Seq has been recently developed for the analysis of protein-DNA interactions on a whole genome scale and it is based on immunoprecipitation of chromatin and high-throughput DNA sequencing procedure. ChIP-Seq is a novel technique with a great potential to replace older techniques for mapping of protein-DNA interactions. In this thesis, we bring some new insights into the ChIP-Seq data analysis. First, we point out to some common and so far unknown artifacts of the method. Sequence tag distribution in the genome does not follow uniform distribution and we have found extreme hot-spots of tag accumulation over specific loci in the human and mouse genomes. These artifactual sequence tags accumulations will create false peaks in every ChIP-Seq dataset and we propose different filtering methods to reduce the number of false positives. Next, we propose random sampling as a powerful analytical tool in the ChIP-Seq data analysis that could be used to infer biological knowledge from the massive ChIP-Seq datasets. We created unbiased random sampling algorithm and we used this methodology to reveal some of the important biological properties of Nuclear Factor I DNA binding proteins. Finally, by analyzing the ChIP-Seq data in detail, we revealed that Nuclear Factor I transcription factors mainly act as activators of transcription, and that they are associated with specific chromatin modifications that are markers of open chromatin. We speculate that NFI factors only interact with the DNA wrapped around the nucleosome. We also found multiple loci that indicate possible chromatin barrier activity of NFI proteins, which could suggest the use of NFI binding sequences as chromatin insulators in biotechnology applications. RESUME : L'ADN des eucaryotes interagit avec les protéines nucléaires par des interactions noncovalentes ioniques. Les protéines peuvent reconnaître les séquences nucléotidiques spécifiques basées sur l'interaction stérique avec l'ADN, et des interactions spécifiques contrôlent de nombreux processus nucléaire, p.ex. transcription du gène, la réplication chromosomique, et la recombinaison. Une nouvelle technologie appelée ChIP-Seq a été récemment développée pour l'analyse des interactions protéine-ADN à l'échelle du génome entier et cette approche est basée sur l'immuno-précipitation de la chromatine et sur la procédure de séquençage de l'ADN à haut débit. La nouvelle approche ChIP-Seq a donc un fort potentiel pour remplacer les anciennes techniques de cartographie des interactions protéine-ADN. Dans cette thèse, nous apportons de nouvelles perspectives dans l'analyse des données ChIP-Seq. Tout d'abord, nous avons identifié des artefacts très communs associés à cette méthode qui étaient jusqu'à présent insoupçonnés. La distribution des séquences dans le génome ne suit pas une distribution uniforme et nous avons constaté des positions extrêmes d'accumulation de séquence à des régions spécifiques, des génomes humains et de la souris. Ces accumulations des séquences artéfactuelles créera de faux pics dans toutes les données ChIP-Seq, et nous proposons différentes méthodes de filtrage pour réduire le nombre de faux positifs. Ensuite, nous proposons un nouvel échantillonnage aléatoire comme un outil puissant d'analyse des données ChIP-Seq, ce qui pourraient augmenter l'acquisition de connaissances biologiques à partir des données ChIP-Seq. Nous avons créé un algorithme d'échantillonnage aléatoire et nous avons utilisé cette méthode pour révéler certaines des propriétés biologiques importantes de protéines liant à l'ADN nommés Facteur Nucléaire I (NFI). Enfin, en analysant en détail les données de ChIP-Seq pour la famille de facteurs de transcription nommés Facteur Nucléaire I, nous avons révélé que ces protéines agissent principalement comme des activateurs de transcription, et qu'elles sont associées à des modifications de la chromatine spécifiques qui sont des marqueurs de la chromatine ouverte. Nous pensons que lés facteurs NFI interagir uniquement avec l'ADN enroulé autour du nucléosome. Nous avons également constaté plusieurs régions génomiques qui indiquent une éventuelle activité de barrière chromatinienne des protéines NFI, ce qui pourrait suggérer l'utilisation de séquences de liaison NFI comme séquences isolatrices dans des applications de la biotechnologie.