956 resultados para Data Sets
Resumo:
In most geochemical analyses log-ratio techniques are required to analyse compositional data sets. When a chemical element is present at a low concentration in is usally identified as a value below the detection límit and added to the data set either as zero or simply by attaching a less-than label. In any case, the occirrence of such concentration prevents us from applying the log-ratio approach. We review here the tehoretical bases of the most recent proposals for dealing with these types of observation, give some advice on their practical application and illustrate their performance throgh some examples using geochemical data
Resumo:
Comparative analyses of survival senescence by using life tables have identified generalizations including the observation that mammals senesce faster than similar-sized birds. These generalizations have been challenged because of limitations of life-table approaches and the growing appreciation that senescence is more than an increasing probability of death. Without using life tables, we examine senescence rates in annual individual fitness using 20 individual-based data sets of terrestrial vertebrates with contrasting life histories and body size. We find that senescence is widespread in the wild and equally likely to occur in survival and reproduction. Additionally, mammals senesce faster than birds because they have a faster life history for a given body size. By allowing us to disentangle the effects of two major fitness components our methods allow an assessment of the robustness of the prevalent life-table approach. Focusing on one aspect of life history - survival or recruitment - can provide reliable information on overall senescence.
Resumo:
The final year project came to us as an opportunity to get involved in a topic which has appeared to be attractive during the learning process of majoring in economics: statistics and its application to the analysis of economic data, i.e. econometrics.Moreover, the combination of econometrics and computer science is a very hot topic nowadays, given the Information Technologies boom in the last decades and the consequent exponential increase in the amount of data collected and stored day by day. Data analysts able to deal with Big Data and to find useful results from it are verydemanded in these days and, according to our understanding, the work they do, although sometimes controversial in terms of ethics, is a clear source of value added both for private corporations and the public sector. For these reasons, the essence of this project is the study of a statistical instrument valid for the analysis of large datasets which is directly related to computer science: Partial Correlation Networks.The structure of the project has been determined by our objectives through the development of it. At first, the characteristics of the studied instrument are explained, from the basic ideas up to the features of the model behind it, with the final goal of presenting SPACE model as a tool for estimating interconnections in between elements in large data sets. Afterwards, an illustrated simulation is performed in order to show the power and efficiency of the model presented. And at last, the model is put into practice by analyzing a relatively large data set of real world data, with the objective of assessing whether the proposed statistical instrument is valid and useful when applied to a real multivariate time series. In short, our main goals are to present the model and evaluate if Partial Correlation Network Analysis is an effective, useful instrument and allows finding valuable results from Big Data.As a result, the findings all along this project suggest the Partial Correlation Estimation by Joint Sparse Regression Models approach presented by Peng et al. (2009) to work well under the assumption of sparsity of data. Moreover, partial correlation networks are shown to be a very valid tool to represent cross-sectional interconnections in between elements in large data sets.The scope of this project is however limited, as there are some sections in which deeper analysis would have been appropriate. Considering intertemporal connections in between elements, the choice of the tuning parameter lambda, or a deeper analysis of the results in the real data application are examples of aspects in which this project could be completed.To sum up, the analyzed statistical tool has been proved to be a very useful instrument to find relationships that connect the elements present in a large data set. And after all, partial correlation networks allow the owner of this set to observe and analyze the existing linkages that could have been omitted otherwise.
Resumo:
Anorexia nervosa (AN) is a complex and heritable eating disorder characterized by dangerously low body weight. Neither candidate gene studies nor an initial genome-wide association study (GWAS) have yielded significant and replicated results. We performed a GWAS in 2907 cases with AN from 14 countries (15 sites) and 14 860 ancestrally matched controls as part of the Genetic Consortium for AN (GCAN) and the Wellcome Trust Case Control Consortium 3 (WTCCC3). Individual association analyses were conducted in each stratum and meta-analyzed across all 15 discovery data sets. Seventy-six (72 independent) single nucleotide polymorphisms were taken forward for in silico (two data sets) or de novo (13 data sets) replication genotyping in 2677 independent AN cases and 8629 European ancestry controls along with 458 AN cases and 421 controls from Japan. The final global meta-analysis across discovery and replication data sets comprised 5551 AN cases and 21 080 controls. AN subtype analyses (1606 AN restricting; 1445 AN binge-purge) were performed. No findings reached genome-wide significance. Two intronic variants were suggestively associated: rs9839776 (P=3.01 × 10(-7)) in SOX2OT and rs17030795 (P=5.84 × 10(-6)) in PPP3CA. Two additional signals were specific to Europeans: rs1523921 (P=5.76 × 10(-)(6)) between CUL3 and FAM124B and rs1886797 (P=8.05 × 10(-)(6)) near SPATA13. Comparing discovery with replication results, 76% of the effects were in the same direction, an observation highly unlikely to be due to chance (P=4 × 10(-6)), strongly suggesting that true findings exist but our sample, the largest yet reported, was underpowered for their detection. The accrual of large genotyped AN case-control samples should be an immediate priority for the field.
Resumo:
The thesis addresses the issue of parenthood and gender equality in Switzerland through the emergence of parental leave policies. This is an original and relevant research topic, as Switzerland is one of the few industrialized countries that have not yet implemented a parental or paternity leave. I first describe the emergence of parental leave policies in the last ten to fifteen years in the political, media, and labor-market spheres. Secondly, adopting a gender and discursive theoretical approach, I analyze whether and to what extent this emergence challenged gendered representations and practices of parenthood. The multilevel and mixed-methods research design implies analyzing various data sets such as parliamentary interventions (N=23J and newspaper articles (N=579) on parental leave policies. A case study of a public administration which implemented a one-month paid paternity leave draws on register data of leave recipients (N=95) and in-depth interviews with fathers and managers (n=30). Results show that parental leave policies, especially in recent years, have been increasingly problematized in the three social spheres considered, as a result of political and institutional events. While there is a struggle over the definition of the legitimate leave type to implement [parental or paternity leave) in the political sphere, paternity leave has precedence in the media and labor-market spheres. Overall, this emergence contributes to making fatherhood visible in the public sphere, challenging albeit in a limited way gendered representations and practices of parenthood. Along with representations of involved fatherhood and change in gender relations, different roles and responsibilities are attributed to mothers and fathers, the latter being often defined as secondary, temporary and optional parents. Finally, I identify a common trend, namely the increasing importance of the economic aspects of parental leave policies with the consequence of sidelining their gender-equality potential. The dissertation contributes to the literature which analyzes the interconnections between the macro-, the meso- and the micro-levels of society in the constitution of gender relations and parenthood. It also provides useful tools for the analysis of the politics of parental leave policies in Switzerland and their effects for gender equality. - Cette thèse traite de la parentalité et de l'égalité de genre en Suisse à travers l'émergence des congés parentaux. Ce sujet de recherche est original et pertinent puisque la Suisse est à ce jour un des seuls pays industrialisés à ne pas avoir adopté de droit au congé parental ou paternité. Cette recherche décrit l'émergence des congés parentaux au cours des 10 à 15 dernières années dans les sphères politique, médiatique et du marché de l'emploi en Suisse. En combinant perspective de genre et analyse de discours, elle examine dans quelle mesure cette émergence remet en question les représentations et pratiques genrées de parentalité. Des méthodes de recherche mixtes sont employées pour analyser des interventions parlementaires (N=23) et des articles de presse (N=579) sur les congés parentaux. L'étude de cas d une entreprise publique qui a adopté un congé paternité payé d'un mois s'appuie sur des données de registre (N=95) et des entretiens semi-structurés avec des pères et des cadres (n=30). Les résultats indiquent que dans les trois sphères considérées, les congés parentaux ont reçu une attention croissante au cours de ces dernières années, en lien avec des événements politiques et institutionnels. Alors que dans la sphère politique il n'y a pas de consensus quant au type de congé considéré comme légitime (congé parental ou paternité), dans les sphères médiatique et du marché de l'emploi le congé paternité semble l'emporter. Dans l'ensemble, l'émergence des congés parentaux contribue à rendre la paternité plus visible dans l'espace public, remettant en question-bien que d'une manière limitée-les représentations genrées de la parentalité. En effet, d'une part l'image de pères impliqués et de rapports de genre plus égalitaires au sein de la famille est diffusée. D'autre part, mères et pères continuent à être associés à des rôles différents, les pères étant définis comme des parents secondaires et temporaires. Finalement, l'analyse révèle une tendance générale, soit l'importance croissante accordée aux aspects économiques des congés parentaux, avec pour conséquence la mise à l'écart de leur potentiel pour l'égalité de genre. Cette thèse contribue à la recherche sur les liens entre les niveaux macro- meso- et microsociaux dans la constitution des rapports de genre et de la parentalité. Elle propose également des outils pour analyser les politiques de congés parentaux en Suisse et leurs implications pour l'égalité de genre.
Resumo:
A crucial step for understanding how lexical knowledge is represented is to describe the relative similarity of lexical items, and how it influences language processing. Previous studies of the effects of form similarity on word production have reported conflicting results, notably within and across languages. The aim of the present study was to clarify this empirical issue to provide specific constraints for theoretical models of language production. We investigated the role of phonological neighborhood density in a large-scale picture naming experiment using fine-grained statistical models. The results showed that increasing phonological neighborhood density has a detrimental effect on naming latencies, and re-analyses of independently obtained data sets provide supplementary evidence for this effect. Finally, we reviewed a large body of evidence concerning phonological neighborhood density effects in word production, and discussed the occurrence of facilitatory and inhibitory effects in accuracy measures. The overall pattern shows that phonological neighborhood generates two opposite forces, one facilitatory and one inhibitory. In cases where speech production is disrupted (e.g. certain aphasic symptoms), the facilitatory component may emerge, but inhibitory processes dominate in efficient naming by healthy speakers. These findings are difficult to accommodate in terms of monitoring processes, but can be explained within interactive activation accounts combining phonological facilitation and lexical competition.
Resumo:
Despite the important benefits for firms of commercial initiatives on the Internet, e-commerce is still an emerging distribution channel, even in developed countries. Thus, more needs to be known about the mechanisms affecting its development. A large number of works have studied firms¿ e-commerce adoption from technological, intraorganizational, institutional, or other specific perspectives, but there is a need for adequately tested integrative frameworks. Hence, this work proposes and tests a model of firms¿ business-to-consumer (called B2C) e-commerce adoption that is founded on a holistic vision of the phenomenon. With this integrative approach, the authors analyze the joint influence of environmental, technological, and organizational factors; moreover, they evaluate this effect over time. Using various representative Spanish data sets covering the period 1996-2005, the findings demonstrate the suitability of the holistic framework. Likewise, some lessons are learned from the analysis of the key building blocks. In particular, the current study provides evidence for the debate about the effect of competitive pressure, since the findings show that competitive pressure disincentivizes e-commerce adoption in the long term. The results also show that the development or enrichment of the consumers¿ consumption patterns, the technological readiness of the market forces, the firm¿s global scope, and its competences in innovation continuously favor e-commerce adoption.
Resumo:
The ability to obtain gene expression profiles from human disease specimens provides an opportunity to identify relevant gene pathways, but is limited by the absence of data sets spanning a broad range of conditions. Here, we analyzed publicly available microarray data from 16 diverse skin conditions in order to gain insight into disease pathogenesis. Unsupervised hierarchical clustering separated samples by disease as well as common cellular and molecular pathways. Disease-specific signatures were leveraged to build a multi-disease classifier, which predicted the diagnosis of publicly and prospectively collected expression profiles with 93% accuracy. In one sample, the molecular classifier differed from the initial clinical diagnosis and correctly predicted the eventual diagnosis as the clinical presentation evolved. Finally, integration of IFN-regulated gene programs with the skin database revealed a significant inverse correlation between IFN-β and IFN-γ programs across all conditions. Our study provides an integrative approach to the study of gene signatures from multiple skin conditions, elucidating mechanisms of disease pathogenesis. In addition, these studies provide a framework for developing tools for personalized medicine toward the precise prediction, prevention, and treatment of disease on an individual level.
Resumo:
To better understand the relationship between tumor-host interactions and the efficacy of chemotherapy, we have developed an analytical approach to quantify several biological processes observed in gene expression data sets. We tested the approach on tumor biopsies from individuals with estrogen receptor-negative breast cancer treated with chemotherapy. We report that increased stromal gene expression predicts resistance to preoperative chemotherapy with 5-fluorouracil, epirubicin and cyclophosphamide (FEC) in subjects in the EORTC 10994/BIG 00-01 trial. The predictive value of the stromal signature was successfully validated in two independent cohorts of subjects who received chemotherapy but not in an untreated control group, indicating that the signature is predictive rather than prognostic. The genes in the signature are expressed in reactive stroma, according to reanalysis of data from microdissected breast tumor samples. These findings identify a previously undescribed resistance mechanism to FEC treatment and suggest that antistromal agents may offer new ways to overcome resistance to chemotherapy.
Resumo:
BACKGROUND: Modern sequencing technologies have massively increased the amount of data available for comparative genomics. Whole-transcriptome shotgun sequencing (RNA-seq) provides a powerful basis for comparative studies. In particular, this approach holds great promise for emerging model species in fields such as evolutionary developmental biology (evo-devo). RESULTS: We have sequenced early embryonic transcriptomes of two non-drosophilid dipteran species: the moth midge Clogmia albipunctata, and the scuttle fly Megaselia abdita. Our analysis includes a third, published, transcriptome for the hoverfly Episyrphus balteatus. These emerging models for comparative developmental studies close an important phylogenetic gap between Drosophila melanogaster and other insect model systems. In this paper, we provide a comparative analysis of early embryonic transcriptomes across species, and use our data for a phylogenomic re-evaluation of dipteran phylogenetic relationships. CONCLUSIONS: We show how comparative transcriptomics can be used to create useful resources for evo-devo, and to investigate phylogenetic relationships. Our results demonstrate that de novo assembly of short (Illumina) reads yields high-quality, high-coverage transcriptomic data sets. We use these data to investigate deep dipteran phylogenetic relationships. Our results, based on a concatenation of 160 orthologous genes, provide support for the traditional view of Clogmia being the sister group of Brachycera (Megaselia, Episyrphus, Drosophila), rather than that of Culicomorpha (which includes mosquitoes and blackflies).
Resumo:
The ratio of resting metabolic rate (RMR) to fat-free mass (FFM) is often used to compare individuals of different body sizes. Because RMR has not been well described over the full range of FFM, a literature review was conducted among groups with a wide range of FFM. It included 31 data sets comprising a total of 1111 subjects: 118 infants and preschoolers, 323 adolescents, and 670 adults; FFM ranged from 2.8 to 106 kg. The relationship of RMR to FFM was found to be nonlinear and average slopes of the regression equations of the three groups differed significantly (P less than 0.0001). For only the youngest group did the intercept approach zero. The lower slopes of RMR on FFM, at higher measures of FFM, corresponded to relatively greater proportions of less metabolically active muscle mass and to lesser proportions of more metabolically active nonmuscle organ mass. Because the contribution of FFM to RMR is not constant, an arithmetic error is introduced when the ratio of RMR to FFM is used. Hence, alternative methods should be used to compare individuals with markedly different FFM.
Resumo:
PURPOSE Our purpose was development and assessment of a BRAF-mutant gene expression signature for colon cancer (CC) and the study of its prognostic implications. Materials and METHODS A set of 668 stage II and III CC samples from the PETACC-3 (Pan-European Trails in Alimentary Tract Cancers) clinical trial were used to assess differential gene expression between c.1799T>A (p.V600E) BRAF mutant and non-BRAF, non-KRAS mutant cancers (double wild type) and to construct a gene expression-based classifier for detecting BRAF mutant samples with high sensitivity. The classifier was validated in independent data sets, and survival rates were compared between classifier positive and negative tumors. Results A 64 gene-based classifier was developed with 96% sensitivity and 86% specificity for detecting BRAF mutant tumors in PETACC-3 and independent samples. A subpopulation of BRAF wild-type patients (30% of KRAS mutants, 13% of double wild type) showed a gene expression pattern and had poor overall survival and survival after relapse, similar to those observed in BRAF-mutant patients. Thus they form a distinct prognostic subgroup within their mutation class. CONCLUSION A characteristic pattern of gene expression is associated with and accurately predicts BRAF mutation status and, in addition, identifies a population of BRAF mutated-like KRAS mutants and double wild-type patients with similarly poor prognosis. This suggests a common biology between these tumors and provides a novel classification tool for cancers, adding prognostic and biologic information that is not captured by the mutation status alone. These results may guide therapeutic strategies for this patient segment and may help in population stratification for clinical trials.
Resumo:
PURPOSE: To improve the risk stratification of patients with rhabdomyosarcoma (RMS) through the use of clinical and molecular biologic data. PATIENTS AND METHODS: Two independent data sets of gene-expression profiling for 124 and 101 patients with RMS were used to derive prognostic gene signatures by using a meta-analysis. These and a previously published metagene signature were evaluated by using cross validation analyses. A combined clinical and molecular risk-stratification scheme that incorporated the PAX3/FOXO1 fusion gene status was derived from 287 patients with RMS and evaluated. RESULTS: We showed that our prognostic gene-expression signature and the one previously published performed well with reproducible and significant effects. However, their effect was reduced when cross validated or tested in independent data and did not add new prognostic information over the fusion gene status, which is simpler to assay. Among nonmetastatic patients, patients who were PAX3/FOXO1 positive had a significantly poorer outcome compared with both alveolar-negative and PAX7/FOXO1-positive patients. Furthermore, a new clinicomolecular risk score that incorporated fusion gene status (negative and PAX3/FOXO1 and PAX7/FOXO1 positive), Intergroup Rhabdomyosarcoma Study TNM stage, and age showed a significant increase in performance over the current risk-stratification scheme. CONCLUSION: Gene signatures can improve current stratification of patients with RMS but will require complex assays to be developed and extensive validation before clinical application. A significant majority of their prognostic value was encapsulated by the fusion gene status. A continuous risk score derived from the combination of clinical parameters with the presence or absence of PAX3/FOXO1 represents a robust approach to improving current risk-adapted therapy for RMS.
Resumo:
Background: Parallel T-Coffee (PTC) was the first parallel implementation of the T-Coffee multiple sequence alignment tool. It is based on MPI and RMA mechanisms. Its purpose is to reduce the execution time of the large-scale sequence alignments. It can be run on distributed memory clusters allowing users to align data sets consisting of hundreds of proteins within a reasonable time. However, most of the potential users of this tool are not familiar with the use of grids or supercomputers. Results: In this paper we show how PTC can be easily deployed and controlled on a super computer architecture using a web portal developed using Rapid. Rapid is a tool for efficiently generating standardized portlets for a wide range of applications and the approach described here is generic enough to be applied to other applications, or to deploy PTC on different HPC environments. Conclusions: The PTC portal allows users to upload a large number of sequences to be aligned by the parallel version of TC that cannot be aligned by a single machine due to memory and execution time constraints. The web portal provides a user-friendly solution.
Resumo:
BACKGROUND/OBJECTIVES: Preoperative nutrition has been shown to reduce morbidity after major gastrointestinal (GI) surgery in selected patients at risk. In a randomized trial performed recently (NCT00512213), almost half of the patients, however, did not consume the recommended dose of nutritional intervention. The present study aimed to identify the risk factors for noncompliance. SUBJECTS/METHODS: Demographic (n=5) and nutritional (n=21) parameters for this retrospective analysis were obtained from a prospectively maintained database. The outcome of interest was compliance with the allocated intervention (ingestion of ⩾11/15 preoperative oral nutritional supplement units). Uni- and multivariate analyses of potential risk factors for noncompliance were performed. RESULTS: The final analysis included 141 patients with complete data sets for the purpose of the study. Fifty-nine patients (42%) were considered noncompliant. Univariate analysis identified low C-reactive protein levels (P=0.015), decreased recent food intake (P=0.032) and, as a trend, low hemoglobin (P=0.065) and low pre-albumin (P=0.056) levels as risk factors for decreased compliance. However, none of them was retained as an independent risk factor after multivariate analysis. Interestingly, 17 potential explanatory parameters, such as upper GI cancer, weight loss, reduced appetite or co-morbidities, did not show any significant correlation with reduced intake of nutritional supplements. CONCLUSIONS: Reduced compliance with preoperative nutritional interventions remains a major issue because the expected benefit depends on the actual intake. Seemingly, obvious reasons could not be retained as valid explanations. Compliance seems thus to be primarily a question of will and information; the importance of nutritional supplementation needs to be emphasized by specific patients' education.