Biblioteca Digital

922 resultados para functional data analysis

Estudo da distribuição das proteínas relacionadas às teneurinas no sistema nervoso central de primatas não-humanos (Sapajus spp) e ratos (Rattus norvegicus)

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Veja mais

Estudo da distribuição das proteínas relacionadas às teneurinas no sistema nervoso central de primatas não-humanos (Sapajus spp) e ratos (Rattus norvegicus)

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Veja mais

Trends in aortic aneurysm- and dissection-related mortality in the state of Sao Paulo, Brazil, 1985-2009: multiple-cause-of-death analysis

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Aortic aneurysm and dissection are important causes of death in older people. Ruptured aneurysms show catastrophic fatality rates reaching near 80%. Few population-based mortality studies have been published in the world and none in Brazil. The objective of the present study was to use multiple-cause-of-death methodology in the analysis of mortality trends related to aortic aneurysm and dissection in the state of Sao Paulo, between 1985 and 2009. Methods: We analyzed mortality data from the Sao Paulo State Data Analysis System, selecting all death certificates on which aortic aneurysm and dissection were listed as a cause-of-death. The variables sex, age, season of the year, and underlying, associated or total mentions of causes of death were studied using standardized mortality rates, proportions and historical trends. Statistical analyses were performed by chi-square goodness-of-fit and H Kruskal-Wallis tests, and variance analysis. The joinpoint regression model was used to evaluate changes in age-standardized rates trends. A p value less than 0.05 was regarded as significant. Results: Over a 25-year period, there were 42,615 deaths related to aortic aneurysm and dissection, of which 36,088 (84.7%) were identified as underlying cause and 6,527 (15.3%) as an associated cause-of-death. Dissection and ruptured aneurysms were considered as an underlying cause of death in 93% of the deaths. For the entire period, a significant increased trend of age-standardized death rates was observed in men and women, while certain non-significant decreases occurred from 1996/2004 until 2009. Abdominal aortic aneurysms and aortic dissections prevailed among men and aortic dissections and aortic aneurysms of unspecified site among women. In 1985 and 2009 death rates ratios of men to women were respectively 2.86 and 2.19, corresponding to a difference decrease between rates of 23.4%. For aortic dissection, ruptured and non-ruptured aneurysms, the overall mean ages at death were, respectively, 63.2, 68.4 and 71.6 years; while, as the underlying cause, the main associated causes of death were as follows: hemorrhages (in 43.8%/40.5%/13.9%); hypertensive diseases (in 49.2%/22.43%/24.5%) and atherosclerosis (in 14.8%/25.5%/15.3%); and, as associated causes, their principal overall underlying causes of death were diseases of the circulatory (55.7%), and respiratory (13.8%) systems and neoplasms (7.8%). A significant seasonal variation, with highest frequency in winter, occurred in deaths identified as underlying cause for aortic dissection, ruptured and non-ruptured aneurysms. Conclusions: This study introduces the methodology of multiple-causes-of-death to enhance epidemiologic knowledge of aortic aneurysm and dissection in Sao Paulo, Brazil. The results presented confer light to the importance of mortality statistics and the need for epidemiologic studies to understand unique trends in our own population.

Veja mais

Transcriptome Analysis of Renal Ischemia/Reperfusion Injury and Its Modulation by Ischemic Pre-Conditioning or Hemin Treatment

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Ischemia/reperfusion injury (IRI) is a leading cause of acute renal failure. The definition of the molecular mechanisms involved in renal IRI and counter protection promoted by ischemic pre-conditioning (IPC) or Hemin treatment is an important milestone that needs to be accomplished in this research area. We examined, through an oligonucleotide microarray protocol, the renal differential transcriptome profiles of mice submitted to IRI, IPC and Hemin treatment. After identifying the profiles of differentially expressed genes observed for each comparison, we carried out functional enrichment analysis to reveal transcripts putatively involved in potential relevant biological processes and signaling pathways. The most relevant processes found in these comparisons were stress, apoptosis, cell differentiation, angiogenesis, focal adhesion, ECM-receptor interaction, ion transport, angiogenesis, mitosis and cell cycle, inflammatory response, olfactory transduction and regulation of actin cytoskeleton. In addition, the most important overrepresented pathways were MAPK, ErbB, JAK/STAT, Toll and Nod like receptors, Angiotensin II, Arachidonic acid metabolism, Wnt and coagulation cascade. Also, new insights were gained about the underlying protection mechanisms against renal IRI promoted by IPC and Hemin treatment. Venn diagram analysis allowed us to uncover common and exclusively differentially expressed genes between these two protective maneuvers, underscoring potential common and exclusive biological functions regulated in each case. In summary, IPC exclusively regulated the expression of genes belonging to stress, protein modification and apoptosis, highlighting the role of IPC in controlling exacerbated stress response. Treatment with the Hmox1 inducer Hemin, in turn, exclusively regulated the expression of genes associated with cell differentiation, metabolic pathways, cell cycle, mitosis, development, regulation of actin cytoskeleton and arachidonic acid metabolism, suggesting a pleiotropic effect for Hemin. These findings improve the biological understanding of how the kidney behaves after IRI. They also illustrate some possible underlying molecular mechanisms involved in kidney protection observed with IPC or Hemin treatment maneuvers.

Veja mais

A log-linear regression model for the beta-Birnbaum-Saunders distribution with censored data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The beta-Birnbaum-Saunders (Cordeiro and Lemonte, 2011) and Birnbaum-Saunders (Birnbaum and Saunders, 1969a) distributions have been used quite effectively to model failure times for materials subject to fatigue and lifetime data. We define the log-beta-Birnbaum-Saunders distribution by the logarithm of the beta-Birnbaum-Saunders distribution. Explicit expressions for its generating function and moments are derived. We propose a new log-beta-Birnbaum-Saunders regression model that can be applied to censored data and be used more effectively in survival analysis. We obtain the maximum likelihood estimates of the model parameters for censored data and investigate influence diagnostics. The new location-scale regression model is modified for the possibility that long-term survivors may be presented in the data. Its usefulness is illustrated by means of two real data sets. (C) 2011 Elsevier B.V. All rights reserved.

Veja mais

SOBRECARGA E DESCONFORTO EMOCIONAL EM CUIDADORES DE IDOSOS

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A finalidade deste estudo foi descrever a sobrecarga e o desconforto emocional dos cuidadores de idosos. Estudo epidemiológico e transversal conduzido em 2009 com 124 cuidadores residentes na comunidade de Ribeirão Preto-SP, por meio dos instrumentos: Escala de Sobrecarga de Zarit e Self-Reporting Questionaire (SRQ-20) para o cuidador. A análise dos dados foi realizada no aplicativo SPSS, 15.0, de forma descritiva, univariada (tabelas de frequência) e bivariada (tabelas de contingência para variáveis qualitativas). Os cuidadores, 85,6% do sexo feminino, média de 56,5 anos, utilizaram, em média, 12,4 horas diárias para o cuidar e 57,6% dos cuidadores apresentaram de leve a moderada sobrecarga. Dependência funcional do idoso, sexo do cuidador e tempo em horas para o cuidado, foram preditores da sobrecarga (p<0,05). Encontrou-se, também, que a sobrecarga é fator de risco para desconforto emocional (p<0,05). Cabe aos enfermeiros utilizarem protocolos de avaliação, com base nos fatores de risco, para prevenir a sobrecarga.

Veja mais

A Bayesian Approach for Decision Making on the Identification of Genes with Different Expression Levels: An Application to Escherichia coli Bacterium Data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A common interest in gene expression data analysis is to identify from a large pool of candidate genes the genes that present significant changes in expression levels between a treatment and a control biological condition. Usually, it is done using a statistic value and a cutoff value that are used to separate the genes differentially and nondifferentially expressed. In this paper, we propose a Bayesian approach to identify genes differentially expressed calculating sequentially credibility intervals from predictive densities which are constructed using the sampled mean treatment effect from all genes in study excluding the treatment effect of genes previously identified with statistical evidence for difference. We compare our Bayesian approach with the standard ones based on the use of the t-test and modified t-tests via a simulation study, using small sample sizes which are common in gene expression data analysis. Results obtained report evidence that the proposed approach performs better than standard ones, especially for cases with mean differences and increases in treatment variance in relation to control variance. We also apply the methodologies to a well-known publicly available data set on Escherichia coli bacterium.

Veja mais

Gaussian deconvolution: a useful method for a form-free modeling of scattering data from mono- and multilayered planar systems

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A new method for analysis of scattering data from lamellar bilayer systems is presented. The method employs a form-free description of the cross-section structure of the bilayer and the fit is performed directly to the scattering data, introducing also a structure factor when required. The cross-section structure (electron density profile in the case of X-ray scattering) is described by a set of Gaussian functions and the technique is termed Gaussian deconvolution. The coefficients of the Gaussians are optimized using a constrained least-squares routine that induces smoothness of the electron density profile. The optimization is coupled with the point-of-inflection method for determining the optimal weight of the smoothness. With the new approach, it is possible to optimize simultaneously the form factor, structure factor and several other parameters in the model. The applicability of this method is demonstrated by using it in a study of a multilamellar system composed of lecithin bilayers, where the form factor and structure factor are obtained simultaneously, and the obtained results provided new insight into this very well known system.

Veja mais

A Bayesian destructive weighted Poisson cure rate model and an application to a cutaneous melanoma data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this article, we propose a new Bayesian flexible cure rate survival model, which generalises the stochastic model of Klebanov et al. [Klebanov LB, Rachev ST and Yakovlev AY. A stochastic-model of radiation carcinogenesis - latent time distributions and their properties. Math Biosci 1993; 113: 51-75], and has much in common with the destructive model formulated by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)]. In our approach, the accumulated number of lesions or altered cells follows a compound weighted Poisson distribution. This model is more flexible than the promotion time cure model in terms of dispersion. Moreover, it possesses an interesting and realistic interpretation of the biological mechanism of the occurrence of the event of interest as it includes a destructive process of tumour cells after an initial treatment or the capacity of an individual exposed to irradiation to repair altered cells that results in cancer induction. In other words, what is recorded is only the damaged portion of the original number of altered cells not eliminated by the treatment or repaired by the repair system of an individual. Markov Chain Monte Carlo (MCMC) methods are then used to develop Bayesian inference for the proposed model. Also, some discussions on the model selection and an illustration with a cutaneous melanoma data set analysed by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)] are presented.

Veja mais

Simcluster: clustering enumeration gene expression data on the simplex space

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.

Veja mais

Trends in aortic aneurysm- and dissection-related mortality in the state of São Paulo, Brazil, 1985–2009: multiple-cause-of-death analysis

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Aortic aneurysm and dissection are important causes of death in older people. Ruptured aneurysms show catastrophic fatality rates reaching near 80%. Few population-based mortality studies have been published in the world and none in Brazil. The objective of the present study was to use multiple-cause-of-death methodology in the analysis of mortality trends related to aortic aneurysm and dissection in the state of Sao Paulo, between 1985 and 2009. Methods: We analyzed mortality data from the Sao Paulo State Data Analysis System, selecting all death certificates on which aortic aneurysm and dissection were listed as a cause-of-death. The variables sex, age, season of the year, and underlying, associated or total mentions of causes of death were studied using standardized mortality rates, proportions and historical trends. Statistical analyses were performed by chi-square goodness-of-fit and H Kruskal-Wallis tests, and variance analysis. The joinpoint regression model was used to evaluate changes in age-standardized rates trends. A p value less than 0.05 was regarded as significant. Results: Over a 25-year period, there were 42,615 deaths related to aortic aneurysm and dissection, of which 36,088 (84.7%) were identified as underlying cause and 6,527 (15.3%) as an associated cause-of-death. Dissection and ruptured aneurysms were considered as an underlying cause of death in 93% of the deaths. For the entire period, a significant increased trend of age-standardized death rates was observed in men and women, while certain non-significant decreases occurred from 1996/2004 until 2009. Abdominal aortic aneurysms and aortic dissections prevailed among men and aortic dissections and aortic aneurysms of unspecified site among women. In 1985 and 2009 death rates ratios of men to women were respectively 2.86 and 2.19, corresponding to a difference decrease between rates of 23.4%. For aortic dissection, ruptured and non-ruptured aneurysms, the overall mean ages at death were, respectively, 63.2, 68.4 and 71.6 years; while, as the underlying cause, the main associated causes of death were as follows: hemorrhages (in 43.8%/40.5%/13.9%); hypertensive diseases (in 49.2%/22.43%/24.5%) and atherosclerosis (in 14.8%/25.5%/15.3%); and, as associated causes, their principal overall underlying causes of death were diseases of the circulatory (55.7%), and respiratory (13.8%) systems and neoplasms (7.8%). A significant seasonal variation, with highest frequency in winter, occurred in deaths identified as underlying cause for aortic dissection, ruptured and non-ruptured aneurysms. Conclusions: This study introduces the methodology of multiple-causes-of-death to enhance epidemiologic knowledge of aortic aneurysm and dissection in São Paulo, Brazil. The results presented confer light to the importance of mortality statistics and the need for epidemiologic studies to understand unique trends in our own population.

Veja mais

MicroRNA expression profile in head and neck cancer: HOX-cluster embedded microRNA-196a and microRNA-10b dysregulation implicated in cell proliferation

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Abstract Background Current evidence implicates aberrant microRNA expression patterns in human malignancies; measurement of microRNA expression may have diagnostic and prognostic applications. Roles for microRNAs in head and neck squamous cell carcinomas (HNSCC) are largely unknown. HNSCC, a smoking-related cancer, is one of the most common malignancies worldwide but reliable diagnostic and prognostic markers have not been discovered so far. Some studies have evaluated the potential use of microRNA as biomarkers with clinical application in HNSCC. Methods MicroRNA expression profile of oral squamous cell carcinoma samples was determined by means of DNA microarrays. We also performed gain-of-function assays for two differentially expressed microRNA using two squamous cell carcinoma cell lines and normal oral keratinocytes. The effect of the over-expression of these molecules was evaluated by means of global gene expression profiling and cell proliferation assessment. Results Altered microRNA expression was detected for a total of 72 microRNAs. Among these we found well studied molecules, such as the miR-17-92 cluster, comprising potent oncogenic microRNA, and miR-34, recently found to interact with p53. HOX-cluster embedded miR-196a/b and miR-10b were up- and down-regulated, respectively, in tumor samples. Since validated HOX gene targets for these microRNAs are not consistently deregulated in HNSCC, we performed gain-of-function experiments, in an attempt to outline their possible role. Our results suggest that both molecules interfere in cell proliferation through distinct processes, possibly targeting a small set of genes involved in cell cycle progression. Conclusions Functional data on miRNAs in HNSCC is still scarce. Our data corroborate current literature and brings new insights into the role of microRNAs in HNSCC. We also show that miR-196a and miR-10b, not previously associated with HNSCC, may play an oncogenic role in this disease through the deregulation of cell proliferation. The study of microRNA alterations in HNSCC is an essential step to the mechanistic understanding of tumor formation and could lead to the discovery of clinically relevant biomarkers.

Veja mais

Multivariate spectroscopic methods for the analysis of solutions

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this thesis some multivariate spectroscopic methods for the analysis of solutions are proposed. Spectroscopy and multivariate data analysis form a powerful combination for obtaining both quantitative and qualitative information and it is shown how spectroscopic techniques in combination with chemometric data evaluation can be used to obtain rapid, simple and efficient analytical methods. These spectroscopic methods consisting of spectroscopic analysis, a high level of automation and chemometric data evaluation can lead to analytical methods with a high analytical capacity, and for these methods, the term high-capacity analysis (HCA) is suggested. It is further shown how chemometric evaluation of the multivariate data in chromatographic analyses decreases the need for baseline separation. The thesis is based on six papers and the chemometric tools used are experimental design, principal component analysis (PCA), soft independent modelling of class analogy (SIMCA), partial least squares regression (PLS) and parallel factor analysis (PARAFAC). The analytical techniques utilised are scanning ultraviolet-visible (UV-Vis) spectroscopy, diode array detection (DAD) used in non-column chromatographic diode array UV spectroscopy, high-performance liquid chromatography with diode array detection (HPLC-DAD) and fluorescence spectroscopy. The methods proposed are exemplified in the analysis of pharmaceutical solutions and serum proteins. In Paper I a method is proposed for the determination of the content and identity of the active compound in pharmaceutical solutions by means of UV-Vis spectroscopy, orthogonal signal correction and multivariate calibration with PLS and SIMCA classification. Paper II proposes a new method for the rapid determination of pharmaceutical solutions by the use of non-column chromatographic diode array UV spectroscopy, i.e. a conventional HPLC-DAD system without any chromatographic column connected. In Paper III an investigation is made of the ability of a control sample, of known content and identity to diagnose and correct errors in multivariate predictions something that together with use of multivariate residuals can make it possible to use the same calibration model over time. In Paper IV a method is proposed for simultaneous determination of serum proteins with fluorescence spectroscopy and multivariate calibration. Paper V proposes a method for the determination of chromatographic peak purity by means of PCA of HPLC-DAD data. In Paper VI PARAFAC is applied for the decomposition of DAD data of some partially separated peaks into the pure chromatographic, spectral and concentration profiles.

Veja mais

Computational analyses on the structure-function relation in ion channels

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Ion channels are protein molecules, embedded in the lipid bilayer of the cell membranes. They act as powerful sensing elements switching chemicalphysical stimuli into ion-fluxes. At a glance, ion channels are water-filled pores, which can open and close in response to different stimuli (gating), and one once open select the permeating ion species (selectivity). They play a crucial role in several physiological functions, like nerve transmission, muscular contraction, and secretion. Besides, ion channels can be used in technological applications for different purpose (sensing of organic molecules, DNA sequencing). As a result, there is remarkable interest in understanding the molecular determinants of the channel functioning. Nowadays, both the functional and the structural characteristics of ion channels can be experimentally solved. The purpose of this thesis was to investigate the structure-function relation in ion channels, by computational techniques. Most of the analyses focused on the mechanisms of ion conduction, and the numerical methodologies to compute the channel conductance. The standard techniques for atomistic simulation of complex molecular systems (Molecular Dynamics) cannot be routinely used to calculate ion fluxes in membrane channels, because of the high computational resources needed. The main step forward of the PhD research activity was the development of a computational algorithm for the calculation of ion fluxes in protein channels. The algorithm - based on the electrodiffusion theory - is computational inexpensive, and was used for an extensive analysis on the molecular determinants of the channel conductance. The first record of ion-fluxes through a single protein channel dates back to 1976, and since then measuring the single channel conductance has become a standard experimental procedure. Chapter 1 introduces ion channels, and the experimental techniques used to measure the channel currents. The abundance of functional data (channel currents) does not match with an equal abundance of structural data. The bacterial potassium channel KcsA was the first selective ion channels to be experimentally solved (1998), and after KcsA the structures of four different potassium channels were revealed. These experimental data inspired a new era in ion channel modeling. Once the atomic structures of channels are known, it is possible to define mathematical models based on physical descriptions of the molecular systems. These physically based models can provide an atomic description of ion channel functioning, and predict the effect of structural changes. Chapter 2 introduces the computation methods used throughout the thesis to model ion channels functioning at the atomic level. In Chapter 3 and Chapter 4 the ion conduction through potassium channels is analyzed, by an approach based on the Poisson-Nernst-Planck electrodiffusion theory. In the electrodiffusion theory ion conduction is modeled by the drift-diffusion equations, thus describing the ion distributions by continuum functions. The numerical solver of the Poisson- Nernst-Planck equations was tested in the KcsA potassium channel (Chapter 3), and then used to analyze how the atomic structure of the intracellular vestibule of potassium channels affects the conductance (Chapter 4). As a major result, a correlation between the channel conductance and the potassium concentration in the intracellular vestibule emerged. The atomic structure of the channel modulates the potassium concentration in the vestibule, thus its conductance. This mechanism explains the phenotype of the BK potassium channels, a sub-family of potassium channels with high single channel conductance. The functional role of the intracellular vestibule is also the subject of Chapter 5, where the affinity of the potassium channels hEag1 (involved in tumour-cell proliferation) and hErg (important in the cardiac cycle) for several pharmaceutical drugs was compared. Both experimental measurements and molecular modeling were used in order to identify differences in the blocking mechanism of the two channels, which could be exploited in the synthesis of selective blockers. The experimental data pointed out the different role of residue mutations in the blockage of hEag1 and hErg, and the molecular modeling provided a possible explanation based on different binding sites in the intracellular vestibule. Modeling ion channels at the molecular levels relates the functioning of a channel to its atomic structure (Chapters 3-5), and can also be useful to predict the structure of ion channels (Chapter 6-7). In Chapter 6 the structure of the KcsA potassium channel depleted from potassium ions is analyzed by molecular dynamics simulations. Recently, a surprisingly high osmotic permeability of the KcsA channel was experimentally measured. All the available crystallographic structure of KcsA refers to a channel occupied by potassium ions. To conduct water molecules potassium ions must be expelled from KcsA. The structure of the potassium-depleted KcsA channel and the mechanism of water permeation are still unknown, and have been investigated by numerical simulations. Molecular dynamics of KcsA identified a possible atomic structure of the potassium-depleted KcsA channel, and a mechanism for water permeation. The depletion from potassium ions is an extreme situation for potassium channels, unlikely in physiological conditions. However, the simulation of such an extreme condition could help to identify the structural conformations, so the functional states, accessible to potassium ion channels. The last chapter of the thesis deals with the atomic structure of the !- Hemolysin channel. !-Hemolysin is the major determinant of the Staphylococcus Aureus toxicity, and is also the prototype channel for a possible usage in technological applications. The atomic structure of !- Hemolysin was revealed by X-Ray crystallography, but several experimental evidences suggest the presence of an alternative atomic structure. This alternative structure was predicted, combining experimental measurements of single channel currents and numerical simulations. This thesis is organized in two parts, in the first part an overview on ion channels and on the numerical methods adopted throughout the thesis is provided, while the second part describes the research projects tackled in the course of the PhD programme. The aim of the research activity was to relate the functional characteristics of ion channels to their atomic structure. In presenting the different research projects, the role of numerical simulations to analyze the structure-function relation in ion channels is highlighted.

Veja mais

New approaches to open problems in gene expression microarray data

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.

Veja mais

922 resultados para functional data analysis

Filtro por publicador