62 resultados para statistical data analysis
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
Background: Head and neck squamous cell carcinoma (HNSCC) is one of the most common malignancies in humans. The average 5-year survival rate is one of the lowest among aggressive cancers, showing no significant improvement in recent years. When detected early, HNSCC has a good prognosis, but most patients present metastatic disease at the time of diagnosis, which significantly reduces survival rate. Despite extensive research, no molecular markers are currently available for diagnostic or prognostic purposes. Methods: Aiming to identify differentially-expressed genes involved in laryngeal squamous cell carcinoma (LSCC) development and progression, we generated individual Serial Analysis of Gene Expression (SAGE) libraries from a metastatic and non-metastatic larynx carcinoma, as well as from a normal larynx mucosa sample. Approximately 54,000 unique tags were sequenced in three libraries. Results: Statistical data analysis identified a subset of 1,216 differentially expressed tags between tumor and normal libraries, and 894 differentially expressed tags between metastatic and non-metastatic carcinomas. Three genes displaying differential regulation, one down-regulated (KRT31) and two up-regulated (BST2, MFAP2), as well as one with a non-significant differential expression pattern (GNA15) in our SAGE data were selected for real-time polymerase chain reaction (PCR) in a set of HNSCC samples. Consistent with our statistical analysis, quantitative PCR confirmed the upregulation of BST2 and MFAP2 and the downregulation of KRT31 when samples of HNSCC were compared to tumor-free surgical margins. As expected, GNA15 presented a non-significant differential expression pattern when tumor samples were compared to normal tissues. Conclusion: To the best of our knowledge, this is the first study reporting SAGE data in head and neck squamous cell tumors. Statistical analysis was effective in identifying differentially expressed genes reportedly involved in cancer development. The differential expression of a subset of genes was confirmed in additional larynx carcinoma samples and in carcinomas from a distinct head and neck subsite. This result suggests the existence of potential common biomarkers for prognosis and targeted-therapy development in this heterogeneous type of tumor.
Resumo:
Background: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. Methods/Principal Findings: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of ""what if'' situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. Conclusion/Significance: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.
Resumo:
The identification, modeling, and analysis of interactions between nodes of neural systems in the human brain have become the aim of interest of many studies in neuroscience. The complex neural network structure and its correlations with brain functions have played a role in all areas of neuroscience, including the comprehension of cognitive and emotional processing. Indeed, understanding how information is stored, retrieved, processed, and transmitted is one of the ultimate challenges in brain research. In this context, in functional neuroimaging, connectivity analysis is a major tool for the exploration and characterization of the information flow between specialized brain regions. In most functional magnetic resonance imaging (fMRI) studies, connectivity analysis is carried out by first selecting regions of interest (ROI) and then calculating an average BOLD time series (across the voxels in each cluster). Some studies have shown that the average may not be a good choice and have suggested, as an alternative, the use of principal component analysis (PCA) to extract the principal eigen-time series from the ROI(s). In this paper, we introduce a novel approach called cluster Granger analysis (CGA) to study connectivity between ROIs. The main aim of this method was to employ multiple eigen-time series in each ROI to avoid temporal information loss during identification of Granger causality. Such information loss is inherent in averaging (e.g., to yield a single ""representative"" time series per ROI). This, in turn, may lead to a lack of power in detecting connections. The proposed approach is based on multivariate statistical analysis and integrates PCA and partial canonical correlation in a framework of Granger causality for clusters (sets) of time series. We also describe an algorithm for statistical significance testing based on bootstrapping. By using Monte Carlo simulations, we show that the proposed approach outperforms conventional Granger causality analysis (i.e., using representative time series extracted by signal averaging or first principal components estimation from ROIs). The usefulness of the CGA approach in real fMRI data is illustrated in an experiment using human faces expressing emotions. With this data set, the proposed approach suggested the presence of significantly more connections between the ROIs than were detected using a single representative time series in each ROI. (c) 2010 Elsevier Inc. All rights reserved.
Resumo:
Objective: The aim of this article is to propose an integrated framework for extracting and describing patterns of disorders from medical images using a combination of linear discriminant analysis and active contour models. Methods: A multivariate statistical methodology was first used to identify the most discriminating hyperplane separating two groups of images (from healthy controls and patients with schizophrenia) contained in the input data. After this, the present work makes explicit the differences found by the multivariate statistical method by subtracting the discriminant models of controls and patients, weighted by the pooled variance between the two groups. A variational level-set technique was used to segment clusters of these differences. We obtain a label of each anatomical change using the Talairach atlas. Results: In this work all the data was analysed simultaneously rather than assuming a priori regions of interest. As a consequence of this, by using active contour models, we were able to obtain regions of interest that were emergent from the data. The results were evaluated using, as gold standard, well-known facts about the neuroanatomical changes related to schizophrenia. Most of the items in the gold standard was covered in our result set. Conclusions: We argue that such investigation provides a suitable framework for characterising the high complexity of magnetic resonance images in schizophrenia as the results obtained indicate a high sensitivity rate with respect to the gold standard. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Royal palm tree peroxidase (RPTP) is a very stable enzyme in regards to acidity, temperature, H(2)O(2), and organic solvents. Thus, RPTP is a promising candidate for developing H(2)O(2)-sensitive biosensors for diverse applications in industry and analytical chemistry. RPTP belongs to the family of class III secretory plant peroxidases, which include horseradish peroxidase isozyme C, soybean and peanut peroxidases. Here we report the X-ray structure of native RPTP isolated from royal palm tree (Roystonea regia) refined to a resolution of 1.85 angstrom. RPTP has the same overall folding pattern of the plant peroxidase superfamily, and it contains one heme group and two calcium-binding sites in similar locations. The three-dimensional structure of RPTP was solved for a hydroperoxide complex state, and it revealed a bound 2-(N-morpholino) ethanesulfonic acid molecule (MES) positioned at a putative substrate-binding secondary site. Nine N-glycosylation sites are clearly defined in the RPTP electron-density maps, revealing for the first time conformations of the glycan chains of this highly glycosylated enzyme. Furthermore, statistical coupling analysis (SCA) of the plant peroxidase superfamily was performed. This sequence-based method identified a set of evolutionarily conserved sites that mapped to regions surrounding the heme prosthetic group. The SCA matrix also predicted a set of energetically coupled residues that are involved in the maintenance of the structural folding of plant peroxidases. The combination of crystallographic data and SCA analysis provides information about the key structural elements that could contribute to explaining the unique stability of RPTP. (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
In this paper, we present an algorithm for cluster analysis that integrates aspects from cluster ensemble and multi-objective clustering. The algorithm is based on a Pareto-based multi-objective genetic algorithm, with a special crossover operator, which uses clustering validation measures as objective functions. The algorithm proposed can deal with data sets presenting different types of clusters, without the need of expertise in cluster analysis. its result is a concise set of partitions representing alternative trade-offs among the objective functions. We compare the results obtained with our algorithm, in the context of gene expression data sets, to those achieved with multi-objective Clustering with automatic K-determination (MOCK). the algorithm most closely related to ours. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
The TCABR data analysis and acquisition system has been upgraded to support a joint research programme using remote participation technologies. The architecture of the new system uses Java language as programming environment. Since application parameters and hardware in a joint experiment are complex with a large variability of components, requirements and specification solutions need to be flexible and modular, independent from operating system and computer architecture. To describe and organize the information on all the components and the connections among them, systems are developed using the extensible Markup Language (XML) technology. The communication between clients and servers uses remote procedure call (RPC) based on the XML (RPC-XML technology). The integration among Java language, XML and RPC-XML technologies allows to develop easily a standard data and communication access layer between users and laboratories using common software libraries and Web application. The libraries allow data retrieval using the same methods for all user laboratories in the joint collaboration, and the Web application allows a simple graphical user interface (GUI) access. The TCABR tokamak team in collaboration with the IPFN (Instituto de Plasmas e Fusao Nuclear, Instituto Superior Tecnico, Universidade Tecnica de Lisboa) is implementing this remote participation technologies. The first version was tested at the Joint Experiment on TCABR (TCABRJE), a Host Laboratory Experiment, organized in cooperation with the IAEA (International Atomic Energy Agency) in the framework of the IAEA Coordinated Research Project (CRP) on ""Joint Research Using Small Tokamaks"". (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
A statistical data analysis methodology was developed to evaluate the field emission properties of many samples of copper oxide nanostructured field emitters. This analysis was largely done in terms of Seppen-Katamuki (SK) charts, field strength and emission current. Some physical and mathematical models were derived to describe the effect of small electric field perturbations in the Fowler-Nordheim (F-N) equation, and then to explain the trend of the data represented in the SK charts. The field enhancement factor and the emission area parameters showed to be very sensitive to variations in the electric field for most of the samples. We have found that the anode-cathode distance is critical in the field emission characterization of samples having a non-rigid nanostructure. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
This work presents a novel approach in order to increase the recognition power of Multiscale Fractal Dimension (MFD) techniques, when applied to image classification. The proposal uses Functional Data Analysis (FDA) with the aim of enhancing the MFD technique precision achieving a more representative descriptors vector, capable of recognizing and characterizing more precisely objects in an image. FDA is applied to signatures extracted by using the Bouligand-Minkowsky MFD technique in the generation of a descriptors vector from them. For the evaluation of the obtained improvement, an experiment using two datasets of objects was carried out. A dataset was used of characters shapes (26 characters of the Latin alphabet) carrying different levels of controlled noise and a dataset of fish images contours. A comparison with the use of the well-known methods of Fourier and wavelets descriptors was performed with the aim of verifying the performance of FDA method. The descriptor vectors were submitted to Linear Discriminant Analysis (LDA) classification method and we compared the correctness rate in the classification process among the descriptors methods. The results demonstrate that FDA overcomes the literature methods (Fourier and wavelets) in the processing of information extracted from the MFD signature. In this way, the proposed method can be considered as an interesting choice for pattern recognition and image classification using fractal analysis.
Resumo:
Superoxide dismutases (SODs) are a crucial class of enzymes in the combat against intracellular free radical damage. They eliminate superoxide radicals by converting them into hydrogen peroxide and oxygen. In spite of their very different life cycles and infection strategies, the human parasites Plasmodium falciparum, Trypanosoma cruzi and Trypanosoma brucei are known to be sensitive to oxidative stress. Thus the parasite Fe-SODs have become attractive targets for novel drug development. Here we report the crystal structures of FeSODs from the trypanosomes T. brucei at 2.0 angstrom and T. cruzi at 1.9 angstrom resolution, and that from P. falciparum at a higher resolution (2.0 angstrom) to that previously reported. The homodimeric enzymes are compared to the related human MnSOD with particular attention to structural aspects which are relevant for drug design. Although the structures possess a very similar overall fold, differences between the enzymes at the entrance to the channel which leads to the active site could be identified. These lead to a slightly broader and more positively charged cavity in the parasite enzymes. Furthermore, a statistical coupling analysis (SCA) for the whole Fe/MnSOD family reveals different patterns of residue coupling for Mn and Fe SODs, as well as for the dimeric and tetrameric states. In both cases, the statistically coupled residues lie adjacent to the conserved core surrounding the metal center and may be expected to be responsible for its fine tuning, leading to metal ion specificity.
Resumo:
The crystal structures of an aspartic proteinase from Trichoderma reesei (TrAsP) and of its complex with a competitive inhibitor, pepstatin A, were solved and refined to crystallographic R-factors of 17.9% (R(free)=21.2%) at 1.70 angstrom resolution and 15.81% (R(free) = 19.2%) at 1.85 angstrom resolution, respectively. The three-dimensional structure of TrAsP is similar to structures of other members of the pepsin-like family of aspartic proteinases. Each molecule is folded in a predominantly beta-sheet bilobal structure with the N-terminal and C-terminal domains of about the same size. Structural comparison of the native structure and the TrAsP-pepstatin complex reveals that the enzyme undergoes an induced-fit, rigid-body movement upon inhibitor binding, with the N-terminal and C-terminal lobes tightly enclosing the inhibitor. Upon recognition and binding of pepstatin A, amino acid residues of the enzyme active site form a number of short hydrogen bonds to the inhibitor that may play an important role in the mechanism of catalysis and inhibition. The structures of TrAsP were used as a template for performing statistical coupling analysis of the aspartic protease family. This approach permitted, for the first time, the identification of a network of structurally linked residues putatively mediating conformational changes relevant to the function of this family of enzymes. Statistical coupling analysis reveals coevolved continuous clusters of amino acid residues that extend from the active site into the hydrophobic cores of each of the two domains and include amino acid residues from the flap regions, highlighting the importance of these parts of the protein for its enzymatic activity. (C) 2008 Elsevier Ltd. All rights reserved.
Resumo:
This paper presents the groundwater favorability mapping on a fractured terrain in the eastern portion of Sao Paulo State, Brazil. Remote sensing, airborne geophysical data, photogeologic interpretation, geologic and geomorphologic maps and geographic information system (GIS) techniques have been used. The results of cross-tabulation between these maps and well yield data allowed groundwater prospective parameters in a fractured-bedrock aquifer. These prospective parameters are the base for the favorability analysis whose principle is based on the knowledge-driven method. The mutticriteria analysis (weighted linear combination) was carried out to give a groundwater favorabitity map, because the prospective parameters have different weights of importance and different classes of each parameter. The groundwater favorability map was tested by cross-tabulation with new well yield data and spring occurrence. The wells with the highest values of productivity, as well as all the springs occurrence are situated in the excellent and good favorabitity mapped areas. It shows good coherence between the prospective parameters and the well yield and the importance of GIS techniques for definition of target areas for detail study and wells location. (c) 2008 Elsevier B.V. All rights reserved.
Resumo:
OBJECTIVES: to study the information contained in Stillbirth Registers (SBRs) in the Municipality of São Paulo. METHODS: the adequacy of the filling out of SBR forms was assessed on the basis of the SBRs (6722) made available by the FSEADE (Foundation for Statistical Data Analysis System), using a Data Completion Index (DCI), making it possible to compare the three years studied (2001-3). Variables relating to the mother and the fetus were included where the DCI was greater than 10%. Education, parity, place of residence, birth type, for the mother and weight, gestational age and underlying cause of death, for the fetus. RESULTS: the absolute stillbirth component changed little in the first two of the three years, falling slightly in the third. The variable most frequently registered was sex (98%), followed by place of residence (82.9%) and parity (70%). The data least often registered were those relating to the mother's age and schooling, 20.0% and 16.7%, respectively. The underlying cause was recorded in 46.7%, fetal weight in 37% and type of birth in 25.3%. CONCLUSIONS: the data demonstrate that the difficulty encountered in incorporating this health indicator into the traditional set of indicators is in part due to the inadequacy of the data provided on the SBR form.
Wavelet correlation between subjects: A time-scale data driven analysis for brain mapping using fMRI
Resumo:
Functional magnetic resonance imaging (fMRI) based on BOLD signal has been used to indirectly measure the local neural activity induced by cognitive tasks or stimulation. Most fMRI data analysis is carried out using the general linear model (GLM), a statistical approach which predicts the changes in the observed BOLD response based on an expected hemodynamic response function (HRF). In cases when the task is cognitively complex or in cases of diseases, variations in shape and/or delay may reduce the reliability of results. A novel exploratory method using fMRI data, which attempts to discriminate between neurophysiological signals induced by the stimulation protocol from artifacts or other confounding factors, is introduced in this paper. This new method is based on the fusion between correlation analysis and the discrete wavelet transform, to identify similarities in the time course of the BOLD signal in a group of volunteers. We illustrate the usefulness of this approach by analyzing fMRI data from normal subjects presented with standardized human face pictures expressing different degrees of sadness. The results show that the proposed wavelet correlation analysis has greater statistical power than conventional GLM or time domain intersubject correlation analysis. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
The use of inter-laboratory test comparisons to determine the performance of individual laboratories for specific tests (or for calibration) [ISO/IEC Guide 43-1, 1997. Proficiency testing by interlaboratory comparisons - Part 1: Development and operation of proficiency testing schemes] is called Proficiency Testing (PT). In this paper we propose the use of the generalized likelihood ratio test to compare the performance of the group of laboratories for specific tests relative to the assigned value and illustrate the procedure considering an actual data from the PT program in the area of volume. The proposed test extends the test criteria in use allowing to test for the consistency of the group of laboratories. Moreover, the class of elliptical distributions are considered for the obtained measurements. (C) 2008 Elsevier B.V. All rights reserved.