44 resultados para microarray data classification
Resumo:
Semisupervised learning is a machine learning approach that is able to employ both labeled and unlabeled samples in the training process. In this paper, we propose a semisupervised data classification model based on a combined random-preferential walk of particles in a network (graph) constructed from the input dataset. The particles of the same class cooperate among themselves, while the particles of different classes compete with each other to propagate class labels to the whole network. A rigorous model definition is provided via a nonlinear stochastic dynamical system and a mathematical analysis of its behavior is carried out. A numerical validation presented in this paper confirms the theoretical predictions. An interesting feature brought by the competitive-cooperative mechanism is that the proposed model can achieve good classification rates while exhibiting low computational complexity order in comparison to other network-based semisupervised algorithms. Computer simulations conducted on synthetic and real-world datasets reveal the effectiveness of the model.
Resumo:
Abstract Background To understand the molecular mechanisms underlying important biological processes, a detailed description of the gene products networks involved is required. In order to define and understand such molecular networks, some statistical methods are proposed in the literature to estimate gene regulatory networks from time-series microarray data. However, several problems still need to be overcome. Firstly, information flow need to be inferred, in addition to the correlation between genes. Secondly, we usually try to identify large networks from a large number of genes (parameters) originating from a smaller number of microarray experiments (samples). Due to this situation, which is rather frequent in Bioinformatics, it is difficult to perform statistical tests using methods that model large gene-gene networks. In addition, most of the models are based on dimension reduction using clustering techniques, therefore, the resulting network is not a gene-gene network but a module-module network. Here, we present the Sparse Vector Autoregressive model as a solution to these problems. Results We have applied the Sparse Vector Autoregressive model to estimate gene regulatory networks based on gene expression profiles obtained from time-series microarray experiments. Through extensive simulations, by applying the SVAR method to artificial regulatory networks, we show that SVAR can infer true positive edges even under conditions in which the number of samples is smaller than the number of genes. Moreover, it is possible to control for false positives, a significant advantage when compared to other methods described in the literature, which are based on ranks or score functions. By applying SVAR to actual HeLa cell cycle gene expression data, we were able to identify well known transcription factor targets. Conclusion The proposed SVAR method is able to model gene regulatory networks in frequent situations in which the number of samples is lower than the number of genes, making it possible to naturally infer partial Granger causalities without any a priori information. In addition, we present a statistical test to control the false discovery rate, which was not previously possible using other gene regulatory network models.
Resumo:
Background Vitamin D transcriptional effects were linked to tumor growth control, however, the hormone targets were determined in cell cultures exposed to supra physiological concentrations of 1,25(OH)2D3 (50-100nM). Our aim was to evaluate the transcriptional effects of 1,25(OH)2D3 in a more physiological model of breast cancer, consisting of fresh tumor slices exposed to 1,25(OH)2D3 at concentrations that can be attained in vivo. Methods Tumor samples from post-menopausal breast cancer patients were sliced and cultured for 24 hours with or without 1,25(OH)2D3 0.5nM or 100nM. Gene expression was analyzed by microarray (SAM paired analysis, FDR≤0.1) or RT-qPCR (p≤0.05, Friedman/Wilcoxon test). Expression of candidate genes was then evaluated in mammary epithelial/breast cancer lineages and cancer associated fibroblasts (CAFs), exposed or not to 1,25(OH)2D3 0.5nM, using RT-qPCR, western blot or immunocytochemistry. Results 1,25(OH)2D3 0.5nM or 100nM effects were evaluated in five tumor samples by microarray and seven and 136 genes, respectively, were up-regulated. There was an enrichment of genes containing transcription factor binding sites for the vitamin D receptor (VDR) in samples exposed to 1,25(OH)2D3 near physiological concentration. Genes up-modulated by both 1,25(OH)2D3 concentrations were CYP24A1, DPP4, CA2, EFTUD1, TKTL1, KCNK3. Expression of candidate genes was subsequently evaluated in another 16 samples by RT-qPCR and up-regulation of CYP24A1, DPP4 and CA2 by 1,25(OH)2D3 was confirmed. To evaluate whether the transcripitonal targets of 1,25(OH)2D3 0.5nM were restricted to the epithelial or stromal compartments, gene expression was examined in HB4A, C5.4, SKBR3, MDA-MB231, MCF-7 lineages and CAFs, using RT-qPCR. In epithelial cells, there was a clear induction of CYP24A1, CA2, CD14 and IL1RL1. In fibroblasts, in addition to CYP24A1 induction, there was a trend towards up-regulation of CA2, IL1RL1, and DPP4. A higher protein expression of CD14 in epithelial cells and CA2 and DPP4 in CAFs exposed to 1,25(OH)2D3 0.5nM was detected. Conclusions In breast cancer specimens a short period of 1,25(OH)2D3 exposure at near physiological concentration modestly activates the hormone transcriptional pathway. Induction of CYP24A1, CA2, DPP4, IL1RL1 expression appears to reflect 1,25(OH)2D3 effects in epithelial as well as stromal cells, however, induction of CD14 expression is likely restricted to the epithelial compartment.
Resumo:
Este artigo pretende oferecer uma visão global dos direitos da personalidade, desde a possibilidade de sua aplicação às pessoas jurídicas, passando pela superposição do estudo de seu objeto por outros ramos do Direito, assim como por dados históricos, classificação, análise jurisprudencial, doutrinária e legislativa dos pontos centrais que envolvem tais direitos.
Resumo:
This work proposes a system for classification of industrial steel pieces by means of magnetic nondestructive device. The proposed classification system presents two main stages, online system stage and off-line system stage. In online stage, the system classifies inputs and saves misclassification information in order to perform posterior analyses. In the off-line optimization stage, the topology of a Probabilistic Neural Network is optimized by a Feature Selection algorithm combined with the Probabilistic Neural Network to increase the classification rate. The proposed Feature Selection algorithm searches for the signal spectrogram by combining three basic elements: a Sequential Forward Selection algorithm, a Feature Cluster Grow algorithm with classification rate gradient analysis and a Sequential Backward Selection. Also, a trash-data recycling algorithm is proposed to obtain the optimal feedback samples selected from the misclassified ones.
Resumo:
This work proposes a method for data clustering based on complex networks theory. A data set is represented as a network by considering different metrics to establish the connection between each pair of objects. The clusters are obtained by taking into account five community detection algorithms. The network-based clustering approach is applied in two real-world databases and two sets of artificially generated data. The obtained results suggest that the exponential of the Minkowski distance is the most suitable metric to quantify the similarities between pairs of objects. In addition, the community identification method based on the greedy optimization provides the best cluster solution. We compare the network-based clustering approach with some traditional clustering algorithms and verify that it provides the lowest classification error rate. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
This paper presents a method for transforming the information of an engineering geological map into useful information for non-specialists involved in land-use planning. The method consists of classifying the engineering geological units in terms of land use capability and identifying the legal and the geologic restrictions that apply in the study area. Both informations are then superimposed over the land use and a conflict areas map is created. The analysis of these data leads to the identification of existing and forthcoming land use conflicts and enables the proposal of planning measures on a regional and local scale. The map for the regional planning was compiled at a 1:50,000 scale and encompasses the whole municipal land area where uses are mainly rural. The map for the local planning was compiled at a 1:10,000 scale and encompasses the urban area. Most of the classification and operations on maps used spatial analyst tools available in the Geographical Information System. The regional studies showed that the greater part of Analandia's territory presents appropriate land uses. The local-scale studies indicate that the majority of the densely occupied urban areas are in suitable land. Although the situation is in general positive, municipal policies should address the identified and expected land use conflicts, so that it can be further improved.
Resumo:
The attributes describing a data set may often be arranged in meaningful subsets, each of which corresponds to a different aspect of the data. An unsupervised algorithm (SCAD) that simultaneously performs fuzzy clustering and aspects weighting was proposed in the literature. However, SCAD may fail and halt given certain conditions. To fix this problem, its steps are modified and then reordered to reduce the number of parameters required to be set by the user. In this paper we prove that each step of the resulting algorithm, named ASCAD, globally minimizes its cost-function with respect to the argument being optimized. The asymptotic analysis of ASCAD leads to a time complexity which is the same as that of fuzzy c-means. A hard version of the algorithm and a novel validity criterion that considers aspect weights in order to estimate the number of clusters are also described. The proposed method is assessed over several artificial and real data sets.
Resumo:
Biogeography has been difficult to apply as a methodological approach because organismic biology is incomplete at levels where the process of formulating comparisons and analogies is complex. The study of insect biogeography became necessary because insects possess numerous evolutionary traits and play an important role as pollinators. Among insects, the euglossine bees, or orchid bees, attract interest because the study of their biology allows us to explain important steps in the evolution of social behavior and many other adaptive tradeoffs. We analyzed the distribution of morphological characteristics in Colombian orchid bees from an ecological perspective. The aim of this study was to observe the distribution of these attributes on a regional basis. Data corresponding to Colombian euglossine species were ordered with a correspondence analysis and with subsequent hierarchical clustering. Later, and based on community proprieties, we compared the resulting hierarchical model with the collection localities to seek to identify a biogeographic classification pattern. From this analysis, we derived a model that classifies the territory of Colombia into 11 biogeographic units or natural clusters. Ecological assumptions in concordance with the derived classification levels suggest that species characteristics associated with flight performance, nectar uptake, and social behavior are the factors that served to produce the current geographical structure.
Resumo:
BACKGROUND: The relationship between predictive proteins and tumors presenting cancer stem cells (CSCs) profiles in oral tumors is still poorly understood. This study aims to identify the relationship between topoisomerases I, II alpha, and III alpha and putative CSCs immunophenotype in oral squamous cell carcinoma (OSCC) and determine its influence on prognosis. METHODS: The following data were retrieved from 127 patients: age, gender, primary anatomic site, smoking and alcohol intake, recurrence, metastases, histologic classification, treatment, and survival. An immunohistochemical study for topoisomerases I, II alpha, and III alpha was performed in a tissue microarray containing 127 paraffin blocks of OSCCs. RESULTS: In univariate analysis, topoisomerases expression showed significant differences according to CSCs profiles and p53 immunoexpression, but not with survival. Topoisomerases II alpha and III alpha also showed significant relationship with lymph node metastasis. The multivariate test confirmed these associations. CONCLUSIONS: The results that all topoisomerases correlates with OSCC CSCs may indicate a role for topoisomerases in head and neck carcinogenesis. Notwithstanding, it is plausible that other members of topoisomerases family could represent novel therapeutical targets in oral squamous cell carcinoma. J Oral Pathol Med (2012) 41: 762-768
Resumo:
Complex networks have attracted increasing interest from various fields of science. It has been demonstrated that each complex network model presents specific topological structures which characterize its connectivity and dynamics. Complex network classification relies on the use of representative measurements that describe topological structures. Although there are a large number of measurements, most of them are correlated. To overcome this limitation, this paper presents a new measurement for complex network classification based on partially self-avoiding walks. We validate the measurement on a data set composed by 40000 complex networks of four well-known models. Our results indicate that the proposed measurement improves correct classification of networks compared to the traditional ones. (C) 2012 American Institute of Physics. [http://dx.doi.org/10.1063/1.4737515]
Resumo:
The reproductive performance of cattle may be influenced by several factors, but mineral imbalances are crucial in terms of direct effects on reproduction. Several studies have shown that elements such as calcium, copper, iron, magnesium, selenium, and zinc are essential for reproduction and can prevent oxidative stress. However, toxic elements such as lead, nickel, and arsenic can have adverse effects on reproduction. In this paper, we applied a simple and fast method of multi-element analysis to bovine semen samples from Zebu and European classes used in reproduction programs and artificial insemination. Samples were analyzed by inductively coupled plasma spectrometry (ICP-MS) using aqueous medium calibration and the samples were diluted in a proportion of 1:50 in a solution containing 0.01% (vol/vol) Triton X-100 and 0.5% (vol/vol) nitric acid. Rhodium, iridium, and yttrium were used as the internal standards for ICP-MS analysis. To develop a reliable method of tracing the class of bovine semen, we used data mining techniques that make it possible to classify unknown samples after checking the differentiation of known-class samples. Based on the determination of 15 elements in 41 samples of bovine semen, 3 machine-learning tools for classification were applied to determine cattle class. Our results demonstrate the potential of support vector machine (SVM), multilayer perceptron (MLP), and random forest (RF) chemometric tools to identify cattle class. Moreover, the selection tools made it possible to reduce the number of chemical elements needed from 15 to just 8.
Resumo:
Multi-element analysis of honey samples was carried out with the aim of developing a reliable method of tracing the origin of honey. Forty-two chemical elements were determined (Al, Cu, Pb, Zn, Mn, Cd, Tl, Co, Ni, Rb, Ba, Be, Bi, U, V, Fe, Pt, Pd, Te, Hf, Mo, Sn, Sb, P, La, Mg, I, Sm, Tb, Dy, Sd, Th, Pr, Nd, Tm, Yb, Lu, Gd, Ho, Er, Ce, Cr) by inductively coupled plasma mass spectrometry (ICP-MS). Then, three machine learning tools for classification and two for attribute selection were applied in order to prove that it is possible to use data mining tools to find the region where honey originated. Our results clearly demonstrate the potential of Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Random Forest (RF) chemometric tools for honey origin identification. Moreover, the selection tools allowed a reduction from 42 trace element concentrations to only 5. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
The sera of a retrospective cohort (n = 41) composed of children with well characterized cow's milk allergy collected from multiple visits were analyzed using a protein microarray system measuring four classes of immunoglobulins. The frequency of the visits, age and gender distribution reflected real situation faced by the clinicians at a pediatric reference center for food allergy in 530 Paulo, Brazil. The profiling array results have shown that total IgG and IgA share similar specificity whilst IgM and in particular IgE are distantly related. The correlation of specificity of IgE and IgA is variable amongst the patients and this relationship cannot be used to predict atopy or the onset of tolerance to milk. The array profiling technique has corroborated the clinical selection criteria for this cohort albeit it clearly suggested that 4 out of the 41 patients might have allergies other than milk origin. There was also a good correlation between the array data and ImmunoCAP results, casein in particular. By using qualitative and quantitative multivariate analysis routines it was possible to produce validated statistical models to predict with reasonable accuracy the onset of tolerance to milk proteins. If expanded to larger study groups, the array profiling in combination with the multivariate techniques show potential to improve the prognostic of milk allergic patients. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Surveillance Levels (SLs) are categories for medical patients (used in Brazil) that represent different types of medical recommendations. SLs are defined according to risk factors and the medical and developmental history of patients. Each SL is associated with specific educational and clinical measures. The objective of the present paper was to verify computer-aided, automatic assignment of SLs. The present paper proposes a computer-aided approach for automatic recommendation of SLs. The approach is based on the classification of information from patient electronic records. For this purpose, a software architecture composed of three layers was developed. The architecture is formed by a classification layer that includes a linguistic module and machine learning classification modules. The classification layer allows for the use of different classification methods, including the use of preprocessed, normalized language data drawn from the linguistic module. We report the verification and validation of the software architecture in a Brazilian pediatric healthcare institution. The results indicate that selection of attributes can have a great effect on the performance of the system. Nonetheless, our automatic recommendation of surveillance level can still benefit from improvements in processing procedures when the linguistic module is applied prior to classification. Results from our efforts can be applied to different types of medical systems. The results of systems supported by the framework presented in this paper may be used by healthcare and governmental institutions to improve healthcare services in terms of establishing preventive measures and alerting authorities about the possibility of an epidemic.