Biblioteca Digital

813 resultados para microarray data classification

A complex networks approach for data clustering

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work proposes a method for data clustering based on complex networks theory. A data set is represented as a network by considering different metrics to establish the connection between each pair of objects. The clusters are obtained by taking into account five community detection algorithms. The network-based clustering approach is applied in two real-world databases and two sets of artificially generated data. The obtained results suggest that the exponential of the Minkowski distance is the most suitable metric to quantify the similarities between pairs of objects. In addition, the community identification method based on the greedy optimization provides the best cluster solution. We compare the network-based clustering approach with some traditional clustering algorithms and verify that it provides the lowest classification error rate. (C) 2012 Elsevier B.V. All rights reserved.

Engineering geological data in support of municipal land use planning-a case study in Analandia, southeast Brazil

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a method for transforming the information of an engineering geological map into useful information for non-specialists involved in land-use planning. The method consists of classifying the engineering geological units in terms of land use capability and identifying the legal and the geologic restrictions that apply in the study area. Both informations are then superimposed over the land use and a conflict areas map is created. The analysis of these data leads to the identification of existing and forthcoming land use conflicts and enables the proposal of planning measures on a regional and local scale. The map for the regional planning was compiled at a 1:50,000 scale and encompasses the whole municipal land area where uses are mainly rural. The map for the local planning was compiled at a 1:10,000 scale and encompasses the urban area. Most of the classification and operations on maps used spatial analyst tools available in the Geographical Information System. The regional studies showed that the greater part of Analandia's territory presents appropriate land uses. The local-scale studies indicate that the majority of the densely occupied urban areas are in suitable land. Although the situation is in general positive, municipal policies should address the identified and expected land use conflicts, so that it can be further improved.

Automatic aspect discrimination in data clustering

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The attributes describing a data set may often be arranged in meaningful subsets, each of which corresponds to a different aspect of the data. An unsupervised algorithm (SCAD) that simultaneously performs fuzzy clustering and aspects weighting was proposed in the literature. However, SCAD may fail and halt given certain conditions. To fix this problem, its steps are modified and then reordered to reduce the number of parameters required to be set by the user. In this paper we prove that each step of the resulting algorithm, named ASCAD, globally minimizes its cost-function with respect to the argument being optimized. The asymptotic analysis of ASCAD leads to a time complexity which is the same as that of fuzzy c-means. A hard version of the algorithm and a novel validity criterion that considers aspect weights in order to estimate the number of clusters are also described. The proposed method is assessed over several artificial and real data sets.

The Ecological Basis for Biogeographic Classification: an Example in Orchid Bees (Apidae: Euglossini)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Biogeography has been difficult to apply as a methodological approach because organismic biology is incomplete at levels where the process of formulating comparisons and analogies is complex. The study of insect biogeography became necessary because insects possess numerous evolutionary traits and play an important role as pollinators. Among insects, the euglossine bees, or orchid bees, attract interest because the study of their biology allows us to explain important steps in the evolution of social behavior and many other adaptive tradeoffs. We analyzed the distribution of morphological characteristics in Colombian orchid bees from an ecological perspective. The aim of this study was to observe the distribution of these attributes on a regional basis. Data corresponding to Colombian euglossine species were ordered with a correspondence analysis and with subsequent hierarchical clustering. Later, and based on community proprieties, we compared the resulting hierarchical model with the collection localities to seek to identify a biogeographic classification pattern. From this analysis, we derived a model that classifies the territory of Colombia into 11 biogeographic units or natural clusters. Ecological assumptions in concordance with the derived classification levels suggest that species characteristics associated with flight performance, nectar uptake, and social behavior are the factors that served to produce the current geographical structure.

Topoisomerase expression in oral squamous cell carcinoma: relationship with cancer stem cells profiles and lymph node metastasis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: The relationship between predictive proteins and tumors presenting cancer stem cells (CSCs) profiles in oral tumors is still poorly understood. This study aims to identify the relationship between topoisomerases I, II alpha, and III alpha and putative CSCs immunophenotype in oral squamous cell carcinoma (OSCC) and determine its influence on prognosis. METHODS: The following data were retrieved from 127 patients: age, gender, primary anatomic site, smoking and alcohol intake, recurrence, metastases, histologic classification, treatment, and survival. An immunohistochemical study for topoisomerases I, II alpha, and III alpha was performed in a tissue microarray containing 127 paraffin blocks of OSCCs. RESULTS: In univariate analysis, topoisomerases expression showed significant differences according to CSCs profiles and p53 immunoexpression, but not with survival. Topoisomerases II alpha and III alpha also showed significant relationship with lymph node metastasis. The multivariate test confirmed these associations. CONCLUSIONS: The results that all topoisomerases correlates with OSCC CSCs may indicate a role for topoisomerases in head and neck carcinogenesis. Notwithstanding, it is plausible that other members of topoisomerases family could represent novel therapeutical targets in oral squamous cell carcinoma. J Oral Pathol Med (2012) 41: 762-768

Complex network classification using partially self-avoiding deterministic walks

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Complex networks have attracted increasing interest from various fields of science. It has been demonstrated that each complex network model presents specific topological structures which characterize its connectivity and dynamics. Complex network classification relies on the use of representative measurements that describe topological structures. Although there are a large number of measurements, most of them are correlated. To overcome this limitation, this paper presents a new measurement for complex network classification based on partially self-avoiding walks. We validate the measurement on a data set composed by 40000 complex networks of four well-known models. Our results indicate that the proposed measurement improves correct classification of networks compared to the traditional ones. (C) 2012 American Institute of Physics. [http://dx.doi.org/10.1063/1.4737515]

Determination of trace elements in bovine semen samples by inductively coupled plasma mass spectrometry and data mining techniques for identification of bovine class

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The reproductive performance of cattle may be influenced by several factors, but mineral imbalances are crucial in terms of direct effects on reproduction. Several studies have shown that elements such as calcium, copper, iron, magnesium, selenium, and zinc are essential for reproduction and can prevent oxidative stress. However, toxic elements such as lead, nickel, and arsenic can have adverse effects on reproduction. In this paper, we applied a simple and fast method of multi-element analysis to bovine semen samples from Zebu and European classes used in reproduction programs and artificial insemination. Samples were analyzed by inductively coupled plasma spectrometry (ICP-MS) using aqueous medium calibration and the samples were diluted in a proportion of 1:50 in a solution containing 0.01% (vol/vol) Triton X-100 and 0.5% (vol/vol) nitric acid. Rhodium, iridium, and yttrium were used as the internal standards for ICP-MS analysis. To develop a reliable method of tracing the class of bovine semen, we used data mining techniques that make it possible to classify unknown samples after checking the differentiation of known-class samples. Based on the determination of 15 elements in 41 samples of bovine semen, 3 machine-learning tools for classification were applied to determine cattle class. Our results demonstrate the potential of support vector machine (SVM), multilayer perceptron (MLP), and random forest (RF) chemometric tools to identify cattle class. Moreover, the selection tools made it possible to reduce the number of chemical elements needed from 15 to just 8.

Multi-element determination in Brazilian honey samples by inductively coupled plasma mass spectrometry and estimation of geographic origin with data mining techniques

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multi-element analysis of honey samples was carried out with the aim of developing a reliable method of tracing the origin of honey. Forty-two chemical elements were determined (Al, Cu, Pb, Zn, Mn, Cd, Tl, Co, Ni, Rb, Ba, Be, Bi, U, V, Fe, Pt, Pd, Te, Hf, Mo, Sn, Sb, P, La, Mg, I, Sm, Tb, Dy, Sd, Th, Pr, Nd, Tm, Yb, Lu, Gd, Ho, Er, Ce, Cr) by inductively coupled plasma mass spectrometry (ICP-MS). Then, three machine learning tools for classification and two for attribute selection were applied in order to prove that it is possible to use data mining tools to find the region where honey originated. Our results clearly demonstrate the potential of Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Random Forest (RF) chemometric tools for honey origin identification. Moreover, the selection tools allowed a reduction from 42 trace element concentrations to only 5. (C) 2012 Elsevier Ltd. All rights reserved.

Prediction of tolerance in children with IgE mediated cow's milk allergy by microarray profiling and chemometric approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The sera of a retrospective cohort (n = 41) composed of children with well characterized cow's milk allergy collected from multiple visits were analyzed using a protein microarray system measuring four classes of immunoglobulins. The frequency of the visits, age and gender distribution reflected real situation faced by the clinicians at a pediatric reference center for food allergy in 530 Paulo, Brazil. The profiling array results have shown that total IgG and IgA share similar specificity whilst IgM and in particular IgE are distantly related. The correlation of specificity of IgE and IgA is variable amongst the patients and this relationship cannot be used to predict atopy or the onset of tolerance to milk. The array profiling technique has corroborated the clinical selection criteria for this cohort albeit it clearly suggested that 4 out of the 41 patients might have allergies other than milk origin. There was also a good correlation between the array data and ImmunoCAP results, casein in particular. By using qualitative and quantitative multivariate analysis routines it was possible to produce validated statistical models to predict with reasonable accuracy the onset of tolerance to milk proteins. If expanded to larger study groups, the array profiling in combination with the multivariate techniques show potential to improve the prognostic of milk allergic patients. (C) 2012 Elsevier B.V. All rights reserved.

Using Machine Learning Classifiers to Assist Healthcare-Related Decisions: Classification of Electronic Patient Records

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Surveillance Levels (SLs) are categories for medical patients (used in Brazil) that represent different types of medical recommendations. SLs are defined according to risk factors and the medical and developmental history of patients. Each SL is associated with specific educational and clinical measures. The objective of the present paper was to verify computer-aided, automatic assignment of SLs. The present paper proposes a computer-aided approach for automatic recommendation of SLs. The approach is based on the classification of information from patient electronic records. For this purpose, a software architecture composed of three layers was developed. The architecture is formed by a classification layer that includes a linguistic module and machine learning classification modules. The classification layer allows for the use of different classification methods, including the use of preprocessed, normalized language data drawn from the linguistic module. We report the verification and validation of the software architecture in a Brazilian pediatric healthcare institution. The results indicate that selection of attributes can have a great effect on the performance of the system. Nonetheless, our automatic recommendation of surveillance level can still benefit from improvements in processing procedures when the linguistic module is applied prior to classification. Results from our efforts can be applied to different types of medical systems. The results of systems supported by the framework presented in this paper may be used by healthcare and governmental institutions to improve healthcare services in terms of establishing preventive measures and alerting authorities about the possibility of an epidemic.

Elemental abundances and classification of carbon-enhanced metal-poor stars

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a detailed study of carbon-enhanced metal-poor (CEMP) stars, based on high-resolution spectroscopic observations of a sample of 18 stars. The stellar spectra for this sample were obtained at the 4.2 m William Herschel Telescope in 2001 and 2002, using the Utrecht Echelle Spectrograph, at a resolving power R similar to 52 000 and S/N similar to 40, covering the wavelength range lambda lambda 3700-5700 angstrom. The atmospheric parameters determined for this sample indicate temperatures ranging from 4750 K to 7100 K, log g from 1.5 to 4.3, and metallicities -3.0 <= [Fe/H]<=-1.7. Elemental abundances for C, Na, Mg, Sc, Ti, Cr, Cu, Zn, Sr, Y, Zr, Ba, La, Ce, Nd, Sm, Eu, Gd, Dy are determined. Abundances for an additional 109 stars were taken from the literature and combined with the data of our sample. The literature sample reveals a lack of reliable abundance estimates for species that might be associated with the r-process elements for about 67% of CEMP stars, preventing a complete understanding of this class of stars, since [Ba/Eu] ratios are used to classify them. Although eight stars in our observed sample are also found in the literature sample, Eu abundances or limits are determined for four of these stars for the first time. From the observed correlations between C, Ba, and Eu, we argue that the CEMP-r/s class has the same astronomical origin as CEMP-s stars, highlighting the need for a more complete understanding of Eu production.

A New Subgeneric Classification of Rhipsalis (Cactoideae, Cactaceae)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most Cactaceae have succulent stems and inhabit dry or arid areas, but some are epiphytes of humid regions. Rhipsalis is the largest genus of epiphytic cacti. Species of Rhipsalis are notoriously difficult to identify, and the subgeneric classification of the genus has remained controversial. Between 1837 and 1995, eight different subgeneric classifications have been proposed for Rhipsalis. The most comprehensive taxonomic treatment of the genus recognized five subgenera, Phyllarthrorhipsalis, Rhipsalis, Epallagogonium, Calamorhipsalis, and Erythrorhipsalis, characterized mainly by stem morphology. Here, molecular phylogenetic information combined with morphological data is used to re-evaluate the former subgeneric classifications proposed for the genus. Three monophyletic subgenera are recognized, Rhipsalis, Calamorhipsalis and Erythrorhipsalis, which are mainly characterized by floral traits. The changes proposed include expanding the circumscription of Rhipsalis by the inclusion of species previously included in Phyllarthrorhipsalis and Epallagogoniwn and recognizing a broader Calamorhipsalis, also including species from subgenus Epallagogonium. The circumscription of Erythrorhipsalis remains unchanged. For each subgenus a list of synonyms, a brief description and a list of species included are presented. A key for the identification of subgenera is also provided.

Semi-supervised dimensionality reduction based on partial least squares for visual analysis of high dimensional data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dimensionality reduction is employed for visual data analysis as a way to obtaining reduced spaces for high dimensional data or to mapping data directly into 2D or 3D spaces. Although techniques have evolved to improve data segregation on reduced or visual spaces, they have limited capabilities for adjusting the results according to user's knowledge. In this paper, we propose a novel approach to handling both dimensionality reduction and visualization of high dimensional data, taking into account user's input. It employs Partial Least Squares (PLS), a statistical tool to perform retrieval of latent spaces focusing on the discriminability of the data. The method employs a training set for building a highly precise model that can then be applied to a much larger data set very effectively. The reduced data set can be exhibited using various existing visualization techniques. The training data is important to code user's knowledge into the loop. However, this work also devises a strategy for calculating PLS reduced spaces when no training data is available. The approach produces increasingly precise visual mappings as the user feeds back his or her knowledge and is capable of working with small and unbalanced training sets.

Identification of protein expression signatures in gastric carcinomas using clustering analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background and Aim: The identification of gastric carcinomas (GC) has traditionally been based on histomorphology. Recently, DNA microarrays have successfully been used to identify tumors through clustering of the expression profiles. Random forest clustering is widely used for tissue microarrays and other immunohistochemical data, because it handles highly-skewed tumor marker expressions well, and weighs the contribution of each marker according to its relatedness with other tumor markers. In the present study, we e identified biologically- and clinically-meaningful groups of GC by hierarchical clustering analysis of immunohistochemical protein expression. Methods: We selected 28 proteins (p16, p27, p21, cyclin D1, cyclin A, cyclin B1, pRb, p53, c-met, c-erbB-2, vascular endothelial growth factor, transforming growth factor [TGF]-beta I, TGF-beta II, MutS homolog-2, bcl-2, bax, bak, bcl-x, adenomatous polyposis coli, clathrin, E-cadherin, beta-catenin, mucin (MUC) 1, MUC2, MUC5AC, MUC6, matrix metalloproteinase [ MMP]-2, and MMP-9) to be investigated by immunohistochemistry in 482 GC. The analyses of the data were done using a random forest-clustering method. Results: Proteins related to cell cycle, growth factor, cell motility, cell adhesion, apoptosis, and matrix remodeling were highly expressed in GC. We identified protein expressions associated with poor survival in diffuse-type GC. Conclusions: Based on the expression analysis of 28 proteins, we identified two groups of GC that could not be explained by any clinicopathological variables, and a subgroup of long-surviving diffuse-type GC patients with a distinct molecular profile. These results provide not only a new molecular basis for understanding the biological properties of GC, but also better prediction of survival than the classic pathological grouping.

Multivariate analyses of UV-Vis absorption spectral data from cachaca wood extracts: a model to classify aged Brazilian cachacas according to the wood species used

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multivariate analyses of UV-Vis spectral data from cachaca wood extracts provide a simple and robust model to classify aged Brazilian cachacas according to the wood species used in the maturation barrels. The model is based on inspection of 93 extracts of oak and different Brazilian wood species by a non-aged cachaca used as an extraction solvent. Application of PCA (Principal Components Analysis) and HCA (Hierarchical Cluster Analysis) leads to identification of 6 clusters of cachaca wood extracts (amburana, amendoim, balsamo, castanheira, jatoba, and oak). LDA (Linear Discriminant Analysis) affords classification of 10 different wood species used in the cachaca extracts (amburana, amendoim, balsamo, cabreuva-parda, canela-sassafras, castanheira, jatoba, jequitiba-rosa, louro-canela, and oak) with an accuracy ranging from 80% (amendoim and castanheira) to 100% (balsamo and jequitiba-rosa). The methodology provides a low-cost alternative to methods based on liquid chromatography and mass spectrometry to classify cachacas aged in barrels that are composed of different wood species.

«
1
2
...
40
41
42
43
44
45
46
...
54
55
»