795 resultados para label hierarchical clustering
Resumo:
Multicentric carpotarsal osteolysis (MCTO) is a rare skeletal dysplasia characterized by aggressive osteolysis, particularly affecting the carpal and tarsal bones, and is frequently associated with progressive renal failure. Using exome capture and next-generation sequencing in five unrelated simplex cases of MCTO, we identified previously unreported missense mutations clustering within a 51 base pair region of the single exon of MAFB, validated by Sanger sequencing. A further six unrelated simplex cases with MCTO were also heterozygous for previously unreported mutations within this same region, as were affected members of two families with autosomal-dominant MCTO. MAFB encodes a transcription factor that negatively regulates RANKL-induced osteoclastogenesis and is essential for normal renal development. Identification of this gene paves the way for development of novel therapeutic approaches for this crippling disease and provides insight into normal bone and kidney development.
Resumo:
The attributes describing a data set may often be arranged in meaningful subsets, each of which corresponds to a different aspect of the data. An unsupervised algorithm (SCAD) that simultaneously performs fuzzy clustering and aspects weighting was proposed in the literature. However, SCAD may fail and halt given certain conditions. To fix this problem, its steps are modified and then reordered to reduce the number of parameters required to be set by the user. In this paper we prove that each step of the resulting algorithm, named ASCAD, globally minimizes its cost-function with respect to the argument being optimized. The asymptotic analysis of ASCAD leads to a time complexity which is the same as that of fuzzy c-means. A hard version of the algorithm and a novel validity criterion that considers aspect weights in order to estimate the number of clusters are also described. The proposed method is assessed over several artificial and real data sets.
Resumo:
For many tree species, mating system analyses have indicated potential variations in the selfing rate and paternity correlation among fruits within individuals, among individuals within populations, among populations, and from one flowering event to another. In this study, we used eight microsatellite markers to investigate mating systems at two hierarchical levels (fruits within individuals and individuals within populations) for the insect pollinated Neotropical tree Tabebuia roseo-alba. We found that T. roseo-alba has a mixed mating system with predominantly outcrossed mating. The outcrossing rates at the population level were similar across two T. roseo-alba populations; however, the rates varied considerably among individuals within populations. The correlated paternity results at different hierarchical levels showed that there is a high probability of shared paternal parentage when comparing seeds within fruits and among fruits within plants and full-sibs occur in much higher proportion within fruits than among fruits. Significant levels of fixation index were found in both populations and biparental inbreeding is believed to be the main cause of the observed inbreeding. The number of pollen donors contributing to mating was low. Furthermore, open-pollinated seeds varied according to relatedness, including half-sibs, full-sibs, self-sibs and self- half-sibs. In both populations, the effective population size within a family (seed-tree and its offspring) was lower than expected for panmictic populations. Thus, seeds for ex situ conservation genetics, progeny tests and reforestation must be collected from a large number of seed-trees to guarantee an adequate effective population in the sample.
Resumo:
Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is an interest in studying latent variables (or latent traits). Usually such latent traits are assumed to be random variables and a convenient distribution is assigned to them. A very common choice for such a distribution has been the standard normal. Recently, Azevedo et al. [Bayesian inference for a skew-normal IRT model under the centred parameterization, Comput. Stat. Data Anal. 55 (2011), pp. 353-365] proposed a skew-normal distribution under the centred parameterization (SNCP) as had been studied in [R. B. Arellano-Valle and A. Azzalini, The centred parametrization for the multivariate skew-normal distribution, J. Multivariate Anal. 99(7) (2008), pp. 1362-1382], to model the latent trait distribution. This approach allows one to represent any asymmetric behaviour concerning the latent trait distribution. Also, they developed a Metropolis-Hastings within the Gibbs sampling (MHWGS) algorithm based on the density of the SNCP. They showed that the algorithm recovers all parameters properly. Their results indicated that, in the presence of asymmetry, the proposed model and the estimation algorithm perform better than the usual model and estimation methods. Our main goal in this paper is to propose another type of MHWGS algorithm based on a stochastic representation (hierarchical structure) of the SNCP studied in [N. Henze, A probabilistic representation of the skew-normal distribution, Scand. J. Statist. 13 (1986), pp. 271-275]. Our algorithm has only one Metropolis-Hastings step, in opposition to the algorithm developed by Azevedo et al., which has two such steps. This not only makes the implementation easier but also reduces the number of proposal densities to be used, which can be a problem in the implementation of MHWGS algorithms, as can be seen in [R.J. Patz and B.W. Junker, A straightforward approach to Markov Chain Monte Carlo methods for item response models, J. Educ. Behav. Stat. 24(2) (1999), pp. 146-178; R. J. Patz and B. W. Junker, The applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses, J. Educ. Behav. Stat. 24(4) (1999), pp. 342-366; A. Gelman, G.O. Roberts, and W.R. Gilks, Efficient Metropolis jumping rules, Bayesian Stat. 5 (1996), pp. 599-607]. Moreover, we consider a modified beta prior (which generalizes the one considered in [3]) and a Jeffreys prior for the asymmetry parameter. Furthermore, we study the sensitivity of such priors as well as the use of different kernel densities for this parameter. Finally, we assess the impact of the number of examinees, number of items and the asymmetry level on the parameter recovery. Results of the simulation study indicated that our approach performed equally as well as that in [3], in terms of parameter recovery, mainly using the Jeffreys prior. Also, they indicated that the asymmetry level has the highest impact on parameter recovery, even though it is relatively small. A real data analysis is considered jointly with the development of model fitting assessment tools. The results are compared with the ones obtained by Azevedo et al. The results indicate that using the hierarchical approach allows us to implement MCMC algorithms more easily, it facilitates diagnosis of the convergence and also it can be very useful to fit more complex skew IRT models.
Resumo:
Metronidazole is a BCS (Biopharmaceutics Classification System) class 1 drug, traditionally considered the choice drug in the infections treatment caused by protozoa and anaerobic microorganisms. This study aimed to evaluate bioequivalence between 2 different marketed 250 mg metronidazole immediate release tablets. A randomized, open-label, 2 x 2 crossover study was performed in healthy Brazilian volunteers under fasting conditions with a 7-day washout period. The formulations were administered as single oral dose and blood was sampled over 48 h. Metronidazole plasma concentrations were determined by a liquid chromatography mass spectrometry (LC-MS/MS) method. The plasma concentration vs. time profile was generated for each volunteer and the pharmacokinetic parameters C-max, T-max, AUC(0-t), AUC(0-infinity), k(e), and t(1/2) were calculated using a noncompartmental model. Bioequivalence between pharmaceutical formulations was determined by calculating 90% CIs (Confidence Intervall) for the ratios of C-max, AUC(0-t), and AUC(0-infinity) values for test and reference using log-transformed data. 22 healthy volunteers (11 men, 11 women; mean (SD) age, 28 (6.5) years [range, 21-45 years]; mean (SD) weight, 66 (9.3) kg [range, 51-81 kg]; mean (SD) height, 169 (6.5) cm [range, 156-186 cm]) were enrolled in and completed the study. The 90% CIs for C-max (0.92-1.06), AUC(0-t) (0.97-1.02), and AUC(0-infinity) (0.97-1.03) values for the test and reference products fitted in the interval of 0.80-1.25 proposed by most regulatory agencies, including the Brazilian agency ANVISA. No clinically significant adverse effects were reported. After pharmacokinetics analysis, it concluded that test 250 mg metronidazole formulation is bioequivalent to the reference product according to the Brazilian agency requirements.
Resumo:
In protein databases there is a substantial number of proteins structurally determined but without function annotation. Understanding the relationship between function and structure can be useful to predict function on a large scale. We have analyzed the similarities in global physicochemical parameters for a set of enzymes which were classified according to the four Enzyme Commission (EC) hierarchical levels. Using relevance theory we introduced a distance between proteins in the space of physicochemical characteristics. This was done by minimizing a cost function of the metric tensor built to reflect the EC classification system. Using an unsupervised clustering method on a set of 1025 enzymes, we obtained no relevant clustering formation compatible with EC classification. The distance distributions between enzymes from the same EC group and from different EC groups were compared by histograms. Such analysis was also performed using sequence alignment similarity as a distance. Our results suggest that global structure parameters are not sufficient to segregate enzymes according to EC hierarchy. This indicates that features essential for function are rather local than global. Consequently, methods for predicting function based on global attributes should not obtain high accuracy in main EC classes prediction without relying on similarities between enzymes from training and validation datasets. Furthermore, these results are consistent with a substantial number of studies suggesting that function evolves fundamentally by recruitment, i.e., a same protein motif or fold can be used to perform different enzymatic functions and a few specific amino acids (AAs) are actually responsible for enzyme activity. These essential amino acids should belong to active sites and an effective method for predicting function should be able to recognize them. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
The automatic disambiguation of word senses (i.e., the identification of which of the meanings is used in a given context for a word that has multiple meanings) is essential for such applications as machine translation and information retrieval, and represents a key step for developing the so-called Semantic Web. Humans disambiguate words in a straightforward fashion, but this does not apply to computers. In this paper we address the problem of Word Sense Disambiguation (WSD) by treating texts as complex networks, and show that word senses can be distinguished upon characterizing the local structure around ambiguous words. Our goal was not to obtain the best possible disambiguation system, but we nevertheless found that in half of the cases our approach outperforms traditional shallow methods. We show that the hierarchical connectivity and clustering of words are usually the most relevant features for WSD. The results reported here shed light on the relationship between semantic and structural parameters of complex networks. They also indicate that when combined with traditional techniques the complex network approach may be useful to enhance the discrimination of senses in large texts. Copyright (C) EPLA, 2012
Resumo:
The mechanisms responsible for containing activity in systems represented by networks are crucial in various phenomena, for example, in diseases such as epilepsy that affect the neuronal networks and for information dissemination in social networks. The first models to account for contained activity included triggering and inhibition processes, but they cannot be applied to social networks where inhibition is clearly absent. A recent model showed that contained activity can be achieved with no need of inhibition processes provided that the network is subdivided into modules (communities). In this paper, we introduce a new concept inspired in the Hebbian theory, through which containment of activity is achieved by incorporating a dynamics based on a decaying activity in a random walk mechanism preferential to the node activity. Upon selecting the decay coefficient within a proper range, we observed sustained activity in all the networks tested, namely, random, Barabasi-Albert and geographical networks. The generality of this finding was confirmed by showing that modularity is no longer needed if the dynamics based on the integrate-and-fire dynamics incorporated the decay factor. Taken together, these results provide a proof of principle that persistent, restrained network activation might occur in the absence of any particular topological structure. This may be the reason why neuronal activity does not spread out to the entire neuronal network, even when no special topological organization exists.
Resumo:
Objective. We aimed to evaluate whether the differential gene expression profiles of patients with rheumatoid arthritis (RA) could distinguish responders from nonresponders to methotrexate (MTX) and, in the case of MTX nonresponders, responsiveness to MTX plus anti-tumor necrosis factor-alpha (anti-TNF) combined therapy. Methods. We evaluated 25 patients with RA taking MTX 15-20 mg/week as a monotherapy (8 responders and 17 nonresponders). All MTX nonresponders received intliximab and were reassessed after 20 weeks to evaluate their anti-TNF responsiveness using the European League Against Rheumatism response criteria. A differential gene expression analysis from peripheral blood mononuclear cells was performed in terms of hierarchical gene clustering, and an evaluation of differentially expressed genes was performed using the significance analysis of microarrays program. Results. Hierarchical gene expression clustering discriminated MTX responders from nonresponders, and MTX plus anti-TNF responders from nonresponders. The evaluation of only highly modulated genes (fold change > 1.3 or < 0.7) yielded 5 induced (4 antiapoptotic and CCL4) and 4 repressed (4 proapoptotic) genes in MTX nonresponders compared to responders. In MTX plus anti-TNF nonresponders, the CCL4, CD83, and BCL2A1 genes were induced in relation to responders. Conclusion. Study of the gene expression profiles of RA peripheral blood cells permitted differentiation of responders from nonresponders to MTX and anti-TNF. Several candidate genes in MTX non-responders (CCL4, HTRA2, PRKCD, BCL2A1, CAV1, TNIP1 CASP8AP2, MXD1, and BTG2) and 3 genes in MTX plus anti-TNF nonresponders (CCL4, CD83, and BCL2A1) were identified for further study. (First Release July 1 2012; J Rheumatol 2012;39:1524-32; doi:10.3899/jrheum.120092)
Resumo:
Spatial data warehouses (SDWs) allow for spatial analysis together with analytical multidimensional queries over huge volumes of data. The challenge is to retrieve data related to ad hoc spatial query windows according to spatial predicates, avoiding the high cost of joining large tables. Therefore, mechanisms to provide efficient query processing over SDWs are essential. In this paper, we propose two efficient indices for SDW: the SB-index and the HSB-index. The proposed indices share the following characteristics. They enable multidimensional queries with spatial predicate for SDW and also support predefined spatial hierarchies. Furthermore, they compute the spatial predicate and transform it into a conventional one, which can be evaluated together with other conventional predicates by accessing a star-join Bitmap index. While the SB-index has a sequential data structure, the HSB-index uses a hierarchical data structure to enable spatial objects clustering and a specialized buffer-pool to decrease the number of disk accesses. The advantages of the SB-index and the HSB-index over the DBMS resources for SDW indexing (i.e. star-join computation and materialized views) were investigated through performance tests, which issued roll-up operations extended with containment and intersection range queries. The performance results showed that improvements ranged from 68% up to 99% over both the star-join computation and the materialized view. Furthermore, the proposed indices proved to be very compact, adding only less than 1% to the storage requirements. Therefore, both the SB-index and the HSB-index are excellent choices for SDW indexing. Choosing between the SB-index and the HSB-index mainly depends on the query selectivity of spatial predicates. While low query selectivity benefits the HSB-index, the SB-index provides better performance for higher query selectivity.
Resumo:
This paper addresses the m-machine no-wait flow shop problem where the set-up time of a job is separated from its processing time. The performance measure considered is the total flowtime. A new hybrid metaheuristic Genetic Algorithm-Cluster Search is proposed to solve the scheduling problem. The performance of the proposed method is evaluated and the results are compared with the best method reported in the literature. Experimental tests show superiority of the new method for the test problems set, regarding the solution quality. (c) 2012 Elsevier Ltd. All rights reserved.
Resumo:
Objective: To assess the risk factors for delayed diagnosis of uterine cervical lesions. Materials and Methods: This is a case-control study that recruited 178 women at 2 Brazilian hospitals. The cases (n = 74) were composed of women with a late diagnosis of a lesion in the uterine cervix (invasive carcinoma in any stage). The controls (n = 104) were composed of women with cervical lesions diagnosed early on (low-or high-grade intraepithelial lesions). The analysis was performed by means of logistic regression model using a hierarchical model. The socioeconomic and demographic variables were included at level I (distal). Level II (intermediate) included the personal and family antecedents and knowledge about the Papanicolaou test and human papillomavirus. Level III (proximal) encompassed the variables relating to individuals' care for their own health, gynecologic symptoms, and variables relating to access to the health care system. Results: The risk factors for late diagnosis of uterine cervical lesions were age older than 40 years (odds ratio [OR] = 10.4; 95% confidence interval [CI], 2.3-48.4), not knowing the difference between the Papanicolaou test and gynecological pelvic examinations (OR, = 2.5; 95% CI, 1.3-4.9), not thinking that the Papanicolaou test was important (odds ratio [OR], 4.2; 95% CI, 1.3-13.4), and abnormal vaginal bleeding (OR, 15.0; 95% CI, 6.5-35.0). Previous treatment for sexually transmissible disease was a protective factor (OR, 0.3; 95% CI, 0.1-0.8) for delayed diagnosis. Conclusions: Deficiencies in cervical cancer prevention programs in developing countries are not simply a matter of better provision and coverage of Papanicolaou tests. The misconception about the Papanicolaou test is a serious educational problem, as demonstrated by the present study.
Resumo:
In multi-label classification, examples can be associated with multiple labels simultaneously. The task of learning from multi-label data can be addressed by methods that transform the multi-label classification problem into several single-label classification problems. The binary relevance approach is one of these methods, where the multi-label learning task is decomposed into several independent binary classification problems, one for each label in the set of labels, and the final labels for each example are determined by aggregating the predictions from all binary classifiers. However, this approach fails to consider any dependency among the labels. Aiming to accurately predict label combinations, in this paper we propose a simple approach that enables the binary classifiers to discover existing label dependency by themselves. An experimental study using decision trees, a kernel method as well as Naive Bayes as base-learning techniques shows the potential of the proposed approach to improve the multi-label classification performance.
Resumo:
Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.
Resumo:
Background: A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes. Results: In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence. Conclusions: This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.