962 resultados para Clustering a large document collection
Resumo:
Autosomal recessive spastic paraplegia with thinning of corpus callosum (ARHSP-TCC) is a complex form of HSP initially described in Japan but subsequently reported to have a worldwide distribution with a particular high frequency in multiple families from the Mediterranean basin. We recently showed that ARHSP-TCC is commonly associated with mutations in SPG11/KIAA1840 on chromosome 15q. We have now screened a collection of new patients mainly originating from Italy and Brazil, in order to further ascertain the spectrum of mutations in SPG11, enlarge the ethnic origin of SPG11 patients, determine the relative frequency at the level of single Countries (i.e., Italy), and establish whether there is one or more common mutation. In 25 index cases we identified 32 mutations; 22 are novel, including 9 nonsense, 3 small deletions, 4 insertions, 1 in/del, 1 small duplication, 1 missense, 2 splice-site, and for the first time a large genomic rearrangement. This brings the total number of SPG11 mutated patients in the SPATAX collection to 111 cases in 44 families and in 17 isolated cases, from 16 Countries, all assessed using homogeneous clinical criteria. While expanding the spectrum of mutations in SPG11, this larger series also corroborated the notion that even within apparently homogeneous population a molecular diagnosis cannot be achieved without full gene sequencing. (C) 2008 Wiley-Liss, Inc.
Resumo:
This paper is concerned with the computational efficiency of fuzzy clustering algorithms when the data set to be clustered is described by a proximity matrix only (relational data) and the number of clusters must be automatically estimated from such data. A fuzzy variant of an evolutionary algorithm for relational clustering is derived and compared against two systematic (pseudo-exhaustive) approaches that can also be used to automatically estimate the number of fuzzy clusters in relational data. An extensive collection of experiments involving 18 artificial and two real data sets is reported and analyzed. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
A large amount of biological data has been produced in the last years. Important knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays an important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Several clustering algorithms have been proposed in the literature. However, each algorithm has its bias, being more adequate for particular datasets. This paper presents a mathematical formulation to support the creation of consistent clusters for biological data. Moreover. it shows a clustering algorithm to solve this formulation that uses GRASP (Greedy Randomized Adaptive Search Procedure). We compared the proposed algorithm with three known other algorithms. The proposed algorithm presented the best clustering results confirmed statistically. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
The relationship between the structure and function of biological networks constitutes a fundamental issue in systems biology. Particularly, the structure of protein-protein interaction networks is related to important biological functions. In this work, we investigated how such a resilience is determined by the large scale features of the respective networks. Four species are taken into account, namely yeast Saccharomyces cerevisiae, worm Caenorhabditis elegans, fly Drosophila melanogaster and Homo sapiens. We adopted two entropy-related measurements (degree entropy and dynamic entropy) in order to quantify the overall degree of robustness of these networks. We verified that while they exhibit similar structural variations under random node removal, they differ significantly when subjected to intentional attacks (hub removal). As a matter of fact, more complex species tended to exhibit more robust networks. More specifically, we quantified how six important measurements of the networks topology (namely clustering coefficient, average degree of neighbors, average shortest path length, diameter, assortativity coefficient, and slope of the power law degree distribution) correlated with the two entropy measurements. Our results revealed that the fraction of hubs and the average neighbor degree contribute significantly for the resilience of networks. In addition, the topological analysis of the removed hubs indicated that the presence of alternative paths between the proteins connected to hubs tend to reinforce resilience. The performed analysis helps to understand how resilience is underlain in networks and can be applied to the development of protein network models.
Resumo:
Data mining is a relatively new field of research that its objective is to acquire knowledge from large amounts of data. In medical and health care areas, due to regulations and due to the availability of computers, a large amount of data is becoming available [27]. On the one hand, practitioners are expected to use all this data in their work but, at the same time, such a large amount of data cannot be processed by humans in a short time to make diagnosis, prognosis and treatment schedules. A major objective of this thesis is to evaluate data mining tools in medical and health care applications to develop a tool that can help make rather accurate decisions. In this thesis, the goal is finding a pattern among patients who got pneumonia by clustering of lab data values which have been recorded every day. By this pattern we can generalize it to the patients who did not have been diagnosed by this disease whose lab values shows the same trend as pneumonia patients does. There are 10 tables which have been extracted from a big data base of a hospital in Jena for my work .In ICU (intensive care unit), COPRA system which is a patient management system has been used. All the tables and data stored in German Language database.
Resumo:
The Solar HeatIntegration NEtwork (SHINE) is a European research school in which 13 PhDstudents in solar thermal technologies are funded by the EU Marie-Curie program.It has five PhD course modules as well as workshops and seminars dedicated to PhDstudents both within the project as well as outside of it. The SHINE researchactivities focus on large solar heating systems and new applications: ondistrict heating, industrial processes and new storage systems. The scope ofthis paper is on systems for district heating for which there are five PhDstudents, three at universities and two at companies. The PhD students allstarted during the early part of 2014 and their initial work has concentratedon literature studies and on setting up models and data collection to be usedfor validation purposes. The PhD students will complete their studies in2017-18.
Resumo:
This article outlines many different ways of using technology to better link academic librarians and faculty, focusing particularly on how the appropriate use of technology in Acquisitions can improve the image of the library. The article presents a comprehensive overview of how technologies can be used to make Acquisitions not just a book purchasing department, but a department that works proactively to impress consituents, helping to make the library a central and prestigious part of the campus community. While the article's primary focus is on academic libraries, much of the discussion is also applicable to other types of libraries.
Resumo:
One objective of the feeder reconfiguration problem in distribution systems is to minimize the power losses for a specific load. For this problem, mathematical modeling is a nonlinear mixed integer problem that is generally hard to solve. This paper proposes an algorithm based on artificial neural network theory. In this context, clustering techniques to determine the best training set for a single neural network with generalization ability are also presented. The proposed methodology was employed for solving two electrical systems and presented good results. Moreover, the methodology can be employed for large-scale systems in real-time environment.
Resumo:
To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST),program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged.
Resumo:
The present study aimed at evaluating the histo-morphological changes resulting from different fasting periods before the collection of tissue samples in different segments of the small intestine (duodenum, jejunum and ileum) of 7-d-old male chicks of a broiler and a layer strain. A completely randomized experimental design in in a 2x7 factorial arrangement, being two strains with different growth rates (Ross 308 and HyLine® W36) and seven fasting periods (0, 2, 4, 6, 8, 10 and 12 hours ), with six replicates, totaling 84 birds. The comparison of the morphometrics of the duodenum, jejunum and ileum of broiler and layer chicks demonstrated faster digestive tract development in broilers relative to layers. The fasting period caused morphological changes in the liver and small and large intestines in both strains. Therefore, it must be highlighted that in studies involving organ weights and intestinal morphometrics, birds must not be submitted to fasting before tissue collection.
Resumo:
We derive constraints on a simple quintessential inflation model, based on a spontaneously broken Phi(4) theory, imposed by the Wilkinson Microwave Anisotropy Probe three-year data (WMAP3) and by galaxy clustering results from the Sloan Digital Sky Survey (SDSS). We find that the scale of symmetry breaking must be larger than about 3 Planck masses in order for inflation to generate acceptable values of the scalar spectral index and of the tensor-to-scalar ratio. We also show that the resulting quintessence equation of state can evolve rapidly at recent times and hence can potentially be distinguished from a simple cosmological constant in this parameter regime.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
We study a model for dynamical localization of topology using ideas from non-commutative geometry and topology in quantum mechanics. We consider a collection X of N one-dimensional manifolds and the corresponding set of boundary conditions (self-adjoint extensions) of the Dirac operator D. The set of boundary conditions encodes the topology and is parameterized by unitary matrices g. A particular geometry is described by a spectral triple x(g) = (A X, script H sign X, D(g)). We define a partition function for the sum over all g. In this model topology fluctuates but the dimension is kept fixed. We use the spectral principle to obtain an action for the set of boundary conditions. Together with invariance principles the procedure fixes the partition function for fluctuating topologies. The model has one free-parameter β and it is equivalent to a one plaquette gauge theory. We argue that topology becomes localized at β = ∞ for any value of N. Moreover, the system undergoes a third-order phase transition at β = 1 for large-N. We give a topological interpretation of the phase transition by looking how it affects the topology. © SISSA/ISAS 2004.
Resumo:
Includes bibliography
Resumo:
Wireless Sensor Networks (WSN) are a special kind of ad-hoc networks that is usually deployed in a monitoring field in order to detect some physical phenomenon. Due to the low dependability of individual nodes, small radio coverage and large areas to be monitored, the organization of nodes in small clusters is generally used. Moreover, a large number of WSN nodes is usually deployed in the monitoring area to increase WSN dependability. Therefore, the best cluster head positioning is a desirable characteristic in a WSN. In this paper, we propose a hybrid clustering algorithm based on community detection in complex networks and traditional K-means clustering technique: the QK-Means algorithm. Simulation results show that QK-Means detect communities and sub-communities thus lost message rate is decreased and WSN coverage is increased. © 2012 IEEE.