873 resultados para agglomerative clustering
Resumo:
Peer-reviewed
A new approach to segmentation based on fusing circumscribed contours, region growing and clustering
Resumo:
One of the major problems in machine vision is the segmentation of images of natural scenes. This paper presents a new proposal for the image segmentation problem which has been based on the integration of edge and region information. The main contours of the scene are detected and used to guide the posterior region growing process. The algorithm places a number of seeds at both sides of a contour allowing stating a set of concurrent growing processes. A previous analysis of the seeds permits to adjust the homogeneity criterion to the regions's characteristics. A new homogeneity criterion based on clustering analysis and convex hull construction is proposed
Resumo:
Speaker diarization is the process of sorting speeches according to the speaker. Diarization helps to search and retrieve what a certain speaker uttered in a meeting. Applications of diarization systemsextend to other domains than meetings, for example, lectures, telephone, television, and radio. Besides, diarization enhances the performance of several speech technologies such as speaker recognition, automatic transcription, and speaker tracking. Methodologies previously used in developing diarization systems are discussed. Prior results and techniques are studied and compared. Methods such as Hidden Markov Models and Gaussian Mixture Models that are used in speaker recognition and other speech technologies are also used in speaker diarization. The objective of this thesis is to develop a speaker diarization system in meeting domain. Experimental part of this work indicates that zero-crossing rate can be used effectively in breaking down the audio stream into segments, and adaptive Gaussian Models fit adequately short audio segments. Results show that 35 Gaussian Models and one second as average length of each segment are optimum values to build a diarization system for the tested data. Uniting the segments which are uttered by same speaker is done in a bottom-up clustering by a newapproach of categorizing the mixture weights.
Resumo:
The present study examines the repertory of liturgical chant known as St. Petersburg Court Chant which emerged within the Imperial Court of St. Petersburg, Russia, and appeared in print in a number of revisions during the course of the 19th century, eventually to spread throughout the Russian Empire and even abroad. The study seeks answers to questions on the essence and composition of Court Chant, its history and liturgical background, and most importantly, its musical relationship to other repertories of Eastern Slavic chant. The research questions emerge from previous literary accounts of Court Chant (summarized in the Introduction), which have tended to be inaccurate and generally not based on critical research. The study is divided into eight main chapters. Chapter 1 provides a survey of the history of Eastern Slavic chant and the Imperial Court Chapel of St. Petersburg until 1917, with special emphasis on the history of singing traditional chant in polyphony, the status of the Court Chapel as a government authority, and its endeavours in publishing church music. Chapter 2 deals with the liturgical background of Eastern chant, the chant genres, and main repertories of Eastern Slavic chant. Chapter 3 concentrates on chant sources: it introduces the musical notations utilised, after which a typology of chant books is presented. The discussion continues with a survey of the sources of Court Chant and their content, the specimens selected for closer analysis, the comparative materials from other repertories, and ends with a commentary on some chant sources that have been excluded. The comparative sources include a specimen from around the beginning of the 12th century, a few manuscripts from the 17th century, and printed and manuscript chant books from the early 18th to early 20th century, covering the geographical area that delimits to the western Ukraine, Astrakhan, Nizhny Novgorod, and the Solovetsky Monastery. Chapter 4 presents the approach and methods used in the subsequent analytical comparisons. After a survey of the pitch organization of Eastern Slavic chant, the customary harmonization strategy of traditional chant polyphony is examined, according to which a method for meaningful analysis of the harmony is proposed. The method is based on the observation that the harmonic framework of chant polyphony derives from the standard pitch collection of monodic chant known as the Church Gamut, specific pitches of which form eight harmonic regions that behave like the usual tonalities of major and harmonic minor. Because of the considerable quantity of comparative chant forms, computer-assisted statistical methods are applied to the analysis of chant melodies. The primary chant forms and their respective comparative forms have been pre-processed into reduced chant prototypes and divided into redactions. The analyses are carried out by measuring the formal dissimilarities of the primary chant forms of the Court Chant repertory against each comparative form, and also by measuring the reciprocal dissimilarities of all chant versions in a redaction, the results of which are subjected to agglomerative hierarchical clustering in order to find out how the chant forms relate to each other. The dissimilarities are determined by applying a metric dissimilarity function that is based on the Levenshtein Distance. Chapter 5 provides the melodic and harmonic analyses of generic chants (chants used for multiple texts of different lengths), i.e., chants for stichera samoglasny and troparia, Chapter 6 of pseudo-generic chants (chants that are used for multiple texts but with certain restrictions), i.e., chants for heirmoi, prokeimena, and three other hymns, and Chapter 7 of non-generic chants, covering nine chants that in the Court repertory are not shared by multiple texts. The results are summarized and evaluated in Chapter 8. Accordingly, it can be established that, contrary to previous conceptions, melodically, Court Chant is in effect a full part of the wider Eastern Slavic chant tradition. Even if it is somewhat detached from the chant versions of the Synodal square-note chant books and the local tradition of Moscow, it is particularly close to chant forms of East Ukraine and some vernacular repertories from Russia. Respectively, the harmonization strategies of Court Chant do not show significant individuality in comparison with those of the available polyphonic comparative sources, the main difference being the part-writing, which generally conforms to western common practice standard, whereas the deviations from this tend to be more significant in other analysed repertories of polyphonic chant. Thus, insofar as the subsequent prevalence of Court Chant is not based on its forceful dissemination by authorities (as suggested in previous literature but for which little tangible evidence could be found in Chapter 1), in the present author’s interpretation, Court Chant attained its dominance principally because musically it was considered sufficiently traditional, and as a chant body supported by the government, was conveniently available in print in serviceable harmonizations.
Management zones using fuzzy clustering based on spatial-temporal variability of soil and corn yield
Resumo:
Clustering soil and crop data can be used as a basis for the definition of management zones because the data are grouped into clusters based on the similar interaction of these variables. Therefore, the objective of this study was to identify management zones using fuzzy c-means clustering analysis based on the spatial and temporal variability of soil attributes and corn yield. The study site (18 by 250-m in size) was located in Jaboticabal, São Paulo/Brazil. Corn yield was measured in one hundred 4.5 by 10-m cells along four parallel transects (25 observations per transect) over five growing seasons between 2001 and 2010. Soil chemical and physical attributes were measured. SAS procedure MIXED was used to identify which variable(s) most influenced the spatial variability of corn yield over the five study years. Basis saturation (BS) was the variable that better related to corn yield, thus, semivariograms models were fitted for BS and corn yield and then, data values were krigged. Management Zone Analyst software was used to carry out the fuzzy c-means clustering algorithm. The optimum number of management zones can change over time, as well as the degree of agreement between the BS and corn yield management zone maps. Thus, it is very important take into account the temporal variability of crop yield and soil attributes to delineate management zones accurately.
Resumo:
The purpose of this thesis is to find out whether all the peer to peer lenders are unworthy of credit and also if there are single qualities or combinations of qualities that determine the probability of default of a person or group of people. Distinguishing qualities are searched with self-organizing maps (SOM). Qualities and groups of people found by the self-organizing map are then compared to the average. The comparison is carried out by looking how big proportion of borrowers meeting the criteria is two months or more behind with their payments. Research data used is collected by an Estonian peer to peer lending company during the years of 2011-2014. Data consists of peer to peer borrowers and information gathered from them.
Resumo:
Previous genetic association studies have overlooked the potential for biased results when analyzing different population structures in ethnically diverse populations. The purpose of the present study was to quantify this bias in two-locus association studies conducted on an admixtured urban population. We studied the genetic structure distribution of angiotensin-converting enzyme insertion/deletion (ACE I/D) and angiotensinogen methionine/threonine (M/T) polymorphisms in 382 subjects from three subgroups in a highly admixtured urban population. Group I included 150 white subjects; group II, 142 mulatto subjects, and group III, 90 black subjects. We conducted sample size simulation studies using these data in different genetic models of gene action and interaction and used genetic distance calculation algorithms to help determine the population structure for the studied loci. Our results showed a statistically different population structure distribution of both ACE I/D (P = 0.02, OR = 1.56, 95% CI = 1.05-2.33 for the D allele, white versus black subgroup) and angiotensinogen M/T polymorphism (P = 0.007, OR = 1.71, 95% CI = 1.14-2.58 for the T allele, white versus black subgroup). Different sample sizes are predicted to be determinant of the power to detect a given genotypic association with a particular phenotype when conducting two-locus association studies in admixtured populations. In addition, the postulated genetic model is also a major determinant of the power to detect any association in a given sample size. The present simulation study helped to demonstrate the complex interrelation among ethnicity, power of the association, and the postulated genetic model of action of a particular allele in the context of clustering studies. This information is essential for the correct planning and interpretation of future association studies conducted on this population.
Resumo:
This master thesis work introduces the fuzzy tolerance/equivalence relation and its application in cluster analysis. The work presents about the construction of fuzzy equivalence relations using increasing generators. Here, we investigate and research on the role of increasing generators for the creation of intersection, union and complement operators. The objective is to develop different varieties of fuzzy tolerance/equivalence relations using different varieties of increasing generators. At last, we perform a comparative study with these developed varieties of fuzzy tolerance/equivalence relations in their application to a clustering method.
Resumo:
Verbal fluency tests are used as a measure of executive functions and language, and can also be used to evaluate semantic memory. We analyzed the influence of education, gender and age on scores in a verbal fluency test using the animal category, and on number of categories, clustering and switching. We examined 257 healthy participants (152 females and 105 males) with a mean age of 49.42 years (SD = 15.75) and having a mean educational level of 5.58 (SD = 4.25) years. We asked them to name as many animals as they could. Analysis of variance was performed to determine the effect of demographic variables. No significant effect of gender was observed for any of the measures. However, age seemed to influence the number of category changes, as expected for a sensitive frontal measure, after being controlled for the effect of education. Educational level had a statistically significant effect on all measures, except for clustering. Subject performance (mean number of animals named) according to schooling was: illiterates, 12.1; 1 to 4 years, 12.3; 5 to 8 years, 14.0; 9 to 11 years, 16.7, and more than 11 years, 17.8. We observed a decrease in performance in these five educational groups over time (more items recalled during the first 15 s, followed by a progressive reduction until the fourth interval). We conclude that education had the greatest effect on the category fluency test in this Brazilian sample. Therefore, we must take care in evaluating performance in lower educational subjects.
Resumo:
The distribution of psychiatric disorders and of chronic medical illnesses was studied in a population-based sample to determine whether these conditions co-occur in the same individual. A representative sample (N = 1464) of adults living in households was assessed by the Composite International Diagnostic Interview, version 1.1, as part of the São Paulo Epidemiological Catchment Area Study. The association of sociodemographic variables and psychological symptoms regarding medical illness multimorbidity (8 lifetime somatic conditions) and psychiatric multimorbidity (15 lifetime psychiatric disorders) was determined by negative binomial regression. A total of 1785 chronic medical conditions and 1163 psychiatric conditions were detected in the population concentrated in 34.1 and 20% of respondents, respectively. Subjects reporting more psychiatric disorders had more medical illnesses. Characteristics such as age range (35-59 years, risk ratio (RR) = 1.3, and more than 60 years, RR = 1.7), being separated (RR = 1.2), being a student (protective effect, RR = 0.7), being of low educational level (RR = 1.2) and being psychologically distressed (RR = 1.1) were determinants of medical conditions. Age (35-59 years, RR = 1.2, and more than 60 years, RR = 0.5), being retired (RR = 2.5), and being psychologically distressed (females, RR = 1.5, and males, RR = 1.4) were determinants of psychiatric disorders. In conclusion, psychological distress and some sociodemographic features such as age, marital status, occupational status, educational level, and gender are associated with psychiatric and medical multimorbidity. The distribution of both types of morbidity suggests the need of integrating mental health into general clinical settings.
Resumo:
The goal of most clustering algorithms is to find the optimal number of clusters (i.e. fewest number of clusters). However, analysis of molecular conformations of biological macromolecules obtained from computer simulations may benefit from a larger array of clusters. The Self-Organizing Map (SOM) clustering method has the advantage of generating large numbers of clusters, but often gives ambiguous results. In this work, SOMs have been shown to be reproducible when the same conformational dataset is independently clustered multiple times (~100), with the help of the Cramérs V-index (C_v). The ability of C_v to determine which SOMs are reproduced is generalizable across different SOM source codes. The conformational ensembles produced from MD (molecular dynamics) and REMD (replica exchange molecular dynamics) simulations of the penta peptide Met-enkephalin (MET) and the 34 amino acid protein human Parathyroid Hormone (hPTH) were used to evaluate SOM reproducibility. The training length for the SOM has a huge impact on the reproducibility. Analysis of MET conformational data definitively determined that toroidal SOMs cluster data better than bordered maps due to the fact that toroidal maps do not have an edge effect. For the source code from MATLAB, it was determined that the learning rate function should be LINEAR with an initial learning rate factor of 0.05 and the SOM should be trained by a sequential algorithm. The trained SOMs can be used as a supervised classification for another dataset. The toroidal 10×10 hexagonal SOMs produced from the MATLAB program for hPTH conformational data produced three sets of reproducible clusters (27%, 15%, and 13% of 100 independent runs) which find similar partitionings to those of smaller 6×6 SOMs. The χ^2 values produced as part of the C_v calculation were used to locate clusters with identical conformational memberships on independently trained SOMs, even those with different dimensions. The χ^2 values could relate the different SOM partitionings to each other.
Resumo:
Naïvement perçu, le processus d’évolution est une succession d’événements de duplication et de mutations graduelles dans le génome qui mènent à des changements dans les fonctions et les interactions du protéome. La famille des hydrolases de guanosine triphosphate (GTPases) similaire à Ras constitue un bon modèle de travail afin de comprendre ce phénomène fondamental, car cette famille de protéines contient un nombre limité d’éléments qui diffèrent en fonctionnalité et en interactions. Globalement, nous désirons comprendre comment les mutations singulières au niveau des GTPases affectent la morphologie des cellules ainsi que leur degré d’impact sur les populations asynchrones. Mon travail de maîtrise vise à classifier de manière significative différents phénotypes de la levure Saccaromyces cerevisiae via l’analyse de plusieurs critères morphologiques de souches exprimant des GTPases mutées et natives. Notre approche à base de microscopie et d’analyses bioinformatique des images DIC (microscopie d’interférence différentielle de contraste) permet de distinguer les phénotypes propres aux cellules natives et aux mutants. L’emploi de cette méthode a permis une détection automatisée et une caractérisation des phénotypes mutants associés à la sur-expression de GTPases constitutivement actives. Les mutants de GTPases constitutivement actifs Cdc42 Q61L, Rho5 Q91H, Ras1 Q68L et Rsr1 G12V ont été analysés avec succès. En effet, l’implémentation de différents algorithmes de partitionnement, permet d’analyser des données qui combinent les mesures morphologiques de population native et mutantes. Nos résultats démontrent que l’algorithme Fuzzy C-Means performe un partitionnement efficace des cellules natives ou mutantes, où les différents types de cellules sont classifiés en fonction de plusieurs facteurs de formes cellulaires obtenus à partir des images DIC. Cette analyse démontre que les mutations Cdc42 Q61L, Rho5 Q91H, Ras1 Q68L et Rsr1 G12V induisent respectivement des phénotypes amorphe, allongé, rond et large qui sont représentés par des vecteurs de facteurs de forme distincts. Ces distinctions sont observées avec différentes proportions (morphologie mutante / morphologie native) dans les populations de mutants. Le développement de nouvelles méthodes automatisées d’analyse morphologique des cellules natives et mutantes s’avère extrêmement utile pour l’étude de la famille des GTPases ainsi que des résidus spécifiques qui dictent leurs fonctions et réseau d’interaction. Nous pouvons maintenant envisager de produire des mutants de GTPases qui inversent leur fonction en ciblant des résidus divergents. La substitution fonctionnelle est ensuite détectée au niveau morphologique grâce à notre nouvelle stratégie quantitative. Ce type d’analyse peut également être transposé à d’autres familles de protéines et contribuer de manière significative au domaine de la biologie évolutive.
Resumo:
S’insérant dans les domaines de la Lecture et de l’Analyse de Textes Assistées par Ordinateur (LATAO), de la Gestion Électronique des Documents (GÉD), de la visualisation de l’information et, en partie, de l’anthropologie, cette recherche exploratoire propose l’expérimentation d’une méthodologie descriptive en fouille de textes afin de cartographier thématiquement un corpus de textes anthropologiques. Plus précisément, nous souhaitons éprouver la méthode de classification hiérarchique ascendante (CHA) pour extraire et analyser les thèmes issus de résumés de mémoires et de thèses octroyés de 1985 à 2009 (1240 résumés), par les départements d’anthropologie de l’Université de Montréal et de l’Université Laval, ainsi que le département d’histoire de l’Université Laval (pour les résumés archéologiques et ethnologiques). En première partie de mémoire, nous présentons notre cadre théorique, c'est-à-dire que nous expliquons ce qu’est la fouille de textes, ses origines, ses applications, les étapes méthodologiques puis, nous complétons avec une revue des principales publications. La deuxième partie est consacrée au cadre méthodologique et ainsi, nous abordons les différentes étapes par lesquelles ce projet fut conduit; la collecte des données, le filtrage linguistique, la classification automatique, pour en nommer que quelques-unes. Finalement, en dernière partie, nous présentons les résultats de notre recherche, en nous attardant plus particulièrement sur deux expérimentations. Nous abordons également la navigation thématique et les approches conceptuelles en thématisation, par exemple, en anthropologie, la dichotomie culture ̸ biologie. Nous terminons avec les limites de ce projet et les pistes d’intérêts pour de futures recherches.
Resumo:
An Overview of known spatial clustering algorithms The space of interest can be the two-dimensional abstraction of the surface of the earth or a man-made space like the layout of a VLSI design, a volume containing a model of the human brain, or another 3d-space representing the arrangement of chains of protein molecules. The data consists of geometric information and can be either discrete or continuous. The explicit location and extension of spatial objects define implicit relations of spatial neighborhood (such as topological, distance and direction relations) which are used by spatial data mining algorithms. Therefore, spatial data mining algorithms are required for spatial characterization and spatial trend analysis. Spatial data mining or knowledge discovery in spatial databases differs from regular data mining in analogous with the differences between non-spatial data and spatial data. The attributes of a spatial object stored in a database may be affected by the attributes of the spatial neighbors of that object. In addition, spatial location, and implicit information about the location of an object, may be exactly the information that can be extracted through spatial data mining
Resumo:
In this paper, moving flock patterns are mined from spatio- temporal datasets by incorporating a clustering algorithm. A flock is defined as the set of data that move together for a certain continuous amount of time. Finding out moving flock patterns using clustering algorithms is a potential method to find out frequent patterns of movement in large trajectory datasets. In this approach, SPatial clusteRing algoRithm thrOugh sWarm intelligence (SPARROW) is the clustering algorithm used. The advantage of using SPARROW algorithm is that it can effectively discover clusters of widely varying sizes and shapes from large databases. Variations of the proposed method are addressed and also the experimental results show that the problem of scalability and duplicate pattern formation is addressed. This method also reduces the number of patterns produced