849 resultados para constrained clustering
Resumo:
Phenomena with a constrained sample space appear frequently in practice. This is the case e.g. with strictly positive data, or with compositional data, like percentages or proportions. If the natural measure of difference is not the absolute one, simple algebraic properties show that it is more convenient to work with a geometry different from the usual Euclidean geometry in real space, and with a measure different from the usual Lebesgue measure, leading to alternative models which better fit the phenomenon under study. The general approach is presented and illustrated using the normal distribution, both on the positive real line and on the D-part simplex. The original ideas of McAlister in his introduction to the lognormal distribution in 1879, are recovered and updated
Resumo:
Speaker diarization is the process of sorting speeches according to the speaker. Diarization helps to search and retrieve what a certain speaker uttered in a meeting. Applications of diarization systemsextend to other domains than meetings, for example, lectures, telephone, television, and radio. Besides, diarization enhances the performance of several speech technologies such as speaker recognition, automatic transcription, and speaker tracking. Methodologies previously used in developing diarization systems are discussed. Prior results and techniques are studied and compared. Methods such as Hidden Markov Models and Gaussian Mixture Models that are used in speaker recognition and other speech technologies are also used in speaker diarization. The objective of this thesis is to develop a speaker diarization system in meeting domain. Experimental part of this work indicates that zero-crossing rate can be used effectively in breaking down the audio stream into segments, and adaptive Gaussian Models fit adequately short audio segments. Results show that 35 Gaussian Models and one second as average length of each segment are optimum values to build a diarization system for the tested data. Uniting the segments which are uttered by same speaker is done in a bottom-up clustering by a newapproach of categorizing the mixture weights.
Management zones using fuzzy clustering based on spatial-temporal variability of soil and corn yield
Resumo:
Clustering soil and crop data can be used as a basis for the definition of management zones because the data are grouped into clusters based on the similar interaction of these variables. Therefore, the objective of this study was to identify management zones using fuzzy c-means clustering analysis based on the spatial and temporal variability of soil attributes and corn yield. The study site (18 by 250-m in size) was located in Jaboticabal, São Paulo/Brazil. Corn yield was measured in one hundred 4.5 by 10-m cells along four parallel transects (25 observations per transect) over five growing seasons between 2001 and 2010. Soil chemical and physical attributes were measured. SAS procedure MIXED was used to identify which variable(s) most influenced the spatial variability of corn yield over the five study years. Basis saturation (BS) was the variable that better related to corn yield, thus, semivariograms models were fitted for BS and corn yield and then, data values were krigged. Management Zone Analyst software was used to carry out the fuzzy c-means clustering algorithm. The optimum number of management zones can change over time, as well as the degree of agreement between the BS and corn yield management zone maps. Thus, it is very important take into account the temporal variability of crop yield and soil attributes to delineate management zones accurately.
Resumo:
This work presents a formulation of the contact with friction between elastic bodies. This is a non linear problem due to unilateral constraints (inter-penetration of bodies) and friction. The solution of this problem can be found using optimization concepts, modelling the problem as a constrained minimization problem. The Finite Element Method is used to construct approximation spaces. The minimization problem has the total potential energy of the elastic bodies as the objective function, the non-inter-penetration conditions are represented by inequality constraints, and equality constraints are used to deal with the friction. Due to the presence of two friction conditions (stick and slip), specific equality constraints are present or not according to the current condition. Since the Coulomb friction condition depends on the normal and tangential contact stresses related to the constraints of the problem, it is devised a conditional dependent constrained minimization problem. An Augmented Lagrangian Method for constrained minimization is employed to solve this problem. This method, when applied to a contact problem, presents Lagrange Multipliers which have the physical meaning of contact forces. This fact allows to check the friction condition at each iteration. These concepts make possible to devise a computational scheme which lead to good numerical results.
Resumo:
The aim of this study was to analyse mothers’ working time patterns across 22 European countries. The focus was on three questions: how much mothers prefer to work, how much they actually work, and to what degree their preferred and actual working times are (in)consistent with each other. The focus was on cross-national differences in mothers’ working time patterns, comparison of mothers’ working times to that of childless women and fathers, as well as on individual- and country-level factors that explain the variation between them. In the theoretical background, the departure point was an integrative theoretical approach where the assumption is that there are various kinds of explanations for the differences in mothers’ working time patterns – namely structural, cultural and institutional – , and that these factors are laid in two levels: individual- and country-levels. Data were extracted from the European Social Survey (ESS) 2010 / 2011. The results showed that mothers’ working time patterns, both preferred and actual working times, varied across European countries. Four clusters were formed to illustrate the differences. In the full-time pattern, full-time work was the most important form of work, leaving all other working time forms marginal. The full-time pattern was perceived in terms of preferred working times in Bulgaria and Portugal. In polarised pattern countries, fulltime work was also important, but it was accompanied by a large share of mothers not working at all. In the case of preferred working times, many Eastern and Southern European countries followed it whereas in terms of actual working times it included all Eastern and Southern European countries as well as Finland. The combination pattern was characterised by the importance of long part-time hours and full-time work. It was the preferred working time pattern in the Nordic countries, France, Slovenia, and Spain, but Belgium, Denmark, France, Norway, and Sweden followed it in terms of actual working times. The fourth cluster that described mothers’ working times was called the part-time pattern, and it was illustrated by the prevalence of short and long part-time work. In the case of preferred working times, it was followed in Belgium, Germany, Ireland, the Netherlands and Switzerland. Besides Belgium, the part-time pattern was followed in the same countries in terms of actual working times. The consistency between preferred and actual working times was rather strong in a majority of countries. However, six countries fell under different working time patterns when preferred and actual working times were compared. Comparison of working mothers’, childless women’s, and fathers’ working times showed that differences between these groups were surprisingly small. It was only in part-time pattern countries that working mothers worked significantly shorter hours than working childless women and fathers. Results therefore revealed that when mothers’ working times are under study, an important question regarding the population examined is whether it consists of all mothers or only working mothers. Results moreover supported the use of the integrative theoretical approach when studying mothers’ working time patterns. Results indicate that mothers’ working time patterns in all countries are shaped by various opportunities and constraints, which are comprised of structural, cultural, institutional, and individual-level factors.
Resumo:
The purpose of this thesis is to find out whether all the peer to peer lenders are unworthy of credit and also if there are single qualities or combinations of qualities that determine the probability of default of a person or group of people. Distinguishing qualities are searched with self-organizing maps (SOM). Qualities and groups of people found by the self-organizing map are then compared to the average. The comparison is carried out by looking how big proportion of borrowers meeting the criteria is two months or more behind with their payments. Research data used is collected by an Estonian peer to peer lending company during the years of 2011-2014. Data consists of peer to peer borrowers and information gathered from them.
Resumo:
Previous genetic association studies have overlooked the potential for biased results when analyzing different population structures in ethnically diverse populations. The purpose of the present study was to quantify this bias in two-locus association studies conducted on an admixtured urban population. We studied the genetic structure distribution of angiotensin-converting enzyme insertion/deletion (ACE I/D) and angiotensinogen methionine/threonine (M/T) polymorphisms in 382 subjects from three subgroups in a highly admixtured urban population. Group I included 150 white subjects; group II, 142 mulatto subjects, and group III, 90 black subjects. We conducted sample size simulation studies using these data in different genetic models of gene action and interaction and used genetic distance calculation algorithms to help determine the population structure for the studied loci. Our results showed a statistically different population structure distribution of both ACE I/D (P = 0.02, OR = 1.56, 95% CI = 1.05-2.33 for the D allele, white versus black subgroup) and angiotensinogen M/T polymorphism (P = 0.007, OR = 1.71, 95% CI = 1.14-2.58 for the T allele, white versus black subgroup). Different sample sizes are predicted to be determinant of the power to detect a given genotypic association with a particular phenotype when conducting two-locus association studies in admixtured populations. In addition, the postulated genetic model is also a major determinant of the power to detect any association in a given sample size. The present simulation study helped to demonstrate the complex interrelation among ethnicity, power of the association, and the postulated genetic model of action of a particular allele in the context of clustering studies. This information is essential for the correct planning and interpretation of future association studies conducted on this population.
Resumo:
This master thesis work introduces the fuzzy tolerance/equivalence relation and its application in cluster analysis. The work presents about the construction of fuzzy equivalence relations using increasing generators. Here, we investigate and research on the role of increasing generators for the creation of intersection, union and complement operators. The objective is to develop different varieties of fuzzy tolerance/equivalence relations using different varieties of increasing generators. At last, we perform a comparative study with these developed varieties of fuzzy tolerance/equivalence relations in their application to a clustering method.
Resumo:
Verbal fluency tests are used as a measure of executive functions and language, and can also be used to evaluate semantic memory. We analyzed the influence of education, gender and age on scores in a verbal fluency test using the animal category, and on number of categories, clustering and switching. We examined 257 healthy participants (152 females and 105 males) with a mean age of 49.42 years (SD = 15.75) and having a mean educational level of 5.58 (SD = 4.25) years. We asked them to name as many animals as they could. Analysis of variance was performed to determine the effect of demographic variables. No significant effect of gender was observed for any of the measures. However, age seemed to influence the number of category changes, as expected for a sensitive frontal measure, after being controlled for the effect of education. Educational level had a statistically significant effect on all measures, except for clustering. Subject performance (mean number of animals named) according to schooling was: illiterates, 12.1; 1 to 4 years, 12.3; 5 to 8 years, 14.0; 9 to 11 years, 16.7, and more than 11 years, 17.8. We observed a decrease in performance in these five educational groups over time (more items recalled during the first 15 s, followed by a progressive reduction until the fourth interval). We conclude that education had the greatest effect on the category fluency test in this Brazilian sample. Therefore, we must take care in evaluating performance in lower educational subjects.
Resumo:
Bioenergi ses som en viktig del av det nu- och framtida sortimentet av inhemsk energi. Svartlut, bark och skogsavfall täcker mer än en femtedel av den inhemska energianvändningen. Produktionsanläggningar kan fungera ofullständigt och en mängd gas-, partikelutsläpp och tjära produceras samtidigt och kan leda till beläggningsbildning och korrosion. Orsaken till dessa problem är ofta obalans i processen: vissa föreningar anrikas i processen och superjämviktstillstånd är bildas. I denna doktorsavhandling presenteras en ny beräkningsmetod, med vilken man kan beskriva superjämviktstillståndet, de viktigaste kemiska reaktionerna, processens värmeproduktion och tillståndsstorheter samtidigt. Beräkningsmetoden grundar sig på en unik frienergimetod med bivillkor som har utvecklats vid VTT. Den här så kallade CFE-metoden har tidigare utnyttjats i pappers-, metall- och kemiindustrin. Applikationer för bioenergi, vilka är demonstrerade i doktorsavhandlingen, är ett nytt användingsområde för metoden. Studien visade att beräkningsmetoden är väl lämpad för högtemperaturenergiprocesser. Superjämviktstillstånden kan uppstå i dessa processer och det kemiska systemet kan definieras med några bivillkor. Typiska tillämpningar är förbränning av biomassa och svartlut, förgasning av biomassa och uppkomsten av kväveoxider. Också olika sätt att definiera superjämviktstillstånd presenterades i doktorsavhandlingen: empiriska konstanter, empiriska hastighetsuttryck eller reaktionsmekanismer kan användas. Resultaten av doktorsavhandlingen kan utnyttjas i framtiden i processplaneringen och i undersökning av nya tekniska lösningar för förgasning, förbränningsteknik och biobränslen. Den presenterade metoden är ett bra alternativ till de traditionella mekanistiska och fenomenmodeller och kombinerar de bästa delarna av både. --------------------------------------------------------------- Bioenergia on tärkeä osa nykyistä ja tulevaa kotimaista energiapalettia. Mustalipeä, kuori ja metsätähteet kattavat yli viidenneksen kotimaisesta energian kulutuksesta. Tuotantolaitokset eivät kuitenkaan aina toimi täydellisesti ja niiden prosesseissa syntyy erilaisia kaasu- ja hiukkaspäästöjä, tervoja sekä prosessilaitteita kuluttavia saostumia ja ruostumista. Usein syy näihin ongelmiin on prosessissa esiintyvä epätasapainotila: tietyt yhdisteet rikastuvat prosessissa ja muodostavat supertasapainotiloja. Väitöstyössä kehitettiin uusi laskentamenetelmä, jolla voidaan kuvata nämä supertasapainotilat, tärkeimmät niihin liittyvät kemialliset reaktiot, prosessin lämmöntuotanto ja tilansuureet yhtä aikaa. Laskentamenetelmä perustuu VTT:llä kehitettyyn ainutlaatuiseen rajoitettuun vapaaenergiamenetelmään. Tätä niin kutsuttua CFE-menetelmää on aiemmin sovelluttu onnistuneesti muun muassa paperi-, metalli- ja kemianteollisuudessa. Väitöstyössä esitetyt bioenergiasovellukset ovat uusi sovellusalue menetelmälle. Työ osoitti laskentatavan soveltuvan hyvin korkealämpöisiin energiatekniikan prosesseihin, joissa kemiallista systeemiä rajoittavia tekijöitä oli rajallinen määrä ja siten super-tasapainotila saattoi muodostua prosessin aikana. Tyypillisiä sovelluskohteita ovat biomassan ja mustalipeän poltto, biomassan kaasutus ja typpioksidipäästöt. Työn aikana arvioitiin myös erilaisia tapoja määritellä super-tasapainojen muodostumista rajoittavat tekijät. Rajoitukset voitiin tehdä teollisiin mittauksiin pohjautuen, kokeellisia malleja hyödyntäen tai mekanistiseen reaktiokinetiikkaan perustuen. Tulevaisuudessa väitöstyön tuloksia voidaan hyödyntää prosessisuunnittelussa ja tutkittaessa uusia teknisiä ratkaisuja kaasutus- ja polttotekniikoissa sekä biopolttoaineiden tutkimuksessa. Kehitetty menetelmä tarjoaa hyvän vaihtoehdon perinteisille mekanistisille ja ilmiömalleille yhdistäen näiden parhaita puolia.
Resumo:
The distribution of psychiatric disorders and of chronic medical illnesses was studied in a population-based sample to determine whether these conditions co-occur in the same individual. A representative sample (N = 1464) of adults living in households was assessed by the Composite International Diagnostic Interview, version 1.1, as part of the São Paulo Epidemiological Catchment Area Study. The association of sociodemographic variables and psychological symptoms regarding medical illness multimorbidity (8 lifetime somatic conditions) and psychiatric multimorbidity (15 lifetime psychiatric disorders) was determined by negative binomial regression. A total of 1785 chronic medical conditions and 1163 psychiatric conditions were detected in the population concentrated in 34.1 and 20% of respondents, respectively. Subjects reporting more psychiatric disorders had more medical illnesses. Characteristics such as age range (35-59 years, risk ratio (RR) = 1.3, and more than 60 years, RR = 1.7), being separated (RR = 1.2), being a student (protective effect, RR = 0.7), being of low educational level (RR = 1.2) and being psychologically distressed (RR = 1.1) were determinants of medical conditions. Age (35-59 years, RR = 1.2, and more than 60 years, RR = 0.5), being retired (RR = 2.5), and being psychologically distressed (females, RR = 1.5, and males, RR = 1.4) were determinants of psychiatric disorders. In conclusion, psychological distress and some sociodemographic features such as age, marital status, occupational status, educational level, and gender are associated with psychiatric and medical multimorbidity. The distribution of both types of morbidity suggests the need of integrating mental health into general clinical settings.
Resumo:
The first example of a [5+2] cycloaddition reaction wherein the olefin of the vinylcyclopropyl moiety is constrained in a carbocycle was explored, and possible reasons on the lack of reactivity of the substrate were studied. A simple model substrate was synthesized and subjected to cycloaddition conditions to determine if the reason for the lack of reactivity was related to the complexity of the substrate, or if the lack of “conjugative character” of the cyclopropyl ring with respect to the olefin is responsible. A more complex bicyclic substrate possessing an angular methyl group at the ring junction was also synthesized and explored, with evidence supporting the current theory of deconjugation of the cyclopropyl moiety.
Resumo:
The goal of most clustering algorithms is to find the optimal number of clusters (i.e. fewest number of clusters). However, analysis of molecular conformations of biological macromolecules obtained from computer simulations may benefit from a larger array of clusters. The Self-Organizing Map (SOM) clustering method has the advantage of generating large numbers of clusters, but often gives ambiguous results. In this work, SOMs have been shown to be reproducible when the same conformational dataset is independently clustered multiple times (~100), with the help of the Cramérs V-index (C_v). The ability of C_v to determine which SOMs are reproduced is generalizable across different SOM source codes. The conformational ensembles produced from MD (molecular dynamics) and REMD (replica exchange molecular dynamics) simulations of the penta peptide Met-enkephalin (MET) and the 34 amino acid protein human Parathyroid Hormone (hPTH) were used to evaluate SOM reproducibility. The training length for the SOM has a huge impact on the reproducibility. Analysis of MET conformational data definitively determined that toroidal SOMs cluster data better than bordered maps due to the fact that toroidal maps do not have an edge effect. For the source code from MATLAB, it was determined that the learning rate function should be LINEAR with an initial learning rate factor of 0.05 and the SOM should be trained by a sequential algorithm. The trained SOMs can be used as a supervised classification for another dataset. The toroidal 10×10 hexagonal SOMs produced from the MATLAB program for hPTH conformational data produced three sets of reproducible clusters (27%, 15%, and 13% of 100 independent runs) which find similar partitionings to those of smaller 6×6 SOMs. The χ^2 values produced as part of the C_v calculation were used to locate clusters with identical conformational memberships on independently trained SOMs, even those with different dimensions. The χ^2 values could relate the different SOM partitionings to each other.
Resumo:
We study the problem of provision and cost-sharing of a public good in large economies where exclusion, complete or partial, is possible. We search for incentive-constrained efficient allocation rules that display fairness properties. Population monotonicity says that an increase in population should not be detrimental to anyone. Demand monotonicity states that an increase in the demand for the public good (in the sense of a first-order stochastic dominance shift in the distribution of preferences) should not be detrimental to any agent whose preferences remain unchanged. Under suitable domain restrictions, there exists a unique incentive-constrained efficient and demand-monotonic allocation rule: the so-called serial rule. In the binary public good case, the serial rule is also the only incentive-constrained efficient and population-monotonic rule.