856 resultados para Data Driven Clustering
Resumo:
General clustering deals with weighted objects and fuzzy memberships. We investigate the group- or object-aggregation-invariance properties possessed by the relevant functionals (effective number of groups or objects, centroids, dispersion, mutual object-group information, etc.). The classical squared Euclidean case can be generalized to non-Euclidean distances, as well as to non-linear transformations of the memberships, yielding the c-means clustering algorithm as well as two presumably new procedures, the convex and pairwise convex clustering. Cluster stability and aggregation-invariance of the optimal memberships associated to the various clustering schemes are examined as well.
Resumo:
Recent single-cell studies in monkeys (Romo et al., 2004) show that the activity of neurons in the ventral premotor cortex covaries with the animal's decisions in a perceptual comparison task regarding the frequency of vibrotactile events. The firing rate response of these neurons was dependent only on the frequency differences between the two applied vibrations, the sign of that difference being the determining factor for correct task performance. We present a biophysically realistic neurodynamical model that can account for the most relevant characteristics of this decision-making-related neural activity. One of the nontrivial predictions of this model is that Weber's law will underlie the perceptual discrimination behavior. We confirmed this prediction in behavioral tests of vibrotactile discrimination in humans and propose a computational explanation of perceptual discrimination that accounts naturally for the emergence of Weber's law. We conclude that the neurodynamical mechanisms and computational principles underlying the decision-making processes in this perceptual discrimination task are consistent with a fluctuation-driven scenario in a multistable regime.
Resumo:
Zonal management in vineyards requires the prior delineation of stable yield zones within the parcel. Among the different methodologies used for zone delineation, cluster analysis of yield data from several years is one of the possibilities cited in scientific literature. However, there exist reasonable doubts concerning the cluster algorithm to be used and the number of zones that have to be delineated within a field. In this paper two different cluster algorithms have been compared (k-means and fuzzy c-means) using the grape yield data corresponding to three successive years (2002, 2003 and 2004), for a ‘Pinot Noir’ vineyard parcel. Final choice of the most recommendable algorithm has been linked to obtaining a stable pattern of spatial yield distribution and to allowing for the delineation of compact and average sized areas. The general recommendation is to use reclassified maps of two clusters or yield classes (low yield zone and high yield zone) and, consequently, the site-specific vineyard management should be based on the prior delineation of just two different zones or sub-parcels. The two tested algorithms are good options for this purpose. However, the fuzzy c-means algorithm allows for a better zoning of the parcel, forming more compact areas and with more equilibrated zonal differences over time.
Resumo:
In this paper we provide a formal account for underapplication of vowel reduction to schwa in Majorcan Catalan loanwords and learned words. On the basis of the comparison of these data with those concerning productive derivation and verbal inflection, which show analogous patterns, in this paper we also explore the existing and not yet acknowledged correlation between those processes that exhibit a particular behaviour in the loanword phonology with respect to the native phonology of the language, those processes that show lexical exceptions and those processes that underapply due to morphological reasons. In light of the analysis of the very same data and taking into account the aforementioned correlation, we show how there might exist a natural diachronic relation between two kinds of Optimality Theory constraints which are commonly used but, in principle, mutually exclusive: positional faithfulness and contextual markedness constraints. Overall, phonological productivity is proven to be crucial in three respects: first, as a context of the grammar, given that «underapplication» is systematically found in what we call the productive phonology of the dialect (including loanwords, learned words, productive derivation and verbal inflection); second, as a trigger or blocker of processes, in that the productivity or the lack of productivity of a specific process or constraint in the language is what explains whether it is challenged or not in any of the depicted situations, and, third, as a guiding principle which can explain the transition from the historical to the synchronic phonology of a linguistic variety.
Resumo:
Low-copy-number molecules are involved in many functions in cells. The intrinsic fluctuations of these numbers can enable stochastic switching between multiple steady states, inducing phenotypic variability. Herein we present a theoretical and computational study based on Master Equations and Fokker-Planck and Langevin descriptions of stochastic switching for a genetic circuit of autoactivation. We show that in this circuit the intrinsic fluctuations arising from low-copy numbers, which are inherently state-dependent, drive asymmetric switching. These theoretical results are consistent with experimental data that have been reported for the bistable system of the gallactose signaling network in yeast. Our study unravels that intrinsic fluctuations, while not required to describe bistability, are fundamental to understand stochastic switching and the dynamical relative stability of multiple states.
Resumo:
PURPOSE: According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. METHOD: About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). RESULTS: The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. CONCLUSION: Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as well as more detailed geological information.
Resumo:
The objective of this thesis is to provide a business model framework that connects customer value to firm resources and explains the change logic of the business model. Strategic supply management and especially dynamic value network management as its scope, the dissertation is based on basic economic theories, transaction cost economics and the resource-based view. The main research question is how the changing customer values should be taken into account when planning business in a networked environment. The main question is divided into questions that form the basic research problems for the separate case studies presented in the five Publications. This research adopts the case study strategy, and the constructive research approach within it. The material consists of data from several Delphi panels and expert workshops, software pilot documents, company financial statements and information on investor relations on the companies’ web sites. The cases used in this study are a mobile multi-player game value network, smart phone and “Skype mobile” services, the business models of AOL, eBay, Google, Amazon and a telecom operator, a virtual city portal business system and a multi-play offering. The main contribution of this dissertation is bridging the gap between firm resources and customer value. This has been done by theorizing the business model concept and connecting it to both the resource-based view and customer value. This thesis contributes to the resource-based view, which deals with customer value and firm resources needed to deliver the value but has a gap in explaining how the customer value changes should be connected to the changes in key resources. This dissertation also provides tools and processes for analyzing the customer value preferences of ICT services, constructing and analyzing business models and business concept innovation and conducting resource analysis.
Resumo:
Liver is unique in its capacity to regenerate in response to injury or tissue loss. Hepatocytes and other liver cells are able to proliferate and repopulate the liver. However, when this response is impaired, the contribution of hepatic progenitors becomes very relevant. Here, we present an update of recent studies on growth factors and cytokine-driven intracellular pathways that govern liver stem/progenitor cell expansion and differentiation, and the relevance of these signals in liver development, regeneration and carcinogenesis. Tyrosine kinase receptor signaling, in particular, c-Met, epidermal growth factor receptors or fibroblast growth factor receptors, contribute to proliferation, survival and differentiation of liver stem/progenitor cells. Different evidence suggests a dual role for the transforming growth factor (TGF)-β signaling pathway in liver stemness and differentiation. On the one hand, TGF-β mediates progression of differentiation from a progenitor stage, but on the other hand, it contributes to the expansion of liver stem cells. Hedgehog family ligands are necessary to promote hepatoblast proliferation but need to be shut off to permit subsequent hepatoblast differentiation. In the same line, the Wnt family and β-catenin/T-cell factor pathway is clearly involved in the maintenance of liver stemness phenotype, and its repression is necessary for liver differentiation during development. Collectively, data indicate that liver stem/progenitor cells follow their own rules and regulations. The same signals that are essential for their activation, expansion and differentiation are good candidates to contribute, under adequate conditions, to the paradigm of transformation from a pro-regenerative to a pro-tumorigenic role. From a clinical perspective, this is a fundamental issue for liver stem/progenitor cell-based therapies.
Resumo:
In recent years, new analytical tools have allowed researchers to extract historical information contained in molecular data, which has fundamentally transformed our understanding of processes ruling biological invasions. However, the use of these new analytical tools has been largely restricted to studies of terrestrial organisms despite the growing recognition that the sea contains ecosystems that are amongst the most heavily affected by biological invasions, and that marine invasion histories are often remarkably complex. Here, we studied the routes of invasion and colonisation histories of an invasive marine invertebrate Microcosmus squamiger (Ascidiacea) using microsatellite loci, mitochondrial DNA sequence data and 11 worldwide populations. Discriminant analysis of principal components, clustering methods and approximate Bayesian computation (ABC) methods showed that the most likely source of the introduced populations was a single admixture event that involved populations from two genetically differentiated ancestral regions - the western and eastern coasts of Australia. The ABC analyses revealed that colonisation of the introduced range of M. squamiger consisted of a series of non-independent introductions along the coastlines of Africa, North America and Europe. Furthermore, we inferred that the sequence of colonisation across continents was in line with historical taxonomic records - first the Mediterranean Sea and South Africa from an unsampled ancestral population, followed by sequential introductions in California and, more recently, the NE Atlantic Ocean. We revealed the most likely invasion history for world populations of M. squamiger, which is broadly characterized by the presence of multiple ancestral sources and non-independent introductions within the introduced range. The results presented here illustrate the complexity of marine invasion routes and identify a cause-effect relationship between human-mediated transport and the success of widespread marine non-indigenous species, which benefit from stepping-stone invasions and admixture processes involving different sources for the spread and expansion of their range.
Resumo:
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR-the European infrastructure for biological information-that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.
Resumo:
This thesis develops a comprehensive and a flexible statistical framework for the analysis and detection of space, time and space-time clusters of environmental point data. The developed clustering methods were applied in both simulated datasets and real-world environmental phenomena; however, only the cases of forest fires in Canton of Ticino (Switzerland) and in Portugal are expounded in this document. Normally, environmental phenomena can be modelled as stochastic point processes where each event, e.g. the forest fire ignition point, is characterised by its spatial location and occurrence in time. Additionally, information such as burned area, ignition causes, landuse, topographic, climatic and meteorological features, etc., can also be used to characterise the studied phenomenon. Thereby, the space-time pattern characterisa- tion represents a powerful tool to understand the distribution and behaviour of the events and their correlation with underlying processes, for instance, socio-economic, environmental and meteorological factors. Consequently, we propose a methodology based on the adaptation and application of statistical and fractal point process measures for both global (e.g. the Morisita Index, the Box-counting fractal method, the multifractal formalism and the Ripley's K-function) and local (e.g. Scan Statistics) analysis. Many measures describing the space-time distribution of environmental phenomena have been proposed in a wide variety of disciplines; nevertheless, most of these measures are of global character and do not consider complex spatial constraints, high variability and multivariate nature of the events. Therefore, we proposed an statistical framework that takes into account the complexities of the geographical space, where phenomena take place, by introducing the Validity Domain concept and carrying out clustering analyses in data with different constrained geographical spaces, hence, assessing the relative degree of clustering of the real distribution. Moreover, exclusively to the forest fire case, this research proposes two new methodologies to defining and mapping both the Wildland-Urban Interface (WUI) described as the interaction zone between burnable vegetation and anthropogenic infrastructures, and the prediction of fire ignition susceptibility. In this regard, the main objective of this Thesis was to carry out a basic statistical/- geospatial research with a strong application part to analyse and to describe complex phenomena as well as to overcome unsolved methodological problems in the characterisation of space-time patterns, in particular, the forest fire occurrences. Thus, this Thesis provides a response to the increasing demand for both environmental monitoring and management tools for the assessment of natural and anthropogenic hazards and risks, sustainable development, retrospective success analysis, etc. The major contributions of this work were presented at national and international conferences and published in 5 scientific journals. National and international collaborations were also established and successfully accomplished. -- Cette thèse développe une méthodologie statistique complète et flexible pour l'analyse et la détection des structures spatiales, temporelles et spatio-temporelles de données environnementales représentées comme de semis de points. Les méthodes ici développées ont été appliquées aux jeux de données simulées autant qu'A des phénomènes environnementaux réels; nonobstant, seulement le cas des feux forestiers dans le Canton du Tessin (la Suisse) et celui de Portugal sont expliqués dans ce document. Normalement, les phénomènes environnementaux peuvent être modélisés comme des processus ponctuels stochastiques ou chaque événement, par ex. les point d'ignition des feux forestiers, est déterminé par son emplacement spatial et son occurrence dans le temps. De plus, des informations tels que la surface bru^lée, les causes d'ignition, l'utilisation du sol, les caractéristiques topographiques, climatiques et météorologiques, etc., peuvent aussi être utilisées pour caractériser le phénomène étudié. Par conséquent, la définition de la structure spatio-temporelle représente un outil puissant pour compren- dre la distribution du phénomène et sa corrélation avec des processus sous-jacents tels que les facteurs socio-économiques, environnementaux et météorologiques. De ce fait, nous proposons une méthodologie basée sur l'adaptation et l'application de mesures statistiques et fractales des processus ponctuels d'analyse global (par ex. l'indice de Morisita, la dimension fractale par comptage de boîtes, le formalisme multifractal et la fonction K de Ripley) et local (par ex. la statistique de scan). Des nombreuses mesures décrivant les structures spatio-temporelles de phénomènes environnementaux peuvent être trouvées dans la littérature. Néanmoins, la plupart de ces mesures sont de caractère global et ne considèrent pas de contraintes spatiales com- plexes, ainsi que la haute variabilité et la nature multivariée des événements. A cet effet, la méthodologie ici proposée prend en compte les complexités de l'espace géographique ou le phénomène a lieu, à travers de l'introduction du concept de Domaine de Validité et l'application des mesures d'analyse spatiale dans des données en présentant différentes contraintes géographiques. Cela permet l'évaluation du degré relatif d'agrégation spatiale/temporelle des structures du phénomène observé. En plus, exclusif au cas de feux forestiers, cette recherche propose aussi deux nouvelles méthodologies pour la définition et la cartographie des zones périurbaines, décrites comme des espaces anthropogéniques à proximité de la végétation sauvage ou de la forêt, et de la prédiction de la susceptibilité à l'ignition de feu. A cet égard, l'objectif principal de cette Thèse a été d'effectuer une recherche statistique/géospatiale avec une forte application dans des cas réels, pour analyser et décrire des phénomènes environnementaux complexes aussi bien que surmonter des problèmes méthodologiques non résolus relatifs à la caractérisation des structures spatio-temporelles, particulièrement, celles des occurrences de feux forestières. Ainsi, cette Thèse fournit une réponse à la demande croissante de la gestion et du monitoring environnemental pour le déploiement d'outils d'évaluation des risques et des dangers naturels et anthro- pogéniques. Les majeures contributions de ce travail ont été présentées aux conférences nationales et internationales, et ont été aussi publiées dans 5 revues internationales avec comité de lecture. Des collaborations nationales et internationales ont été aussi établies et accomplies avec succès.
Resumo:
Peer-reviewed
Resumo:
The purpose of this thesis was to investigate creating and improving category purchasing visibility for corporate procurement by utilizing financial information. This thesis was a part of the global category driven spend analysis project of Konecranes Plc. While creating general understanding for building category driven corporate spend visibility, the IT architecture and needed purchasing parameters for spend analysis were described. In the case part of the study three manufacturing plants of Konecranes Standard Lifting, Heavy Lifting and Services business areas were examined. This included investigating the operative IT system architecture and needed processes for building corporate spend visibility. The key findings of this study were the identification of the needed processes for gathering purchasing data elements while creating corporate spend visibility in fragmented source system environment. As an outcome of the study, roadmap presenting further development areas was introduced for Konecranes.
Resumo:
Speaker diarization is the process of sorting speeches according to the speaker. Diarization helps to search and retrieve what a certain speaker uttered in a meeting. Applications of diarization systemsextend to other domains than meetings, for example, lectures, telephone, television, and radio. Besides, diarization enhances the performance of several speech technologies such as speaker recognition, automatic transcription, and speaker tracking. Methodologies previously used in developing diarization systems are discussed. Prior results and techniques are studied and compared. Methods such as Hidden Markov Models and Gaussian Mixture Models that are used in speaker recognition and other speech technologies are also used in speaker diarization. The objective of this thesis is to develop a speaker diarization system in meeting domain. Experimental part of this work indicates that zero-crossing rate can be used effectively in breaking down the audio stream into segments, and adaptive Gaussian Models fit adequately short audio segments. Results show that 35 Gaussian Models and one second as average length of each segment are optimum values to build a diarization system for the tested data. Uniting the segments which are uttered by same speaker is done in a bottom-up clustering by a newapproach of categorizing the mixture weights.