925 resultados para Data clustering. Fuzzy C-Means. Cluster centers initialization. Validation indices
Resumo:
In this paper, we consider active sampling to label pixels grouped with hierarchical clustering. The objective of the method is to match the data relationships discovered by the clustering algorithm with the user's desired class semantics. The first is represented as a complete tree to be pruned and the second is iteratively provided by the user. The active learning algorithm proposed searches the pruning of the tree that best matches the labels of the sampled points. By choosing the part of the tree to sample from according to current pruning's uncertainty, sampling is focused on most uncertain clusters. This way, large clusters for which the class membership is already fixed are no longer queried and sampling is focused on division of clusters showing mixed labels. The model is tested on a VHR image in a multiclass classification setting. The method clearly outperforms random sampling in a transductive setting, but cannot generalize to unseen data, since it aims at optimizing the classification of a given cluster structure.
Resumo:
Dissolved organic matter (DOM) is a complex mixture of organic compounds, ubiquitous in marine and freshwater systems. Fluorescence spectroscopy, by means of Excitation-Emission Matrices (EEM), has become an indispensable tool to study DOM sources, transport and fate in aquatic ecosystems. However the statistical treatment of large and heterogeneous EEM data sets still represents an important challenge for biogeochemists. Recently, Self-Organising Maps (SOM) has been proposed as a tool to explore patterns in large EEM data sets. SOM is a pattern recognition method which clusterizes and reduces the dimensionality of input EEMs without relying on any assumption about the data structure. In this paper, we show how SOM, coupled with a correlation analysis of the component planes, can be used both to explore patterns among samples, as well as to identify individual fluorescence components. We analysed a large and heterogeneous EEM data set, including samples from a river catchment collected under a range of hydrological conditions, along a 60-km downstream gradient, and under the influence of different degrees of anthropogenic impact. According to our results, chemical industry effluents appeared to have unique and distinctive spectral characteristics. On the other hand, river samples collected under flash flood conditions showed homogeneous EEM shapes. The correlation analysis of the component planes suggested the presence of four fluorescence components, consistent with DOM components previously described in the literature. A remarkable strength of this methodology was that outlier samples appeared naturally integrated in the analysis. We conclude that SOM coupled with a correlation analysis procedure is a promising tool for studying large and heterogeneous EEM data sets.
Resumo:
Weaver syndrome, first described in 1974, is characterized by tall stature, a typical facial appearance, and variable intellectual disability. In 2011, mutations in the histone methyltransferase, EZH2, were shown to cause Weaver syndrome. To date, we have identified 48 individuals with EZH2 mutations. The mutations were primarily missense mutations occurring throughout the gene, with some clustering in the SET domain (12/48). Truncating mutations were uncommon (4/48) and only identified in the final exon, after the SET domain. Through analyses of clinical data and facial photographs of EZH2 mutation-positive individuals, we have shown that the facial features can be subtle and the clinical diagnosis of Weaver syndrome is thus challenging, especially in older individuals. However, tall stature is very common, reported in >90% of affected individuals. Intellectual disability is also common, present in ~80%, but is highly variable and frequently mild. Additional clinical features which may help in stratifying individuals to EZH2 mutation testing include camptodactyly, soft, doughy skin, umbilical hernia, and a low, hoarse cry. Considerable phenotypic overlap between Sotos and Weaver syndromes is also evident. The identification of an EZH2 mutation can therefore provide an objective means of confirming a subtle presentation of Weaver syndrome and/or distinguishing Weaver and Sotos syndromes. As mutation testing becomes increasingly accessible and larger numbers of EZH2 mutation-positive individuals are identified, knowledge of the clinical spectrum and prognostic implications of EZH2 mutations should improve.
Resumo:
Background: Current advances in genomics, proteomics and other areas of molecular biology make the identification and reconstruction of novel pathways an emerging area of great interest. One such class of pathways is involved in the biogenesis of Iron-Sulfur Clusters (ISC). Results: Our goal is the development of a new approach based on the use and combination of mathematical, theoretical and computational methods to identify the topology of a target network. In this approach, mathematical models play a central role for the evaluation of the alternative network structures that arise from literature data-mining, phylogenetic profiling, structural methods, and human curation. As a test case, we reconstruct the topology of the reaction and regulatory network for the mitochondrial ISC biogenesis pathway in S. cerevisiae. Predictions regarding how proteins act in ISC biogenesis are validated by comparison with published experimental results. For example, the predicted role of Arh1 and Yah1 and some of the interactions we predict for Grx5 both matches experimental evidence. A putative role for frataxin in directly regulating mitochondrial iron import is discarded from our analysis, which agrees with also published experimental results. Additionally, we propose a number of experiments for testing other predictions and further improve the identification of the network structure. Conclusion: We propose and apply an iterative in silico procedure for predictive reconstruction of the network topology of metabolic pathways. The procedure combines structural bioinformatics tools and mathematical modeling techniques that allow the reconstruction of biochemical networks. Using the Iron Sulfur cluster biogenesis in S. cerevisiae as a test case we indicate how this procedure can be used to analyze and validate the network model against experimental results. Critical evaluation of the obtained results through this procedure allows devising new wet lab experiments to confirm its predictions or provide alternative explanations for further improving the models.
A priori parameterisation of the CERES soil-crop models and tests against several European data sets
Resumo:
Mechanistic soil-crop models have become indispensable tools to investigate the effect of management practices on the productivity or environmental impacts of arable crops. Ideally these models may claim to be universally applicable because they simulate the major processes governing the fate of inputs such as fertiliser nitrogen or pesticides. However, because they deal with complex systems and uncertain phenomena, site-specific calibration is usually a prerequisite to ensure their predictions are realistic. This statement implies that some experimental knowledge on the system to be simulated should be available prior to any modelling attempt, and raises a tremendous limitation to practical applications of models. Because the demand for more general simulation results is high, modellers have nevertheless taken the bold step of extrapolating a model tested within a limited sample of real conditions to a much larger domain. While methodological questions are often disregarded in this extrapolation process, they are specifically addressed in this paper, and in particular the issue of models a priori parameterisation. We thus implemented and tested a standard procedure to parameterize the soil components of a modified version of the CERES models. The procedure converts routinely-available soil properties into functional characteristics by means of pedo-transfer functions. The resulting predictions of soil water and nitrogen dynamics, as well as crop biomass, nitrogen content and leaf area index were compared to observations from trials conducted in five locations across Europe (southern Italy, northern Spain, northern France and northern Germany). In three cases, the model’s performance was judged acceptable when compared to experimental errors on the measurements, based on a test of the model’s root mean squared error (RMSE). Significant deviations between observations and model outputs were however noted in all sites, and could be ascribed to various model routines. In decreasing importance, these were: water balance, the turnover of soil organic matter, and crop N uptake. A better match to field observations could therefore be achieved by visually adjusting related parameters, such as field-capacity water content or the size of soil microbial biomass. As a result, model predictions fell within the measurement errors in all sites for most variables, and the model’s RMSE was within the range of published values for similar tests. We conclude that the proposed a priori method yields acceptable simulations with only a 50% probability, a figure which may be greatly increased through a posteriori calibration. Modellers should thus exercise caution when extrapolating their models to a large sample of pedo-climatic conditions for which they have only limited information.
Resumo:
PLFC is a first-order possibilistic logic dealing with fuzzy constants and fuzzily restricted quantifiers. The refutation proof method in PLFC is mainly based on a generalized resolution rule which allows an implicit graded unification among fuzzy constants. However, unification for precise object constants is classical. In order to use PLFC for similarity-based reasoning, in this paper we extend a Horn-rule sublogic of PLFC with similarity-based unification of object constants. The Horn-rule sublogic of PLFC we consider deals only with disjunctive fuzzy constants and it is equipped with a simple and efficient version of PLFC proof method. At the semantic level, it is extended by equipping each sort with a fuzzy similarity relation, and at the syntactic level, by fuzzily “enlarging” each non-fuzzy object constant in the antecedent of a Horn-rule by means of a fuzzy similarity relation.
Resumo:
BACKGROUND: Worldwide data for cancer survival are scarce. We aimed to initiate worldwide surveillance of cancer survival by central analysis of population-based registry data, as a metric of the effectiveness of health systems, and to inform global policy on cancer control. METHODS: Individual tumour records were submitted by 279 population-based cancer registries in 67 countries for 25·7 million adults (age 15-99 years) and 75 000 children (age 0-14 years) diagnosed with cancer during 1995-2009 and followed up to Dec 31, 2009, or later. We looked at cancers of the stomach, colon, rectum, liver, lung, breast (women), cervix, ovary, and prostate in adults, and adult and childhood leukaemia. Standardised quality control procedures were applied; errors were corrected by the registry concerned. We estimated 5-year net survival, adjusted for background mortality in every country or region by age (single year), sex, and calendar year, and by race or ethnic origin in some countries. Estimates were age-standardised with the International Cancer Survival Standard weights. FINDINGS: 5-year survival from colon, rectal, and breast cancers has increased steadily in most developed countries. For patients diagnosed during 2005-09, survival for colon and rectal cancer reached 60% or more in 22 countries around the world; for breast cancer, 5-year survival rose to 85% or higher in 17 countries worldwide. Liver and lung cancer remain lethal in all nations: for both cancers, 5-year survival is below 20% everywhere in Europe, in the range 15-19% in North America, and as low as 7-9% in Mongolia and Thailand. Striking rises in 5-year survival from prostate cancer have occurred in many countries: survival rose by 10-20% between 1995-99 and 2005-09 in 22 countries in South America, Asia, and Europe, but survival still varies widely around the world, from less than 60% in Bulgaria and Thailand to 95% or more in Brazil, Puerto Rico, and the USA. For cervical cancer, national estimates of 5-year survival range from less than 50% to more than 70%; regional variations are much wider, and improvements between 1995-99 and 2005-09 have generally been slight. For women diagnosed with ovarian cancer in 2005-09, 5-year survival was 40% or higher only in Ecuador, the USA, and 17 countries in Asia and Europe. 5-year survival for stomach cancer in 2005-09 was high (54-58%) in Japan and South Korea, compared with less than 40% in other countries. By contrast, 5-year survival from adult leukaemia in Japan and South Korea (18-23%) is lower than in most other countries. 5-year survival from childhood acute lymphoblastic leukaemia is less than 60% in several countries, but as high as 90% in Canada and four European countries, which suggests major deficiencies in the management of a largely curable disease. INTERPRETATION: International comparison of survival trends reveals very wide differences that are likely to be attributable to differences in access to early diagnosis and optimum treatment. Continuous worldwide surveillance of cancer survival should become an indispensable source of information for cancer patients and researchers and a stimulus for politicians to improve health policy and health-care systems. FUNDING: Canadian Partnership Against Cancer (Toronto, Canada), Cancer Focus Northern Ireland (Belfast, UK), Cancer Institute New South Wales (Sydney, Australia), Cancer Research UK (London, UK), Centers for Disease Control and Prevention (Atlanta, GA, USA), Swiss Re (London, UK), Swiss Cancer Research foundation (Bern, Switzerland), Swiss Cancer League (Bern, Switzerland), and University of Kentucky (Lexington, KY, USA).
Resumo:
OBJECTIVE: In contrast to conventional (CONV) neuromuscular electrical stimulation (NMES), the use of "wide-pulse, high-frequencies" (WPHF) can generate higher forces than expected by the direct activation of motor axons alone. We aimed at investigating the occurrence, magnitude, variability and underlying neuromuscular mechanisms of these "Extra Forces" (EF). METHODS: Electrically-evoked isometric plantar flexion force was recorded in 42 healthy subjects. Additionally, twitch potentiation, H-reflex and M-wave responses were assessed in 13 participants. CONV (25Hz, 0.05ms) and WPHF (100Hz, 1ms) NMES consisted of five stimulation trains (20s on-90s off). RESULTS: K-means clustering analysis disclosed a responder rate of almost 60%. Within this group of responders, force significantly increased from 4% to 16% of the maximal voluntary contraction force and H-reflexes were depressed after WPHF NMES. In contrast, non-responders showed neither EF nor H-reflex depression. Twitch potentiation and resting EMG data were similar between groups. Interestingly, a large inter- and intrasubject variability of EF was observed. CONCLUSION: The responder percentage was overestimated in previous studies. SIGNIFICANCE: This study proposes a novel methodological framework for unraveling the neurophysiological mechanisms involved in EF and provides further evidence for a central contribution to EF in responders.
Resumo:
Cerebral, ocular, dental, auricular, skeletal anomalies (CODAS) syndrome (MIM 600373) was first described and named by Shehib et al, in 1991 in a single patient. The anomalies referred to in the acronym are as follows: cerebral-developmental delay, ocular-cataracts, dental-aberrant cusp morphology and delayed eruption, auricular-malformations of the external ear, and skeletal-spondyloepiphyseal dysplasia. This distinctive constellation of anatomical findings should allow easy recognition but despite this only four apparently sporadic patients have been reported in the last 20 years indicating that the full phenotype is indeed very rare with perhaps milder or a typical presentations that are allelic but without sufficient phenotypic resemblance to permit clinical diagnosis. We performed exome sequencing in three patients (an isolated case and a brother and sister sib pair) with classical features of CODAS. Sanger sequencing was used to confirm results as well as for mutation discovery in a further four unrelated patients ascertained via their skeletal features. Compound heterozygous or homozygous mutations in LONP1 were found in all (8 separate mutations; 6 missense, 1 nonsense, 1 small in-frame deletion) thus establishing the genetic basis of CODAS and the pattern of inheritance (autosomal recessive). LONP1 encodes an enzyme of bacterial ancestry that participates in protein turnover within the mitochondrial matrix. The mutations cluster at the ATP-binding and proteolytic domains of the enzyme. Biallelic inheritance and clustering of mutations confirm dysfunction of LONP1 activity as the molecular basis of CODAS but the pathogenesis remains to be explored.
Resumo:
Durant les deux derniers siècles avant notre ère, des objets de type italique, c'est-à-dire caractérisant la culture matérielle de l'Italie tardo-républicaine apparaissent progressivement en Gaule. L'identification de ces objets du quotidien et leur analyse typologique et contextuelle permettent une approche renouvelée du phénomène de romanisation de la Gaule. L'objectif de cette thèse est de mettre en exergue les modalités chronologiques, spatiales et culturelles de la diffusion de ce type de mobilier. La confrontation des résultats avec les données issues des études céramologiques et architecturales permet de brosser un tableau affiné du processus d'acculturation. La nature des sites, ainsi que la diversité des types d'objets considérés permettent de souligner la variété des comportements des populations locales face à la réception de ces mobiliers exogènes. Que ce soit dans le commerce ou en intégrant l'armée romaine, les élites locales ont joué un rôle majeur au sein des interactions avec l'Italie, de la diffusion de ces objets et de l'intégration des nouveaux modèles italiques. -- For the last two centuries before our era, italic type objects, which means objects that are characterizing the material culture of Tardo-republican Italia, are progressively appearing in Gaul. The identification of these everyday objects and their typological and contextual analysis allow a renewed approach of the Gaul romanization phenomenon. The objective of this thesis is to highlight, the chronological, spatial and cultural modalities of the diffusion of such furniture. The comparison of the results with the data provided by the ceramological and architectural studies allows to improve the description of the acculturation process.The behavioral diversity towards the reception of exogenous furniture is underlined by the nature of the sites and the diversity of the considered objects. Local elites, whether by participating in commercial exchanges or by joining the army, have played a major role in the interactions with Italia, from the diffusion of these objects to the integration of the new italic models.
Resumo:
Flood simulation studies use spatial-temporal rainfall data input into distributed hydrological models. A correct description of rainfall in space and in time contributes to improvements on hydrological modelling and design. This work is focused on the analysis of 2-D convective structures (rain cells), whose contribution is especially significant in most flood events. The objective of this paper is to provide statistical descriptors and distribution functions for convective structure characteristics of precipitation systems producing floods in Catalonia (NE Spain). To achieve this purpose heavy rainfall events recorded between 1996 and 2000 have been analysed. By means of weather radar, and applying 2-D radar algorithms a distinction between convective and stratiform precipitation is made. These data are introduced and analyzed with a GIS. In a first step different groups of connected pixels with convective precipitation are identified. Only convective structures with an area greater than 32 km2 are selected. Then, geometric characteristics (area, perimeter, orientation and dimensions of the ellipse), and rainfall statistics (maximum, mean, minimum, range, standard deviation, and sum) of these structures are obtained and stored in a database. Finally, descriptive statistics for selected characteristics are calculated and statistical distributions are fitted to the observed frequency distributions. Statistical analyses reveal that the Generalized Pareto distribution for the area and the Generalized Extreme Value distribution for the perimeter, dimensions, orientation and mean areal precipitation are the statistical distributions that best fit the observed ones of these parameters. The statistical descriptors and the probability distribution functions obtained are of direct use as an input in spatial rainfall generators.
Resumo:
Dissolved organic matter (DOM) is a complex mixture of organic compounds, ubiquitous in marine and freshwater systems. Fluorescence spectroscopy, by means of Excitation-Emission Matrices (EEM), has become an indispensable tool to study DOM sources, transport and fate in aquatic ecosystems. However the statistical treatment of large and heterogeneous EEM data sets still represents an important challenge for biogeochemists. Recently, Self-Organising Maps (SOM) has been proposed as a tool to explore patterns in large EEM data sets. SOM is a pattern recognition method which clusterizes and reduces the dimensionality of input EEMs without relying on any assumption about the data structure. In this paper, we show how SOM, coupled with a correlation analysis of the component planes, can be used both to explore patterns among samples, as well as to identify individual fluorescence components. We analysed a large and heterogeneous EEM data set, including samples from a river catchment collected under a range of hydrological conditions, along a 60-km downstream gradient, and under the influence of different degrees of anthropogenic impact. According to our results, chemical industry effluents appeared to have unique and distinctive spectral characteristics. On the other hand, river samples collected under flash flood conditions showed homogeneous EEM shapes. The correlation analysis of the component planes suggested the presence of four fluorescence components, consistent with DOM components previously described in the literature. A remarkable strength of this methodology was that outlier samples appeared naturally integrated in the analysis. We conclude that SOM coupled with a correlation analysis procedure is a promising tool for studying large and heterogeneous EEM data sets.
Resumo:
We present molecular dynamics (MD) simulations results for dense fluids of ultrasoft, fully penetrable particles. These are a binary mixture and a polydisperse system of particles interacting via the generalized exponential model, which is known to yield cluster crystal phases for the corresponding monodisperse systems. Because of the dispersity in the particle size, the systems investigated in this work do not crystallize and form disordered cluster phases. The clusteringtransition appears as a smooth crossover to a regime in which particles are mostly located in clusters, isolated particles being infrequent. The analysis of the internal cluster structure reveals microsegregation of the big and small particles, with a strong homo-coordination in the binary mixture. Upon further lowering the temperature below the clusteringtransition, the motion of the clusters" centers-of-mass slows down dramatically, giving way to a cluster glass transition. In the cluster glass, the diffusivities remain finite and display an activated temperature dependence, indicating that relaxation in the cluster glass occurs via particle hopping in a nearly arrested matrix of clusters. Finally we discuss the influence of the microscopic dynamics on the transport properties by comparing the MD results with Monte Carlo simulations.
Resumo:
Several clinical studies have reported that EEG synchrony is affected by Alzheimer’s disease (AD). In this paper a frequency band analysis of AD EEG signals is presented, with the aim of improving the diagnosis of AD using EEG signals. In this paper, multiple synchrony measures are assessed through statistical tests (Mann–Whitney U test), including correlation, phase synchrony and Granger causality measures. Moreover, linear discriminant analysis (LDA) is conducted with those synchrony measures as features. For the data set at hand, the frequency range (5-6Hz) yields the best accuracy for diagnosing AD, which lies within the classical theta band (4-8Hz). The corresponding classification error is 4.88% for directed transfer function (DTF) Granger causality measure. Interestingly, results show that EEG of AD patients is more synchronous than in healthy subjects within the optimized range 5-6Hz, which is in sharp contrast with the loss of synchrony in AD EEG reported in many earlier studies. This new finding may provide new insights about the neurophysiology of AD. Additional testing on larger AD datasets is required to verify the effectiveness of the proposed approach.