856 resultados para Data Driven Clustering
Resumo:
The dependence of the resistivity with changing diameter of heavily-doped self-seeded germanium nanowires was studied for the diameter range 40 to 11 nm. The experimental data reveal an initial strong reduction of the resistivity with diameter decrease. At about 20 nm a region of slowly varying resistivity emerges with a peak feature around 14 nm. For diameters above 20 nm, nanowires were found to be describable by classical means. For smaller diameters a quantum-based approach was required where we employed the 1D Kubo–Greenwood framework and also revealed the dominant charge carriers to be heavy holes. For both regimes the theoretical results and experimental data agree qualitatively well assuming a spatial spreading of the free holes towards the nanowire centre upon diameter reduction.
Resumo:
Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.
Resumo:
The dominant model of atmospheric circulation posits that hot air rises, creating horizontal winds. A second major driver has recently been proposed by Makarieva and Gorshkov in their biotic pump theory (BPT), which suggests that evapotranspiration from natural closed-canopy forests causes intense condensation, and hence winds from ocean to land. Critics of the BPT argue that air movement to fill the partial vacuum caused by condensation is always isotropic, and therefore causes no net air movement (Bunyard, 2015, hdl:11232/397). This paper explores the physics of water condensation under mild atmospheric conditions, within a purpose-designed square-section 4.8 m-tall closed-system structure. Two enclosed vertical columns are connected at top and bottom by two horizontal tunnels, around which 19.5 m**3 of atmospheric air can circulate freely, allowing rotary airflows in either direction. This air can be cooled and/or warmed by refrigeration pipes and a heating mat, and changes in airflow, temperature, humidity and barometric pressure measured in real time. The study investigates whether the "hot-air-rises" or an implosive condensation model can better explain the results of more than 100 experiments. The data show a highly significant correlation (R2 >0.96, p value <0.001) between observed airflows and partial pressure changes from condensation. While the kinetic energy of the refrigerated air falls short of that required in bringing about observed airflows by a factor of at least 30, less than a tenth of the potential kinetic energy from condensation is shown to be sufficient. The assumption that condensation of water vapour is always isotropic is therefore incorrect. Condensation can be anisotropic, and in the laboratory does cause sustained airflow.
Resumo:
Owing to their important roles in biogeochemical cycles, phytoplankton functional types (PFTs) have been the aim of an increasing number of ocean color algorithms. Yet, none of the existing methods are based on phytoplankton carbon (C) biomass, which is a fundamental biogeochemical and ecological variable and the "unit of accounting" in Earth system models. We present a novel bio-optical algorithm to retrieve size-partitioned phytoplankton carbon from ocean color satellite data. The algorithm is based on existing methods to estimate particle volume from a power-law particle size distribution (PSD). Volume is converted to carbon concentrations using a compilation of allometric relationships. We quantify absolute and fractional biomass in three PFTs based on size - picophytoplankton (0.5-2 µm in diameter), nanophytoplankton (2-20 µm) and microphytoplankton (20-50 µm). The mean spatial distributions of total phytoplankton C biomass and individual PFTs, derived from global SeaWiFS monthly ocean color data, are consistent with current understanding of oceanic ecosystems, i.e., oligotrophic regions are characterized by low biomass and dominance of picoplankton, whereas eutrophic regions have high biomass to which nanoplankton and microplankton contribute relatively larger fractions. Global climatological, spatially integrated phytoplankton carbon biomass standing stock estimates using our PSD-based approach yield - 0.25 Gt of C, consistent with analogous estimates from two other ocean color algorithms and several state-of-the-art Earth system models. Satisfactory in situ closure observed between PSD and POC measurements lends support to the theoretical basis of the PSD-based algorithm. Uncertainty budget analyses indicate that absolute carbon concentration uncertainties are driven by the PSD parameter No which determines particle number concentration to first order, while uncertainties in PFTs' fractional contributions to total C biomass are mostly due to the allometric coefficients. The C algorithm presented here, which is not empirically constrained a priori, partitions biomass in size classes and introduces improvement over the assumptions of the other approaches. However, the range of phytoplankton C biomass spatial variability globally is larger than estimated by any other models considered here, which suggests an empirical correction to the No parameter is needed, based on PSD validation statistics. These corrected absolute carbon biomass concentrations validate well against in situ POC observations.
Resumo:
Upwelling intensity in the South China Sea has changed over glacial-interglacial cycles in response to orbital-scale changes in the East Asian Monsoon. Here, we evaluate new multi-proxy records of two sediment cores from the north-eastern South China Sea to uncover millennial-scale changes in winter monsoondriven upwelling over glacial Terminations I and II. On the basis of U/Th-based speleothem chronology, we compare these changes with sediment records of summer monsoondriven upwelling east of South Vietnam. Ocean upwelling is traced by reduced (UK'37-based) temperature and increased nutrient and productivity estimates of sea surface water (d13C on planktic foraminifera, accumulation rates of alkenones, chlorins, and total organic carbon). Accordingly, strong winter upwelling occurred north-west of Luzon (Philippines) during late Marine Isotope Stage 6.2, Heinrich (HS) and Greenland stadials (GS) HS-11, GS-26, GS-25, HS-1, and the Younger Dryas. During these stadials, summer upwelling decreased off South Vietnam and sea surface salinity reached a maximum suggesting a drop in monsoon rains, concurrent with speleothem records of aridity in China. In harmony with a stadial-to-interstadial see-saw pattern, winter upwelling off Luzon in turn was weak during interstadials, in particular those of glacial Terminations I and II, when summer upwelling culminated east of South Vietnam. Most likely, this upwelling terminated widespread deep-water stratification, coeval with the deglacial rise in atmospheric CO2. Yet, a synchronous maximum in precipitation fostered estuarine overturning circulation in the South China Sea, in particular as long as the Borneo Strait was closed when sea level dropped below -40 m.
Resumo:
The Bering Sea is one of the most biologically productive regions in the marine system and plays a key role in regulating the flow of waters to the Arctic Ocean and into the subarctic North Pacific Ocean. Cores from Integrated Ocean Drilling Program (IODP) Expedition 323 to the Bering Sea provide the first opportunity to obtain reconstructions from the region that extend back to the Pliocene. Previous research at Bowers Ridge, south Bering Sea, has revealed stable levels of siliceous productivity over the onset of major Northern Hemisphere Glaciation (NHG) (circa 2.85-2.73 Ma). However, diatom silica isotope records of oxygen (d18Odiatom) and silicon (d30Sidiatom) presented here demonstrate that this interval was associated with a progressive increase in the supply of silicic acid to the region, superimposed on shift to a more dynamic environment characterized by colder temperatures and increased sea ice. This concluded at 2.58 Ma with a sharp increase in diatom productivity, further increases in photic zone nutrient availability and a permanent shift to colder sea surface conditions. These transitions are suggested to reflect a gradually more intense nutrient leakage from the subarctic northwest Pacific Ocean, with increases in productivity further aided by increased sea ice- and wind-driven mixing in the Bering Sea. In suggesting a linkage in biogeochemical cycling between the south Bering Sea and subarctic Northwest Pacific Ocean, mainly via the Kamchatka Strait, this work highlights the need to consider the interconnectivity of these two systems when future reconstructions are carried out in the region.
Resumo:
Visual cluster analysis provides valuable tools that help analysts to understand large data sets in terms of representative clusters and relationships thereof. Often, the found clusters are to be understood in context of belonging categorical, numerical or textual metadata which are given for the data elements. While often not part of the clustering process, such metadata play an important role and need to be considered during the interactive cluster exploration process. Traditionally, linked-views allow to relate (or loosely speaking: correlate) clusters with metadata or other properties of the underlying cluster data. Manually inspecting the distribution of metadata for each cluster in a linked-view approach is tedious, specially for large data sets, where a large search problem arises. Fully interactive search for potentially useful or interesting cluster to metadata relationships may constitute a cumbersome and long process. To remedy this problem, we propose a novel approach for guiding users in discovering interesting relationships between clusters and associated metadata. Its goal is to guide the analyst through the potentially huge search space. We focus in our work on metadata of categorical type, which can be summarized for a cluster in form of a histogram. We start from a given visual cluster representation, and compute certain measures of interestingness defined on the distribution of metadata categories for the clusters. These measures are used to automatically score and rank the clusters for potential interestingness regarding the distribution of categorical metadata. Identified interesting relationships are highlighted in the visual cluster representation for easy inspection by the user. We present a system implementing an encompassing, yet extensible, set of interestingness scores for categorical metadata, which can also be extended to numerical metadata. Appropriate visual representations are provided for showing the visual correlations, as well as the calculated ranking scores. Focusing on clusters of time series data, we test our approach on a large real-world data set of time-oriented scientific research data, demonstrating how specific interesting views are automatically identified, supporting the analyst discovering interesting and visually understandable relationships.
Resumo:
Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.
Resumo:
Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.
Resumo:
Anthropogenic climate change is causing unprecedented rapid responses in marine communities, with species across many different taxonomic groups showing faster shifts in biogeographic ranges than in any other ecosystem. Spatial and temporal trends for many marine species are difficult to quantify, however, due to the lack of long-term datasets across complete geographical distributions and the occurrence of small-scale variability from both natural and anthropogenic drivers. Understanding these changes requires a multidisciplinary approach to bring together patterns identified within long-term datasets and the processes driving those patterns using biologically relevant mechanistic information to accurately attribute cause and effect. This must include likely future biological responses, and detection of the underlying mechanisms in order to scale up from the organismal level to determine how communities and ecosystems are likely to respond across a range of future climate change scenarios. Using this multidisciplinary approach will improve the use of robust science to inform the development of fit-for-purpose policy to effectively manage marine environments in this rapidly changing world.
Resumo:
Anthropogenic climate change is causing unprecedented rapid responses in marine communities, with species across many different taxonomic groups showing faster shifts in biogeographic ranges than in any other ecosystem. Spatial and temporal trends for many marine species are difficult to quantify, however, due to the lack of long-term datasets across complete geographical distributions and the occurrence of small-scale variability from both natural and anthropogenic drivers. Understanding these changes requires a multidisciplinary approach to bring together patterns identified within long-term datasets and the processes driving those patterns using biologically relevant mechanistic information to accurately attribute cause and effect. This must include likely future biological responses, and detection of the underlying mechanisms in order to scale up from the organismal level to determine how communities and ecosystems are likely to respond across a range of future climate change scenarios. Using this multidisciplinary approach will improve the use of robust science to inform the development of fit-for-purpose policy to effectively manage marine environments in this rapidly changing world.
Resumo:
Clustering algorithms, pattern mining techniques and associated quality metrics emerged as reliable methods for modeling learners’ performance, comprehension and interaction in given educational scenarios. The specificity of available data such as missing values, extreme values or outliers, creates a challenge to extract significant user models from an educational perspective. In this paper we introduce a pattern detection mechanism with-in our data analytics tool based on k-means clustering and on SSE, silhouette, Dunn index and Xi-Beni index quality metrics. Experiments performed on a dataset obtained from our online e-learning platform show that the extracted interaction patterns were representative in classifying learners. Furthermore, the performed monitoring activities created a strong basis for generating automatic feedback to learners in terms of their course participation, while relying on their previous performance. In addition, our analysis introduces automatic triggers that highlight learners who will potentially fail the course, enabling tutors to take timely actions.
Resumo:
This paper reports on a study of a curricular intervention for pupils (age 10-13 years) in the UK aimed at supporting critical engagement with science based media reports. In particular the study focused on core elements of knowledge, skills and attitudes identified in previous studies that characterize critical consumers of science presented as news. This was an empirical study based on classroom observation. Data included responses from individual pupils, in addition video recording of group activity and intentional conversations between pupils and teachers were scrutinised. Analysis focused on core tasks relating to different elements of critical reading. Pupils demonstrated a grasp of questioning and evaluating text, however the capacity to translate this experience in support of a critical response to a media report with a science component is limited in assessing the credibility of text and as an element in critical reading.
Resumo:
The pathogenesis of diffuse large B-cell lymphoma (DLBCL) remains partially unknown. The analysis of the B-cell receptor of the malignant cells could contribute to a better understanding of the DLBCL biology. We studied the molecular features of the immunoglobulin heavy chain (IGH) rearrangements in 165 patients diagnosed with DLBCL not otherwise specified. Clonal IGH rearrangements were amplified according to the BIOMED-2 protocol and PCR products were sequenced directly. We also analyzed the criteria for stereotyped patterns in all complete IGHV-IGHD-IGHJ (V-D-J) sequences. Complete V-D-J rearrangements were identified in 130 of 165 patients. Most cases (89%) were highly mutated, but 12 sequences were truly unmutated or minimally mutated. Three genes, IGHV4-34, IGHV3-23, and IGHV4-39, accounted for one third of the whole cohort, including an overrepresentation of IGHV4-34 (15.5% overall). Interestingly, all IGHV4-34 rearrangements and all unmutated sequences belonged to the nongerminal center B-cell-like (non-GCB) subtype. Overall, we found three cases following the current criteria for stereotyped heavy chain VH CDR3 sequences, two of them belonging to subsets previously described in CLL. IGHV gene repertoire is remarkably biased, implying an antigen-driven origin in DLBCL. The particular features in the sequence of the immunoglobulins suggest the existence of particular subgroups within the non-GCB subtype.
Resumo:
Key Performance Indicators (KPIs) and their predictions are widely used by the enterprises for informed decision making. Nevertheless , a very important factor, which is generally overlooked, is that the top level strategic KPIs are actually driven by the operational level business processes. These two domains are, however, mostly segregated and analysed in silos with different Business Intelligence solutions. In this paper, we are proposing an approach for advanced Business Simulations, which converges the two domains by utilising process execution & business data, and concepts from Business Dynamics (BD) and Business Ontologies, to promote better system understanding and detailed KPI predictions. Our approach incorporates the automated creation of Causal Loop Diagrams, thus empowering the analyst to critically examine the complex dependencies hidden in the massive amounts of available enterprise data. We have further evaluated our proposed approach in the context of a retail use-case that involved verification of the automatically generated causal models by a domain expert.