825 resultados para tree structured business data
Resumo:
Key Performance Indicators (KPIs) and their predictions are widely used by the enterprises for informed decision making. Nevertheless , a very important factor, which is generally overlooked, is that the top level strategic KPIs are actually driven by the operational level business processes. These two domains are, however, mostly segregated and analysed in silos with different Business Intelligence solutions. In this paper, we are proposing an approach for advanced Business Simulations, which converges the two domains by utilising process execution & business data, and concepts from Business Dynamics (BD) and Business Ontologies, to promote better system understanding and detailed KPI predictions. Our approach incorporates the automated creation of Causal Loop Diagrams, thus empowering the analyst to critically examine the complex dependencies hidden in the massive amounts of available enterprise data. We have further evaluated our proposed approach in the context of a retail use-case that involved verification of the automatically generated causal models by a domain expert.
Resumo:
This paper aims to address the knowledge gap in regards to the potential intermediary role tertiary institutions can play in developing generic design thinking/design led innovation capabilities in non-designers. Specifically, it investigates the value derived from the contribution of postgraduate design students as facilitators/educators for undergraduate non-design student cohorts. It examines a design immersion workshop designed to encourage the use of design thinking capabilities for project brief development for undergraduate multi-disciplinary student teams involved in a community service learning project for a social enterprise. The workshop was facilitated by design led innovation masters students embedded in industry organisations to research the integration of design led innovation capabilities in business. Data was collected from participating non-design students and postgraduate facilitators’ in the form of reflective journals and semi-structured interviews. The thematic analysis provided insight into the value of design thinking/design led innovation immersion programs for both the postgraduate facilitators and the undergraduate non-design students. The research results will inform a tentative foundation prototype framework to allow for ongoing program developments and research in design thinking/design led innovation integration in higher education, facilitating the development of generic capabilities required to empower future generations for business innovation and active citizenship in the 21st century knowledge economy.
Resumo:
Extracting frequent subtrees from the tree structured data has important applications in Web mining. In this paper, we introduce a novel canonical form for rooted labelled unordered trees called the balanced-optimal-search canonical form (BOCF) that can handle the isomorphism problem efficiently. Using BOCF, we define a tree structure guided scheme based enumeration approach that systematically enumerates only the valid subtrees. Finally, we present the balanced optimal search tree miner (BOSTER) algorithm based on BOCF and the proposed enumeration approach, for finding frequent induced subtrees from a database of labelled rooted unordered trees. Experiments on the real datasets compare the efficiency of BOSTER over the two state-of-the-art algorithms for mining induced unordered subtrees, HybridTreeMiner and UNI3. The results are encouraging.
Resumo:
The main contribution of this work is to analyze and describe the state of the art performance as regards answer scoring systems from the SemEval- 2013 task, as well as to continue with the development of an answer scoring system (EHU-ALM) developed in the University of the Basque Country. On the overall this master thesis focuses on finding any possible configuration that lets improve the results in the SemEval dataset by using attribute engineering techniques in order to find optimal feature subsets, along with trying different hierarchical configurations in order to analyze its performance against the traditional one versus all approach. Altogether, throughout the work we propose two alternative strategies: on the one hand, to improve the EHU-ALM system without changing the architecture, and, on the other hand, to improve the system adapting it to an hierarchical con- figuration. To build such new models we describe and use distinct attribute engineering, data preprocessing, and machine learning techniques.
Resumo:
A fundamental problem in the analysis of structured relational data like graphs, networks, databases, and matrices is to extract a summary of the common structure underlying relations between individual entities. Relational data are typically encoded in the form of arrays; invariance to the ordering of rows and columns corresponds to exchangeable arrays. Results in probability theory due to Aldous, Hoover and Kallenberg show that exchangeable arrays can be represented in terms of a random measurable function which constitutes the natural model parameter in a Bayesian model. We obtain a flexible yet simple Bayesian nonparametric model by placing a Gaussian process prior on the parameter function. Efficient inference utilises elliptical slice sampling combined with a random sparse approximation to the Gaussian process. We demonstrate applications of the model to network data and clarify its relation to models in the literature, several of which emerge as special cases.
Resumo:
The fundamental aim of clustering algorithms is to partition data points. We consider tasks where the discovered partition is allowed to vary with some covariate such as space or time. One approach would be to use fragmentation-coagulation processes, but these, being Markov processes, are restricted to linear or tree structured covariate spaces. We define a partition-valued process on an arbitrary covariate space using Gaussian processes. We use the process to construct a multitask clustering model which partitions datapoints in a similar way across multiple data sources, and a time series model of network data which allows cluster assignments to vary over time. We describe sampling algorithms for inference and apply our method to defining cancer subtypes based on different types of cellular characteristics, finding regulatory modules from gene expression data from multiple human populations, and discovering time varying community structure in a social network.
Resumo:
New representations of tree-structured data objects, using ideas from topological data analysis, enable improved statistical analyses of a population of brain artery trees. A number of representations of each data tree arise from persistence diagrams that quantify branching and looping of vessels at multiple scales. Novel approaches to the statistical analysis, through various summaries of the persistence diagrams, lead to heightened correlations with covariates such as age and sex, relative to earlier analyses of this data set. The correlation with age continues to be significant even after controlling for correlations from earlier significant summaries.
Resumo:
Many Web applications walk the thin line between the need for dynamic data and the need to meet user performance expectations. In environments where funds are not available to constantly upgrade hardware inline with user demand, alternative approaches need to be considered. This paper introduces a ‘Data farming’ model whereby dynamic data, which is ‘grown’ in operational applications, is ‘harvested’ and ‘packaged’ for various consumer markets. Like any well managed agricultural operation, crops are harvested according to historical and perceived demand as inferred by a self-optimising process. This approach aims to make enhanced use of available resources through better utlilisation of system downtime - thereby improving application performance and increasing the availability of key business data.
Resumo:
Frequent pattern discovery in structured data is receiving an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset.
Resumo:
The debate associated with the qualifications of business school faculty has raged since the 1959 release of the Gordon–Howell and Pierson reports, which encouraged business schools in the USA to enhance their legitimacy by increasing their faculties’ doctoral qualifications and scholarly rigor. Today, the legitimacy of specific faculty qualifications remains one of the most discussed topics in management education, attracting the interest of administrators, faculty, and accreditation agencies. Based on new institutional theory and the institutional logics perspective, this paper examines convergence and innovation in business schools through an analysis of faculty hiring criteria. The qualifications examined are academic degree, scholarly publications, teaching experience, and professional experience. Three groups of schools are examined based on type of university, position within a media ranking system, and accreditation by the Association to Advance Collegiate Schools of Business. Data are gathered using a content analysis of 441 faculty postings from business schools based in the USA over two time periods. Contrary to claims of global convergence, we find most qualifications still vary by group, even in the mature US market. Moreover, innovative hiring is more likely to be found in non-elite schools.
Resumo:
When designing metaheuristic optimization methods, there is a trade-off between application range and effectiveness. For large real-world instances of combinatorial optimization problems out-of-the-box metaheuristics often fail, and optimization methods need to be adapted to the problem at hand. Knowledge about the structure of high-quality solutions can be exploited by introducing a so called bias into one of the components of the metaheuristic used. These problem-specific adaptations allow to increase search performance. This thesis analyzes the characteristics of high-quality solutions for three constrained spanning tree problems: the optimal communication spanning tree problem, the quadratic minimum spanning tree problem and the bounded diameter minimum spanning tree problem. Several relevant tree properties, that should be explored when analyzing a constrained spanning tree problem, are identified. Based on the gained insights on the structure of high-quality solutions, efficient and robust solution approaches are designed for each of the three problems. Experimental studies analyze the performance of the developed approaches compared to the current state-of-the-art.
Resumo:
Accurate seasonal to interannual streamflow forecasts based on climate information are critical for optimal management and operation of water resources systems. Considering most water supply systems are multipurpose, operating these systems to meet increasing demand under the growing stresses of climate variability and climate change, population and economic growth, and environmental concerns could be very challenging. This study was to investigate improvement in water resources systems management through the use of seasonal climate forecasts. Hydrological persistence (streamflow and precipitation) and large-scale recurrent oceanic-atmospheric patterns such as the El Niño/Southern Oscillation (ENSO), Pacific Decadal Oscillation (PDO), North Atlantic Oscillation (NAO), the Atlantic Multidecadal Oscillation (AMO), the Pacific North American (PNA), and customized sea surface temperature (SST) indices were investigated for their potential to improve streamflow forecast accuracy and increase forecast lead-time in a river basin in central Texas. First, an ordinal polytomous logistic regression approach is proposed as a means of incorporating multiple predictor variables into a probabilistic forecast model. Forecast performance is assessed through a cross-validation procedure, using distributions-oriented metrics, and implications for decision making are discussed. Results indicate that, of the predictors evaluated, only hydrologic persistence and Pacific Ocean sea surface temperature patterns associated with ENSO and PDO provide forecasts which are statistically better than climatology. Secondly, a class of data mining techniques, known as tree-structured models, is investigated to address the nonlinear dynamics of climate teleconnections and screen promising probabilistic streamflow forecast models for river-reservoir systems. Results show that the tree-structured models can effectively capture the nonlinear features hidden in the data. Skill scores of probabilistic forecasts generated by both classification trees and logistic regression trees indicate that seasonal inflows throughout the system can be predicted with sufficient accuracy to improve water management, especially in the winter and spring seasons in central Texas. Lastly, a simplified two-stage stochastic economic-optimization model was proposed to investigate improvement in water use efficiency and the potential value of using seasonal forecasts, under the assumption of optimal decision making under uncertainty. Model results demonstrate that incorporating the probabilistic inflow forecasts into the optimization model can provide a significant improvement in seasonal water contract benefits over climatology, with lower average deficits (increased reliability) for a given average contract amount, or improved mean contract benefits for a given level of reliability compared to climatology. The results also illustrate the trade-off between the expected contract amount and reliability, i.e., larger contracts can be signed at greater risk.
Resumo:
Northwestern North America has one of the highest rates of recent temperature increase in the world, but the putative “divergence problem” in dendroclimatology potentially limits the ability of tree-ring proxy data at high latitudes to provide long-term context for current anthropogenic change. Here, summer temperatures are reconstructed from a Picea glauca maximum latewood density (MXD) chronology that shows a stable relationship to regional temperatures and spans most of the last millennium at the Firth River in northeastern Alaska. The warmest epoch in the last nine centuries is estimated to have occurred during the late twentieth century, with average temperatures over the last 30 yr of the reconstruction developed for this study [1973–2002 in the Common Era (CE)] approximately 1.3° ± 0.4°C warmer than the long-term preindustrial mean (1100–1850 CE), a change associated with rapid increases in greenhouse gases. Prior to the late twentieth century, multidecadal temperature fluctuations covary broadly with changes in natural radiative forcing. The findings presented here emphasize that tree-ring proxies can provide reliable indicators of temperature variability even in a rapidly warming climate.
Resumo:
(1) A mathematical theory for computing the probabilities of various nucleotide configurations is developed, and the probability of obtaining the correct phylogenetic tree (model tree) from sequence data is evaluated for six phylogenetic tree-making methods (UPGMA, distance Wagner method, transformed distance method, Fitch-Margoliash's method, maximum parsimony method, and compatibility method). The number of nucleotides (m*) necessary to obtain the correct tree with a probability of 95% is estimated with special reference to the human, chimpanzee, and gorilla divergence. m* is at least 4,200, but the availability of outgroup species greatly reduces m* for all methods except UPGMA. m* increases if transitions occur more frequently than transversions as in the case of mitochondrial DNA. (2) A new tree-making method called the neighbor-joining method is proposed. This method is applicable either for distance data or character state data. Computer simulation has shown that the neighbor-joining method is generally better than UPGMA, Farris' method, Li's method, and modified Farris method on recovering the true topology when distance data are used. A related method, the simultaneous partitioning method, is also discussed. (3) The maximum likelihood (ML) method for phylogeny reconstruction under the assumption of both constant and varying evolutionary rates is studied, and a new algorithm for obtaining the ML tree is presented. This method gives a tree similar to that obtained by UPGMA when constant evolutionary rate is assumed, whereas it gives a tree similar to that obtained by the maximum parsimony tree and the neighbor-joining method when varying evolutionary rate is assumed. ^
Resumo:
Hereditary nonpolyposis colorectal cancer (HNPCC) is an autosomal dominant disease caused by germline mutations in DNA mismatch repair(MMR) genes. The nucleotide excision repair(NER) pathway plays a very important role in cancer development. We systematically studied interactions between NER and MMR genes to identify NER gene single nucleotide polymorphism (SNP) risk factors that modify the effect of MMR mutations on risk for cancer in HNPCC. We analyzed data from polymorphisms in 10 NER genes that had been genotyped in HNPCC patients that carry MSH2 and MLH1 gene mutations. The influence of the NER gene SNPs on time to onset of colorectal cancer (CRC) was assessed using survival analysis and a semiparametric proportional hazard model. We found the median age of onset for CRC among MMR mutation carriers with the ERCC1 mutation was 3.9 years earlier than patients with wildtype ERCC1(median 47.7 vs 51.6, log-rank test p=0.035). The influence of Rad23B A249V SNP on age of onset of HNPCC is age dependent (likelihood ratio test p=0.0056). Interestingly, using the likelihood ratio test, we also found evidence of genetic interactions between the MMR gene mutations and SNPs in ERCC1 gene(C8092A) and XPG/ERCC5 gene(D1104H) with p-values of 0.004 and 0.042, respectively. An assessment using tree structured survival analysis (TSSA) showed distinct gene interactions in MLH1 mutation carriers and MSH2 mutation carriers. ERCC1 SNP genotypes greatly modified the age onset of HNPCC in MSH2 mutation carriers, while no effect was detected in MLH1 mutation carriers. Given the NER genes in this study play different roles in NER pathway, they may have distinct influences on the development of HNPCC. The findings of this study are very important for elucidation of the molecular mechanism of colon cancer development and for understanding why some mutation carriers of the MSH2 and MLH1 gene develop CRC early and others never develop CRC. Overall, the findings also have important implications for the development of early detection strategies and prevention as well as understanding the mechanism of colorectal carcinogenesis in HNPCC. ^