916 resultados para Data Mining and its Application
Resumo:
The kinetic parameters of the pyrolysis of miscanthus and its acid hydrolysis residue (AHR) were determined using thermogravimetric analysis (TGA). The AHR was produced at the University of Limerick by treating miscanthus with 5 wt.% sulphuric acid at 175 °C as representative of a lignocellulosic acid hydrolysis product. For the TGA experiments, 3 to 6 g of sample, milled and sieved to a particle size below 250 μm, were placed in the TGA ceramic crucible. The experiments were carried out under non-isothermal conditions heating the samples from 50 to 900 °C at heating rates of 2.5, 5, 10, 17 and 25 °C/min. The activation energy (EA) of the decomposition process was determined from the TGA data by differential analysis (Friedman) and three isoconversional methods of integral analysis (Kissinger–Akahira–Sunose, Ozawa–Flynn–Wall, Vyazovkin). The activation energy ranged from 129 to 156 kJ/mol for miscanthus and from 200 to 376 kJ/mol for AHR increasing with increasing conversion. The reaction model was selected using the non-linear least squares method and the pre-exponential factor was calculated from the Arrhenius approximation. The results showed that the best fitting reaction model was the third order reaction for both feedstocks. The pre-exponential factor was in the range of 5.6 × 1010 to 3.9 × 10+ 13 min− 1 for miscanthus and 2.1 × 1016 to 7.7 × 1025 min− 1 for AHR.
Resumo:
* The work is partially supported by Grant no. NIP917 of the Ministry of Science and Education – Republic of Bulgaria.
Resumo:
The concept of knowledge is the central one used when solving the various problems of data mining and pattern recognition in finite spaces of Boolean or multi-valued attributes. A special form of knowledge representation, called implicative regularities, is proposed for applying in two powerful tools of modern logic: the inductive inference and the deductive inference. The first one is used for extracting the knowledge from the data. The second is applied when the knowledge is used for calculation of the goal attribute values. A set of efficient algorithms was developed for that, dealing with Boolean functions and finite predicates represented by logical vectors and matrices.
Resumo:
This chapter provides the theoretical foundation and background on Data Envelopment Analysis (DEA) method and some variants of basic DEA models and applications to various sectors. Some illustrative examples, helpful resources on DEA, including DEA software package, are also presented in this chapter. DEA is useful for measuring relative efficiency for variety of institutions and has its own merits and limitations. This chapter concludes that DEA results should be interpreted with much caution to avoid giving wrong signals and providing inappropriate recommendations.
Resumo:
Dimensionality reduction is a very important step in the data mining process. In this paper, we consider feature extraction for classification tasks as a technique to overcome problems occurring because of “the curse of dimensionality”. Three different eigenvector-based feature extraction approaches are discussed and three different kinds of applications with respect to classification tasks are considered. The summary of obtained results concerning the accuracy of classification schemes is presented with the conclusion about the search for the most appropriate feature extraction method. The problem how to discover knowledge needed to integrate the feature extraction and classification processes is stated. A decision support system to aid in the integration of the feature extraction and classification processes is proposed. The goals and requirements set for the decision support system and its basic structure are defined. The means of knowledge acquisition needed to build up the proposed system are considered.
Resumo:
AMS Subj. Classification: 62P10, 62H30, 68T01
Resumo:
Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to be analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss Preprocessing, Standardization, Imputation, Projection, Regularization, Penalization, Compression, Reduction, Selection, Kernelization, Hybridization, Parallelization, Aggregation, Randomization, Replication, Sequentialization. Indeed, it is important to emphasize right away that the so-called no free lunch theorem applies here, in the sense that there is no universally superior method that outperforms all other methods on all categories of bigness. It is also important to stress the fact that simplicity in the sense of Ockham’s razor non-plurality principle of parsimony tends to reign supreme when it comes to massive data. We conclude with a comparison of the predictive performance of some of the most commonly used methods on a few data sets.
Resumo:
The focus of this study is on the governance decisions in a concurrent channels context, in the case of uncertainty. The study examines how a firm chooses to deploy its sales force in times of uncertainty, and the subsequent performance outcome of those deployment choices. The theoretical framework is based on multiple theories of governance, including transaction cost analysis (TCA), agency theory, and institutional economics. Three uncertainty variables are investigated in this study. The first two are demand and competitive uncertainty which are considered to be industry-level market uncertainty forms. The third uncertainty, political uncertainty, is chosen as it is an important dimension of institutional environments, capturing non-economic circumstances such as regulations and political systemic issues. The study employs longitudinal secondary data from a Thai hotel chain, comprising monthly observations from January 2007 – December 2012. This hotel chain has its operations in 4 countries, Thailand, the Philippines, United Arab Emirates – Dubai, and Egypt, all of which experienced substantial demand, competitive, and political uncertainty during the study period. This makes them ideal contexts for this study. Two econometric models, both deploying Newey-West estimations, are employed to test 13 hypotheses. The first model considers the relationship between uncertainty and governance. The second model is a version of Newey-West, using an Instrumental Variables (IV) estimator and a Two-Stage Least Squares model (2SLS), to test the direct effect of uncertainty on performance and the moderating effect of governance on the relationship between uncertainty and performance. The observed relationship between uncertainty and governance observed follows a core prediction of TCA; that vertical integration is the preferred choice of governance when uncertainty rises. As for the subsequent performance outcomes, the results corroborate that uncertainty has a negative effect on performance. Importantly, the findings show that becoming more vertically integrated cannot help moderate the effect of demand and competitive uncertainty, but can significantly moderate the effect of political uncertainty. These findings have significant theoretical and practical implications, and extend our knowledge of the impact on uncertainty significantly, as well as bringing an institutional perspective to TCA. Further, they offer managers novel insight into the nature of different types of uncertainty, their impact on performance, and how channel decisions can mitigate these impacts.
Resumo:
The availability of regular supply has been identified as one of the major stimulants for the growth and development of any nation and is thus important for the economic well-being of a nation. The problems of the Nigerian power sector stems from a lot of factors culminating in her slow developmental growth and inability to meet the power demands of her citizens regardless of the abundance of human and natural resources prevalent in the nation. The research therefore had the main aim of investigating the importance and contributions of risk management to the success of projects specific to the power sector. To achieve this aim it was pertinent to examine the efficacy of risk management process in practice and elucidate the various risks typically associated with projects (Construction, Contractual, Political, Financial, Design, Human resource and Environmental risk factors) in the power sector as well as determine the current situation of risk management practice in Nigeria. To address this factors inhibiting the proficiency of the overarching and prevailing issue which have only been subject to limited in-depth academic research, a rigorous mixed research method was adopted (quantitative and qualitative data analysis). A review of the Nigeria power sector was also carried out as a precursor to the data collection stage. Using purposive sampling technique, respondents were identified and a questionnaire survey was administered. The research hypotheses were tested using inferential statistics (Pearson correlation, Chi-square test, t-test and ANOVA technique) and the findings revealed the need for the development of a new risk management implementation Framework. The proposed Framework was tested within a company project, for interpreting the dynamism and essential benefits of risk management with the aim of improving the project performances (time), reducing the level of fragmentation (quality) and improving profitability (cost) within the Nigerian power sector in order to bridge a gap between theory and practice. It was concluded that Nigeria’s poor risk management practices have prevented it from experiencing strong growth and development. The study however, concludes that the successful implementation of the developed risk management framework may help it to attain this status by enabling it to become more prepared and flexible, to face challenges that previously led to project failures, and thus contributing to its prosperity. The research study provides an original contribution theoretically, methodologically and practically which adds to the project risk management body of knowledge and to the Nigerian power sector.
Resumo:
In the years 2002, 2003 and 2004 we collected samples of macroinvertebrates on a total of 36 occasions in Badacsony bay, in areas of open water (in the years 2003 and 2004 reed-grassy) as well as populated by reed (Phragmites australis) and cattail (Typha angustifolia). Samples were taken using a stiff hand net. The sampling site includes three microhabitats differentiated only by the aquatic plants inhabiting these areas. Our data was gathered from processing 208 individual samples. The quantity of macroinvertebrates is represented by biovolume value based on volume estimates. We can identify taxa in abundant numbers found in all water types and ooze; as well as groups associated with individual microhabitats with various aquatic plants. We can observe a notable difference between the years in the volume of invertebrate macrofauna caused by the drop of water level, and the multiplication of submerged macrophytes. There are smaller differences between the samples taken in reeds and cattail stands. In the second half of 2003 – which was a year of drought – the Najas marina appeared in open waters and allowed to support larger quantities of macroinvertebrates. In 2004 with higher water levels, the Potamogeton perfoliatus occurring in the same area has had an even more significant effect. This type of reed-grass may support the most macroinvertebrates during the summer. From the aspect of diversity relations we may suspect different characteristics. The reeds sampling site proved to be the richest, while the cattail microhabitat is close behind, open water (with submerged macrophytes) is the least diverse microhabitat.
Resumo:
The nation's freeway systems are becoming increasingly congested. A major contribution to traffic congestion on freeways is due to traffic incidents. Traffic incidents are non-recurring events such as accidents or stranded vehicles that cause a temporary roadway capacity reduction, and they can account for as much as 60 percent of all traffic congestion on freeways. One major freeway incident management strategy involves diverting traffic to avoid incident locations by relaying timely information through Intelligent Transportation Systems (ITS) devices such as dynamic message signs or real-time traveler information systems. The decision to divert traffic depends foremost on the expected duration of an incident, which is difficult to predict. In addition, the duration of an incident is affected by many contributing factors. Determining and understanding these factors can help the process of identifying and developing better strategies to reduce incident durations and alleviate traffic congestion. A number of research studies have attempted to develop models to predict incident durations, yet with limited success. ^ This dissertation research attempts to improve on this previous effort by applying data mining techniques to a comprehensive incident database maintained by the District 4 ITS Office of the Florida Department of Transportation (FDOT). Two categories of incident duration prediction models were developed: "offline" models designed for use in the performance evaluation of incident management programs, and "online" models for real-time prediction of incident duration to aid in the decision making of traffic diversion in the event of an ongoing incident. Multiple data mining analysis techniques were applied and evaluated in the research. The multiple linear regression analysis and decision tree based method were applied to develop the offline models, and the rule-based method and a tree algorithm called M5P were used to develop the online models. ^ The results show that the models in general can achieve high prediction accuracy within acceptable time intervals of the actual durations. The research also identifies some new contributing factors that have not been examined in past studies. As part of the research effort, software code was developed to implement the models in the existing software system of District 4 FDOT for actual applications. ^
Resumo:
Ecotourism, a new term for low-impact nature travel, is receiving increasing attention. The author has researched the development of the U.S. ecotourism market from 1980-1989 in order to obtain data on the growth of this market segment. Factors involved in the growth of the U.S. ecotourism market are then examined in order to project the growth of this maeket during the 1990's.
Resumo:
Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.
Resumo:
Innovation is a fundamental part of social work. In recent years there has been a shift in the innovation paradigm, making it easier to accept this relationship. National and supranational policies aimed at promoting innovation appear to be specifically guided by this idea. To be able to affirm this hypothesis, it is necessary to review the perception that social workers have of their duties. It is also useful to examine particular cases that show how such social innovation arises.
Resumo:
Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package.