916 resultados para Data Mining and its Application
Resumo:
The exponential growth of studies on the biological response to ocean acidification over the last few decades has generated a large amount of data. To facilitate data comparison, a data compilation hosted at the data publisher PANGAEA was initiated in 2008 and is updated on a regular basis (doi:10.1594/PANGAEA.149999). By January 2015, a total of 581 data sets (over 4 000 000 data points) from 539 papers had been archived. Here we present the developments of this data compilation five years since its first description by Nisumaa et al. (2010). Most of study sites from which data archived are still in the Northern Hemisphere and the number of archived data from studies from the Southern Hemisphere and polar oceans are still relatively low. Data from 60 studies that investigated the response of a mix of organisms or natural communities were all added after 2010, indicating a welcomed shift from the study of individual organisms to communities and ecosystems. The initial imbalance of considerably more data archived on calcification and primary production than on other processes has improved. There is also a clear tendency towards more data archived from multifactorial studies after 2010. For easier and more effective access to ocean acidification data, the ocean acidification community is strongly encouraged to contribute to the data archiving effort, and help develop standard vocabularies describing the variables and define best practices for archiving ocean acidification data.
Resumo:
Data mining, as a heatedly discussed term, has been studied in various fields. Its possibilities in refining the decision-making process, realizing potential patterns and creating valuable knowledge have won attention of scholars and practitioners. However, there are less studies intending to combine data mining and libraries where data generation occurs all the time. Therefore, this thesis plans to fill such a gap. Meanwhile, potential opportunities created by data mining are explored to enhance one of the most important elements of libraries: reference service. In order to thoroughly demonstrate the feasibility and applicability of data mining, literature is reviewed to establish a critical understanding of data mining in libraries and attain the current status of library reference service. The result of the literature review indicates that free online data resources other than data generated on social media are rarely considered to be applied in current library data mining mandates. Therefore, the result of the literature review motivates the presented study to utilize online free resources. Furthermore, the natural match between data mining and libraries is established. The natural match is explained by emphasizing the data richness reality and considering data mining as one kind of knowledge, an easy choice for libraries, and a wise method to overcome reference service challenges. The natural match, especially the aspect that data mining could be helpful for library reference service, lays the main theoretical foundation for the empirical work in this study. Turku Main Library was selected as the case to answer the research question: whether data mining is feasible and applicable for reference service improvement. In this case, the daily visit from 2009 to 2015 in Turku Main Library is considered as the resource for data mining. In addition, corresponding weather conditions are collected from Weather Underground, which is totally free online. Before officially being analyzed, the collected dataset is cleansed and preprocessed in order to ensure the quality of data mining. Multiple regression analysis is employed to mine the final dataset. Hourly visits are the independent variable and weather conditions, Discomfort Index and seven days in a week are dependent variables. In the end, four models in different seasons are established to predict visiting situations in each season. Patterns are realized in different seasons and implications are created based on the discovered patterns. In addition, library-climate points are generated by a clustering method, which simplifies the process for librarians using weather data to forecast library visiting situation. Then the data mining result is interpreted from the perspective of improving reference service. After this data mining work, the result of the case study is presented to librarians so as to collect professional opinions regarding the possibility of employing data mining to improve reference services. In the end, positive opinions are collected, which implies that it is feasible to utilizing data mining as a tool to enhance library reference service.
Resumo:
The incredible rapid development to huge volumes of air travel, mainly because of jet airliners that appeared to the sky in the 1950s, created the need for systematic research for aviation safety and collecting data about air traffic. The structured data can be analysed easily using queries from databases and running theseresults through graphic tools. However, in analysing narratives that often give more accurate information about the case, mining tools are needed. The analysis of textual data with computers has not been possible until data mining tools have been developed. Their use, at least among aviation, is still at a moderate level. The research aims at discovering lethal trends in the flight safety reports. The narratives of 1,200 flight safety reports from years 1994 – 1996 in Finnish were processed with three text mining tools. One of them was totally language independent, the other had a specific configuration for Finnish and the third originally created for English, but encouraging results had been achieved with Spanish and that is why a Finnish test was undertaken, too. The global rate of accidents is stabilising and the situation can now be regarded as satisfactory, but because of the growth in air traffic, the absolute number of fatal accidents per year might increase, if the flight safety will not be improved. The collection of data and reporting systems have reached their top level. The focal point in increasing the flight safety is analysis. The air traffic has generally been forecasted to grow 5 – 6 per cent annually over the next two decades. During this period, the global air travel will probably double also with relatively conservative expectations of economic growth. This development makes the airline management confront growing pressure due to increasing competition, signify cant rise in fuel prices and the need to reduce the incident rate due to expected growth in air traffic volumes. All this emphasises the urgent need for new tools and methods. All systems provided encouraging results, as well as proved challenges still to be won. Flight safety can be improved through the development and utilisation of sophisticated analysis tools and methods, like data mining, using its results supporting the decision process of the executives.
Resumo:
The parameter setting of a differential evolution algorithm must meet several requirements: efficiency, effectiveness, and reliability. Problems vary. The solution of a particular problem can be represented in different ways. An algorithm most efficient in dealing with a particular representation may be less efficient in dealing with other representations. The development of differential evolution-based methods contributes substantially to research on evolutionary computing and global optimization in general. The objective of this study is to investigatethe differential evolution algorithm, the intelligent adjustment of its controlparameters, and its application. In the thesis, the differential evolution algorithm is first examined using different parameter settings and test functions. Fuzzy control is then employed to make control parameters adaptive based on an optimization process and expert knowledge. The developed algorithms are applied to training radial basis function networks for function approximation with possible variables including centers, widths, and weights of basis functions and both having control parameters kept fixed and adjusted by fuzzy controller. After the influence of control variables on the performance of the differential evolution algorithm was explored, an adaptive version of the differential evolution algorithm was developed and the differential evolution-based radial basis function network training approaches were proposed. Experimental results showed that the performance of the differential evolution algorithm is sensitive to parameter setting, and the best setting was found to be problem dependent. The fuzzy adaptive differential evolution algorithm releases the user load of parameter setting and performs better than those using all fixedparameters. Differential evolution-based approaches are effective for training Gaussian radial basis function networks.
Resumo:
This master thesis work introduces the fuzzy tolerance/equivalence relation and its application in cluster analysis. The work presents about the construction of fuzzy equivalence relations using increasing generators. Here, we investigate and research on the role of increasing generators for the creation of intersection, union and complement operators. The objective is to develop different varieties of fuzzy tolerance/equivalence relations using different varieties of increasing generators. At last, we perform a comparative study with these developed varieties of fuzzy tolerance/equivalence relations in their application to a clustering method.
Resumo:
Embora o objectivo de redução de acidentes laborais seja frequentemente invocado para justificar uma aplicação preventiva de testes de álcool e drogas no trabalho, há poucas evidências estatisticamente relevantes das pressupostas causalidade e correlação negativa entre a sujeição aos testes e os posteriores acidentes. Os dados de testes e dos acidentes ocorridos com os colaboradores de uma transportadora ferroviária portuguesa de âmbito nacional, durante anos recentes, começam agora a ser explorados, em busca de relações entre estas e outras variáveis biográficas. - Although the aim of reducing occupational accidents is frequently cited to justify preventive drug and alcohol testing at work, there is little statistically significant evidence of the assumed causality and negative correlation between exposure to testing and subsequent accidents. Data mining of tests and accidents involving employees of a Portuguese national wide railway transportation company, during recent years, is now beginning in search of relations between these and other biographical variables.
Resumo:
In the recent years, the area of data mining has been experiencing considerable demand for technologies that extract knowledge from large and complex data sources. There has been substantial commercial interest as well as active research in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from large datasets. Artificial neural networks (NNs) are popular biologically-inspired intelligent methodologies, whose classification, prediction, and pattern recognition capabilities have been utilized successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction, and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks. © 2012 Wiley Periodicals, Inc.
Resumo:
n the past decade, the analysis of data has faced the challenge of dealing with very large and complex datasets and the real-time generation of data. Technologies to store and access these complex and large datasets are in place. However, robust and scalable analysis technologies are needed to extract meaningful information from these datasets. The research field of Information Visualization and Visual Data Analytics addresses this need. Information visualization and data mining are often used complementary to each other. Their common goal is the extraction of meaningful information from complex and possibly large data. However, though data mining focuses on the usage of silicon hardware, visualization techniques also aim to access the powerful image-processing capabilities of the human brain. This article highlights the research on data visualization and visual analytics techniques. Furthermore, we highlight existing visual analytics techniques, systems, and applications including a perspective on the field from the chemical process industry.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Nowadays, data mining is based on low-level specications of the employed techniques typically bounded to a specic analysis platform. Therefore, data mining lacks a modelling architecture that allows analysts to consider it as a truly software-engineering process. Here, we propose a model-driven approach based on (i) a conceptual modelling framework for data mining, and (ii) a set of model transformations to automatically generate both the data under analysis (via data-warehousing technology) and the analysis models for data mining (tailored to a specic platform). Thus, analysts can concentrate on the analysis problem via conceptual data-mining models instead of low-level programming tasks related to the underlying-platform technical details. These tasks are now entrusted to the model-transformations scaffolding.
Resumo:
This special issue is a collection of the selected papers published on the proceedings of the First International Conference on Advanced Data Mining and Applications (ADMA) held in Wuhan, China in 2005. The articles focus on the innovative applications of data mining approaches to the problems that involve large data sets, incomplete and noise data, or demand optimal solutions.
Resumo:
Dinoflagellate cysts are useful for reconstructing upper water conditions. For adequate reconstructions detailed information is required about the relationship between modern day environmental conditions and the geographic distribution of cysts in sediments. This Atlas summarises the modern global distribution of 71 organicwalled dinoflagellate cyst species. The synthesis is based on the integration of literature sources together with data of 2405 globally distributed surface sediment samples that have been preparedwith a comparable methodology and taxonomy. The distribution patterns of individual cyst species are being comparedwith environmental factors that are knownto influence dinoflagellate growth, gamete production, encystment, excystment and preservation of their organic-walled cysts: surface water temperature, salinity, nitrate, phosphate, chlorophyll-a concentrations and bottom water oxygen concentrations. Graphs are provided for every species depicting the relationship between seasonal and annual variations of these parameters and the relative abundance of the species. Results have been compared with previously published records; an overview of the ecological significance as well as information about the seasonal production of each individual species is presented. The relationship between the cyst distribution and variation in the aforementioned environmental parameters was analysed by performing a canonical correspondence analysis. All tested variables showed a positive relationship on the 99% confidence level. Sea-surface temperature represents the parameter corresponding to the largest amount of variance within the dataset (40%) followed by nitrate, salinity, phosphate and bottom-water oxygen concentration, which correspond to 34%, 33%, 25% and 24% of the variance, respectively. Characterisations of selected environments as well as a discussion about how these factors could have influenced the final cyst yield in sediments are included.
Resumo:
Two hazard risk assessment matrices for the ranking of occupational health risks are described. The qualitative matrix uses qualitative measures of probability and consequence to determine risk assessment codes for hazard-disease combinations. A walk-through survey of an underground metalliferous mine and concentrator is used to demonstrate how the qualitative matrix can be applied to determine priorities for the control of occupational health hazards. The semi-quantitative matrix uses attributable risk as a quantitative measure of probability and uses qualitative measures of consequence. A practical application of this matrix is the determination of occupational health priorities using existing epidemiological studies. Calculated attributable risks from epidemiological studies of hazard-disease combinations in mining and minerals processing are used as examples. These historic response data do not reflect the risks associated with current exposures. A method using current exposure data, known exposure-response relationships and the semi-quantitative matrix is proposed for more accurate and current risk rankings.
Resumo:
Presently power system operation produces huge volumes of data that is still treated in a very limited way. Knowledge discovery and machine learning can make use of these data resulting in relevant knowledge with very positive impact. In the context of competitive electricity markets these data is of even higher value making clear the trend to make data mining techniques application in power systems more relevant. This paper presents two cases based on real data, showing the importance of the use of data mining for supporting demand response and for supporting player strategic behavior.