50 resultados para Data Mining and its Application
Resumo:
Knowledge-elicitation is a common technique used to produce rules about the operation of a plant from the knowledge that is available from human expertise. Similarly, data-mining is becoming a popular technique to extract rules from the data available from the operation of a plant. In the work reported here knowledge was required to enable the supervisory control of an aluminium hot strip mill by the determination of mill set-points. A method was developed to fuse knowledge-elicitation and data-mining to incorporate the best aspects of each technique, whilst avoiding known problems. Utilisation of the knowledge was through an expert system, which determined schedules of set-points and provided information to human operators. The results show that the method proposed in this paper was effective in producing rules for the on-line control of a complex industrial process.
Resumo:
A distributed Lagrangian moving-mesh finite element method is applied to problems involving changes of phase. The algorithm uses a distributed conservation principle to determine nodal mesh velocities, which are then used to move the nodes. The nodal values are obtained from an ALE (Arbitrary Lagrangian-Eulerian) equation, which represents a generalization of the original algorithm presented in Applied Numerical Mathematics, 54:450--469 (2005). Having described the details of the generalized algorithm it is validated on two test cases from the original paper and is then applied to one-phase and, for the first time, two-phase Stefan problems in one and two space dimensions, paying particular attention to the implementation of the interface boundary conditions. Results are presented to demonstrate the accuracy and the effectiveness of the method, including comparisons against analytical solutions where available.
Resumo:
Despite the fact that mites were used at the dawn of forensic entomology to elucidate the postmortem interval, their use in current cases remains quite low for procedural reasons such as inadequate taxonomic knowledge. A special interest is focused on the phoretic stages of some mite species, because the phoront-host specificity allows us to deduce in many occasions the presence of the carrier (usually Diptera or Coleoptera) although it has not been seen in the sampling performed in situ or in the autopsy room. In this article, we describe two cases where Poecilochirus austroasiaticus Vitzthum (Acari: Parasitidae) was sampled in the autopsy room. In the first case, we could sample the host, Thanatophilus ruficornis (Küster) (Coleoptera: Silphidae), which was still carrying phoretic stages of the mite on the body. That attachment allowed, by observing starvation/feeding periods as a function of the digestive tract filling, the establishment of chronological cycles of phoretic behavior, showing maximum peaks of phoronts during arrival and departure from the corpse and the lowest values in the phase of host feeding. From the sarcosaprophagous fauna, we were able to determine in this case a minimum postmortem interval of 10 days. In the second case, we found no Silphidae at the place where the corpse was found or at the autopsy, but a postmortem interval of 13 days could be established by the high specificity of this interspecific relationship and the departure from the corpse of this family of Coleoptera.
Resumo:
The concentrations of dissolved noble gases in water are widely used as a climate proxy to determine noble gas temperatures (NGTs); i.e., the temperature of the water when gas exchange last occurred. In this paper we make a step forward to apply this principle to fluid inclusions in stalagmites in order to reconstruct the cave temperature prevailing at the time when the inclusion was formed. We present an analytical protocol that allows us accurately to determine noble gas concentrations and isotope ratios in stalagmites, and which includes a precise manometrical determination of the mass of water liberated from fluid inclusions. Most important for NGT determination is to reduce the amount of noble gases liberated from air inclusions, as they mask the temperature-dependent noble gas signal from the water inclusions. We demonstrate that offline pre-crushing in air to subsequently extract noble gases and water from the samples by heating is appropriate to separate gases released from air and water inclusions. Although a large fraction of recent samples analysed by this technique yields NGTs close to present-day cave temperatures, the interpretation of measured noble gas concentrations in terms of NGTs is not yet feasible using the available least squares fitting models. This is because the noble gas concentrations in stalagmites are not only composed of the two components air and air saturated water (ASW), which these models are able to account for. The observed enrichments in heavy noble gases are interpreted as being due to adsorption during sample preparation in air, whereas the excess in He and Ne is interpreted as an additional noble gas component that is bound in voids in the crystallographic structure of the calcite crystals. As a consequence of our study's findings, NGTs will have to be determined in the future using the concentrations of Ar, Kr and Xe only. This needs to be achieved by further optimizing the sample preparation to minimize atmospheric contamination and to further reduce the amount of noble gases released from air inclusions.
Resumo:
We present a new subcortical structure shape modeling framework using heat kernel smoothing constructed with the Laplace-Beltrami eigenfunctions. The cotan discretization is used to numerically obtain the eigenfunctions of the Laplace-Beltrami operator along the surface of subcortical structures of the brain. The eigenfunctions are then used to construct the heat kernel and used in smoothing out measurements noise along the surface. The proposed framework is applied in investigating the influence of age (38-79 years) and gender on amygdala and hippocampus shape. We detected a significant age effect on hippocampus in accordance with the previous studies. In addition, we also detected a significant gender effect on amygdala. Since we did not find any such differences in the traditional volumetric methods, our results demonstrate the benefit of the current framework over traditional volumetric methods.
Resumo:
We present a new sparse shape modeling framework on the Laplace-Beltrami (LB) eigenfunctions. Traditionally, the LB-eigenfunctions are used as a basis for intrinsically representing surface shapes by forming a Fourier series expansion. To reduce high frequency noise, only the first few terms are used in the expansion and higher frequency terms are simply thrown away. However, some lower frequency terms may not necessarily contribute significantly in reconstructing the surfaces. Motivated by this idea, we propose to filter out only the significant eigenfunctions by imposing l1-penalty. The new sparse framework can further avoid additional surface-based smoothing often used in the field. The proposed approach is applied in investigating the influence of age (38-79 years) and gender on amygdala and hippocampus shapes in the normal population. In addition, we show how the emotional response is related to the anatomy of the subcortical structures.
Resumo:
Twitter has become a dependable microblogging tool for real time information dissemination and newsworthy events broadcast. Its users sometimes break news on the network faster than traditional newsagents due to their presence at ongoing real life events at most times. Different topic detection methods are currently used to match Twitter posts to real life news of mainstream media. In this paper, we analyse tweets relating to the English FA Cup finals 2012 by applying our novel method named TRCM to extract association rules present in hash tag keywords of tweets in different time-slots. Our system identify evolving hash tag keywords with strong association rules in each time-slot. We then map the identified hash tag keywords to event highlights of the game as reported in the ground truth of the main stream media. The performance effectiveness measure of our experiments show that our method perform well as a Topic Detection and Tracking approach.
Resumo:
Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.
Resumo:
The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contains important molecular properties. These properties may be used for data mining studies. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular, we look at two aspects: firstly, how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories — this is an important and challenging aspect of P-found, due to the large data volumes involved and the desire of scientists to maintain control of their own data. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling scientific discovery.
Resumo:
In the recent years, the area of data mining has been experiencing considerable demand for technologies that extract knowledge from large and complex data sources. There has been substantial commercial interest as well as active research in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from large datasets. Artificial neural networks (NNs) are popular biologically-inspired intelligent methodologies, whose classification, prediction, and pattern recognition capabilities have been utilized successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction, and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks. © 2012 Wiley Periodicals, Inc.