947 resultados para Model Mining


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Arsenic is a class 1 non-threshold carcinogen which is highly ubiquitous. Arsenic undergoes many different transformations (biotic or abiotic) between and within environmental compartments, leading to a number of different chemical species possessing different properties and toxicities. One specific transformation is As biotic volatilization which is coupled with As biomethylation and has been scarcely studied due to inherent sampling issues. Arsenic methylation/volatilization is also linked with methanogenesis and occurs in anaerobic environments. In China, rice straw and animal manure are very often used to produce biogas and both can contain high amounts of As, especially if the rice is grown in areas with heavy mining or smelting industries and if Roxarsone is fed to the animals. Roxarsone is an As-containing drug which is widely used in China to control coccidian intestinal parasites, to improve feed efficiency and to promote rapid growth. Previous work has shown that this compound degrades to inorganic As under anaerobic conditions. In this study the focus is on biotic transformations of As in small microcosms designed as biogas digester models (BDMs) using recently validated As traps, thus, enabling direct quantification and identification of volatile As species. It is shown that although there was a loss of soluble As in the BDMs, their conditions favored biomethylation. All reactors produced volatile As, especially the monomethylarsonic acid spiked ones with 413 ± 148 ng As (mean ± SD, n = 3) which suggest that the first methylation step, from inorganic As, is a limiting factor. The most abundant species was trimethylarsine, but the toxic arsine was present in the headspace of most of the BDMs. The results suggest that volatile As species should be monitored in biogas digesters in order to assess risks to humans working in biogas plants and those utilizing the biogas.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Simple Sequence Repeats (SSRs) are widely used in population genetic studies but their classical development is costly and time-consuming. The ever-increasing available DNA datasets generated by high-throughput techniques offer an inexpensive alternative for SSRs discovery. Expressed Sequence Tags (ESTs) have been widely used as SSR source for plants of economic relevance but their application to non-model species is still modest. Methods Here, we explored the use of publicly available ESTs (GenBank at the National Center for Biotechnology Information-NCBI) for SSRs development in non-model plants, focusing on genera listed by the International Union for the Conservation of Nature (IUCN). We also search two model genera with fully annotated genomes for EST-SSRs, Arabidopsis and Oryza, and used them as controls for genome distribution analyses. Overall, we downloaded 16 031 555 sequences for 258 plant genera which were mined for SSRsand their primers with the help of QDD1. Genome distribution analyses in Oryza and Arabidopsis were done by blasting the sequences with SSR against the Oryza sativa and Arabidopsis thaliana reference genomes implemented in the Basal Local Alignment Tool (BLAST) of the NCBI website. Finally, we performed an empirical test to determine the performance of our EST-SSRs in a few individuals from four species of two eudicot genera, Trifolium and Centaurea. Results We explored a total of 14 498 726 EST sequences from the dbEST database (NCBI) in 257 plant genera from the IUCN Red List. We identify a very large number (17 102) of ready-to-test EST-SSRs in most plant genera (193) at no cost. Overall, dinucleotide and trinucleotide repeats were the prevalent types but the abundance of the various types of repeat differed between taxonomic groups. Control genomes revealed that trinucleotide repeats were mostly located in coding regions while dinucleotide repeats were largely associated with untranslated regions. Our results from the empirical test revealed considerable amplification success and transferability between congenerics. Conclusions The present work represents the first large-scale study developing SSRs by utilizing publicly accessible EST databases in threatened plants. Here we provide a very large number of ready-to-test EST-SSR (17 102) for 193 genera. The cross-species transferability suggests that the number of possible target species would be large. Since trinucleotide repeats are abundant and mainly linked to exons they might be useful in evolutionary and conservation studies. Altogether, our study highly supports the use of EST databases as an extremely affordable and fast alternative for SSR developing in threatened plants.

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Twenty production blasts in two open pit mines were monitored, in rocks with medium to very high strength. Three different blasting agents (ANFO, watergel and emulsion blend) were used, with powder factors ranging between 0.88 and 1.45 kg/m3. Excavators were front loaders and rope shovels. Mechanical properties of the rock, blasting characteristics and mucking rates were carefully measured. A model for the calculation of the productivity of excavators is developed thereof, in which the production rate results as a product of an ideal, maximum, productivity rate times an operating efficiency. The maximum rate is a function of the dipper capacity and the efficiency is a function of rock density, strength, and explosive energy concentration in the rock. The model is statistically significant and explains up to 92 % of the variance of the production rate measurements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Due to recent scientific and technological advances in information sys¬tems, it is now possible to perform almost every application on a mobile device. The need to make sense of such devices more intelligent opens an opportunity to design data mining algorithm that are able to autonomous execute in local devices to provide the device with knowledge. The problem behind autonomous mining deals with the proper configuration of the algorithm to produce the most appropriate results. Contextual information together with resource information of the device have a strong impact on both the feasibility of a particu¬lar execution and on the production of the proper patterns. On the other hand, performance of the algorithm expressed in terms of efficacy and efficiency highly depends on the features of the dataset to be analyzed together with values of the parameters of a particular implementation of an algorithm. However, few existing approaches deal with autonomous configuration of data mining algorithms and in any case they do not deal with contextual or resources information. Both issues are of particular significance, in particular for social net¬works application. In fact, the widespread use of social networks and consequently the amount of information shared have made the need of modeling context in social application a priority. Also the resource consumption has a crucial role in such platforms as the users are using social networks mainly on their mobile devices. This PhD thesis addresses the aforementioned open issues, focusing on i) Analyzing the behavior of algorithms, ii) mapping contextual and resources information to find the most appropriate configuration iii) applying the model for the case of a social recommender. Four main contributions are presented: - The EE-Model: is able to predict the behavior of a data mining algorithm in terms of resource consumed and accuracy of the mining model it will obtain. - The SC-Mapper: maps a situation defined by the context and resource state to a data mining configuration. - SOMAR: is a social activity (event and informal ongoings) recommender for mobile devices. - D-SOMAR: is an evolution of SOMAR which incorporates the configurator in order to provide updated recommendations. Finally, the experimental validation of the proposed contributions using synthetic and real datasets allows us to achieve the objectives and answer the research questions proposed for this dissertation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We perform a review of Web Mining techniques and we describe a Bootstrap Statistics methodology applied to pattern model classifier optimization and verification for Supervised Learning for Tour-Guide Robot knowledge repository management. It is virtually impossible to test thoroughly Web Page Classifiers and many other Internet Applications with pure empirical data, due to the need for human intervention to generate training sets and test sets. We propose using the computer-based Bootstrap paradigm to design a test environment where they are checked with better reliability.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The study examines the Capital Asset Pricing Model (CAPM) for the mining sector using weekly stock returns from 27 companies traded on the New York Stock Exchange (NYSE) or on the London Stock Exchange (LSE) for the period of December 2008 to December 2010. The results support the use of the CAPM for the allocation of risk to companies. Most companies involved in precious metals (particularly gold), which have a beta value less than unity (Table 1), have been actuated as shelter values during the financial crisis. Values of R2 do not shown very explanatory power of fitted models (R2 < 70 %). Estimated coefficients beta are not sufficient to determine the expected returns on securities but the results of the tests conducted on sample data for the period analysed do not appear to clearly reject the CAPM

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ubiquitous computing software needs to be autonomous so that essential decisions such as how to configure its particular execution are self-determined. Moreover, data mining serves an important role for ubiquitous computing by providing intelligence to several types of ubiquitous computing applications. Thus, automating ubiquitous data mining is also crucial. We focus on the problem of automatically configuring the execution of a ubiquitous data mining algorithm. In our solution, we generate configuration decisions in a resource aware and context aware manner since the algorithm executes in an environment in which the context often changes and computing resources are often severely limited. We propose to analyze the execution behavior of the data mining algorithm by mining its past executions. By doing so, we discover the effects of resource and context states as well as parameter settings on the data mining quality. We argue that a classification model is appropriate for predicting the behavior of an algorithm?s execution and we concentrate on decision tree classifier. We also define taxonomy on data mining quality so that tradeoff between prediction accuracy and classification specificity of each behavior model that classifies by a different abstraction of quality, is scored for model selection. Behavior model constituents and class label transformations are formally defined and experimental validation of the proposed approach is also performed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In ubiquitous data stream mining applications, different devices often aim to learn concepts that are similar to some extent. In these applications, such as spam filtering or news recommendation, the data stream underlying concept (e.g., interesting mail/news) is likely to change over time. Therefore, the resultant model must be continuously adapted to such changes. This paper presents a novel Collaborative Data Stream Mining (Coll-Stream) approach that explores the similarities in the knowledge available from other devices to improve local classification accuracy. Coll-Stream integrates the community knowledge using an ensemble method where the classifiers are selected and weighted based on their local accuracy for different partitions of the feature space. We evaluate Coll-Stream classification accuracy in situations with concept drift, noise, partition granularity and concept similarity in relation to the local underlying concept. The experimental results show that Coll-Stream resultant model achieves stability and accuracy in a variety of situations using both synthetic and real world datasets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Diabetes is the most common disease nowadays in all populations and in all age groups. diabetes contributing to heart disease, increases the risks of developing kidney disease, blindness, nerve damage, and blood vessel damage. Diabetes disease diagnosis via proper interpretation of the diabetes data is an important classification problem. Different techniques of artificial intelligence has been applied to diabetes problem. The purpose of this study is apply the artificial metaplasticity on multilayer perceptron (AMMLP) as a data mining (DM) technique for the diabetes disease diagnosis. The Pima Indians diabetes was used to test the proposed model AMMLP. The results obtained by AMMLP were compared with decision tree (DT), Bayesian classifier (BC) and other algorithms, recently proposed by other researchers, that were applied to the same database. The robustness of the algorithms are examined using classification accuracy, analysis of sensitivity and specificity, confusion matrix. The results obtained by AMMLP are superior to obtained by DT and BC.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There are a number of factors that contribute to the success of dental implant operations. Among others, is the choice of location in which the prosthetic tooth is to be implanted. This project offers a new approach to analyse jaw tissue for the purpose of selecting suitable locations for teeth implant operations. The application developed takes as input jaw computed tomography stack of slices and trims data outside the jaw area, which is the point of interest. It then reconstructs a three dimensional model of the jaw highlighting points of interest on the reconstructed model. On another hand, data mining techniques have been utilised in order to construct a prediction model based on an information dataset of previous dental implant operations with observed stability values. The goal is to find patterns within the dataset that would help predicting the success likelihood of an implant.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective The main purpose of this research is the novel use of artificial metaplasticity on multilayer perceptron (AMMLP) as a data mining tool for prediction the outcome of patients with acquired brain injury (ABI) after cognitive rehabilitation. The final goal aims at increasing knowledge in the field of rehabilitation theory based on cognitive affectation. Methods and materials The data set used in this study contains records belonging to 123 ABI patients with moderate to severe cognitive affectation (according to Glasgow Coma Scale) that underwent rehabilitation at Institut Guttmann Neurorehabilitation Hospital (IG) using the tele-rehabilitation platform PREVIRNEC©. The variables included in the analysis comprise the neuropsychological initial evaluation of the patient (cognitive affectation profile), the results of the rehabilitation tasks performed by the patient in PREVIRNEC© and the outcome of the patient after a 3–5 months treatment. To achieve the treatment outcome prediction, we apply and compare three different data mining techniques: the AMMLP model, a backpropagation neural network (BPNN) and a C4.5 decision tree. Results The prediction performance of the models was measured by ten-fold cross validation and several architectures were tested. The results obtained by the AMMLP model are clearly superior, with an average predictive performance of 91.56%. BPNN and C4.5 models have a prediction average accuracy of 80.18% and 89.91% respectively. The best single AMMLP model provided a specificity of 92.38%, a sensitivity of 91.76% and a prediction accuracy of 92.07%. Conclusions The proposed prediction model presented in this study allows to increase the knowledge about the contributing factors of an ABI patient recovery and to estimate treatment efficacy in individual patients. The ability to predict treatment outcomes may provide new insights toward improving effectiveness and creating personalized therapeutic interventions based on clinical evidence.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract This paper presents a new method to extract knowledge from existing data sets, that is, to extract symbolic rules using the weights of an Artificial Neural Network. The method has been applied to a neural network with special architecture named Enhanced Neural Network (ENN). This architecture improves the results that have been obtained with multilayer perceptron (MLP). The relationship among the knowledge stored in the weights, the performance of the network and the new implemented algorithm to acquire rules from the weights is explained. The method itself gives a model to follow in the knowledge acquisition with ENN.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the last few years there has been a heightened interest in data treatment and analysis with the aim of discovering hidden knowledge and eliciting relationships and patterns within this data. Data mining techniques (also known as Knowledge Discovery in Databases) have been applied over a wide range of fields such as marketing, investment, fraud detection, manufacturing, telecommunications and health. In this study, well-known data mining techniques such as artificial neural networks (ANN), genetic programming (GP), forward selection linear regression (LR) and k-means clustering techniques, are proposed to the health and sports community in order to aid with resistance training prescription. Appropriate resistance training prescription is effective for developing fitness, health and for enhancing general quality of life. Resistance exercise intensity is commonly prescribed as a percent of the one repetition maximum. 1RM, dynamic muscular strength, one repetition maximum or one execution maximum, is operationally defined as the heaviest load that can be moved over a specific range of motion, one time and with correct performance. The safety of the 1RM assessment has been questioned as such an enormous effort may lead to muscular injury. Prediction equations could help to tackle the problem of predicting the 1RM from submaximal loads, in order to avoid or at least, reduce the associated risks. We built different models from data on 30 men who performed up to 5 sets to exhaustion at different percentages of the 1RM in the bench press action, until reaching their actual 1RM. Also, a comparison of different existing prediction equations is carried out. The LR model seems to outperform the ANN and GP models for the 1RM prediction in the range between 1 and 10 repetitions. At 75% of the 1RM some subjects (n = 5) could perform 13 repetitions with proper technique in the bench press action, whilst other subjects (n = 20) performed statistically significant (p < 0:05) more repetitions at 70% than at 75% of their actual 1RM in the bench press action. Rate of perceived exertion (RPE) seems not to be a good predictor for 1RM when all the sets are performed until exhaustion, as no significant differences (p < 0:05) were found in the RPE at 75%, 80% and 90% of the 1RM. Also, years of experience and weekly hours of strength training are better correlated to 1RM (p < 0:05) than body weight. O'Connor et al. 1RM prediction equation seems to arise from the data gathered and seems to be the most accurate 1RM prediction equation from those proposed in literature and used in this study. Epley's 1RM prediction equation is reproduced by means of data simulation from 1RM literature equations. Finally, future lines of research are proposed related to the problem of the 1RM prediction by means of genetic algorithms, neural networks and clustering techniques. RESUMEN En los últimos años ha habido un creciente interés en el tratamiento y análisis de datos con el propósito de descubrir relaciones, patrones y conocimiento oculto en los mismos. Las técnicas de data mining (también llamadas de \Descubrimiento de conocimiento en bases de datos\) se han aplicado consistentemente a lo gran de un gran espectro de áreas como el marketing, inversiones, detección de fraude, producción industrial, telecomunicaciones y salud. En este estudio, técnicas bien conocidas de data mining como las redes neuronales artificiales (ANN), programación genética (GP), regresión lineal con selección hacia adelante (LR) y la técnica de clustering k-means, se proponen a la comunidad del deporte y la salud con el objetivo de ayudar con la prescripción del entrenamiento de fuerza. Una apropiada prescripción de entrenamiento de fuerza es efectiva no solo para mejorar el estado de forma general, sino para mejorar la salud e incrementar la calidad de vida. La intensidad en un ejercicio de fuerza se prescribe generalmente como un porcentaje de la repetición máxima. 1RM, fuerza muscular dinámica, una repetición máxima o una ejecución máxima, se define operacionalmente como la carga máxima que puede ser movida en un rango de movimiento específico, una vez y con una técnica correcta. La seguridad de las pruebas de 1RM ha sido cuestionada debido a que el gran esfuerzo requerido para llevarlas a cabo puede derivar en serias lesiones musculares. Las ecuaciones predictivas pueden ayudar a atajar el problema de la predicción de la 1RM con cargas sub-máximas y son empleadas con el propósito de eliminar o al menos, reducir los riesgos asociados. En este estudio, se construyeron distintos modelos a partir de los datos recogidos de 30 hombres que realizaron hasta 5 series al fallo en el ejercicio press de banca a distintos porcentajes de la 1RM, hasta llegar a su 1RM real. También se muestra una comparación de algunas de las distintas ecuaciones de predicción propuestas con anterioridad. El modelo LR parece superar a los modelos ANN y GP para la predicción de la 1RM entre 1 y 10 repeticiones. Al 75% de la 1RM algunos sujetos (n = 5) pudieron realizar 13 repeticiones con una técnica apropiada en el ejercicio press de banca, mientras que otros (n = 20) realizaron significativamente (p < 0:05) más repeticiones al 70% que al 75% de su 1RM en el press de banca. El ínndice de esfuerzo percibido (RPE) parece no ser un buen predictor del 1RM cuando todas las series se realizan al fallo, puesto que no existen diferencias signifiativas (p < 0:05) en el RPE al 75%, 80% y el 90% de la 1RM. Además, los años de experiencia y las horas semanales dedicadas al entrenamiento de fuerza están más correlacionadas con la 1RM (p < 0:05) que el peso corporal. La ecuación de O'Connor et al. parece surgir de los datos recogidos y parece ser la ecuación de predicción de 1RM más precisa de aquellas propuestas en la literatura y empleadas en este estudio. La ecuación de predicción de la 1RM de Epley es reproducida mediante simulación de datos a partir de algunas ecuaciones de predicción de la 1RM propuestas con anterioridad. Finalmente, se proponen futuras líneas de investigación relacionadas con el problema de la predicción de la 1RM mediante algoritmos genéticos, redes neuronales y técnicas de clustering.