906 resultados para Decision tree


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Academic and industrial research in the late 90s have brought about an exponential explosion of DNA sequence data. Automated expert systems are being created to help biologists to extract patterns, trends and links from this ever-deepening ocean of information. Two such systems aimed on retrieving and subsequently utilizing phylogenetically relevant information have been developed in this dissertation, the major objective of which was to automate the often difficult and confusing phylogenetic reconstruction process. ^ Popular phylogenetic reconstruction methods, such as distance-based methods, attempt to find an optimal tree topology (that reflects the relationships among related sequences and their evolutionary history) by searching through the topology space. Various compromises between the fast (but incomplete) and exhaustive (but computationally prohibitive) search heuristics have been suggested. An intelligent compromise algorithm that relies on a flexible “beam” search principle from the Artificial Intelligence domain and uses the pre-computed local topology reliability information to adjust the beam search space continuously is described in the second chapter of this dissertation. ^ However, sometimes even a (virtually) complete distance-based method is inferior to the significantly more elaborate (and computationally expensive) maximum likelihood (ML) method. In fact, depending on the nature of the sequence data in question either method might prove to be superior. Therefore, it is difficult (even for an expert) to tell a priori which phylogenetic reconstruction method—distance-based, ML or maybe maximum parsimony (MP)—should be chosen for any particular data set. ^ A number of factors, often hidden, influence the performance of a method. For example, it is generally understood that for a phylogenetically “difficult” data set more sophisticated methods (e.g., ML) tend to be more effective and thus should be chosen. However, it is the interplay of many factors that one needs to consider in order to avoid choosing an inferior method (potentially a costly mistake, both in terms of computational expenses and in terms of reconstruction accuracy.) ^ Chapter III of this dissertation details a phylogenetic reconstruction expert system that selects a superior proper method automatically. It uses a classifier (a Decision Tree-inducing algorithm) to map a new data set to the proper phylogenetic reconstruction method. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Center for Disease Control and Prevention (CDC) estimates that more than 2 million patients annually acquire an infection while hospitalized in U.S. hospitals for other health problems, and that 88,000 die as a direct or indirect result of these infections. Infection with Clostridium difficile is the most important common cause of health care associated infectious diarrhea in industrialized countries. The purpose of this study was to explore the cost of current treatment practice of beginning empiric metronidazole treatment for hospitalized patients with diarrhea prior to identification of an infectious agent. The records of 70 hospitalized patients were retrospectively analyzed to determine the pharmacologic treatment, laboratory testing, and radiographic studies ordered and the median cost for each of these was determined. All patients in the study were tested for C. difficile and concurrently started on empiric metronidazole. The median direct cost for metronidazole was $7.25 per patient (95% CI 5.00, 12.721). The median direct cost for laboratory charges was $468.00 (95% CI 339.26, 552.58) and for radiology the median direct cost was $970.00 (95% CI 738.00, 3406.91). Indirect costs, which are far greater than direct costs, were not studied. At St. Luke's, if every hospitalized patient with diarrhea was empirically treated with metronidazole at a median cost of $7.25, the annual direct cost is estimated to be over $9,000.00 plus uncalculated indirect costs. In the U.S., the estimated annual direct cost may be as much as $21,750,000.00, plus indirect costs. ^ An unexpected and significant finding of this study was the inconsistency in testing and treatment of patients with health care associated diarrhea. A best-practice model for C. difficile testing and treatment was not found in the literature review. In addition to the cost savings gained by not routinely beginning empiric treatment with metronidazole, significant savings and improvement in patient care may result from a more consistent approach to the diagnosis and treatment of all patients with health care associated diarrhea. A decision tree model for C. difficile testing and treatment is proposed, but further research is needed to evaluate the decision arms before a validated best practice model can be proposed. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Wadden Sea is located in the southeastern part of the North Sea forming an extended intertidal area along the Dutch, German and Danish coast. It is a highly dynamic and largely natural ecosystem influenced by climatic changes and anthropogenic use of the North Sea. Changes in the environment of the Wadden Sea, natural or anthropogenic origin, cannot be monitored by the standard measurement methods alone, because large-area surveys of the intertidal flats are often difficult due to tides, tidal channels and unstable underground. For this reason, remote sensing offers effective monitoring tools. In this study a multi-sensor concept for classification of intertidal areas in the Wadden Sea has been developed. Basis for this method is a combined analysis of RapidEye (RE) and TerraSAR-X (TSX) satellite data coupled with ancillary vector data about the distribution of vegetation, mussel beds and sediments. The classification of the vegetation and mussel beds is based on a decision tree and a set of hierarchically structured algorithms which use object and texture features. The sediments are classified by an algorithm which uses thresholds and a majority filter. Further improvements focus on radiometric enhancement and atmospheric correction. First results show that we are able to identify vegetation and mussel beds with the use of multi-sensor remote sensing. The classification of the sediments in the tidal flats is a challenge compared to vegetation and mussel beds. The results demonstrate that the sediments cannot be classified with high accuracy by their spectral properties alone due to their similarity which is predominately caused by their water content.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

To deliver sample estimates provided with the necessary probability foundation to permit generalization from the sample data subset to the whole target population being sampled, probability sampling strategies are required to satisfy three necessary not sufficient conditions: (i) All inclusion probabilities be greater than zero in the target population to be sampled. If some sampling units have an inclusion probability of zero, then a map accuracy assessment does not represent the entire target region depicted in the map to be assessed. (ii) The inclusion probabilities must be: (a) knowable for nonsampled units and (b) known for those units selected in the sample: since the inclusion probability determines the weight attached to each sampling unit in the accuracy estimation formulas, if the inclusion probabilities are unknown, so are the estimation weights. This original work presents a novel (to the best of these authors' knowledge, the first) probability sampling protocol for quality assessment and comparison of thematic maps generated from spaceborne/airborne Very High Resolution (VHR) images, where: (I) an original Categorical Variable Pair Similarity Index (CVPSI, proposed in two different formulations) is estimated as a fuzzy degree of match between a reference and a test semantic vocabulary, which may not coincide, and (II) both symbolic pixel-based thematic quality indicators (TQIs) and sub-symbolic object-based spatial quality indicators (SQIs) are estimated with a degree of uncertainty in measurement in compliance with the well-known Quality Assurance Framework for Earth Observation (QA4EO) guidelines. Like a decision-tree, any protocol (guidelines for best practice) comprises a set of rules, equivalent to structural knowledge, and an order of presentation of the rule set, known as procedural knowledge. The combination of these two levels of knowledge makes an original protocol worth more than the sum of its parts. The several degrees of novelty of the proposed probability sampling protocol are highlighted in this paper, at the levels of understanding of both structural and procedural knowledge, in comparison with related multi-disciplinary works selected from the existing literature. In the experimental session the proposed protocol is tested for accuracy validation of preliminary classification maps automatically generated by the Satellite Image Automatic MapperT (SIAMT) software product from two WorldView-2 images and one QuickBird-2 image provided by DigitalGlobe for testing purposes. In these experiments, collected TQIs and SQIs are statistically valid, statistically significant, consistent across maps and in agreement with theoretical expectations, visual (qualitative) evidence and quantitative quality indexes of operativeness (OQIs) claimed for SIAMT by related papers. As a subsidiary conclusion, the statistically consistent and statistically significant accuracy validation of the SIAMT pre-classification maps proposed in this contribution, together with OQIs claimed for SIAMT by related works, make the operational (automatic, accurate, near real-time, robust, scalable) SIAMT software product eligible for opening up new inter-disciplinary research and market opportunities in accordance with the visionary goal of the Global Earth Observation System of Systems (GEOSS) initiative and the QA4EO international guidelines.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Esta tesis doctoral se enmarca dentro de la computación con membranas. Se trata de un tipo de computación bio-inspirado, concretamente basado en las células de los organismos vivos, en las que se producen múltiples reacciones de forma simultánea. A partir de la estructura y funcionamiento de las células se han definido diferentes modelos formales, denominados P sistemas. Estos modelos no tratan de modelar el comportamiento biológico de una célula, sino que abstraen sus principios básicos con objeto de encontrar nuevos paradigmas computacionales. Los P sistemas son modelos de computación no deterministas y masivamente paralelos. De ahí el interés que en los últimos años estos modelos han suscitado para la resolución de problemas complejos. En muchos casos, consiguen resolver de forma teórica problemas NP-completos en tiempo polinómico o lineal. Por otra parte, cabe destacar también la aplicación que la computación con membranas ha tenido en la investigación de otros muchos campos, sobre todo relacionados con la biología. Actualmente, una gran cantidad de estos modelos de computación han sido estudiados desde el punto de vista teórico. Sin embargo, el modo en que pueden ser implementados es un reto de investigación todavía abierto. Existen varias líneas en este sentido, basadas en arquitecturas distribuidas o en hardware dedicado, que pretenden acercarse en lo posible a su carácter no determinista y masivamente paralelo, dentro de un contexto de viabilidad y eficiencia. En esta tesis doctoral se propone la realización de un análisis estático del P sistema, como vía para optimizar la ejecución del mismo en estas plataformas. Se pretende que la información recogida en tiempo de análisis sirva para configurar adecuadamente la plataforma donde se vaya a ejecutar posteriormente el P sistema, obteniendo como consecuencia una mejora en el rendimiento. Concretamente, en esta tesis se han tomado como referencia los P sistemas de transiciones para llevar a cabo el estudio de dicho análisis estático. De manera un poco más específica, el análisis estático propuesto en esta tesis persigue que cada membrana sea capaz de determinar sus reglas activas de forma eficiente en cada paso de evolución, es decir, aquellas reglas que reúnen las condiciones adecuadas para poder ser aplicadas. En esta línea, se afronta el problema de los estados de utilidad de una membrana dada, que en tiempo de ejecución permitirán a la misma conocer en todo momento las membranas con las que puede comunicarse, cuestión que determina las reglas que pueden aplicarse en cada momento. Además, el análisis estático propuesto en esta tesis se basa en otra serie de características del P sistema como la estructura de membranas, antecedentes de las reglas, consecuentes de las reglas o prioridades. Una vez obtenida toda esta información en tiempo de análisis, se estructura en forma de árbol de decisión, con objeto de que en tiempo de ejecución la membrana obtenga las reglas activas de la forma más eficiente posible. Por otra parte, en esta tesis se lleva a cabo un recorrido por un número importante de arquitecturas hardware y software que diferentes autores han propuesto para implementar P sistemas. Fundamentalmente, arquitecturas distribuidas, hardware dedicado basado en tarjetas FPGA y plataformas basadas en microcontroladores PIC. El objetivo es proponer soluciones que permitan implantar en dichas arquitecturas los resultados obtenidos del análisis estático (estados de utilidad y árboles de decisión para reglas activas). En líneas generales, se obtienen conclusiones positivas, en el sentido de que dichas optimizaciones se integran adecuadamente en las arquitecturas sin penalizaciones significativas. Summary Membrane computing is the focus of this doctoral thesis. It can be considered a bio-inspired computing type. Specifically, it is based on living cells, in which many reactions take place simultaneously. From cell structure and operation, many different formal models have been defined, named P systems. These models do not try to model the biological behavior of the cell, but they abstract the basic principles of the cell in order to find out new computational paradigms. P systems are non-deterministic and massively parallel computational models. This is why, they have aroused interest when dealing with complex problems nowadays. In many cases, they manage to solve in theory NP problems in polynomial or lineal time. On the other hand, it is important to note that membrane computing has been successfully applied in many researching areas, specially related to biology. Nowadays, lots of these computing models have been sufficiently characterized from a theoretical point of view. However, the way in which they can be implemented is a research challenge, that it is still open nowadays. There are some lines in this way, based on distributed architectures or dedicated hardware. All of them are trying to approach to its non-deterministic and parallel character as much as possible, taking into account viability and efficiency. In this doctoral thesis it is proposed carrying out a static analysis of the P system in order to optimize its performance in a computing platform. The general idea is that after data are collected in analysis time, they are used for getting a suitable configuration of the computing platform in which P system is going to be performed. As a consequence, the system throughput will improve. Specifically, this thesis has made use of Transition P systems for carrying out the study in static analysis. In particular, the static analysis proposed in this doctoral thesis tries to achieve that every membrane can efficiently determine its active rules in every evolution step. These rules are the ones that can be applied depending on the system configuration at each computational step. In this line, we are going to tackle the problem of the usefulness states for a membrane. This state will allow this membrane to know the set of membranes with which communication is possible at any time. This is a very important issue in determining the set of rules that can be applied. Moreover, static analysis in this thesis is carried out taking into account other properties such as membrane structure, rule antecedents, rule consequents and priorities among rules. After collecting all data in analysis time, they are arranged in a decision tree structure, enabling membranes to obtain the set of active rules as efficiently as possible in run-time system. On the other hand, in this doctoral thesis is going to carry out an overview of hardware and software architectures, proposed by different authors in order to implement P systems, such as distributed architectures, dedicated hardware based on PFGA, and computing platforms based on PIC microcontrollers. The aim of this overview is to propose solutions for implementing the results of the static analysis, that is, usefulness states and decision trees for active rules. In general, conclusions are satisfactory, because these optimizations can be properly integrated in most of the architectures without significant penalties.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Acquired brain injury (ABI) is one of the leading causes of death and disability in the world and is associated with high health care costs as a result of the acute treatment and long term rehabilitation involved. Different algorithms and methods have been proposed to predict the effectiveness of rehabilitation programs. In general, research has focused on predicting the overall improvement of patients with ABI. The purpose of this study is the novel application of data mining (DM) techniques to predict the outcomes of cognitive rehabilitation in patients with ABI. We generate three predictive models that allow us to obtain new knowledge to evaluate and improve the effectiveness of the cognitive rehabilitation process. Decision tree (DT), multilayer perceptron (MLP) and general regression neural network (GRNN) have been used to construct the prediction models. 10-fold cross validation was carried out in order to test the algorithms, using the Institut Guttmann Neurorehabilitation Hospital (IG) patients database. Performance of the models was tested through specificity, sensitivity and accuracy analysis and confusion matrix analysis. The experimental results obtained by DT are clearly superior with a prediction average accuracy of 90.38%, while MLP and GRRN obtained a 78.7% and 75.96%, respectively. This study allows to increase the knowledge about the contributing factors of an ABI patient recovery and to estimate treatment efficacy in individual patients.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Ubiquitous computing software needs to be autonomous so that essential decisions such as how to configure its particular execution are self-determined. Moreover, data mining serves an important role for ubiquitous computing by providing intelligence to several types of ubiquitous computing applications. Thus, automating ubiquitous data mining is also crucial. We focus on the problem of automatically configuring the execution of a ubiquitous data mining algorithm. In our solution, we generate configuration decisions in a resource aware and context aware manner since the algorithm executes in an environment in which the context often changes and computing resources are often severely limited. We propose to analyze the execution behavior of the data mining algorithm by mining its past executions. By doing so, we discover the effects of resource and context states as well as parameter settings on the data mining quality. We argue that a classification model is appropriate for predicting the behavior of an algorithm?s execution and we concentrate on decision tree classifier. We also define taxonomy on data mining quality so that tradeoff between prediction accuracy and classification specificity of each behavior model that classifies by a different abstraction of quality, is scored for model selection. Behavior model constituents and class label transformations are formally defined and experimental validation of the proposed approach is also performed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Diabetes is the most common disease nowadays in all populations and in all age groups. diabetes contributing to heart disease, increases the risks of developing kidney disease, blindness, nerve damage, and blood vessel damage. Diabetes disease diagnosis via proper interpretation of the diabetes data is an important classification problem. Different techniques of artificial intelligence has been applied to diabetes problem. The purpose of this study is apply the artificial metaplasticity on multilayer perceptron (AMMLP) as a data mining (DM) technique for the diabetes disease diagnosis. The Pima Indians diabetes was used to test the proposed model AMMLP. The results obtained by AMMLP were compared with decision tree (DT), Bayesian classifier (BC) and other algorithms, recently proposed by other researchers, that were applied to the same database. The robustness of the algorithms are examined using classification accuracy, analysis of sensitivity and specificity, confusion matrix. The results obtained by AMMLP are superior to obtained by DT and BC.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective The main purpose of this research is the novel use of artificial metaplasticity on multilayer perceptron (AMMLP) as a data mining tool for prediction the outcome of patients with acquired brain injury (ABI) after cognitive rehabilitation. The final goal aims at increasing knowledge in the field of rehabilitation theory based on cognitive affectation. Methods and materials The data set used in this study contains records belonging to 123 ABI patients with moderate to severe cognitive affectation (according to Glasgow Coma Scale) that underwent rehabilitation at Institut Guttmann Neurorehabilitation Hospital (IG) using the tele-rehabilitation platform PREVIRNEC©. The variables included in the analysis comprise the neuropsychological initial evaluation of the patient (cognitive affectation profile), the results of the rehabilitation tasks performed by the patient in PREVIRNEC© and the outcome of the patient after a 3–5 months treatment. To achieve the treatment outcome prediction, we apply and compare three different data mining techniques: the AMMLP model, a backpropagation neural network (BPNN) and a C4.5 decision tree. Results The prediction performance of the models was measured by ten-fold cross validation and several architectures were tested. The results obtained by the AMMLP model are clearly superior, with an average predictive performance of 91.56%. BPNN and C4.5 models have a prediction average accuracy of 80.18% and 89.91% respectively. The best single AMMLP model provided a specificity of 92.38%, a sensitivity of 91.76% and a prediction accuracy of 92.07%. Conclusions The proposed prediction model presented in this study allows to increase the knowledge about the contributing factors of an ABI patient recovery and to estimate treatment efficacy in individual patients. The ability to predict treatment outcomes may provide new insights toward improving effectiveness and creating personalized therapeutic interventions based on clinical evidence.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The emergence of new horizons in the field of travel assistant management leads to the development of cutting-edge systems focused on improving the existing ones. Moreover, new opportunities are being also presented since systems trend to be more reliable and autonomous. In this paper, a self-learning embedded system for object identification based on adaptive-cooperative dynamic approaches is presented for intelligent sensor’s infrastructures. The proposed system is able to detect and identify moving objects using a dynamic decision tree. Consequently, it combines machine learning algorithms and cooperative strategies in order to make the system more adaptive to changing environments. Therefore, the proposed system may be very useful for many applications like shadow tolls since several types of vehicles may be distinguished, parking optimization systems, improved traffic conditions systems, etc.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The energy demand for operating Information and Communication Technology (ICT) systems has been growing, implying in high operational costs and consequent increase of carbon emissions. Both in datacenters and telecom infrastructures, the networks represent a significant amount of energy spending. Given that, there is an increased demand for energy eficiency solutions, and several capabilities to save energy have been proposed. However, it is very dificult to orchestrate such energy eficiency capabilities, i.e., coordinate or combine them in the same network, ensuring a conflict-free operation and choosing the best one for a given scenario, ensuring that a capability not suited to the current bandwidth utilization will not be applied and lead to congestion or packet loss. Also, there is no way in the literature to do this taking business directives into account. In this regard, a method able to orchestrate diferent energy eficiency capabilities is proposed considering the possible combinations and conflicts among them, as well as the best option for a given bandwidth utilization and network characteristics. In the proposed method, the business policies specified in a high-level interface are refined down to the network level in order to bring highlevel directives into the operation, and a Utility Function is used to combine energy eficiency and performance requirements. A Decision Tree able to determine what to do in each scenario is deployed in a Software Defined Network environment. The proposed method was validated with diferent experiments, testing the Utility Function, checking the extra savings when combining several capabilities, the decision tree interpolation and dynamicity aspects. The orchestration proved to be valid to solve the problem of finding the best combination for a given scenario, achieving additional savings due to the combination, besides ensuring a conflict-free operation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Os motores de indução trifásicos são os principais elementos de conversão de energia elétrica em mecânica motriz aplicados em vários setores produtivos. Identificar um defeito no motor em operação pode fornecer, antes que ele falhe, maior segurança no processo de tomada de decisão sobre a manutenção da máquina, redução de custos e aumento de disponibilidade. Nesta tese são apresentas inicialmente uma revisão bibliográfica e a metodologia geral para a reprodução dos defeitos nos motores e a aplicação da técnica de discretização dos sinais de correntes e tensões no domínio do tempo. É também desenvolvido um estudo comparativo entre métodos de classificação de padrões para a identificação de defeitos nestas máquinas, tais como: Naive Bayes, k-Nearest Neighbor, Support Vector Machine (Sequential Minimal Optimization), Rede Neural Artificial (Perceptron Multicamadas), Repeated Incremental Pruning to Produce Error Reduction e C4.5 Decision Tree. Também aplicou-se o conceito de Sistemas Multiagentes (SMA) para suportar a utilização de múltiplos métodos concorrentes de forma distribuída para reconhecimento de padrões de defeitos em rolamentos defeituosos, quebras nas barras da gaiola de esquilo do rotor e curto-circuito entre as bobinas do enrolamento do estator de motores de indução trifásicos. Complementarmente, algumas estratégias para a definição da severidade dos defeitos supracitados em motores foram exploradas, fazendo inclusive uma averiguação da influência do desequilíbrio de tensão na alimentação da máquina para a determinação destas anomalias. Os dados experimentais foram adquiridos por meio de uma bancada experimental em laboratório com motores de potência de 1 e 2 cv acionados diretamente na rede elétrica, operando em várias condições de desequilíbrio das tensões e variações da carga mecânica aplicada ao eixo do motor.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In progressive companies succession planning plays a vital role in leadership development. Some companies have neither succession planning nor ways to fill leadership gaps. On the academic side of research, this Capstone Project focuses on the role of succession planning in leadership development through various approaches and theories. The purpose of this Capstone Project is to provide HR professionals with solid mechanisms for identifying and preparing leaders for leadership roles through proper succession mapping within a company. Through the analysis of case studies and research, recommendations are made on the formulation of a decision tree so that succession planning may take a new place in leadership development in an organization.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Paleoceanographic archives derived from 17 marine sediment cores reconstruct the response of the Southwest Pacific Ocean to the peak interglacial, Marine Isotope Stage (MIS) 5e (ca. 125?ka). Paleo-Sea Surface Temperature (SST) estimates were obtained from the Random Forest model-an ensemble decision tree tool-applied to core-top planktonic foraminiferal faunas calibrated to modern SSTs. The reconstructed geographic pattern of the SST anomaly (maximum SST between 120 and 132?ka minus mean modern SST) seems to indicate how MIS 5e conditions were generally warmer in the Southwest Pacific, especially in the western Tasman Sea where a strengthened East Australian Current (EAC) likely extended subtropical influence to ca. 45°S off Tasmania. In contrast, the eastern Tasman Sea may have had a modest cooling except around 45°S. The observed pattern resembles that developing under the present warming trend in the region. An increase in wind stress curl over the modern South Pacific is hypothesized to have spun-up the South Pacific Subtropical Gyre, with concurrent increase in subtropical flow in the western boundary currents that include the EAC. However, warmer temperatures along the Subtropical Front and Campbell Plateau to the south suggest that the relative influence of the boundary inflows to eastern New Zealand may have differed in MIS 5e, and these currents may have followed different paths compared to today.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fuzzy data has grown to be an important factor in data mining. Whenever uncertainty exists, simulation can be used as a model. Simulation is very flexible, although it can involve significant levels of computation. This article discusses fuzzy decision-making using the grey related analysis method. Fuzzy models are expected to better reflect decision-making uncertainty, at some cost in accuracy relative to crisp models. Monte Carlo simulation is used to incorporate experimental levels of uncertainty into the data and to measure the impact of fuzzy decision tree models using categorical data. Results are compared with decision tree models based on crisp continuous data.