786 resultados para Data mining models
Resumo:
Objective: Bronchial typical carcinoid tumors are tow-grade malignancies. However, metastases are diagnosed in some patients. Predicting the individual risk of these metastases to determine patients eligible for a radical lymphadenectomy and patients to be followed-up because of distant metastasis risk is relevant. Our objective was to screen for predictive criteria of bronchial typical carcinoid tumor aggressiveness based on a logistic regression model using clinical, pathological and biomolecular data. Methods: A multicenter retrospective cohort study, including 330 consecutive patients operated on for bronchial typical carcinoid tumors and followed-up during a period more than 10 years in two university hospitals was performed. Selected data to predict the individual risk for both nodal and distant metastasis were: age, gender, TNM staging, tumor diameter and location (central/peripheral), tumor immunostaining index of p53 and Ki67, Bcl2 and the extracellular density of neoformed microvessels and of collagen/elastic extracellular fibers. Results: Nodal and distant metastasis incidence was 11% and 5%, respectively. Univariate analysis identified all the studied biomarkers as related to nodal metastasis. Multivariate analysis identified a predictive variable for nodal metastasis: neo angiogenesis, quantified by the neoformed pathological microvessels density. Distant metastasis was related to mate gender. Discussion: Predictive models based on clinical and biomolecular data could be used to predict individual risk for metastasis. Patients under a high individual risk for lymph node metastasis should be considered as candidates to mediastinal lymphadenectomy. Those under a high risk of distant metastasis should be followed-up as having an aggressive disease. Conclusion: Individual risk prediction of bronchial typical carcinoid tumor metastasis for patients operated on can be calculated in function of biomolecular data. Prediction models can detect high-risk patients and help surgeons to identify patients requiring radical lymphadenectomy and help oncologists to identify those as having an aggressive disease requiring prolonged follow-up. (C) 2008 European Association for Cardio-Thoracic Surgery. Published by Elsevier B.V. All rights reserved.
Resumo:
Feature selection is one of important and frequently used techniques in data preprocessing. It can improve the efficiency and the effectiveness of data mining by reducing the dimensions of feature space and removing the irrelevant and redundant information. Feature selection can be viewed as a global optimization problem of finding a minimum set of M relevant features that describes the dataset as well as the original N attributes. In this paper, we apply the adaptive partitioned random search strategy into our feature selection algorithm. Under this search strategy, the partition structure and evaluation function is proposed for feature selection problem. This algorithm ensures the global optimal solution in theory and avoids complete randomness in search direction. The good property of our algorithm is shown through the theoretical analysis.
Resumo:
This paper presents results from field studies carried out during the 1993-1998 Australian cotton (Gossypium hirsutum L.) seasons to monitor off-target droplet movement of endosulfan (6,7,8,9,10,10-hexachloro-1,5,5a,6,9,9a-hexahydro-6,9-methano-2,4,3-benzodioxathiepin 3-oxide) insecticide applied to a commercial cotton crop. Averaged over a wide range of conditions, off-target deposition 500 m downwind of the field boundary was approximately 2% of the field-applied rate with oil-based applications and 1% with water-based applications. Mean airborne drift values recorded 100 m downwind of a single flight line were a third as much with water-based application compared with oil-based application. Calculations using a Gaussian diffusion model and the U.S. Spray Drift Task Force AgDRIFT model produced downwind drift profiles that compared favorably with experimental data. Both models and data indicate that by adopting large droplet placement (LDP) application methods and incorporating crop buffer distances, spray drift can be effectively managed.
Resumo:
Benchmarking is an important tool to organisations to improve their productivity, product quality, process efficiency or services. From Benchmarking the organisations could compare their performance with competitors and identify their strengths and weaknesses. This study intends to do a benchmarking analysis on the main Iberian Sea ports with a special focus on their container terminals efficiency. To attain this, the DEA (data envelopment analysis) is used since it is considered by several researchers as the most effective method to quantify a set of key performance indicators. In order to reach a more reliable diagnosis tool the DEA is used together with the data mining in comparing the sea ports operational data of container terminals during 2007.Taking into account that sea ports are global logistics networks the performance evaluation is essential to an effective decision making in order to improve their efficiency and, therefore, their competitiveness.
Resumo:
Este trabalho consiste no desenvolvimento de um Sistema de Apoio à Criminologia – SAC, onde se pretende ajudar os detectives/analistas na prevenção proactiva da criminalidade e na gestão dos seus recursos materiais e humanos, bem como impulsionar estudos sobre a alta incidência de determinados tipos de crime numa dada região. Historicamente, a resolução de crimes tem sido uma prerrogativa da justiça penal e dos seus especialistas e, com o aumento da utilização de sistemas computacionais no sistema judicial para registar todos os dados que dizem respeito a ocorrências de crimes, dados de suspeitos e vítimas, registo criminal de indivíduos e outros dados que fluem dentro da organização, cresce a necessidade de transformar estes dados em informação proveitosa no combate à criminalidade. O SAC tira partido de técnicas de extracção de conhecimento de informação e aplica-as a um conjunto de dados de ocorrências de crimes numa dada região e espaço temporal, bem como a um conjunto de variáveis que influenciam a criminalidade, as quais foram estudadas e identificadas neste trabalho. Este trabalho é constituído por um modelo de extracção de conhecimento de informação e por uma aplicação que permite ao utilizador fornecer um conjunto de dados adequado, garantindo a máxima eficácia do modelo.
Resumo:
In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done that seek the establishment of standards in the area. Included on these efforts there can be enumerated SEMMA and CRISP-DM. Both grow as industrial standards and define a set of sequential steps that pretends to guide the implementation of data mining applications. The question of the existence of substantial differences between them and the traditional KDD process arose. In this paper, is pretended to establish a parallel between these and the KDD process as well as an understanding of the similarities between them.
Resumo:
In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done that seek the establishment of standards in the area. Included on these efforts there can be enumerated SEMMA and CRISP-DM. Both grow as industrial standards and define a set of sequential steps that pretends to guide the implementation of data mining applications. The question of the existence of substantial differences between them and the traditional KDD process arose. In this paper, is pretended to establish a parallel between these and the KDD process as well as an understanding of the similarities between them.
Resumo:
The present research paper presents five different clustering methods to identify typical load profiles of medium voltage (MV) electricity consumers. These methods are intended to be used in a smart grid environment to extract useful knowledge about customer’s behaviour. The obtained knowledge can be used to support a decision tool, not only for utilities but also for consumers. Load profiles can be used by the utilities to identify the aspects that cause system load peaks and enable the development of specific contracts with their customers. The framework presented throughout the paper consists in several steps, namely the pre-processing data phase, clustering algorithms application and the evaluation of the quality of the partition, which is supported by cluster validity indices. The process ends with the analysis of the discovered knowledge. To validate the proposed framework, a case study with a real database of 208 MV consumers is used.
Resumo:
In this work is proposed the design of a system to create and handle Electric Vehicles (EV) charging procedures, based on intelligent process. Due to the electrical power distribution network limitation and absence of smart meter devices, Electric Vehicles charging should be performed in a balanced way, taking into account past experience, weather information based on data mining, and simulation approaches. In order to allow information exchange and to help user mobility, it was also created a mobile application to assist the EV driver on these processes. This proposed Smart ElectricVehicle Charging System uses Vehicle-to-Grid (V2G) technology, in order to connect Electric Vehicles and also renewable energy sources to Smart Grids (SG). This system also explores the new paradigm of Electrical Markets (EM), with deregulation of electricity production and use, in order to obtain the best conditions for commercializing electrical energy.
Resumo:
With the electricity market liberalization, distribution and retail companies are looking for better market strategies based on adequate information upon the consumption patterns of its electricity customers. In this environment all consumers are free to choose their electricity supplier. A fair insight on the customer´s behaviour will permit the definition of specific contract aspects based on the different consumption patterns. In this paper Data Mining (DM) techniques are applied to electricity consumption data from a utility client’s database. To form the different customer´s classes, and find a set of representative consumption patterns, we have used the Two-Step algorithm which is a hierarchical clustering algorithm. Each consumer class will be represented by its load profile resulting from the clustering operation. Next, to characterize each consumer class a classification model will be constructed with the C5.0 classification algorithm.
Resumo:
Electricity markets are complex environments with very particular characteristics. MASCEM is a market simulator developed to allow deep studies of the interactions between the players that take part in the electricity market negotiations. This paper presents a new proposal for the definition of MASCEM players’ strategies to negotiate in the market. The proposed methodology is multiagent based, using reinforcement learning algorithms to provide players with the capabilities to perceive the changes in the environment, while adapting their bids formulation according to their needs, using a set of different techniques that are at their disposal.
Resumo:
The growing importance and influence of new resources connected to the power systems has caused many changes in their operation. Environmental policies and several well know advantages have been made renewable based energy resources largely disseminated. These resources, including Distributed Generation (DG), are being connected to lower voltage levels where Demand Response (DR) must be considered too. These changes increase the complexity of the system operation due to both new operational constraints and amounts of data to be processed. Virtual Power Players (VPP) are entities able to manage these resources. Addressing these issues, this paper proposes a methodology to support VPP actions when these act as a Curtailment Service Provider (CSP) that provides DR capacity to a DR program declared by the Independent System Operator (ISO) or by the VPP itself. The amount of DR capacity that the CSP can assure is determined using data mining techniques applied to a database which is obtained for a large set of operation scenarios. The paper includes a case study based on 27,000 scenarios considering a diversity of distributed resources in a 33 bus distribution network.
Resumo:
This paper consist in the establishment of a Virtual Producer/Consumer Agent (VPCA) in order to optimize the integrated management of distributed energy resources and to improve and control Demand Side Management DSM) and its aggregated loads. The paper presents the VPCA architecture and the proposed function-based organization to be used in order to coordinate the several generation technologies, the different load types and storage systems. This VPCA organization uses a frame work based on data mining techniques to characterize the costumers. The paper includes results of several experimental tests cases, using real data and taking into account electricity generation resources as well as consumption data.
Resumo:
Many current e-commerce systems provide personalization when their content is shown to users. In this sense, recommender systems make personalized suggestions and provide information of items available in the system. Nowadays, there is a vast amount of methods, including data mining techniques that can be employed for personalization in recommender systems. However, these methods are still quite vulnerable to some limitations and shortcomings related to recommender environment. In order to deal with some of them, in this work we implement a recommendation methodology in a recommender system for tourism, where classification based on association is applied. Classification based on association methods, also named associative classification methods, consist of an alternative data mining technique, which combines concepts from classification and association in order to allow association rules to be employed in a prediction context. The proposed methodology was evaluated in some case studies, where we could verify that it is able to shorten limitations presented in recommender systems and to enhance recommendation quality.
Resumo:
This paper presents an integrated system that helps both retail companies and electricity consumers on the definition of the best retail contracts and tariffs. This integrated system is composed by a Decision Support System (DSS) based on a Consumer Characterization Framework (CCF). The CCF is based on data mining techniques, applied to obtain useful knowledge about electricity consumers from large amounts of consumption data. This knowledge is acquired following an innovative and systematic approach able to identify different consumers’ classes, represented by a load profile, and its characterization using decision trees. The framework generates inputs to use in the knowledge base and in the database of the DSS. The rule sets derived from the decision trees are integrated in the knowledge base of the DSS. The load profiles together with the information about contracts and electricity prices form the database of the DSS. This DSS is able to perform the classification of different consumers, present its load profile and test different electricity tariffs and contracts. The final outputs of the DSS are a comparative economic analysis between different contracts and advice about the most economic contract to each consumer class. The presentation of the DSS is completed with an application example using a real data base of consumers from the Portuguese distribution company.