907 resultados para Data clustering
Resumo:
This paper presents a methodology supported on the data base knowledge discovery process (KDD), in order to find out the failure probability of electrical equipments’, which belong to a real electrical high voltage network. Data Mining (DM) techniques are used to discover a set of outcome failure probability and, therefore, to extract knowledge concerning to the unavailability of the electrical equipments such us power transformers and high-voltages power lines. The framework includes several steps, following the analysis of the real data base, the pre-processing data, the application of DM algorithms, and finally, the interpretation of the discovered knowledge. To validate the proposed methodology, a case study which includes real databases is used. This data have a heavy uncertainty due to climate conditions for this reason it was used fuzzy logic to determine the set of the electrical components failure probabilities in order to reestablish the service. The results reflect an interesting potential of this approach and encourage further research on the topic.
Resumo:
Presently power system operation produces huge volumes of data that is still treated in a very limited way. Knowledge discovery and machine learning can make use of these data resulting in relevant knowledge with very positive impact. In the context of competitive electricity markets these data is of even higher value making clear the trend to make data mining techniques application in power systems more relevant. This paper presents two cases based on real data, showing the importance of the use of data mining for supporting demand response and for supporting player strategic behavior.
Resumo:
This paper aims to study the relationships between chromosomal DNA sequences of twenty species. We propose a methodology combining DNA-based word frequency histograms, correlation methods, and an MDS technique to visualize structural information underlying chromosomes (CRs) and species. Four statistical measures are tested (Minkowski, Cosine, Pearson product-moment, and Kendall τ rank correlations) to analyze the information content of 421 nuclear CRs from twenty species. The proposed methodology is built on mathematical tools and allows the analysis and visualization of very large amounts of stream data, like DNA sequences, with almost no assumptions other than the predefined DNA “word length.” This methodology is able to produce comprehensible three-dimensional visualizations of CR clustering and related spatial and structural patterns. The results of the four test correlation scenarios show that the high-level information clusterings produced by the MDS tool are qualitatively similar, with small variations due to each correlation method characteristics, and that the clusterings are a consequence of the input data and not method’s artifacts.
Resumo:
This paper presents an integrated system that helps both retail companies and electricity consumers on the definition of the best retail contracts and tariffs. This integrated system is composed by a Decision Support System (DSS) based on a Consumer Characterization Framework (CCF). The CCF is based on data mining techniques, applied to obtain useful knowledge about electricity consumers from large amounts of consumption data. This knowledge is acquired following an innovative and systematic approach able to identify different consumers’ classes, represented by a load profile, and its characterization using decision trees. The framework generates inputs to use in the knowledge base and in the database of the DSS. The rule sets derived from the decision trees are integrated in the knowledge base of the DSS. The load profiles together with the information about contracts and electricity prices form the database of the DSS. This DSS is able to perform the classification of different consumers, present its load profile and test different electricity tariffs and contracts. The final outputs of the DSS are a comparative economic analysis between different contracts and advice about the most economic contract to each consumer class. The presentation of the DSS is completed with an application example using a real data base of consumers from the Portuguese distribution company.
Resumo:
Objectives : The purpose of this article is to find out differences between surveys using paper and online questionnaires. The author has deep knowledge in the case of questions concerning opinions in the development of survey based research, e.g. the limits of postal and online questionnaires. Methods : In the physician studies carried out in 1995 (doctors graduated in 1982-1991), 2000 (doctors graduated in 1982-1996), 2005 (doctors graduated in 1982-2001), 2011 (doctors graduated in 1977-2006) and 457 family doctors in 2000, were used paper and online questionnaires. The response rates were 64%, 68%, 64%, 49% and 73%, respectively. Results : The results of the physician studies showed that there were differences between methods. These differences were connected with using paper-based questionnaire and online questionnaire and response rate. The online-based survey gave a lower response rate than the postal survey. The major advantages of online survey were short response time; very low financial resource needs and data were directly loaded in the data analysis software, thus saved time and resources associated with the data entry process. Conclusions : The current article helps researchers with planning the study design and choosing of the right data collection method.
Resumo:
This paper presents the SmartClean tool. The purpose of this tool is to detect and correct the data quality problems (DQPs). Compared with existing tools, SmartClean has the following main advantage: the user does not need to specify the execution sequence of the data cleaning operations. For that, an execution sequence was developed. The problems are manipulated (i.e., detected and corrected) following that sequence. The sequence also supports the incremental execution of the operations. In this paper, the underlying architecture of the tool is presented and its components are described in detail. The tool's validity and, consequently, of the architecture is demonstrated through the presentation of a case study. Although SmartClean has cleaning capabilities in all other levels, in this paper are only described those related with the attribute value level.
Resumo:
The emergence of new business models, namely, the establishment of partnerships between organizations, the chance that companies have of adding existing data on the web, especially in the semantic web, to their information, led to the emphasis on some problems existing in databases, particularly related to data quality. Poor data can result in loss of competitiveness of the organizations holding these data, and may even lead to their disappearance, since many of their decision-making processes are based on these data. For this reason, data cleaning is essential. Current approaches to solve these problems are closely linked to database schemas and specific domains. In order that data cleaning can be used in different repositories, it is necessary for computer systems to understand these data, i.e., an associated semantic is needed. The solution presented in this paper includes the use of ontologies: (i) for the specification of data cleaning operations and, (ii) as a way of solving the semantic heterogeneity problems of data stored in different sources. With data cleaning operations defined at a conceptual level and existing mappings between domain ontologies and an ontology that results from a database, they may be instantiated and proposed to the expert/specialist to be executed over that database, thus enabling their interoperability.
Resumo:
OBJECTIVE: To analyze the prevalence of physiotherapy utilization and to explore the variables associated to its utilization. METHODS: A population-based cross-sectional study, including 3,100 subjects aged 20 years or more living in the urban area of Pelotas, southern Brazil, was carried out. The sample was selected following a multiple-stage protocol; the census tracts delimited by the Instituto Brasileiro de Geografia e Estatística (Brazilian Institute of Geography and Statistics) were the primary sample units. Following descriptive and crude analyses, Poisson regression models taking the clustering of the sample into account were carried out. Data were collected through face-to-face interviews using a standardized and pre-tested questionnaire. RESULTS: The lifetime utilization of physiotherapy was 30.2%; and physiotherapy utilization in the 12 months prior to the interview was reported by 4.9%. Women, elderly subjects, and those from higher socioeconomic levels were more likely to use physiotherapy. Restricting analysis to subjects who attended physiotherapy, 66% used public health services, 25% used insurance health services and 9% had private sessions. CONCLUSIONS: This is the first population-based study on physiotherapy utilization carried out in Brazil. Utilization of physio therapy was lower than reported in both developed and developing countries. The study findings might help public health authorities to organize healthcare service in terms of this important demand.
Resumo:
A estimativa da idade gestacional em restos cadavéricos de fetos é importante em contextos forenses. Para esse efeito, os especialistas forenses recorrem à avaliação do padrão de calcificação dentária e/ou ao estudo do esqueleto. Neste último, o comprimento das diáfises de ossos longos é um dos métodos mais utilizados, sendo utilizadas tabelas e equações de regressão de obras pouco actuais ou baseadas em dados ecográficos, cujas medições diferem das efectuadas directamente no osso. Este trabalho tem como objectivo principal a construção de tabelas e equações de regressão para a população Portuguesa, com base na medição das diáfises de fémur, tíbia e úmero, utilizando radiografias post-mortem, que não diferem muito das medições em osso. Pretende-se também determinar qual dos três ossos é mais credível e se existem diferenças significativas entre fetos de género feminino e de género masculino.
Resumo:
Copyright © 2013 Springer Netherlands.
Resumo:
V Congreso de Eficiencia y Productividad EFIUCO, Córdoba, 19-20 Mayo 2011.
Resumo:
25th Annual Conference of the European Cetacean Society, Cadiz, Spain 21-23 March 2011.
Resumo:
27th Annual Conference of the European Cetacean Society. Setúbal, Portugal, 8-10 April 2013.
Resumo:
27th Annual Conference of the European Cetacean Society. Setúbal, Portugal, 8-10 April 2013.
Resumo:
A great number of low-temperature geothermal fields occur in Northern-Portugal related to fractured rocks. The most important superficial manifestations of these hydrothermal systems appear in pull-apart tectonic basins and are strongly conditioned by the orientation of the main fault systems in the region. This work presents the interpretation of gravity gradient maps and 3D inversion model produced from a regional gravity survey. The horizontal gradients reveal a complex fault system. The obtained 3D model of density contrast puts into evidence the main fault zone in the region and the depth distribution of the granitic bodies. Their relationship with the hydrothermal systems supports the conceptual models elaborated from hydrochemical and isotopic water analyses. This work emphasizes the importance of the role of the gravity method and analysis to better understand the connection between hydrothermal systems and the fractured rock pattern and surrounding geology. (c) 2013 Elsevier B.V. All rights reserved.