995 resultados para Mineração de dados (Computação)


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Na computação científica é necessário que os dados sejam o mais precisos e exatos possível, porém a imprecisão dos dados de entrada desse tipo de computação pode estar associada às medidas obtidas por equipamentos que fornecem dados truncados ou arredondados, fazendo com que os cálculos com esses dados produzam resultados imprecisos. Os erros mais comuns durante a computação científica são: erros de truncamentos, que surgem em dados infinitos e que muitas vezes são truncados", ou interrompidos; erros de arredondamento que são responsáveis pela imprecisão de cálculos em seqüências finitas de operações aritméticas. Diante desse tipo de problema Moore, na década de 60, introduziu a matemática intervalar, onde foi definido um tipo de dado que permitiu trabalhar dados contínuos,possibilitando, inclusive prever o tamanho máximo do erro. A matemática intervalar é uma saída para essa questão, já que permite um controle e análise de erros de maneira automática. Porém, as propriedades algébricas dos intervalos não são as mesmas dos números reais, apesar dos números reais serem vistos como intervalos degenerados, e as propriedades algébricas dos intervalos degenerados serem exatamente as dos números reais. Partindo disso, e pensando nas técnicas de especificação algébrica, precisa-se de uma linguagem capaz de implementar uma noção auxiliar de equivalência introduzida por Santiago [6] que ``simule" as propriedades algébricas dos números reais nos intervalos. A linguagem de especificação CASL, Common Algebraic Specification Language, [1] é uma linguagem de especificação algébrica para a descrição de requisitos funcionais e projetos modulares de software, que vem sendo desenvolvida pelo CoFI, The Common Framework Initiative [2] a partir do ano de 1996. O desenvolvimento de CASL se encontra em andamento e representa um esforço conjunto de grandes expoentes da área de especificações algébricas no sentido de criar um padrão para a área. A dissertação proposta apresenta uma especificação em CASL do tipo intervalo, munido da aritmética de Moore, afim de que ele venha a estender os sistemas que manipulem dados contínuos, sendo possível não só o controle e a análise dos erros de aproximação, como também a verificação algébrica de propriedades do tipo de sistema aqui mencionado. A especificação de intervalos apresentada aqui foi feita apartir das especificações dos números racionais proposta por Mossakowaski em 2001 [3] e introduz a noção de igualdade local proposta por Santiago [6, 5, 4]

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The increase of applications complexity has demanded hardware even more flexible and able to achieve higher performance. Traditional hardware solutions have not been successful in providing these applications constraints. General purpose processors have inherent flexibility, since they perform several tasks, however, they can not reach high performance when compared to application-specific devices. Moreover, since application-specific devices perform only few tasks, they achieve high performance, although they have less flexibility. Reconfigurable architectures emerged as an alternative to traditional approaches and have become an area of rising interest over the last decades. The purpose of this new paradigm is to modify the device s behavior according to the application. Thus, it is possible to balance flexibility and performance and also to attend the applications constraints. This work presents the design and implementation of a coarse grained hybrid reconfigurable architecture to stream-based applications. The architecture, named RoSA, consists of a reconfigurable logic attached to a processor. Its goal is to exploit the instruction level parallelism from intensive data-flow applications to accelerate the application s execution on the reconfigurable logic. The instruction level parallelism extraction is done at compile time, thus, this work also presents an optimization phase to the RoSA architecture to be included in the GCC compiler. To design the architecture, this work also presents a methodology based on hardware reuse of datapaths, named RoSE. RoSE aims to visualize the reconfigurable units through reusability levels, which provides area saving and datapath simplification. The architecture presented was implemented in hardware description language (VHDL). It was validated through simulations and prototyping. To characterize performance analysis some benchmarks were used and they demonstrated a speedup of 11x on the execution of some applications

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The use of clustering methods for the discovery of cancer subtypes has drawn a great deal of attention in the scientific community. While bioinformaticians have proposed new clustering methods that take advantage of characteristics of the gene expression data, the medical community has a preference for using classic clustering methods. There have been no studies thus far performing a large-scale evaluation of different clustering methods in this context. This work presents the first large-scale analysis of seven different clustering methods and four proximity measures for the analysis of 35 cancer gene expression data sets. Results reveal that the finite mixture of Gaussians, followed closely by k-means, exhibited the best performance in terms of recovering the true structure of the data sets. These methods also exhibited, on average, the smallest difference between the actual number of classes in the data sets and the best number of clusters as indicated by our validation criteria. Furthermore, hierarchical methods, which have been widely used by the medical community, exhibited a poorer recovery performance than that of the other methods evaluated. Moreover, as a stable basis for the assessment and comparison of different clustering methods for cancer gene expression data, this study provides a common group of data sets (benchmark data sets) to be shared among researchers and used for comparisons with new methods

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The use of middleware technology in various types of systems, in order to abstract low-level details related to the distribution of application logic, is increasingly common. Among several systems that can be benefited from using these components, we highlight the distributed systems, where it is necessary to allow communications between software components located on different physical machines. An important issue related to the communication between distributed components is the provision of mechanisms for managing the quality of service. This work presents a metamodel for modeling middlewares based on components in order to provide to an application the abstraction of a communication between components involved in a data stream, regardless their location. Another feature of the metamodel is the possibility of self-adaptation related to the communication mechanism, either by updating the values of its configuration parameters, or by its replacement by another mechanism, in case of the restrictions of quality of service specified are not being guaranteed. In this respect, it is planned the monitoring of the communication state (application of techniques like feedback control loop), analyzing performance metrics related. The paradigm of Model Driven Development was used to generate the implementation of a middleware that will serve as proof of concept of the metamodel, and the configuration and reconfiguration policies related to the dynamic adaptation processes. In this sense was defined the metamodel associated to the process of a communication configuration. The MDD application also corresponds to the definition of the following transformations: the architectural model of the middleware in Java code, and the configuration model to XML

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The main goal of this work is to investigate the suitability of applying cluster ensemble techniques (ensembles or committees) to gene expression data. More specifically, we will develop experiments with three diferent cluster ensembles methods, which have been used in many works in literature: coassociation matrix, relabeling and voting, and ensembles based on graph partitioning. The inputs for these methods will be the partitions generated by three clustering algorithms, representing diferent paradigms: kmeans, ExpectationMaximization (EM), and hierarchical method with average linkage. These algorithms have been widely applied to gene expression data. In general, the results obtained with our experiments indicate that the cluster ensemble methods present a better performance when compared to the individual techniques. This happens mainly for the heterogeneous ensembles, that is, ensembles built with base partitions generated with diferent clustering algorithms

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The increasing complexity of integrated circuits has boosted the development of communications architectures like Networks-on-Chip (NoCs), as an architecture; alternative for interconnection of Systems-on-Chip (SoC). Networks-on-Chip complain for component reuse, parallelism and scalability, enhancing reusability in projects of dedicated applications. In the literature, lots of proposals have been made, suggesting different configurations for networks-on-chip architectures. Among all networks-on-chip considered, the architecture of IPNoSys is a non conventional one, since it allows the execution of operations, while the communication process is performed. This study aims to evaluate the execution of data-flow based applications on IPNoSys, focusing on their adaptation against the design constraints. Data-flow based applications are characterized by the flowing of continuous stream of data, on which operations are executed. We expect that these type of applications can be improved when running on IPNoSys, because they have a programming model similar to the execution model of this network. By observing the behavior of these applications when running on IPNoSys, were performed changes in the execution model of the network IPNoSys, allowing the implementation of an instruction level parallelism. For these purposes, analysis of the implementations of dataflow applications were performed and compared

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Software Repository Mining (MSR) is a research area that analyses software repositories in order to derive relevant information for the research and practice of software engineering. The main goal of repository mining is to extract static information from repositories (e.g. code repository or change requisition system) into valuable information providing a way to support the decision making of software projects. On the other hand, another research area called Process Mining (PM) aims to find the characteristics of the underlying process of business organizations, supporting the process improvement and documentation. Recent works have been doing several analyses through MSR and PM techniques: (i) to investigate the evolution of software projects; (ii) to understand the real underlying process of a project; and (iii) create defect prediction models. However, few research works have been focusing on analyzing the contributions of software developers by means of MSR and PM techniques. In this context, this dissertation proposes the development of two empirical studies of assessment of the contribution of software developers to an open-source and a commercial project using those techniques. The contributions of developers are assessed through three different perspectives: (i) buggy commits; (ii) the size of commits; and (iii) the most important bugs. For the opensource project 12.827 commits and 8.410 bugs have been analyzed while 4.663 commits and 1.898 bugs have been analyzed for the commercial project. Our results indicate that, for the open source project, the developers classified as core developers have contributed with more buggy commits (although they have contributed with the majority of commits), more code to the project (commit size) and more important bugs solved while the results could not indicate differences with statistical significance between developer groups for the commercial project

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The main goal of Regression Test (RT) is to reuse the test suite of the latest version of a software in its current version, in order to maximize the value of the tests already developed and ensure that old features continue working after the new changes. Even with reuse, it is common that not all tests need to be executed again. Because of that, it is encouraged to use Regression Tests Selection (RTS) techniques, which aims to select from all tests, only those that reveal faults, this reduces costs and makes this an interesting practice for the testing teams. Several recent research works evaluate the quality of the selections performed by RTS techniques, identifying which one presents the best results, measured by metrics such as inclusion and precision. The RTS techniques should seek in the System Under Test (SUT) for tests that reveal faults. However, because this is a problem without a viable solution, they alternatively seek for tests that reveal changes, where faults may occur. Nevertheless, these changes may modify the execution flow of the algorithm itself, leading some tests no longer exercise the same stretch. In this context, this dissertation investigates whether changes performed in a SUT would affect the quality of the selection of tests performed by an RTS, if so, which features the changes present which cause errors, leading the RTS to include or exclude tests wrongly. For this purpose, a tool was developed using the Java language to automate the measurement of inclusion and precision averages achieved by a regression test selection technique for a particular feature of change. In order to validate this tool, an empirical study was conducted to evaluate the RTS technique Pythia, based on textual differencing, on a large web information system, analyzing the feature of types of tasks performed to evolve the SUT

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Symbolic Data Analysis (SDA) main aims to provide tools for reducing large databases to extract knowledge and provide techniques to describe the unit of such data in complex units, as such, interval or histogram. The objective of this work is to extend classical clustering methods for symbolic interval data based on interval-based distance. The main advantage of using an interval-based distance for interval-based data lies on the fact that it preserves the underlying imprecision on intervals which is usually lost when real-valued distances are applied. This work includes an approach allow existing indices to be adapted to interval context. The proposed methods with interval-based distances are compared with distances punctual existing literature through experiments with simulated data and real data interval

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Geographic Information System (GIS) are computational tools used to capture, store, consult, manipulate, analyze and print geo-referenced data. A GIS is a multi-disciplinary system that can be used by different communities of users, each one having their own interest and knowledge. This way, different knowledge views about the same reality need to be combined, in such way to attend each community. This work presents a mechanism that allows different community users access the same geographic database without knowing its particular internal structure. We use geographic ontologies to support a common and shared understanding of a specific domain: the coral reefs. Using these ontologies' descriptions that represent the knowledge of the different communities, mechanisms are created to handle with such different concepts. We use equivalent classes mapping, and a semantic layer that interacts with the ontologies and the geographic database, and that gives to the user the answers about his/her queries, independently of the used terms

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work aims to characterize the workers in mineral activities exposed to lung injuries in Parelhas Municipality, Rio Grande do Norte State, seeking to relate respiratory diseases to the mining activity. The studied area (Parelhas City), with about 19,700 inhabitants, is located in the Serido region, approximately 232 km far from Natal City. The number of people involved in informal mining activity (garimpo) in the Seridó region reaches about 5,000. These workers generally do not use any kind of individual protection equipments and develop, at early ages of greater productivity, severe forms of diseases, which end up disabling them to professional activities, family and social life. Deceases by respiratory problems (e.g. silicosis) have been reported in very young adults. A descriptive observational study was conducted based on information from the records found in Dr. José Augusto Dantas Hospital, between the years 1996- 2006. The occupational and socio-economic features of the population, which was selected by using the hospital records, were achieved through individually answered forms. The purpose was to link the occupational activities with the respiratory diseases. The next stage of the research was an observational case-control study, in the 1:1 proportion. The achieved data allowed confirming the central hypothesis of the research, which states that the pneumoconiosis cases are due to the mineral-based activities in the studied area. The final step of the investigation tried to assess the knowledge of relatives of students in public and private elementary and high schools from Parelhas City, regarding silicosis. About 15.4% of urban schools were analyzed through application of a structured questionnaire. The results show distinct socio-economic levels and a difference in the perception of the relatives of students in public and private schools, concerning silicosis. It was possible to identify the characteristics of the population economically involved with mineral-based activities and to define the group that deserves preferential attention in preventive actions. The work indicates some environmental problems caused by inadequate mining operations in the region

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work discusses the application of techniques of ensembles in multimodal recognition systems development in revocable biometrics. Biometric systems are the future identification techniques and user access control and a proof of this is the constant increases of such systems in current society. However, there is still much advancement to be developed, mainly with regard to the accuracy, security and processing time of such systems. In the search for developing more efficient techniques, the multimodal systems and the use of revocable biometrics are promising, and can model many of the problems involved in traditional biometric recognition. A multimodal system is characterized by combining different techniques of biometric security and overcome many limitations, how: failures in the extraction or processing the dataset. Among the various possibilities to develop a multimodal system, the use of ensembles is a subject quite promising, motivated by performance and flexibility that they are demonstrating over the years, in its many applications. Givin emphasis in relation to safety, one of the biggest problems found is that the biometrics is permanently related with the user and the fact of cannot be changed if compromised. However, this problem has been solved by techniques known as revocable biometrics, which consists of applying a transformation on the biometric data in order to protect the unique characteristics, making its cancellation and replacement. In order to contribute to this important subject, this work compares the performance of individual classifiers methods, as well as the set of classifiers, in the context of the original data and the biometric space transformed by different functions. Another factor to be highlighted is the use of Genetic Algorithms (GA) in different parts of the systems, seeking to further maximize their eficiency. One of the motivations of this development is to evaluate the gain that maximized ensembles systems by different GA can bring to the data in the transformed space. Another relevant factor is to generate revocable systems even more eficient by combining two or more functions of transformations, demonstrating that is possible to extract information of a similar standard through applying different transformation functions. With all this, it is clear the importance of revocable biometrics, ensembles and GA in the development of more eficient biometric systems, something that is increasingly important in the present day

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Apresenta-se um sistema computacional, denominado ICADPLUS, desenvolvido para elaboração de banco de dados, tabulação de dados, cálculo do índice CPO e análise estatística para estimação de intervalos de confiança e comparação de resultados de duas populações.Tem como objetivo apresentar método simplificado para atender necessidades de serviços de saúde na área de odontologia processando fichas utilizadas por cirurgiões dentistas em levantamentos epidemiológicos de cárie dentária. A característica principal do sistema é a dispensa de profissional especializado na área de odontologia e computação, exigindo o conhecimento mínimo de digitação por parte do usuário, pois apresenta menus simples e claros como também relatórios padronizados, sem possibilidade de erro. Possui opções para fichas de CPO segundo Klein e Palmer, CPO proposto pela OMS, CPOS segundo Klein, Palmer e Knutson, e ceo. A validação do sistema foi feita por comparação com outros métodos, permitindo recomendar sua adoção.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Historicamente a convivência entre a mineração e a população sempre foi problemática, com prejuízos enormes para esta última, que, por problemas da falta de moradia, tem de conviver com a degradação do meio ambiente ao seu redor, como a poluição através da poeira, ruído e vibração. Esse trabalho, com dados coletados na mina subterrânea de carvão Trevo, na região carbonífera de Santa Catarina, coloca em evidência a necessidade da integração Empresa x Comunidade, no sentido da melhoria das técnicas de mineração e do meio ambiente, visando a uma convivência pacífica e tendo como um dos seus principais resultados alertar e despertar a consciência da Empresa ante aos reclamos da comunidade, demonstrando-se com dados, o perigo decorrente do planejamento de uma lavra sem levar em conta as habitações em superfície.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Pós-graduação em Ciência da Computação - IBILCE