997 resultados para mIneração de dados


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dentre os vários aspectos da saúde do idoso, a saúde bucal merece atenção especial pelo fato de que, historicamente, nos serviços odontológicos, não se considera esse grupo populacional como prioridade de atenção. Por isso, se faz necessária a produção de um indicador multidimensional capaz de mensurar todas as alterações bucais encontradas em um idoso, facilitando a categorização da saúde bucal como um todo. Tal indicador representará um importante instrumento capaz de elencar prioridades de atenção voltadas à população idosa. Portanto, o estudo em questão propõe a produção e validação de um indicador de saúde bucal a partir dos dados secundários coletados pelo projeto SB Brasil 2010 referente ao grupo etário de 65 a 74 anos. A amostra foi representada pelos 7619 indivíduos do grupo etário de 65 a 74 anos que participaram da pesquisa nas 5 (cinco) regiões do Brasil. Tais indivíduos foram submetidos à avaliação epidemiológica das condições de saúde bucal, a partir dos índices CPO-d, CPI e PIP. Além disso, verificou-se o uso e necessidade de prótese, bem como características sociais, econômicas e demográficas. Uma análise fatorial identificou um número relativamente pequeno de fatores comuns, através da análise de componentes principais. Após a nomenclatura dos fatores, foi realizada a soma dos escores fatoriais por indivíduo. Por último, a dicotomização dessa soma nos forneceu o indicador de saúde bucal proposto. Para esse estudo foram incluídas na análise fatorial 12 variáveis de saúde bucal oriundas do banco de dados do SB Brasil 2010 e, também 3 variáveis socioeconômicas e demográficas. Com base no critério de Kaiser, observa-se que foram retidos cinco fatores que explicaram 70,28% da variância total das variáveis incluídas no modelo. O fator 1 (um) explica sozinho 32,02% dessa variância, o fator 2 (dois) 14,78%, enquanto que os fatores 3 (três), 4 (quatro) e 5 (cinco) explicam 8,90%, 7,89% e 6,68%, respectivamente. Por meio das cargas fatoriais, o fator um foi denominado dente hígido e pouco uso de prótese , o dois doença periodontal presente , o três necessidade de reabilitação , já o quarto e quinto fator foram denominados de cárie e condição social favorável , respectivamente. Para garantir a representatividade do indicador proposto, realizou-se uma segunda análise fatorial em uma subamostra da população de idosos investigados. Por outro lado, a aplicabilidade do indicador produzido foi testada por meio da associação do mesmo com outras variáveis do estudo. Por fim, Cabe ressaltar que, o indicador aqui produzido foi capaz de agregar diver sas informações a respeito da saúde bucal e das condições sociais desses indivíduos, traduzindo assim, diversos dados em uma informação simples, que facilita o olhar dos gestores de saúde sobre as reais necessidades de intervenções em relação à saúde bucal de determinada população

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In systems that combine the outputs of classification methods (combination systems), such as ensembles and multi-agent systems, one of the main constraints is that the base components (classifiers or agents) should be diverse among themselves. In other words, there is clearly no accuracy gain in a system that is composed of a set of identical base components. One way of increasing diversity is through the use of feature selection or data distribution methods in combination systems. In this work, an investigation of the impact of using data distribution methods among the components of combination systems will be performed. In this investigation, different methods of data distribution will be used and an analysis of the combination systems, using several different configurations, will be performed. As a result of this analysis, it is aimed to detect which combination systems are more suitable to use feature distribution among the components

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Na computação científica é necessário que os dados sejam o mais precisos e exatos possível, porém a imprecisão dos dados de entrada desse tipo de computação pode estar associada às medidas obtidas por equipamentos que fornecem dados truncados ou arredondados, fazendo com que os cálculos com esses dados produzam resultados imprecisos. Os erros mais comuns durante a computação científica são: erros de truncamentos, que surgem em dados infinitos e que muitas vezes são truncados", ou interrompidos; erros de arredondamento que são responsáveis pela imprecisão de cálculos em seqüências finitas de operações aritméticas. Diante desse tipo de problema Moore, na década de 60, introduziu a matemática intervalar, onde foi definido um tipo de dado que permitiu trabalhar dados contínuos,possibilitando, inclusive prever o tamanho máximo do erro. A matemática intervalar é uma saída para essa questão, já que permite um controle e análise de erros de maneira automática. Porém, as propriedades algébricas dos intervalos não são as mesmas dos números reais, apesar dos números reais serem vistos como intervalos degenerados, e as propriedades algébricas dos intervalos degenerados serem exatamente as dos números reais. Partindo disso, e pensando nas técnicas de especificação algébrica, precisa-se de uma linguagem capaz de implementar uma noção auxiliar de equivalência introduzida por Santiago [6] que ``simule" as propriedades algébricas dos números reais nos intervalos. A linguagem de especificação CASL, Common Algebraic Specification Language, [1] é uma linguagem de especificação algébrica para a descrição de requisitos funcionais e projetos modulares de software, que vem sendo desenvolvida pelo CoFI, The Common Framework Initiative [2] a partir do ano de 1996. O desenvolvimento de CASL se encontra em andamento e representa um esforço conjunto de grandes expoentes da área de especificações algébricas no sentido de criar um padrão para a área. A dissertação proposta apresenta uma especificação em CASL do tipo intervalo, munido da aritmética de Moore, afim de que ele venha a estender os sistemas que manipulem dados contínuos, sendo possível não só o controle e a análise dos erros de aproximação, como também a verificação algébrica de propriedades do tipo de sistema aqui mencionado. A especificação de intervalos apresentada aqui foi feita apartir das especificações dos números racionais proposta por Mossakowaski em 2001 [3] e introduz a noção de igualdade local proposta por Santiago [6, 5, 4]

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The increase of applications complexity has demanded hardware even more flexible and able to achieve higher performance. Traditional hardware solutions have not been successful in providing these applications constraints. General purpose processors have inherent flexibility, since they perform several tasks, however, they can not reach high performance when compared to application-specific devices. Moreover, since application-specific devices perform only few tasks, they achieve high performance, although they have less flexibility. Reconfigurable architectures emerged as an alternative to traditional approaches and have become an area of rising interest over the last decades. The purpose of this new paradigm is to modify the device s behavior according to the application. Thus, it is possible to balance flexibility and performance and also to attend the applications constraints. This work presents the design and implementation of a coarse grained hybrid reconfigurable architecture to stream-based applications. The architecture, named RoSA, consists of a reconfigurable logic attached to a processor. Its goal is to exploit the instruction level parallelism from intensive data-flow applications to accelerate the application s execution on the reconfigurable logic. The instruction level parallelism extraction is done at compile time, thus, this work also presents an optimization phase to the RoSA architecture to be included in the GCC compiler. To design the architecture, this work also presents a methodology based on hardware reuse of datapaths, named RoSE. RoSE aims to visualize the reconfigurable units through reusability levels, which provides area saving and datapath simplification. The architecture presented was implemented in hardware description language (VHDL). It was validated through simulations and prototyping. To characterize performance analysis some benchmarks were used and they demonstrated a speedup of 11x on the execution of some applications

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The use of clustering methods for the discovery of cancer subtypes has drawn a great deal of attention in the scientific community. While bioinformaticians have proposed new clustering methods that take advantage of characteristics of the gene expression data, the medical community has a preference for using classic clustering methods. There have been no studies thus far performing a large-scale evaluation of different clustering methods in this context. This work presents the first large-scale analysis of seven different clustering methods and four proximity measures for the analysis of 35 cancer gene expression data sets. Results reveal that the finite mixture of Gaussians, followed closely by k-means, exhibited the best performance in terms of recovering the true structure of the data sets. These methods also exhibited, on average, the smallest difference between the actual number of classes in the data sets and the best number of clusters as indicated by our validation criteria. Furthermore, hierarchical methods, which have been widely used by the medical community, exhibited a poorer recovery performance than that of the other methods evaluated. Moreover, as a stable basis for the assessment and comparison of different clustering methods for cancer gene expression data, this study provides a common group of data sets (benchmark data sets) to be shared among researchers and used for comparisons with new methods

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The use of middleware technology in various types of systems, in order to abstract low-level details related to the distribution of application logic, is increasingly common. Among several systems that can be benefited from using these components, we highlight the distributed systems, where it is necessary to allow communications between software components located on different physical machines. An important issue related to the communication between distributed components is the provision of mechanisms for managing the quality of service. This work presents a metamodel for modeling middlewares based on components in order to provide to an application the abstraction of a communication between components involved in a data stream, regardless their location. Another feature of the metamodel is the possibility of self-adaptation related to the communication mechanism, either by updating the values of its configuration parameters, or by its replacement by another mechanism, in case of the restrictions of quality of service specified are not being guaranteed. In this respect, it is planned the monitoring of the communication state (application of techniques like feedback control loop), analyzing performance metrics related. The paradigm of Model Driven Development was used to generate the implementation of a middleware that will serve as proof of concept of the metamodel, and the configuration and reconfiguration policies related to the dynamic adaptation processes. In this sense was defined the metamodel associated to the process of a communication configuration. The MDD application also corresponds to the definition of the following transformations: the architectural model of the middleware in Java code, and the configuration model to XML

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The main goal of this work is to investigate the suitability of applying cluster ensemble techniques (ensembles or committees) to gene expression data. More specifically, we will develop experiments with three diferent cluster ensembles methods, which have been used in many works in literature: coassociation matrix, relabeling and voting, and ensembles based on graph partitioning. The inputs for these methods will be the partitions generated by three clustering algorithms, representing diferent paradigms: kmeans, ExpectationMaximization (EM), and hierarchical method with average linkage. These algorithms have been widely applied to gene expression data. In general, the results obtained with our experiments indicate that the cluster ensemble methods present a better performance when compared to the individual techniques. This happens mainly for the heterogeneous ensembles, that is, ensembles built with base partitions generated with diferent clustering algorithms

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The increasing complexity of integrated circuits has boosted the development of communications architectures like Networks-on-Chip (NoCs), as an architecture; alternative for interconnection of Systems-on-Chip (SoC). Networks-on-Chip complain for component reuse, parallelism and scalability, enhancing reusability in projects of dedicated applications. In the literature, lots of proposals have been made, suggesting different configurations for networks-on-chip architectures. Among all networks-on-chip considered, the architecture of IPNoSys is a non conventional one, since it allows the execution of operations, while the communication process is performed. This study aims to evaluate the execution of data-flow based applications on IPNoSys, focusing on their adaptation against the design constraints. Data-flow based applications are characterized by the flowing of continuous stream of data, on which operations are executed. We expect that these type of applications can be improved when running on IPNoSys, because they have a programming model similar to the execution model of this network. By observing the behavior of these applications when running on IPNoSys, were performed changes in the execution model of the network IPNoSys, allowing the implementation of an instruction level parallelism. For these purposes, analysis of the implementations of dataflow applications were performed and compared

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Software Repository Mining (MSR) is a research area that analyses software repositories in order to derive relevant information for the research and practice of software engineering. The main goal of repository mining is to extract static information from repositories (e.g. code repository or change requisition system) into valuable information providing a way to support the decision making of software projects. On the other hand, another research area called Process Mining (PM) aims to find the characteristics of the underlying process of business organizations, supporting the process improvement and documentation. Recent works have been doing several analyses through MSR and PM techniques: (i) to investigate the evolution of software projects; (ii) to understand the real underlying process of a project; and (iii) create defect prediction models. However, few research works have been focusing on analyzing the contributions of software developers by means of MSR and PM techniques. In this context, this dissertation proposes the development of two empirical studies of assessment of the contribution of software developers to an open-source and a commercial project using those techniques. The contributions of developers are assessed through three different perspectives: (i) buggy commits; (ii) the size of commits; and (iii) the most important bugs. For the opensource project 12.827 commits and 8.410 bugs have been analyzed while 4.663 commits and 1.898 bugs have been analyzed for the commercial project. Our results indicate that, for the open source project, the developers classified as core developers have contributed with more buggy commits (although they have contributed with the majority of commits), more code to the project (commit size) and more important bugs solved while the results could not indicate differences with statistical significance between developer groups for the commercial project

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The main goal of Regression Test (RT) is to reuse the test suite of the latest version of a software in its current version, in order to maximize the value of the tests already developed and ensure that old features continue working after the new changes. Even with reuse, it is common that not all tests need to be executed again. Because of that, it is encouraged to use Regression Tests Selection (RTS) techniques, which aims to select from all tests, only those that reveal faults, this reduces costs and makes this an interesting practice for the testing teams. Several recent research works evaluate the quality of the selections performed by RTS techniques, identifying which one presents the best results, measured by metrics such as inclusion and precision. The RTS techniques should seek in the System Under Test (SUT) for tests that reveal faults. However, because this is a problem without a viable solution, they alternatively seek for tests that reveal changes, where faults may occur. Nevertheless, these changes may modify the execution flow of the algorithm itself, leading some tests no longer exercise the same stretch. In this context, this dissertation investigates whether changes performed in a SUT would affect the quality of the selection of tests performed by an RTS, if so, which features the changes present which cause errors, leading the RTS to include or exclude tests wrongly. For this purpose, a tool was developed using the Java language to automate the measurement of inclusion and precision averages achieved by a regression test selection technique for a particular feature of change. In order to validate this tool, an empirical study was conducted to evaluate the RTS technique Pythia, based on textual differencing, on a large web information system, analyzing the feature of types of tasks performed to evolve the SUT

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Symbolic Data Analysis (SDA) main aims to provide tools for reducing large databases to extract knowledge and provide techniques to describe the unit of such data in complex units, as such, interval or histogram. The objective of this work is to extend classical clustering methods for symbolic interval data based on interval-based distance. The main advantage of using an interval-based distance for interval-based data lies on the fact that it preserves the underlying imprecision on intervals which is usually lost when real-valued distances are applied. This work includes an approach allow existing indices to be adapted to interval context. The proposed methods with interval-based distances are compared with distances punctual existing literature through experiments with simulated data and real data interval

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Geographic Information System (GIS) are computational tools used to capture, store, consult, manipulate, analyze and print geo-referenced data. A GIS is a multi-disciplinary system that can be used by different communities of users, each one having their own interest and knowledge. This way, different knowledge views about the same reality need to be combined, in such way to attend each community. This work presents a mechanism that allows different community users access the same geographic database without knowing its particular internal structure. We use geographic ontologies to support a common and shared understanding of a specific domain: the coral reefs. Using these ontologies' descriptions that represent the knowledge of the different communities, mechanisms are created to handle with such different concepts. We use equivalent classes mapping, and a semantic layer that interacts with the ontologies and the geographic database, and that gives to the user the answers about his/her queries, independently of the used terms

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Vriesea minarum is a rupiculous bromeliad species, with naturally fragmented populations, restricted to the Iron Quadrangle, Minas Gerais, Brazil. It is a threatened species, which is suffering from habitat loss due to the growth of cities and mining activities. The knowledge of genetic variability in plant populations is one of the main branches of conservation genetics, linking genetic data to conservation strategies while the knowledge about plant reproductive biology can aid in understanding key aspects of their life story, as well as in the comprehension of their distribution and survival strategies. Thus, the study of diversity, richness, and genetic structure, as well as the reproductive biology of populations of V. minarum can contribute to the development of conservation actions. Chapter 1 presents the transferability of 14 microsatellite loci for V. minarum. Among the results of this chapter, we highlight the successful transferability of 10 microsatellite loci described for other species of Bromeliaceae, all of which are polymorphic. In Chapter 2, we present the genetic analyses of 12 populations of V. minarum that are distributed throughout the Iron Quadrangle. We used the 10 microsatellite loci tested in Chapter 1. The results show a low population structuring (Fst = 0.088), but with different values of genetic richness (mean = 2.566) and gene diversity (mean = 0.635) for all populations; and a high inbreeding coefficient (Gis = 0.376). These may be the result of pollinators action and/or efficient seed dispersal, thus allowing a high connectivity among populations of naturally fragmented outcrops. The reproductive biology and floral morphology of a population of V. minarum, located in the Parque Estadual da Serra do Rola-Moça, are studied in Chapter 3. This reserve is the only public environmental protection area where the species occurs. As a result of field experiments and observations, we found that the species has its flowering period from January to March, with flowers that last for two days and that it has a mixed pollination syndrome. It is primarily alogamous, but also has the capacity to be self-ferilized. It is expected that data obtained in chapters 1, 2 and 3 serve as basis for other studies with species from the ferruginous rocky fields, since until now, to our knowledge, there are no other survey of endemic species from the Iron Quadrangle, seeking to merge the genetic knowledge, with the data of the reproductive biology, with the ultimate aim of biodiversity conservation. Considering the great habitat loss for the species by mining, it becomes crucial to analyze the creation of new protected areas for its conservation

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In Fazenda Belém oil field (Potiguar Basin, Ceará State, Brazil) occur frequently sinkholes and sudden terrain collapses associated to an unconsolidated sedimentary cap covering the Jandaíra karst. This research was carried out in order to understand the mechanisms of generation of these collapses. The main tool used was Ground Penetrating Radar (GPR). This work is developed twofold: one aspect concerns methodology improvements in GPR data processing whilst another aspect concerns the geological study of the Jandaíra karst. This second aspect was strongly supported both by the analysis of outcropping karst structures (in another regions of Potiguar Basin) and by the interpretation of radargrams from the subsurface karst in Fazenda Belém. It was designed and tested an adequate flux to process GPR data which was adapted from an usual flux to process seismic data. The changes were introduced to take into account important differences between GPR and Reflection Seismic methods, in particular: poor coupling between source and ground, mixed phase of the wavelet, low signal-to-noise ratio, monochannel acquisition, and high influence of wave propagation effects, notably dispersion. High frequency components of the GPR pulse suffer more pronounced effects of attenuation than low frequency components resulting in resolution losses in radargrams. In Fazenda Belém, there is a stronger need of an suitable flux to process GPR data because both the presence of a very high level of aerial events and the complexity of the imaged subsurface karst structures. The key point of the processing flux was an improvement in the correction of the attenuation effects on the GPR pulse based on their influence on the amplitude and phase spectra of GPR signals. In low and moderate losses dielectric media the propagated signal suffers significant changes only in its amplitude spectrum; that is, the phase spectrum of the propagated signal remains practically unaltered for the usual travel time ranges. Based on this fact, it is shown using real data that the judicious application of the well known tools of time gain and spectral balancing can efficiently correct the attenuation effects. The proposed approach can be applied in heterogeneous media and it does not require the precise knowledge of the attenuation parameters of the media. As an additional benefit, the judicious application of spectral balancing promotes a partial deconvolution of the data without changing its phase. In other words, the spectral balancing acts in a similar way to a zero phase deconvolution. In GPR data the resolution increase obtained with spectral balancing is greater than those obtained with spike and predictive deconvolutions. The evolution of the Jandaíra karst in Potiguar Basin is associated to at least three events of subaerial exposition of the carbonatic plataform during the Turonian, Santonian, and Campanian. In Fazenda Belém region, during the mid Miocene, the Jandaíra karst was covered by continental siliciclastic sediments. These sediments partially filled the void space associated to the dissolution structures and fractures. Therefore, the development of the karst in this region was attenuated in comparison to other places in Potiguar Basin where this karst is exposed. In Fazenda Belém, the generation of sinkholes and terrain collapses are controlled mainly by: (i) the presence of an unconsolidated sedimentary cap which is thick enough to cover completely the karst but with sediment volume lower than the available space associated to the dissolution structures in the karst; (ii) the existence of important structural of SW-NE and NW-SE alignments which promote a localized increase in the hydraulic connectivity allowing the channeling of underground water, thus facilitating the carbonatic dissolution; and (iii) the existence of a hydraulic barrier to the groundwater flow, associated to the Açu-4 Unity. The terrain collapse mechanisms in Fazenda Belém occur according to the following temporal evolution. The meteoric water infiltrates through the unconsolidated sedimentary cap and promotes its remobilization to the void space associated with the dissolution structures in Jandaíra Formation. This remobilization is initiated at the base of the sedimentary cap where the flow increases its abrasion due to a change from laminar to turbulent flow regime when the underground water flow reaches the open karst structures. The remobilized sediments progressively fill from bottom to top the void karst space. So, the void space is continuously migrated upwards ultimately reaching the surface and causing the sudden observed terrain collapses. This phenomenon is particularly active during the raining season, when the water table that normally is located in the karst may be temporarily located in the unconsolidated sedimentary cap