838 resultados para text and data mining
Resumo:
Tendo como motivação o desenvolvimento de uma representação gráfica de redes com grande número de vértices, útil para aplicações de filtro colaborativo, este trabalho propõe a utilização de superfícies de coesão sobre uma base temática multidimensionalmente escalonada. Para isso, utiliza uma combinação de escalonamento multidimensional clássico e análise de procrustes, em algoritmo iterativo que encaminha soluções parciais, depois combinadas numa solução global. Aplicado a um exemplo de transações de empréstimo de livros pela Biblioteca Karl A. Boedecker, o algoritmo proposto produz saídas interpretáveis e coerentes tematicamente, e apresenta um stress menor que a solução por escalonamento clássico.
Resumo:
Variations in the phenotypic expression of heterozygous beta thalassemia reflect the formation of different populations. To better understand the profile of heterozygous beta-thalassemia of the Brazilian population, we aimed at establishing parameters to direct the diagnosis of carriers and calculate the frequency from information stored in an electronic database. Using a Data Mining tool, we evaluated information on 10,960 blood samples deposited in a relational database. Over the years, improved diagnostic technology has facilitated the elucidation of suspected beta thalassemia heterozygote cases with an average frequency of 3.5% of referred cases. We also found that the Brazilian beta thalassemia trait has classic increases of Hb A2 and Hb F (60%), mainly caused by mutations in beta zero thalassemia, especially in the southeast of the country.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
The increase in the number of spatial data collected has motivated the development of geovisualisation techniques, aiming to provide an important resource to support the extraction of knowledge and decision making. One of these techniques are 3D graphs, which provides a dynamic and flexible increase of the results analysis obtained by the spatial data mining algorithms, principally when there are incidences of georeferenced objects in a same local. This work presented as an original contribution the potentialisation of visual resources in a computational environment of spatial data mining and, afterwards, the efficiency of these techniques is demonstrated with the use of a real database. The application has shown to be very interesting in interpreting obtained results, such as patterns that occurred in a same locality and to provide support for activities which could be done as from the visualisation of results. © 2013 Springer-Verlag.
Resumo:
Background: Once multi-relational approach has emerged as an alternative for analyzing structured data such as relational databases, since they allow applying data mining in multiple tables directly, thus avoiding expensive joining operations and semantic losses, this work proposes an algorithm with multi-relational approach. Methods: Aiming to compare traditional approach performance and multi-relational for mining association rules, this paper discusses an empirical study between PatriciaMine - an traditional algorithm - and its corresponding multi-relational proposed, MR-Radix. Results: This work showed advantages of the multi-relational approach in performance over several tables, which avoids the high cost for joining operations from multiple tables and semantic losses. The performance provided by the algorithm MR-Radix shows faster than PatriciaMine, despite handling complex multi-relational patterns. The utilized memory indicates a more conservative growth curve for MR-Radix than PatriciaMine, which shows the increase in demand of frequent items in MR-Radix does not result in a significant growth of utilized memory like in PatriciaMine. Conclusion: The comparative study between PatriciaMine and MR-Radix confirmed efficacy of the multi-relational approach in data mining process both in terms of execution time and in relation to memory usage. Besides that, the multi-relational proposed algorithm, unlike other algorithms of this approach, is efficient for use in large relational databases.
Resumo:
The increase in new electronic devices had generated a considerable increase in obtaining spatial data information; hence these data are becoming more and more widely used. As well as for conventional data, spatial data need to be analyzed so interesting information can be retrieved from them. Therefore, data clustering techniques can be used to extract clusters of a set of spatial data. However, current approaches do not consider the implicit semantics that exist between a region and an object’s attributes. This paper presents an approach that enhances spatial data mining process, so they can use the semantic that exists within a region. A framework was developed, OntoSDM, which enables spatial data mining algorithms to communicate with ontologies in order to enhance the algorithm’s result. The experiments demonstrated a semantically improved result, generating more interesting clusters, therefore reducing manual analysis work of an expert.
Resumo:
Abstract Background Once multi-relational approach has emerged as an alternative for analyzing structured data such as relational databases, since they allow applying data mining in multiple tables directly, thus avoiding expensive joining operations and semantic losses, this work proposes an algorithm with multi-relational approach. Methods Aiming to compare traditional approach performance and multi-relational for mining association rules, this paper discusses an empirical study between PatriciaMine - an traditional algorithm - and its corresponding multi-relational proposed, MR-Radix. Results This work showed advantages of the multi-relational approach in performance over several tables, which avoids the high cost for joining operations from multiple tables and semantic losses. The performance provided by the algorithm MR-Radix shows faster than PatriciaMine, despite handling complex multi-relational patterns. The utilized memory indicates a more conservative growth curve for MR-Radix than PatriciaMine, which shows the increase in demand of frequent items in MR-Radix does not result in a significant growth of utilized memory like in PatriciaMine. Conclusion The comparative study between PatriciaMine and MR-Radix confirmed efficacy of the multi-relational approach in data mining process both in terms of execution time and in relation to memory usage. Besides that, the multi-relational proposed algorithm, unlike other algorithms of this approach, is efficient for use in large relational databases.
Resumo:
Advances in biomedical signal acquisition systems for motion analysis have led to lowcost and ubiquitous wearable sensors which can be used to record movement data in different settings. This implies the potential availability of large amounts of quantitative data. It is then crucial to identify and to extract the information of clinical relevance from the large amount of available data. This quantitative and objective information can be an important aid for clinical decision making. Data mining is the process of discovering such information in databases through data processing, selection of informative data, and identification of relevant patterns. The databases considered in this thesis store motion data from wearable sensors (specifically accelerometers) and clinical information (clinical data, scores, tests). The main goal of this thesis is to develop data mining tools which can provide quantitative information to the clinician in the field of movement disorders. This thesis will focus on motor impairment in Parkinson's disease (PD). Different databases related to Parkinson subjects in different stages of the disease were considered for this thesis. Each database is characterized by the data recorded during a specific motor task performed by different groups of subjects. The data mining techniques that were used in this thesis are feature selection (a technique which was used to find relevant information and to discard useless or redundant data), classification, clustering, and regression. The aims were to identify high risk subjects for PD, characterize the differences between early PD subjects and healthy ones, characterize PD subtypes and automatically assess the severity of symptoms in the home setting.
Resumo:
The primary challenge in groundwater and contaminant transport modeling is obtaining the data needed for constructing, calibrating and testing the models. Large amounts of data are necessary for describing the hydrostratigraphy in areas with complex geology. Increasingly states are making spatial data available that can be used for input to groundwater flow models. The appropriateness of this data for large-scale flow systems has not been tested. This study focuses on modeling a plume of 1,4-dioxane in a heterogeneous aquifer system in Scio Township, Washtenaw County, Michigan. The analysis consisted of: (1) characterization of hydrogeology of the area and construction of a conceptual model based on publicly available spatial data, (2) development and calibration of a regional flow model for the site, (3) conversion of the regional model to a more highly resolved local model, (4) simulation of the dioxane plume, and (5) evaluation of the model's ability to simulate field data and estimation of the possible dioxane sources and subsequent migration until maximum concentrations are at or below the Michigan Department of Environmental Quality's residential cleanup standard for groundwater (85 ppb). MODFLOW-2000 and MT3D programs were utilized to simulate the groundwater flow and the development and movement of the 1, 4-dioxane plume, respectively. MODFLOW simulates transient groundwater flow in a quasi-3-dimensional sense, subject to a variety of boundary conditions that can simulate recharge, pumping, and surface-/groundwater interactions. MT3D simulates solute advection with groundwater flow (using the flow solution from MODFLOW), dispersion, source/sink mixing, and chemical reaction of contaminants. This modeling approach was successful at simulating the groundwater flows by calibrating recharge and hydraulic conductivities. The plume transport was adequately simulated using literature dispersivity and sorption coefficients, although the plume geometries were not well constrained.
Resumo:
Sustainable yields from water wells in hard-rock aquifers are achieved when the well bore intersects fracture networks. Fracture networks are often not readily discernable at the surface. Lineament analysis using remotely sensed satellite imagery has been employed to identify surface expressions of fracturing, and a variety of image-analysis techniques have been successfully applied in “ideal” settings. An ideal setting for lineament detection is where the influences of human development, vegetation, and climatic situations are minimal and hydrogeological conditions and geologic structure are known. There is not yet a well-accepted protocol for mapping lineaments nor have different approaches been compared in non-ideal settings. A new approach for image-processing/synthesis was developed to identify successful satellite imagery types for lineament analysis in non-ideal terrain. Four satellite sensors (ASTER, Landsat7 ETM+, QuickBird, RADARSAT-1) and a digital elevation model were evaluated for lineament analysis in Boaco, Nicaragua, where the landscape is subject to varied vegetative cover, a plethora of anthropogenic features, and frequent cloud cover that limit the availability of optical satellite data. A variety of digital image processing techniques were employed and lineament interpretations were performed to obtain 12 complementary image products that were evaluated subjectively to identify lineaments. The 12 lineament interpretations were synthesized to create a raster image of lineament zone coincidence that shows the level of agreement among the 12 interpretations. A composite lineament interpretation was made using the coincidence raster to restrict lineament observations to areas where multiple interpretations (at least 4) agree. Nine of the 11 previously mapped faults were identified from the coincidence raster. An additional 26 lineaments were identified from the coincidence raster, and the locations of 10 were confirmed by field observation. Four manual pumping tests suggest that well productivity is higher for wells proximal to lineament features. Interpretations from RADARSAT-1 products were superior to interpretations from other sensor products, suggesting that quality lineament interpretation in this region requires anthropogenic features to be minimized and topographic expressions to be maximized. The approach developed in this study has the potential to improve siting wells in non-ideal regions.