844 resultados para Data mining and knowledge discovery


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Variations in the phenotypic expression of heterozygous beta thalassemia reflect the formation of different populations. To better understand the profile of heterozygous beta-thalassemia of the Brazilian population, we aimed at establishing parameters to direct the diagnosis of carriers and calculate the frequency from information stored in an electronic database. Using a Data Mining tool, we evaluated information on 10,960 blood samples deposited in a relational database. Over the years, improved diagnostic technology has facilitated the elucidation of suspected beta thalassemia heterozygote cases with an average frequency of 3.5% of referred cases. We also found that the Brazilian beta thalassemia trait has classic increases of Hb A2 and Hb F (60%), mainly caused by mutations in beta zero thalassemia, especially in the southeast of the country.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In soil surveys, several sampling systems can be used to define the most representative sites for sample collection and description of soil profiles. In recent years, the conditioned Latin hypercube sampling system has gained prominence for soil surveys. In Brazil, most of the soil maps are at small scales and in paper format, which hinders their refinement. The objectives of this work include: (i) to compare two sampling systems by conditioned Latin hypercube to map soil classes and soil properties; (II) to retrieve information from a detailed scale soil map of a pilot watershed for its refinement, comparing two data mining tools, and validation of the new soil map; and (III) to create and validate a soil map of a much larger and similar area from the extrapolation of information extracted from the existing soil map. Two sampling systems were created by conditioned Latin hypercube and by the cost-constrained conditioned Latin hypercube. At each prospection place, soil classification and measurement of the A horizon thickness were performed. Maps were generated and validated for each sampling system, comparing the efficiency of these methods. The conditioned Latin hypercube captured greater variability of soils and properties than the cost-constrained conditioned Latin hypercube, despite the former provided greater difficulty in field work. The conditioned Latin hypercube can capture greater soil variability and the cost-constrained conditioned Latin hypercube presents great potential for use in soil surveys, especially in areas of difficult access. From an existing detailed scale soil map of a pilot watershed, topographical information for each soil class was extracted from a Digital Elevation Model and its derivatives, by two data mining tools. Maps were generated using each tool. The more accurate of these tools was used for extrapolation of soil information for a much larger and similar area and the generated map was validated. It was possible to retrieve the existing soil map information and apply it on a larger area containing similar soil forming factors, at much low financial cost. The KnowledgeMiner tool for data mining, and ArcSIE, used to create the soil map, presented better results and enabled the use of existing soil map to extract soil information and its application in similar larger areas at reduced costs, which is especially important in development countries with limited financial resources for such activities, such as Brazil.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Once multi-relational approach has emerged as an alternative for analyzing structured data such as relational databases, since they allow applying data mining in multiple tables directly, thus avoiding expensive joining operations and semantic losses, this work proposes an algorithm with multi-relational approach. Methods: Aiming to compare traditional approach performance and multi-relational for mining association rules, this paper discusses an empirical study between PatriciaMine - an traditional algorithm - and its corresponding multi-relational proposed, MR-Radix. Results: This work showed advantages of the multi-relational approach in performance over several tables, which avoids the high cost for joining operations from multiple tables and semantic losses. The performance provided by the algorithm MR-Radix shows faster than PatriciaMine, despite handling complex multi-relational patterns. The utilized memory indicates a more conservative growth curve for MR-Radix than PatriciaMine, which shows the increase in demand of frequent items in MR-Radix does not result in a significant growth of utilized memory like in PatriciaMine. Conclusion: The comparative study between PatriciaMine and MR-Radix confirmed efficacy of the multi-relational approach in data mining process both in terms of execution time and in relation to memory usage. Besides that, the multi-relational proposed algorithm, unlike other algorithms of this approach, is efficient for use in large relational databases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The increase in new electronic devices had generated a considerable increase in obtaining spatial data information; hence these data are becoming more and more widely used. As well as for conventional data, spatial data need to be analyzed so interesting information can be retrieved from them. Therefore, data clustering techniques can be used to extract clusters of a set of spatial data. However, current approaches do not consider the implicit semantics that exist between a region and an object’s attributes. This paper presents an approach that enhances spatial data mining process, so they can use the semantic that exists within a region. A framework was developed, OntoSDM, which enables spatial data mining algorithms to communicate with ontologies in order to enhance the algorithm’s result. The experiments demonstrated a semantically improved result, generating more interesting clusters, therefore reducing manual analysis work of an expert.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The CUPID (Cultural and Psychosocial Influences on Disability) study was established to explore the hypothesis that common musculoskeletal disorders (MSDs) and associated disability are importantly influenced by culturally determined health beliefs and expectations. This paper describes the methods of data collection and various characteristics of the study sample. Methods/Principal Findings: A standardised questionnaire covering musculoskeletal symptoms, disability and potential risk factors, was used to collect information from 47 samples of nurses, office workers, and other (mostly manual) workers in 18 countries from six continents. In addition, local investigators provided data on economic aspects of employment for each occupational group. Participation exceeded 80% in 33 of the 47 occupational groups, and after pre-specified exclusions, analysis was based on 12,426 subjects (92 to 1018 per occupational group). As expected, there was high usage of computer keyboards by office workers, while nurses had the highest prevalence of heavy manual lifting in all but one country. There was substantial heterogeneity between occupational groups in economic and psychosocial aspects of work; three-to fivefold variation in awareness of someone outside work with musculoskeletal pain; and more than ten-fold variation in the prevalence of adverse health beliefs about back and arm pain, and in awareness of terms such as "repetitive strain injury" (RSI). Conclusions/Significance: The large differences in psychosocial risk factors (including knowledge and beliefs about MSDs) between occupational groups should allow the study hypothesis to be addressed effectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Oxygen abundances of 67 dwarf stars in the metallicity range -1.6 < [Fe/H] < -0.4 are derived from a non-LTE analysis of the 777 nm O I triplet lines. These stars have precise atmospheric parameters measured by Nissen and Schuster, who find that they separate into three groups based on their kinematics and alpha-element (Mg, Si, Ca, Ti) abundances: thick disk, high-alpha halo, and low-alpha halo. We find the oxygen abundance trends of thick-disk and high-alpha halo stars very similar. The low-alpha stars show a larger star-to-star scatter in [O/Fe] at a given [Fe/H] and have systematically lower oxygen abundances compared to the other two groups. Thus, we find the behavior of oxygen abundances in these groups of stars similar to that of the a elements. We use previously published oxygen abundance data of disk and very metal-poor halo stars to present an overall view (-2.3 < [Fe/H] < +0.3) of oxygen abundance trends of stars in the solar neighborhood. Two field halo dwarf stars stand out in their O and Na abundances. Both G53-41 and G150-40 have very low oxygen and very high sodium abundances, which are key signatures of the abundance anomalies observed in globular cluster (GC) stars. Therefore, they are likely field halo stars born in GCs. If true, we estimate that at least 3% +/- 2% of the local field metal-poor star population was born in GCs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract Background Once multi-relational approach has emerged as an alternative for analyzing structured data such as relational databases, since they allow applying data mining in multiple tables directly, thus avoiding expensive joining operations and semantic losses, this work proposes an algorithm with multi-relational approach. Methods Aiming to compare traditional approach performance and multi-relational for mining association rules, this paper discusses an empirical study between PatriciaMine - an traditional algorithm - and its corresponding multi-relational proposed, MR-Radix. Results This work showed advantages of the multi-relational approach in performance over several tables, which avoids the high cost for joining operations from multiple tables and semantic losses. The performance provided by the algorithm MR-Radix shows faster than PatriciaMine, despite handling complex multi-relational patterns. The utilized memory indicates a more conservative growth curve for MR-Radix than PatriciaMine, which shows the increase in demand of frequent items in MR-Radix does not result in a significant growth of utilized memory like in PatriciaMine. Conclusion The comparative study between PatriciaMine and MR-Radix confirmed efficacy of the multi-relational approach in data mining process both in terms of execution time and in relation to memory usage. Besides that, the multi-relational proposed algorithm, unlike other algorithms of this approach, is efficient for use in large relational databases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Given a large image set, in which very few images have labels, how to guess labels for the remaining majority? How to spot images that need brand new labels different from the predefined ones? How to summarize these data to route the user’s attention to what really matters? Here we answer all these questions. Specifically, we propose QuMinS, a fast, scalable solution to two problems: (i) Low-labor labeling (LLL) – given an image set, very few images have labels, find the most appropriate labels for the rest; and (ii) Mining and attention routing – in the same setting, find clusters, the top-'N IND.O' outlier images, and the 'N IND.R' images that best represent the data. Experiments on satellite images spanning up to 2.25 GB show that, contrasting to the state-of-the-art labeling techniques, QuMinS scales linearly on the data size, being up to 40 times faster than top competitors (GCap), still achieving better or equal accuracy, it spots images that potentially require unpredicted labels, and it works even with tiny initial label sets, i.e., nearly five examples. We also report a case study of our method’s practical usage to show that QuMinS is a viable tool for automatic coffee crop detection from remote sensing images.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advances in biomedical signal acquisition systems for motion analysis have led to lowcost and ubiquitous wearable sensors which can be used to record movement data in different settings. This implies the potential availability of large amounts of quantitative data. It is then crucial to identify and to extract the information of clinical relevance from the large amount of available data. This quantitative and objective information can be an important aid for clinical decision making. Data mining is the process of discovering such information in databases through data processing, selection of informative data, and identification of relevant patterns. The databases considered in this thesis store motion data from wearable sensors (specifically accelerometers) and clinical information (clinical data, scores, tests). The main goal of this thesis is to develop data mining tools which can provide quantitative information to the clinician in the field of movement disorders. This thesis will focus on motor impairment in Parkinson's disease (PD). Different databases related to Parkinson subjects in different stages of the disease were considered for this thesis. Each database is characterized by the data recorded during a specific motor task performed by different groups of subjects. The data mining techniques that were used in this thesis are feature selection (a technique which was used to find relevant information and to discard useless or redundant data), classification, clustering, and regression. The aims were to identify high risk subjects for PD, characterize the differences between early PD subjects and healthy ones, characterize PD subtypes and automatically assess the severity of symptoms in the home setting.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The discovery of the Cosmic Microwave Background (CMB) radiation in 1965 is one of the fundamental milestones supporting the Big Bang theory. The CMB is one of the most important source of information in cosmology. The excellent accuracy of the recent CMB data of WMAP and Planck satellites confirmed the validity of the standard cosmological model and set a new challenge for the data analysis processes and their interpretation. In this thesis we deal with several aspects and useful tools of the data analysis. We focus on their optimization in order to have a complete exploitation of the Planck data and contribute to the final published results. The issues investigated are: the change of coordinates of CMB maps using the HEALPix package, the problem of the aliasing effect in the generation of low resolution maps, the comparison of the Angular Power Spectrum (APS) extraction performances of the optimal QML method, implemented in the code called BolPol, and the pseudo-Cl method, implemented in Cromaster. The QML method has been then applied to the Planck data at large angular scales to extract the CMB APS. The same method has been applied also to analyze the TT parity and the Low Variance anomalies in the Planck maps, showing a consistent deviation from the standard cosmological model, the possible origins for this results have been discussed. The Cromaster code instead has been applied to the 408 MHz and 1.42 GHz surveys focusing on the analysis of the APS of selected regions of the synchrotron emission. The new generation of CMB experiments will be dedicated to polarization measurements, for which are necessary high accuracy devices for separating the polarizations. Here a new technology, called Photonic Crystals, is exploited to develop a new polarization splitter device and its performances are compared to the devices used nowadays.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Internet of Things (IoT) is the next industrial revolution: we will interact naturally with real and virtual devices as a key part of our daily life. This technology shift is expected to be greater than the Web and Mobile combined. As extremely different technologies are needed to build connected devices, the Internet of Things field is a junction between electronics, telecommunications and software engineering. Internet of Things application development happens in silos, often using proprietary and closed communication protocols. There is the common belief that only if we can solve the interoperability problem we can have a real Internet of Things. After a deep analysis of the IoT protocols, we identified a set of primitives for IoT applications. We argue that each IoT protocol can be expressed in term of those primitives, thus solving the interoperability problem at the application protocol level. Moreover, the primitives are network and transport independent and make no assumption in that regard. This dissertation presents our implementation of an IoT platform: the Ponte project. Privacy issues follows the rise of the Internet of Things: it is clear that the IoT must ensure resilience to attacks, data authentication, access control and client privacy. We argue that it is not possible to solve the privacy issue without solving the interoperability problem: enforcing privacy rules implies the need to limit and filter the data delivery process. However, filtering data require knowledge of how the format and the semantics of the data: after an analysis of the possible data formats and representations for the IoT, we identify JSON-LD and the Semantic Web as the best solution for IoT applications. Then, this dissertation present our approach to increase the throughput of filtering semantic data by a factor of ten.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Vietnam has developed rapidly over the past 15 years. However, progress was not uniformly distributed across the country. Availability, adequate visualization and analysis of spatially explicit data on socio-economic and environmental aspects can support both research and policy towards sustainable development. Applying appropriate mapping techniques allows gleaning important information from tabular socio-economic data. Spatial analysis of socio-economic phenomena can yield insights into locally-specifi c patterns and processes that cannot be generated by non-spatial applications. This paper presents techniques and applications that develop and analyze spatially highly disaggregated socioeconomic datasets. A number of examples show how such information can support informed decisionmaking and research in Vietnam.