901 resultados para Data carving tools.


Relevância:

90.00% 90.00%

Publicador:

Resumo:

To date, the processing of wildlife location data has relied on a diversity of software and file formats. Data management and the following spatial and statistical analyses were undertaken in multiple steps, involving many time-consuming importing/exporting phases. Recent technological advancements in tracking systems have made large, continuous, high-frequency datasets of wildlife behavioral data available, such as those derived from the global positioning system (GPS) and other animal-attached sensor devices. These data can be further complemented by a wide range of other information about the animals’ environment. Management of these large and diverse datasets for modelling animal behaviour and ecology can prove challenging, slowing down analysis and increasing the probability of mistakes in data handling. We address these issues by critically evaluating the requirements for good management of GPS data for wildlife biology. We highlight that dedicated data management tools and expertise are needed. We explore current research in wildlife data management. We suggest a general direction of development, based on a modular software architecture with a spatial database at its core, where interoperability, data model design and integration with remote-sensing data sources play an important role in successful GPS data handling.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Data mining is a relatively new field of research that its objective is to acquire knowledge from large amounts of data. In medical and health care areas, due to regulations and due to the availability of computers, a large amount of data is becoming available [27]. On the one hand, practitioners are expected to use all this data in their work but, at the same time, such a large amount of data cannot be processed by humans in a short time to make diagnosis, prognosis and treatment schedules. A major objective of this thesis is to evaluate data mining tools in medical and health care applications to develop a tool that can help make rather accurate decisions. In this thesis, the goal is finding a pattern among patients who got pneumonia by clustering of lab data values which have been recorded every day. By this pattern we can generalize it to the patients who did not have been diagnosed by this disease whose lab values shows the same trend as pneumonia patients does. There are 10 tables which have been extracted from a big data base of a hospital in Jena for my work .In ICU (intensive care unit), COPRA system which is a patient management system has been used. All the tables and data stored in German Language database.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We hereby develop an effective and efficient testing methodology for correctness testing for file recovery tools across different file systems. We assume that the tool tester is familiar with the formats of common file types and has the ability to use the tools correctly. Our methodology first derives a testing plan to minimize the number of runs required to identify the differences in tools with respect to correctness. We also present a case study on correctness testing for file carving tools, which allows us to confirm that the number of necessary testing runs is bounded and our results are statistically sound.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In soil surveys, several sampling systems can be used to define the most representative sites for sample collection and description of soil profiles. In recent years, the conditioned Latin hypercube sampling system has gained prominence for soil surveys. In Brazil, most of the soil maps are at small scales and in paper format, which hinders their refinement. The objectives of this work include: (i) to compare two sampling systems by conditioned Latin hypercube to map soil classes and soil properties; (II) to retrieve information from a detailed scale soil map of a pilot watershed for its refinement, comparing two data mining tools, and validation of the new soil map; and (III) to create and validate a soil map of a much larger and similar area from the extrapolation of information extracted from the existing soil map. Two sampling systems were created by conditioned Latin hypercube and by the cost-constrained conditioned Latin hypercube. At each prospection place, soil classification and measurement of the A horizon thickness were performed. Maps were generated and validated for each sampling system, comparing the efficiency of these methods. The conditioned Latin hypercube captured greater variability of soils and properties than the cost-constrained conditioned Latin hypercube, despite the former provided greater difficulty in field work. The conditioned Latin hypercube can capture greater soil variability and the cost-constrained conditioned Latin hypercube presents great potential for use in soil surveys, especially in areas of difficult access. From an existing detailed scale soil map of a pilot watershed, topographical information for each soil class was extracted from a Digital Elevation Model and its derivatives, by two data mining tools. Maps were generated using each tool. The more accurate of these tools was used for extrapolation of soil information for a much larger and similar area and the generated map was validated. It was possible to retrieve the existing soil map information and apply it on a larger area containing similar soil forming factors, at much low financial cost. The KnowledgeMiner tool for data mining, and ArcSIE, used to create the soil map, presented better results and enabled the use of existing soil map to extract soil information and its application in similar larger areas at reduced costs, which is especially important in development countries with limited financial resources for such activities, such as Brazil.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Multi-element analysis of honey samples was carried out with the aim of developing a reliable method of tracing the origin of honey. Forty-two chemical elements were determined (Al, Cu, Pb, Zn, Mn, Cd, Tl, Co, Ni, Rb, Ba, Be, Bi, U, V, Fe, Pt, Pd, Te, Hf, Mo, Sn, Sb, P, La, Mg, I, Sm, Tb, Dy, Sd, Th, Pr, Nd, Tm, Yb, Lu, Gd, Ho, Er, Ce, Cr) by inductively coupled plasma mass spectrometry (ICP-MS). Then, three machine learning tools for classification and two for attribute selection were applied in order to prove that it is possible to use data mining tools to find the region where honey originated. Our results clearly demonstrate the potential of Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Random Forest (RF) chemometric tools for honey origin identification. Moreover, the selection tools allowed a reduction from 42 trace element concentrations to only 5. (C) 2012 Elsevier Ltd. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Advances in biomedical signal acquisition systems for motion analysis have led to lowcost and ubiquitous wearable sensors which can be used to record movement data in different settings. This implies the potential availability of large amounts of quantitative data. It is then crucial to identify and to extract the information of clinical relevance from the large amount of available data. This quantitative and objective information can be an important aid for clinical decision making. Data mining is the process of discovering such information in databases through data processing, selection of informative data, and identification of relevant patterns. The databases considered in this thesis store motion data from wearable sensors (specifically accelerometers) and clinical information (clinical data, scores, tests). The main goal of this thesis is to develop data mining tools which can provide quantitative information to the clinician in the field of movement disorders. This thesis will focus on motor impairment in Parkinson's disease (PD). Different databases related to Parkinson subjects in different stages of the disease were considered for this thesis. Each database is characterized by the data recorded during a specific motor task performed by different groups of subjects. The data mining techniques that were used in this thesis are feature selection (a technique which was used to find relevant information and to discard useless or redundant data), classification, clustering, and regression. The aims were to identify high risk subjects for PD, characterize the differences between early PD subjects and healthy ones, characterize PD subtypes and automatically assess the severity of symptoms in the home setting.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: The recent development of semi-automated techniques for staining and analyzing flow cytometry samples has presented new challenges. Quality control and quality assessment are critical when developing new high throughput technologies and their associated information services. Our experience suggests that significant bottlenecks remain in the development of high throughput flow cytometry methods for data analysis and display. Especially, data quality control and quality assessment are crucial steps in processing and analyzing high throughput flow cytometry data. Methods: We propose a variety of graphical exploratory data analytic tools for exploring ungated flow cytometry data. We have implemented a number of specialized functions and methods in the Bioconductor package rflowcyt. We demonstrate the use of these approaches by investigating two independent sets of high throughput flow cytometry data. Results: We found that graphical representations can reveal substantial non-biological differences in samples. Empirical Cumulative Distribution Function and summary scatterplots were especially useful in the rapid identification of problems not identified by manual review. Conclusions: Graphical exploratory data analytic tools are quick and useful means of assessing data quality. We propose that the described visualizations should be used as quality assessment tools and where possible, be used for quality control.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Swiss Swiss Consultant Trust Fund (CTF) support covered the period from July to December 2007 and comprised four main tasks: (1) Analysis of historic land degradation trends in the four watersheds of Zerafshan, Surkhob, Toirsu, and Vanj; (2) Translation of standard CDE GIS training materials into Russian and Tajik to enable local government staff and other specialists to use geospatial data and tools; (3) Demonstration of geospatial tools that show land degradation trends associated with land use and vegetative cover data in the project areas, (4) Preliminary training of government staff in using appropriate data, including existing information, global datasets, inexpensive satellite imagery and other datasets and webbased visualization tools like spatial data viewers, etc. The project allowed building of local awareness of, and skills in, up-to-date, inexpensive, easy-to-use GIS technologies, data sources, and applications relevant to natural resource management and especially to sustainable land management. In addition to supporting the implementation of the World Bank technical assistance activity to build capacity in the use of geospatial tools for natural resource management, the Swiss CTF support also aimed at complementing the Bank supervision work on the ongoing Community Agriculture and Watershed Management Project (CAWMP).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The sustainable management of natural resources is a key issue for sustainable development of a poor, mountainous country such as Tajikistan. In order to strengthen its agricultural and infrastructural development efforts and alleviate poverty in rural areas, spatial information and analysis are of crucial importance to improve priority setting and decision making efficiency. However, poor access to geospatial data and tools, and limited capacity in their use has greatly constrained the ability of governmental institutions to effectively assess, plan, and monitor natural resources management. The Centre for Development and Environment (CDE) has thus been mandated by the World Bank Group to provide adequate technical support to the Community Agriculture and Watershed Management Project (CAWMP). This support consists of a spatial database on soil degradation trends in 4 watersheds, capacity development in and awareness creation about geographic information technology and a spatial data exchange hub for natural resources management in Tajikistan. CDE’s support has started in July 2007 and will last until December 2007 with a possible extension in 2008.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Over the last few years, the Data Center market has increased exponentially and this tendency continues today. As a direct consequence of this trend, the industry is pushing the development and implementation of different new technologies that would improve the energy consumption efficiency of data centers. An adaptive dashboard would allow the user to monitor the most important parameters of a data center in real time. For that reason, monitoring companies work with IoT big data filtering tools and cloud computing systems to handle the amounts of data obtained from the sensors placed in a data center.Analyzing the market trends in this field we can affirm that the study of predictive algorithms has become an essential area for competitive IT companies. Complex algorithms are used to forecast risk situations based on historical data and warn the user in case of danger. Considering that several different users will interact with this dashboard from IT experts or maintenance staff to accounting managers, it is vital to personalize it automatically. Following that line of though, the dashboard should only show relevant metrics to the user in different formats like overlapped maps or representative graphs among others. These maps will show all the information needed in a visual and easy-to-evaluate way. To sum up, this dashboard will allow the user to visualize and control a wide range of variables. Monitoring essential factors such as average temperature, gradients or hotspots as well as energy and power consumption and savings by rack or building would allow the client to understand how his equipment is behaving, helping him to optimize the energy consumption and efficiency of the racks. It also would help him to prevent possible damages in the equipment with predictive high-tech algorithms.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dissertação apresentada à Escola Superior de Tecnologia do Instituto Politécnico de Castelo Branco para cumprimento dos requisitos necessários à obtenção do grau de Mestre em Desenvolvimento de Software e Sistemas Interactivos, realizada sob a orientação científica da categoria profissional do orientador Doutor Eurico Ribeiro Lopes, do Instituto Politécnico de Castelo Branco.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper surveys the context of feature extraction by neural network approaches, and compares and contrasts their behaviour as prospective data visualisation tools in a real world problem. We also introduce and discuss a hybrid approach which allows us to control the degree of discriminatory and topographic information in the extracted feature space.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The breadth and depth of available clinico-genomic information, present an enormous opportunity for improving our ability to study disease mechanisms and meet the individualised medicine needs. A difficulty occurs when the results are to be transferred 'from bench to bedside'. Diversity of methods is one of the causes, but the most critical one relates to our inability to share and jointly exploit data and tools. This paper presents a perspective on current state-of-the-art in the analysis of clinico-genomic data and its relevance to medical decision support. It is an attempt to investigate the issues related to data and knowledge integration. Copyright © 2010 Inderscience Enterprises Ltd.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^