985 resultados para Data handling


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Knowledge discovery in databases is the non-trivial process of identifying valid, novel potentially useful and ultimately understandable patterns from data. The term Data mining refers to the process which does the exploratory analysis on the data and builds some model on the data. To infer patterns from data, data mining involves different approaches like association rule mining, classification techniques or clustering techniques. Among the many data mining techniques, clustering plays a major role, since it helps to group the related data for assessing properties and drawing conclusions. Most of the clustering algorithms act on a dataset with uniform format, since the similarity or dissimilarity between the data points is a significant factor in finding out the clusters. If a dataset consists of mixed attributes, i.e. a combination of numerical and categorical variables, a preferred approach is to convert different formats into a uniform format. The research study explores the various techniques to convert the mixed data sets to a numerical equivalent, so as to make it equipped for applying the statistical and similar algorithms. The results of clustering mixed category data after conversion to numeric data type have been demonstrated using a crime data set. The thesis also proposes an extension to the well known algorithm for handling mixed data types, to deal with data sets having only categorical data. The proposed conversion has been validated on a data set corresponding to breast cancer. Moreover, another issue with the clustering process is the visualization of output. Different geometric techniques like scatter plot, or projection plots are available, but none of the techniques display the result projecting the whole database but rather demonstrate attribute-pair wise analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The common GIS-based approach to regional analyses of soil organic carbon (SOC) stocks and changes is to define geographic layers for which unique sets of driving variables are derived, which include land use, climate, and soils. These GIS layers, with their associated attribute data, can then be fed into a range of empirical and dynamic models. Common methodologies for collating and formatting regional data sets on land use, climate, and soils were adopted for the project Assessment of Soil Organic Carbon Stocks and Changes at National Scale (GEFSOC). This permitted the development of a uniform protocol for handling the various input for the dynamic GEFSOC Modelling System. Consistent soil data sets for Amazon-Brazil, the Indo-Gangetic Plains (IGP) of India, Jordan and Kenya, the case study areas considered in the GEFSOC project, were prepared using methodologies developed for the World Soils and Terrain Database (SOTER). The approach involved three main stages: (1) compiling new soil geographic and attribute data in SOTER format; (2) using expert estimates and common sense to fill selected gaps in the measured or primary data; (3) using a scheme of taxonomy-based pedotransfer rules and expert-rules to derive soil parameter estimates for similar soil units with missing soil analytical data. The most appropriate approach varied from country to country, depending largely on the overall accessibility and quality of the primary soil data available in the case study areas. The secondary SOTER data sets discussed here are appropriate for a wide range of environmental applications at national scale. These include agro-ecological zoning, land evaluation, modelling of soil C stocks and changes, and studies of soil vulnerability to pollution. Estimates of national-scale stocks of SOC, calculated using SOTER methods, are presented as a first example of database application. Independent estimates of SOC stocks are needed to evaluate the outcome of the GEFSOC Modelling System for current conditions of land use and climate. (C) 2007 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We have compared the biokinetics of deuterated natural (RPR) and synthetic (all rac) alpha-tocopherol in male apoE4-carrying smokers and nonsmokers. In a randomized, crossover study subjects underwent two 4-week treatments (400 mg/day) with undeuterated RRR- and all rac-alpha-tocopheryl acetate around a 12-week washout. Before and after each supplementation period subjects underwent a biokinetic protocol (48 h) with 150 mg deuterated RRR- or all rac-alpha-tocopheryl acetate. During the biokinetic protocols, the elimination of endogenous plasma alpha-tocopherol was significantly faster in smokers (P < 0.05). However, smokers had a lower uptake of deuterated RRR than nonsmokers, but there was no difference in uptake of deuterated all rac. The supplementation regimes significantly raised plasma alpha-tocopherol (P < 0.001) with no differences in response between smokers and nonsmokers or between alpha-tocopherol forms. Smokers had significantly lower excretion of alpha-carboxyethyl-hydroxychroman than nonsmokers following supplementation (P < 0.05). Nonsmokers excreted more alpha-carboxyethyl-hydroxychroman following RRR than all rac; however, smokers did not differ in excretion between forms. At baseline, smokers had significantly lower ascorbate (P < 0.01) and higher F(2-)isoprostarres (P < 0.05). F-2-isoprostanes in smokers remained unchanged during the study, but increased in nonsmokers following alpha-tocopherol supplementation. These data suggest that apoE4-carrying smokers and nonsmokers differ in their handling of natural and synthetic alpha-tocopherol. (c) 2006 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes a prototype grid infrastructure, called the eMinerals minigrid, for molecular simulation scientists. which is based on an integration of shared compute and data resources. We describe the key components, namely the use of Condor pools, Linux/Unix clusters with PBS and IBM's LoadLeveller job handling tools, the use of Globus for security handling, the use of Condor-G tools for wrapping globus job submit commands, Condor's DAGman tool for handling workflow, the Storage Resource Broker for handling data, and the CCLRC dataportal and associated tools for both archiving data with metadata and making data available to other workers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Novel imaging techniques are playing an increasingly important role in drug development, providing insight into the mechanism of action of new chemical entities. The data sets obtained by these methods can be large with complex inter-relationships, but the most appropriate statistical analysis for handling this data is often uncertain - precisely because of the exploratory nature of the way the data are collected. We present an example from a clinical trial using magnetic resonance imaging to assess changes in atherosclerotic plaques following treatment with a tool compound with established clinical benefit. We compared two specific approaches to handle the correlations due to physical location and repeated measurements: two-level and four-level multilevel models. The two methods identified similar structural variables, but higher level multilevel models had the advantage of explaining a greater proportion of variation, and the modeling assumptions appeared to be better satisfied.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In Britain, substantial cuts in police budgets alongside controversial handling of incidents such as politically sensitive enquiries, public disorder and relations with the media have recently triggered much debate about public knowledge and trust in the police. To date, however, little academic research has investigated how knowledge of police performance impacts citizens’ trust. We address this long-standing lacuna by exploring citizens’ trust before and after exposure to real performance data in the context of a British police force. The results reveal that being informed of performance data affects citizens’ trust significantly. Furthermore, direction and degree of change in trust are related to variations across the different elements of the reported performance criteria. Interestingly, the volatility of citizens’ trust is related to initial performance perceptions (such that citizens with low initial perceptions of police performance react more significantly to evidence of both good and bad performance than citizens with high initial perceptions), and citizens’ intentions to support the police do not always correlate with their cognitive and affective trust towards the police. In discussing our findings, we explore the implications of how being transparent with performance data can both hinder and be helpful in developing citizens’ trust towards a public organisation such as the police. From our study, we pose a number of ethical challenges that practitioners face when deciding what data to highlight, to whom, and for what purpose.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Astronomy has evolved almost exclusively by the use of spectroscopic and imaging techniques, operated separately. With the development of modern technologies, it is possible to obtain data cubes in which one combines both techniques simultaneously, producing images with spectral resolution. To extract information from them can be quite complex, and hence the development of new methods of data analysis is desirable. We present a method of analysis of data cube (data from single field observations, containing two spatial and one spectral dimension) that uses Principal Component Analysis (PCA) to express the data in the form of reduced dimensionality, facilitating efficient information extraction from very large data sets. PCA transforms the system of correlated coordinates into a system of uncorrelated coordinates ordered by principal components of decreasing variance. The new coordinates are referred to as eigenvectors, and the projections of the data on to these coordinates produce images we will call tomograms. The association of the tomograms (images) to eigenvectors (spectra) is important for the interpretation of both. The eigenvectors are mutually orthogonal, and this information is fundamental for their handling and interpretation. When the data cube shows objects that present uncorrelated physical phenomena, the eigenvector`s orthogonality may be instrumental in separating and identifying them. By handling eigenvectors and tomograms, one can enhance features, extract noise, compress data, extract spectra, etc. We applied the method, for illustration purpose only, to the central region of the low ionization nuclear emission region (LINER) galaxy NGC 4736, and demonstrate that it has a type 1 active nucleus, not known before. Furthermore, we show that it is displaced from the centre of its stellar bulge.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we introduce a parametric model for handling lifetime data where an early lifetime can be related to the infant-mortality failure or to the wear processes but we do not know which risk is responsible for the failure. The maximum likelihood approach and the sampling-based approach are used to get the inferences of interest. Some special cases of the proposed model are studied via Monte Carlo methods for size and power of hypothesis tests. To illustrate the proposed methodology, we introduce an example consisting of a real data set.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Model trees are a particular case of decision trees employed to solve regression problems. They have the advantage of presenting an interpretable output, helping the end-user to get more confidence in the prediction and providing the basis for the end-user to have new insight about the data, confirming or rejecting hypotheses previously formed. Moreover, model trees present an acceptable level of predictive performance in comparison to most techniques used for solving regression problems. Since generating the optimal model tree is an NP-Complete problem, traditional model tree induction algorithms make use of a greedy top-down divide-and-conquer strategy, which may not converge to the global optimal solution. In this paper, we propose a novel algorithm based on the use of the evolutionary algorithms paradigm as an alternate heuristic to generate model trees in order to improve the convergence to globally near-optimal solutions. We call our new approach evolutionary model tree induction (E-Motion). We test its predictive performance using public UCI data sets, and we compare the results to traditional greedy regression/model trees induction algorithms, as well as to other evolutionary approaches. Results show that our method presents a good trade-off between predictive performance and model comprehensibility, which may be crucial in many machine learning applications. (C) 2010 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When an accurate hydraulic network model is available, direct modeling techniques are very straightforward and reliable for on-line leakage detection and localization applied to large class of water distribution networks. In general, this type of techniques based on analytical models can be seen as an application of the well-known fault detection and isolation theory for complex industrial systems. Nonetheless, the assumption of single leak scenarios is usually made considering a certain leak size pattern which may not hold in real applications. Upgrading a leak detection and localization method based on a direct modeling approach to handle multiple-leak scenarios can be, on one hand, quite straightforward but, on the other hand, highly computational demanding for large class of water distribution networks given the huge number of potential water loss hotspots. This paper presents a leakage detection and localization method suitable for multiple-leak scenarios and large class of water distribution networks. This method can be seen as an upgrade of the above mentioned method based on a direct modeling approach in which a global search method based on genetic algorithms has been integrated in order to estimate those network water loss hotspots and the size of the leaks. This is an inverse / direct modeling method which tries to take benefit from both approaches: on one hand, the exploration capability of genetic algorithms to estimate network water loss hotspots and the size of the leaks and on the other hand, the straightforwardness and reliability offered by the availability of an accurate hydraulic model to assess those close network areas around the estimated hotspots. The application of the resulting method in a DMA of the Barcelona water distribution network is provided and discussed. The obtained results show that leakage detection and localization under multiple-leak scenarios may be performed efficiently following an easy procedure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work presents a methodological proposal for acquisition of biometric data through telemetry basing its development on a research-action and a case study. Nowadays, the qualified professionals of physical evaluation have to use specific devices to obtain biometric signals and data. These devices in the most of the time are high cost and difficult to use and handling. Therefore, the methodological proposal was elaborate in order to develop, conceptually, a bio telemetric device which could acquire the desirable biometric signals: oxymetry, biometrics, corporal temperature and pedometry which are essential for the area of physical evaluation. It was researched the existent biometrics sensors, the possible ways for the remote transmission of signals and the computer systems available so that the acquisition of data could be possible. This methodological proposal of remote acquisition of biometrical signals is structured in four modules: Acquisitor of biometrics data; Converser and transmitter of biometric signals; Receiver and Processor of biometrics signals and Generator of Interpretative Graphs. The modules aim the obtention of interpretative graphics of human biometric signals. In order to validate this proposal a functional prototype was developed and it is presented in the development of this work.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

CONTEXTUALIZAÇÃO:O manuseio de materiais está ainda presente nos setores industriais e é associado a lesões na coluna lombar e membros superiores. A inserção de alças em caixas industriais é uma forma de reduzir os riscos relacionados à tarefa, porém a posição e a angulação das alças, que são fatores importantes para o conforto e segurança durante o manuseio, são ainda pouco investigadas objetivamente.OBJETIVOS:Comparar o manuseio de uma caixa comercial e de protótipos com alças e avaliar seus efeitos na postura de membros superiores, atividade elétrica muscular e percepção de agradabilidade em diferentes empunhaduras durante manuseio entre diferentes alturas.MÉTODO:Trinta e sete voluntários saudáveis avaliaram as alças dos protótipos que possibilitavam mudança nas posições (superior e inferior) e angulações (0°, 15° e 30º). Os movimentos dos punhos, cotovelos e ombros foram avaliados por meio da eletrogoniometria e inclinometria. A atividade elétrica muscular dos extensores do punho, bíceps braquial e porção superior do trapézio foi avaliada por um eletromiógrafo portátil. Os registros de movimento e atividade elétrica muscular foram sincronizados. Aspectos subjetivos de agradabilidade foram avaliados por meio de uma escala visual analógica.RESULTADOS E CONCLUSÕES:Os protótipos com alças inclinadas em 30° apresentaram as melhores avaliações de agradabilidade, posturas mais neutras de punho, menores níveis de atividade eletromiográfica do trapézio superior e menores ângulos de elevação dos braços. Os diferentes métodos de medida se mostraram complementares para a avaliação dos membros superiores durante as tarefas de manuseio.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Once multi-relational approach has emerged as an alternative for analyzing structured data such as relational databases, since they allow applying data mining in multiple tables directly, thus avoiding expensive joining operations and semantic losses, this work proposes an algorithm with multi-relational approach. Methods: Aiming to compare traditional approach performance and multi-relational for mining association rules, this paper discusses an empirical study between PatriciaMine - an traditional algorithm - and its corresponding multi-relational proposed, MR-Radix. Results: This work showed advantages of the multi-relational approach in performance over several tables, which avoids the high cost for joining operations from multiple tables and semantic losses. The performance provided by the algorithm MR-Radix shows faster than PatriciaMine, despite handling complex multi-relational patterns. The utilized memory indicates a more conservative growth curve for MR-Radix than PatriciaMine, which shows the increase in demand of frequent items in MR-Radix does not result in a significant growth of utilized memory like in PatriciaMine. Conclusion: The comparative study between PatriciaMine and MR-Radix confirmed efficacy of the multi-relational approach in data mining process both in terms of execution time and in relation to memory usage. Besides that, the multi-relational proposed algorithm, unlike other algorithms of this approach, is efficient for use in large relational databases.