952 resultados para Dynamic data set visualization
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
The analysis of large amounts of data is better performed by humans when represented in a graphical format. Therefore, a new research area called the Visual Data Mining is being developed endeavoring to use the number crunching power of computers to prepare data for visualization, allied to the ability of humans to interpret data presented graphically.This work presents the results of applying a visual data mining tool, called FastMapDB to detect the behavioral pattern exhibited by a dataset of clinical information about hemoglobinopathies known as thalassemia. FastMapDB is a visual data mining tool that get tabular data stored in a relational database such as dates, numbers and texts, and by considering them as points in a multidimensional space, maps them to a three-dimensional space. The intuitive three-dimensional representation of objects enables a data analyst to see the behavior of the characteristics from abnormal forms of hemoglobin, highlighting the differences when compared to data from a group without alteration.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Objetivou-se com este trabalho estimar a influência da idade da vaca ao parto (IDV) e da data juliana de nascimento (DJN) sobre o peso à desmama (PD) e a média do ganho diário no período pré-desmama (GMD) de bezerros Gir, determinando fatores de correção para estes efeitos. Foram analisados 10.685 e 18.339 dados de PD e GMD de bezerros Gir, provenientes do Arquivo da Associação Brasileira dos Criadores de Zebu (ABCZ), pertencentes a 1229 e 1979 grupos contemporâneos (GC), respectivamente. PD e GMD foram pré-ajustados para o efeito da idade do bezerro à desmama. O efeito de IDV sobre PD e GMD foi modelado como polinômio segmentado quadrático-quadrático-quadrático, com nós, ou pontos de junção aos 4,1; 12,7 e 4,0; 8,2 anos, respectivamente, para machos e como polinômio segmentado quadrático-quadrático, com nó, ou ponto de junção aos 3,8 anos, para fêmeas sobre as duas características. A DJN foi modelada como um polinômio segmentado quadrático-quadrático com nó aos 126 dias para PD e 167 dias para GMD. Os resultados mostraram que a determinação dos fatores de correção para IDV deve ser feita, separadamente, para machos e fêmeas e, para DJN, deve-se considerar cada estação do ano, para que as diferenças entre elas sejam bem observadas. Os fatores de correção para o efeito da idade da vaca variaram de 0,94750 a 1,08033 sobre PD e 0,91714 a 1,07689 sobre GMD, para machos, e de 0,90937 a 1,07415 sobre PD e 0,96055 a 1,14007 sobre GMD, para fêmeas. Para o efeito de DJN, a amplitude foi de 0,9256 a 1,0340 sobre PD e 0,9112 a 1,0551 sobre GMD.
Resumo:
As pescarias no reservatório da UHE-Tucuruí no rio Tocantins, Pará, envolvem cerca de 6.000 pescadores e movimentam cerca de R$ 4,2 milhões/ano. A atividade se concentra em três espécies principais: tucunaré Cichla monoculus (capturado com anzol), pescada Plagioscion squamosissimus (capturado com rede e/ou anzol) e mapará Hypophthalmus marginatus (capturado com rede). Com o objetivo de caracterizar os pescadores e as pescarias do reservatório, criar cenários de aumento do esforço pesqueiro e prever os momentos de conflito pela escassez de recursos, foram levantadas informações da literatura e realizadas duas campanhas de coleta de dados nos anos de 1999 e 2000, envolvendo entrevistas com líderes comunitários e pescadores. As seguintes variáveis foram consideradas: desembarque por espécie-alvo (de acordo com os registros fornecidos pelas colônias de pescadores), artes de pesca, estratégias dos pescadores, conflitos e formas de apropriação do espaço e rendimentos da atividade. Estas variáveis foram inseridas em um modelo dinâmico, simulado no software Vensim PLE para um período de 10 anos a partir de 1999. Os resultados indicam que a pesca de anzol é a estratégia mais rentável, e que possíveis momentos de conflito devido à escassez de recursos podem acontecer em curto prazo (2005). A metodologia utilizada para as simulações e análises de risco também se revelou adequada à realidade local e ao conjunto de dados disponíveis.
Resumo:
The objective of this work was to evaluate the Nelore beef cattle, growth curve parameters using the Von Bertalanffy function in a nested Bayesian procedure that allowed estimation of the joint posterior distribution of growth curve parameters, their (co)variance components, and the environmental and additive genetic components affecting them. A hierarchical model was applied; each individual had a growth trajectory described by the nonlinear function, and each parameter of this function was considered to be affected by genetic and environmental effects that were described by an animal model. Random samples of the posterior distributions were drawn using Gibbs sampling and Metropolis-Hastings algorithms. The data set consisted of a total of 145,961 BW recorded from 15,386 animals. Even though the curve parameters were estimated for animals with few records, given that the information from related animals and the structure of systematic effects were considered in the curve fitting, all mature BW predicted were suitable. A large additive genetic variance for mature BW was observed. The parameter a of growth curves, which represents asymptotic adult BW, could be used as a selection criterion to control increases in adult BW when selecting for growth rate. The effect of maternal environment on growth was carried through to maturity and should be considered when evaluating adult BW. Other growth curve parameters showed small additive genetic and maternal effects. Mature BW and parameter k, related to the slope of the curve, presented a large, positive genetic correlation. The results indicated that selection for growth rate would increase adult BW without substantially changing the shape of the growth curve. Selection to change the slope of the growth curve without modifying adult BW would be inefficient because their genetic correlation is large. However, adult BW could be considered in a selection index with its corresponding economic weight to improve the overall efficiency of beef cattle production.
Resumo:
Bit performance prediction has been a challenging problem for the petroleum industry. It is essential in cost reduction associated with well planning and drilling performance prediction, especially when rigs leasing rates tend to follow the projects-demand and barrel-price rises. A methodology to model and predict one of the drilling bit performance evaluator, the Rate of Penetration (ROP), is presented herein. As the parameters affecting the ROP are complex and their relationship not easily modeled, the application of a Neural Network is suggested. In the present work, a dynamic neural network, based on the Auto-Regressive with Extra Input Signals model, or ARX model, is used to approach the ROP modeling problem. The network was applied to a real oil offshore field data set, consisted of information from seven wells drilled with an equal-diameter bit.
Resumo:
In geophysics and seismology, raw data need to be processed to generate useful information that can be turned into knowledge by researchers. The number of sensors that are acquiring raw data is increasing rapidly. Without good data management systems, more time can be spent in querying and preparing datasets for analyses than in acquiring raw data. Also, a lot of good quality data acquired at great effort can be lost forever if they are not correctly stored. Local and international cooperation will probably be reduced, and a lot of data will never become scientific knowledge. For this reason, the Seismological Laboratory of the Institute of Astronomy, Geophysics and Atmospheric Sciences at the University of São Paulo (IAG-USP) has concentrated fully on its data management system. This report describes the efforts of the IAG-USP to set up a seismology data management system to facilitate local and international cooperation. © 2011 by the Istituto Nazionale di Geofisica e Vulcanologia. All rights reserved.
Resumo:
This paper presents a method for indirect orientation of aerial images using ground control lines extracted from airborne Laser system (ALS) data. This data integration strategy has shown good potential in the automation of photogrammetric tasks, including the indirect orientation of images. The most important characteristic of the proposed approach is that the exterior orientation parameters (EOP) of a single or multiple images can be automatically computed with a space resection procedure from data derived from different sensors. The suggested method works as follows. Firstly, the straight lines are automatically extracted in the digital aerial image (s) and in the intensity image derived from an ALS data-set (S). Then, correspondence between s and S is automatically determined. A line-based coplanarity model that establishes the relationship between straight lines in the object and in the image space is used to estimate the EOP with the iterated extended Kalman filtering (IEKF). Implementation and testing of the method have employed data from different sensors. Experiments were conducted to assess the proposed method and the results obtained showed that the estimation of the EOP is function of ALS positional accuracy.
Resumo:
Semi-supervised learning is applied to classification problems where only a small portion of the data items is labeled. In these cases, the reliability of the labels is a crucial factor, because mislabeled items may propagate wrong labels to a large portion or even the entire data set. This paper aims to address this problem by presenting a graph-based (network-based) semi-supervised learning method, specifically designed to handle data sets with mislabeled samples. The method uses teams of walking particles, with competitive and cooperative behavior, for label propagation in the network constructed from the input data set. The proposed model is nature-inspired and it incorporates some features to make it robust to a considerable amount of mislabeled data items. Computer simulations show the performance of the method in the presence of different percentage of mislabeled data, in networks of different sizes and average node degree. Importantly, these simulations reveals the existence of the critical points of the mislabeled subset size, below which the network is free of wrong label contamination, but above which the mislabeled samples start to propagate their labels to the rest of the network. Moreover, numerical comparisons have been made among the proposed method and other representative graph-based semi-supervised learning methods using both artificial and real-world data sets. Interestingly, the proposed method has increasing better performance than the others as the percentage of mislabeled samples is getting larger. © 2012 IEEE.
Resumo:
The use of saturated two-level designs is very popular, especially in industrial applications where the cost of experiments is too high. Standard classical approaches are not appropriate to analyze data from saturated designs, since we could only get the estimates of the main factor effects and we would not have degrees of freedom to estimate the variance of the error. In this paper, we propose the use of empirical Bayesian procedures to get inferences for data obtained from saturated designs. The proposed methodology is illustrated assuming a simulated data set. © 2013 Growing Science Ltd. All rights reserved.
Resumo:
Pós-graduação em Ciências Cartográficas - FCT
Resumo:
Pós-graduação em Educação Matemática - IGCE
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Este artigo apresenta um estudo experimental de técnicas de identificação paramétrica aplicadas à modelagem dinâmica de um servidor web Apache. Foi desenvolvido um arranjo experimental para simular variações de carga no servidor. O arranjo é composto por dois computadores PC, sendo um deles utilizado para executar o servidor Apache e o outro utilizado como um gerador de carga, solicitando requisições de serviço ao servidor Apache. Foram estimados modelos paramétricos auto-regressivos (AR) para diferentes pontos de operação e de condição de carga. Cada ponto de operação foi definido em termos dos valores médios para o parâmetro de entrada MaxClients (parâmetro utilizado para definir o número máximo de processos ativos) e a saída percentual de consumo de CPU (Central Processing Unit) do servidor Apache. Para cada ponto de operação foram coletadas 600 amostras, com um intervalo de amostragem de 5 segundos. Metade do conjunto de amostras coletadas em cada ponto de operação foi utilizada para estimação do modelo, enquanto que a outra metade foi utilizada para validação. Um estudo da ordem mais adequada do modelo mostrou que, para um ponto de operação com valor reduzido de MaxClients, um modelo AR de 7a ordem pode ser satisfatório. Para valores mais elevados de MaxClients, os resultados mostraram que são necessários modelos de ordem mais elevada, devido às não-linearidades inerentes ao sistema.