952 resultados para Dynamic data set visualization
Resumo:
Speckle is being used as a characterization tool for the analysis of the dynamic of slow varying phenomena occurring in biological and industrial samples. The retrieved data takes the form of a sequence of speckle images. The analysis of these images should reveal the inner dynamic of the biological or physical process taking place in the sample. Very recently, it has been shown that principal component analysis is able to split the original data set in a collection of classes. These classes can be related with the dynamic of the observed phenomena. At the same time, statistical descriptors of biospeckle images have been used to retrieve information on the characteristics of the sample. These statistical descriptors can be calculated in almost real time and provide a fast monitoring of the sample. On the other hand, principal component analysis requires longer computation time but the results contain more information related with spatial-temporal pattern that can be identified with physical process. This contribution merges both descriptions and uses principal component analysis as a pre-processing tool to obtain a collection of filtered images where a simpler statistical descriptor can be calculated. The method has been applied to slow-varying biological and industrial processes
Resumo:
In 2005, the University of Maryland acquired over 70 digital videos spanning 35 years of Jim Henson’s groundbreaking work in television and film. To support in-house discovery and use, the collection was cataloged in detail using AACR2 and MARC21, and a web-based finding aid was also created. In the past year, I created an "r-ball" (a linked data set described using RDA) of these same resources. The presentation will compare and contrast these three ways of accessing the Jim Henson Works collection, with insights gleaned from providing resource discovery using RIMMF (RDA in Many Metadata Formats).
Resumo:
Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.
Resumo:
Diagnostic methods have been an important tool in regression analysis to detect anomalies, such as departures from error assumptions and the presence of outliers and influential observations with the fitted models. Assuming censored data, we considered a classical analysis and Bayesian analysis assuming no informative priors for the parameters of the model with a cure fraction. A Bayesian approach was considered by using Markov Chain Monte Carlo Methods with Metropolis-Hasting algorithms steps to obtain the posterior summaries of interest. Some influence methods, such as the local influence, total local influence of an individual, local influence on predictions and generalized leverage were derived, analyzed and discussed in survival data with a cure fraction and covariates. The relevance of the approach was illustrated with a real data set, where it is shown that, by removing the most influential observations, the decision about which model best fits the data is changed.
Resumo:
Introduction: Work disability is a major consequence of rheumatoid arthritis (RA), associated not only with traditional disease activity variables, but also more significantly with demographic, functional, occupational, and societal variables. Recent reports suggest that the use of biologic agents offers potential for reduced work disability rates, but the conclusions are based on surrogate disease activity measures derived from studies primarily from Western countries. Methods: The Quantitative Standard Monitoring of Patients with RA (QUEST-RA) multinational database of 8,039 patients in 86 sites in 32 countries, 16 with high gross domestic product (GDP) (>24K US dollars (USD) per capita) and 16 low-GDP countries (<11K USD), was analyzed for work and disability status at onset and over the course of RA and clinical status of patients who continued working or had stopped working in high-GDP versus low-GDP countries according to all RA Core Data Set measures. Associations of work disability status with RA Core Data Set variables and indices were analyzed using descriptive statistics and regression analyses. Results: At the time of first symptoms, 86% of men (range 57%-100% among countries) and 64% (19%-87%) of women <65 years were working. More than one third (37%) of these patients reported subsequent work disability because of RA. Among 1,756 patients whose symptoms had begun during the 2000s, the probabilities of continuing to work were 80% (95% confidence interval (CI) 78%-82%) at 2 years and 68% (95% CI 65%-71%) at 5 years, with similar patterns in high-GDP and low-GDP countries. Patients who continued working versus stopped working had significantly better clinical status for all clinical status measures and patient self-report scores, with similar patterns in high-GDP and low-GDP countries. However, patients who had stopped working in high-GDP countries had better clinical status than patients who continued working in low-GDP countries. The most significant identifier of work disability in all subgroups was Health Assessment Questionnaire (HAQ) functional disability score. Conclusions: Work disability rates remain high among people with RA during this millennium. In low-GDP countries, people remain working with high levels of disability and disease activity. Cultural and economic differences between societies affect work disability as an outcome measure for RA.
Resumo:
Interval-censored survival data, in which the event of interest is not observed exactly but is only known to occur within some time interval, occur very frequently. In some situations, event times might be censored into different, possibly overlapping intervals of variable widths; however, in other situations, information is available for all units at the same observed visit time. In the latter cases, interval-censored data are termed grouped survival data. Here we present alternative approaches for analyzing interval-censored data. We illustrate these techniques using a survival data set involving mango tree lifetimes. This study is an example of grouped survival data.
Resumo:
This paper proposes a regression model considering the modified Weibull distribution. This distribution can be used to model bathtub-shaped failure rate functions. Assuming censored data, we consider maximum likelihood and Jackknife estimators for the parameters of the model. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and we also present some ways to perform global influence. Besides, for different parameter settings, sample sizes and censoring percentages, various simulations are performed and the empirical distribution of the modified deviance residual is displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended for a martingale-type residual in log-modified Weibull regression models with censored data. Finally, we analyze a real data set under log-modified Weibull regression models. A diagnostic analysis and a model checking based on the modified deviance residual are performed to select appropriate models. (c) 2008 Elsevier B.V. All rights reserved.
Resumo:
In this study, regression models are evaluated for grouped survival data when the effect of censoring time is considered in the model and the regression structure is modeled through four link functions. The methodology for grouped survival data is based on life tables, and the times are grouped in k intervals so that ties are eliminated. Thus, the data modeling is performed by considering the discrete models of lifetime regression. The model parameters are estimated by using the maximum likelihood and jackknife methods. To detect influential observations in the proposed models, diagnostic measures based on case deletion, which are denominated global influence, and influence measures based on small perturbations in the data or in the model, referred to as local influence, are used. In addition to those measures, the local influence and the total influential estimate are also employed. Various simulation studies are performed and compared to the performance of the four link functions of the regression models for grouped survival data for different parameter settings, sample sizes and numbers of intervals. Finally, a data set is analyzed by using the proposed regression models. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
A four-parameter extension of the generalized gamma distribution capable of modelling a bathtub-shaped hazard rate function is defined and studied. The beauty and importance of this distribution lies in its ability to model monotone and non-monotone failure rate functions, which are quite common in lifetime data analysis and reliability. The new distribution has a number of well-known lifetime special sub-models, such as the exponentiated Weibull, exponentiated generalized half-normal, exponentiated gamma and generalized Rayleigh, among others. We derive two infinite sum representations for its moments. We calculate the density of the order statistics and two expansions for their moments. The method of maximum likelihood is used for estimating the model parameters and the observed information matrix is obtained. Finally, a real data set from the medical area is analysed.
Resumo:
Joint generalized linear models and double generalized linear models (DGLMs) were designed to model outcomes for which the variability can be explained using factors and/or covariates. When such factors operate, the usual normal regression models, which inherently exhibit constant variance, will under-represent variation in the data and hence may lead to erroneous inferences. For count and proportion data, such noise factors can generate a so-called overdispersion effect, and the use of binomial and Poisson models underestimates the variability and, consequently, incorrectly indicate significant effects. In this manuscript, we propose a DGLM from a Bayesian perspective, focusing on the case of proportion data, where the overdispersion can be modeled using a random effect that depends on some noise factors. The posterior joint density function was sampled using Monte Carlo Markov Chain algorithms, allowing inferences over the model parameters. An application to a data set on apple tissue culture is presented, for which it is shown that the Bayesian approach is quite feasible, even when limited prior information is available, thereby generating valuable insight for the researcher about its experimental results.
Resumo:
Grass reference evapotranspiration (ETo) is an important agrometeorological parameter for climatological and hydrological studies, as well as for irrigation planning and management. There are several methods to estimate ETo, but their performance in different environments is diverse, since all of them have some empirical background. The FAO Penman-Monteith (FAD PM) method has been considered as a universal standard to estimate ETo for more than a decade. This method considers many parameters related to the evapotranspiration process: net radiation (Rn), air temperature (7), vapor pressure deficit (Delta e), and wind speed (U); and has presented very good results when compared to data from lysimeters Populated with short grass or alfalfa. In some conditions, the use of the FAO PM method is restricted by the lack of input variables. In these cases, when data are missing, the option is to calculate ETo by the FAD PM method using estimated input variables, as recommended by FAD Irrigation and Drainage Paper 56. Based on that, the objective of this study was to evaluate the performance of the FAO PM method to estimate ETo when Rn, Delta e, and U data are missing, in Southern Ontario, Canada. Other alternative methods were also tested for the region: Priestley-Taylor, Hargreaves, and Thornthwaite. Data from 12 locations across Southern Ontario, Canada, were used to compare ETo estimated by the FAD PM method with a complete data set and with missing data. The alternative ETo equations were also tested and calibrated for each location. When relative humidity (RH) and U data were missing, the FAD PM method was still a very good option for estimating ETo for Southern Ontario, with RMSE smaller than 0.53 mm day(-1). For these cases, U data were replaced by the normal values for the region and Delta e was estimated from temperature data. The Priestley-Taylor method was also a good option for estimating ETo when U and Delta e data were missing, mainly when calibrated locally (RMSE = 0.40 mm day(-1)). When Rn was missing, the FAD PM method was not good enough for estimating ETo, with RMSE increasing to 0.79 mm day(-1). When only T data were available, adjusted Hargreaves and modified Thornthwaite methods were better options to estimate ETo than the FAO) PM method, since RMSEs from these methods, respectively 0.79 and 0.83 mm day(-1), were significantly smaller than that obtained by FAO PM (RMSE = 1.12 mm day(-1). (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
This article presents a statistical model of agricultural yield data based on a set of hierarchical Bayesian models that allows joint modeling of temporal and spatial autocorrelation. This method captures a comprehensive range of the various uncertainties involved in predicting crop insurance premium rates as opposed to the more traditional ad hoc, two-stage methods that are typically based on independent estimation and prediction. A panel data set of county-average yield data was analyzed for 290 counties in the State of Parana (Brazil) for the period of 1990 through 2002. Posterior predictive criteria are used to evaluate different model specifications. This article provides substantial improvements in the statistical and actuarial methods often applied to the calculation of insurance premium rates. These improvements are especially relevant to situations where data are limited.
Resumo:
This paper discusses a multi-layer feedforward (MLF) neural network incident detection model that was developed and evaluated using field data. In contrast to published neural network incident detection models which relied on simulated or limited field data for model development and testing, the model described in this paper was trained and tested on a real-world data set of 100 incidents. The model uses speed, flow and occupancy data measured at dual stations, averaged across all lanes and only from time interval t. The off-line performance of the model is reported under both incident and non-incident conditions. The incident detection performance of the model is reported based on a validation-test data set of 40 incidents that were independent of the 60 incidents used for training. The false alarm rates of the model are evaluated based on non-incident data that were collected from a freeway section which was video-taped for a period of 33 days. A comparative evaluation between the neural network model and the incident detection model in operation on Melbourne's freeways is also presented. The results of the comparative performance evaluation clearly demonstrate the substantial improvement in incident detection performance obtained by the neural network model. The paper also presents additional results that demonstrate how improvements in model performance can be achieved using variable decision thresholds. Finally, the model's fault-tolerance under conditions of corrupt or missing data is investigated and the impact of loop detector failure/malfunction on the performance of the trained model is evaluated and discussed. The results presented in this paper provide a comprehensive evaluation of the developed model and confirm that neural network models can provide fast and reliable incident detection on freeways. (C) 1997 Elsevier Science Ltd. All rights reserved.
Resumo:
An analysis of the relationships of the major arthropod groups Was undertaken using mitochondrial genome data to examine the hypotheses that Hexapoda is polyphyletic and that Collembola is more closely related to branchiopod crustaceans than insects. We sought to examine the sensitivity of this relationship to outgroup choice, data treatment. gene choice and optimality criteria used in the phylogenetic analysis of mitochondrial genome data. Additionally we sequenced the mitochondrial genome of ail archaeognathan, Nesomachilis australica. to improve taxon selection in the apterygote insects, a group poorly represented in previous mitochondrial phylogenies. The sister group of the Collembola was rarely resolved in our analyses with a significant level of support. The use of different outgroups (myriapods, nematodes, or annelids + mollusks) resulted in many different placements of Collembola. The way in which the dataset was coded for analysis (DNA, DNA with the exclusion of third codon position and as amino acids) also had marked affects on tree topology. We found that nodal Support was spread evenly throughout the 13 mitochondrial genes and the exclusion of genes resulted in significantly less resolution in the inferred trees. Optimality criteria had a much lesser effect on topology than the preceding factors; parsimony and Bayesian trees for a given data set and treatment were quite similar. We therefore conclude that the relationships of the extant arthropod groups as inferred by mitochondrial genomes are highly vulnerable to outgroup choice, data treatment and gene choice, and no consistent alternative hypothesis of Collembola's relationships is supported. Pending the resolution of these identified problems with the application of mitogenomic data to basal arthropod relationships, it is difficult to justify the rejection of hexapod monophyly, which is well supported on morphological grounds. (c) The Willi Hennig Society 2004.
Resumo:
Multitemporal Landsat Thematic Mapper (TM) and Enhanced Thematic Mapper Plus (ETM+) imagery was used to assess coastline morphological changes in southeastern Brazil. A spectral linear mixing approach (SLMA) was used to estimate fraction imagery representing amounts of vegetation, clean water (a proxy for shade) and soil. Fraction abundances were related to erosive and depositional features. Shoreline, sandy banks (including emerged and submerged banks) and sand spits were highlighted mainly by clean water and soil fraction imagery. To evaluate changes in the coastline geomorphic features, the fraction imagery generated for each data set was classified in a contextual approach using a segmentation technique and ISOSEG, an unsupervised classification. Evaluation of the classifications was performed visually and by an error matrix relating ground-truth data to classification results. Comparison of the classification results revealed an intense transformation in the coastline, and that erosive and depositional features are extremely dynamic and subject to change in short periods of time.