Biblioteca Digital

15 resultados para DATA SET

em Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho"

Multidimensional cluster stability analysis from a Brazilian Bradyrhizobium sp RFLP/PCR data set

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The taxonomy of the N(2)-fixing bacteria belonging to the genus Bradyrhizobium is still poorly refined, mainly due to conflicting results obtained by the analysis of the phenotypic and genotypic properties. This paper presents an application of a method aiming at the identification of possible new clusters within a Brazilian collection of 119 Bradryrhizobium strains showing phenotypic characteristics of B. japonicum and B. elkanii. The stability was studied as a function of the number of restriction enzymes used in the RFLP-PCR analysis of three ribosomal regions with three restriction enzymes per region. The method proposed here uses Clustering algorithms with distances calculated by average-linkage clustering. Introducing perturbations using sub-sampling techniques makes the stability analysis. The method showed efficacy in the grouping of the species B. japonicum and B. elkanii. Furthermore, two new clusters were clearly defined, indicating possible new species, and sub-clusters within each detected cluster. (C) 2008 Elsevier B.V. All rights reserved.

Combined search for the standard model Higgs boson decaying to bb̄ using the D0 run II data set

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present the results of the combination of searches for the standard model Higgs boson produced in association with a W or Z boson and decaying into bb̄ using the data sample collected with the D0 detector in pp̄ collisions at √s=1.96TeV at the Fermilab Tevatron Collider. We derive 95% C.L. upper limits on the Higgs boson cross section relative to the standard model prediction in the mass range 100GeV≤M H≤150GeV, and we exclude Higgs bosons with masses smaller than 102 GeV at the 95% C.L. In the mass range 120GeV≤M H≤145GeV, the data exhibit an excess above the background prediction with a global significance of 1.5 standard deviations, consistent with the expectation in the presence of a standard model Higgs boson. © 2012 American Physical Society.

Influence of data structure on the estimation of the additive genetic direct and maternal covariance for early growth traits in Nellore cattle

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The objective of the present study was to investigate the effect of data structure on estimated genetic parameters and predicted breeding values of direct and maternal genetic effects for weaning weight (WW) and weight gain from birth to weaning (BWG), including or not the genetic covariance between direct and maternal effects. Records of 97,490 Nellore animals born between 1993 and 2006, from the Jacarezinho cattle raising farm, were used. Two different data sets were analyzed: DI_all, which included all available progenies of dams without their own performance; DII_all, which included DI_all + 20% of recorded progenies with maternal phenotypes. Two subsets were obtained from each data set (DI_all and DII_all): DI_1 and DII_1, which included only dams with three or fewer progenies; DI_5 and DII_5, which included only dams with five or more progenies. (Co)variance components and heritabilities were estimated by Bayesian inference through Gibbs sampling using univariate animal models. In general, for the population and traits studied, the proportion of dams with known phenotypic information and the number of progenies per dam influenced direct and maternal heritabilities, as well as the contribution of maternal permanent environmental variance to phenotypic variance. Only small differences were observed in the genetic and environmental parameters when the genetic covariance between direct and maternal effects was set to zero in the data sets studied. Thus, the inclusion or not of the genetic covariance between direct and maternal effects had little effect on the ranking of animals according to their breeding values for WW and BWG. Accurate estimation of genetic correlations between direct and maternal genetic effects depends on the data structure. Thus, this covariance should be set to zero in Nellore data sets in which the proportion of dams with phenotypic information is low, the number of progenies per dam is small, and pedigree relationships are poorly known. (c) 2012 Elsevier B.V. All rights reserved.

Modeling grouped survival data with time-dependent covariates

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this article, proportional hazards and logistic models for grouped survival data were extended to incorporate time-dependent covariates. The extension was motivated by a forestry experiment designed to compare five different water stresses in Eucalyptus grandis seedlings. The response was the seedling lifetime. The data set was grouped since there were just three occasions in which the seedlings was visited by the researcher. In each of these occasions also the shoot height was measured and therefore it is a time-dependent covariate. Both extended models were used in this example, and the results were very similar.

A Generalized Log-Normal Model for Grouped Survival Data

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Search for associated production of charginos and neutralinos in the trilepton final state using 2.3 fb(-1) of data

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Efeito da Idade da Vaca ao Parto e da Data Juliana de Nascimento sobre Características Pré-desmama de Bezerros da Raça Gir

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Objetivou-se com este trabalho estimar a influência da idade da vaca ao parto (IDV) e da data juliana de nascimento (DJN) sobre o peso à desmama (PD) e a média do ganho diário no período pré-desmama (GMD) de bezerros Gir, determinando fatores de correção para estes efeitos. Foram analisados 10.685 e 18.339 dados de PD e GMD de bezerros Gir, provenientes do Arquivo da Associação Brasileira dos Criadores de Zebu (ABCZ), pertencentes a 1229 e 1979 grupos contemporâneos (GC), respectivamente. PD e GMD foram pré-ajustados para o efeito da idade do bezerro à desmama. O efeito de IDV sobre PD e GMD foi modelado como polinômio segmentado quadrático-quadrático-quadrático, com nós, ou pontos de junção aos 4,1; 12,7 e 4,0; 8,2 anos, respectivamente, para machos e como polinômio segmentado quadrático-quadrático, com nó, ou ponto de junção aos 3,8 anos, para fêmeas sobre as duas características. A DJN foi modelada como um polinômio segmentado quadrático-quadrático com nó aos 126 dias para PD e 167 dias para GMD. Os resultados mostraram que a determinação dos fatores de correção para IDV deve ser feita, separadamente, para machos e fêmeas e, para DJN, deve-se considerar cada estação do ano, para que as diferenças entre elas sejam bem observadas. Os fatores de correção para o efeito da idade da vaca variaram de 0,94750 a 1,08033 sobre PD e 0,91714 a 1,07689 sobre GMD, para machos, e de 0,90937 a 1,07415 sobre PD e 0,96055 a 1,14007 sobre GMD, para fêmeas. Para o efeito de DJN, a amplitude foi de 0,9256 a 1,0340 sobre PD e 0,9112 a 1,0551 sobre GMD.

Analysis of beef cattle longitudinal data applying a nonlinear model

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The objective of this work was to evaluate the Nelore beef cattle, growth curve parameters using the Von Bertalanffy function in a nested Bayesian procedure that allowed estimation of the joint posterior distribution of growth curve parameters, their (co)variance components, and the environmental and additive genetic components affecting them. A hierarchical model was applied; each individual had a growth trajectory described by the nonlinear function, and each parameter of this function was considered to be affected by genetic and environmental effects that were described by an animal model. Random samples of the posterior distributions were drawn using Gibbs sampling and Metropolis-Hastings algorithms. The data set consisted of a total of 145,961 BW recorded from 15,386 animals. Even though the curve parameters were estimated for animals with few records, given that the information from related animals and the structure of systematic effects were considered in the curve fitting, all mature BW predicted were suitable. A large additive genetic variance for mature BW was observed. The parameter a of growth curves, which represents asymptotic adult BW, could be used as a selection criterion to control increases in adult BW when selecting for growth rate. The effect of maternal environment on growth was carried through to maturity and should be considered when evaluating adult BW. Other growth curve parameters showed small additive genetic and maternal effects. Mature BW and parameter k, related to the slope of the curve, presented a large, positive genetic correlation. The results indicated that selection for growth rate would increase adult BW without substantially changing the shape of the growth curve. Selection to change the slope of the growth curve without modifying adult BW would be inefficient because their genetic correlation is large. However, adult BW could be considered in a selection index with its corresponding economic weight to improve the overall efficiency of beef cattle production.

Conceptual model for adaptable and extensible visual data exploration

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Interactive visual representations complement traditional statistical and machine learning techniques for data analysis, allowing users to play a more active role in a knowledge discovery process and making the whole process more understandable. Though visual representations are applicable to several stages of the knowledge discovery process, a common use of visualization is in the initial stages to explore and organize a sometimes unknown and complex data set. In this context, the integrated and coordinated - that is, user actions should be capable of affecting multiple visualizations when desired - use of multiple graphical representations allows data to be observed from several perspectives and offers richer information than isolated representations. In this paper we propose an underlying model for an extensible and adaptable environment that allows independently developed visualization components to be gradually integrated into a user configured knowledge discovery application. Because a major requirement when using multiple visual techniques is the ability to link amongst them, so that user actions executed on a representation propagate to others if desired, the model also allows runtime configuration of coordinated user actions over different visual representations. We illustrate how this environment is being used to assist data exploration and organization in a climate classification problem.

The Brazilian Seismographic Integrated Systems (BRASIS): Infrastructure and data management

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In geophysics and seismology, raw data need to be processed to generate useful information that can be turned into knowledge by researchers. The number of sensors that are acquiring raw data is increasing rapidly. Without good data management systems, more time can be spent in querying and preparing datasets for analyses than in acquiring raw data. Also, a lot of good quality data acquired at great effort can be lost forever if they are not correctly stored. Local and international cooperation will probably be reduced, and a lot of data will never become scientific knowledge. For this reason, the Seismological Laboratory of the Institute of Astronomy, Geophysics and Atmospheric Sciences at the University of São Paulo (IAG-USP) has concentrated fully on its data management system. This report describes the efforts of the IAG-USP to set up a seismology data management system to facilitate local and international cooperation. © 2011 by the Istituto Nazionale di Geofisica e Vulcanologia. All rights reserved.

A semi-automatic method for indirect orientation of aerial images using ground control lines extracted from airborne laser scanner data

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents a method for indirect orientation of aerial images using ground control lines extracted from airborne Laser system (ALS) data. This data integration strategy has shown good potential in the automation of photogrammetric tasks, including the indirect orientation of images. The most important characteristic of the proposed approach is that the exterior orientation parameters (EOP) of a single or multiple images can be automatically computed with a space resection procedure from data derived from different sensors. The suggested method works as follows. Firstly, the straight lines are automatically extracted in the digital aerial image (s) and in the intensity image derived from an ALS data-set (S). Then, correspondence between s and S is automatically determined. A line-based coplanarity model that establishes the relationship between straight lines in the object and in the image space is used to estimate the EOP with the iterated extended Kalman filtering (IEKF). Implementation and testing of the method have employed data from different sensors. Experiments were conducted to assess the proposed method and the results obtained showed that the estimation of the EOP is function of ALS positional accuracy.

Particle competition and cooperation to prevent error propagation from mislabeled data in semi-supervised learning

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Semi-supervised learning is applied to classification problems where only a small portion of the data items is labeled. In these cases, the reliability of the labels is a crucial factor, because mislabeled items may propagate wrong labels to a large portion or even the entire data set. This paper aims to address this problem by presenting a graph-based (network-based) semi-supervised learning method, specifically designed to handle data sets with mislabeled samples. The method uses teams of walking particles, with competitive and cooperative behavior, for label propagation in the network constructed from the input data set. The proposed model is nature-inspired and it incorporates some features to make it robust to a considerable amount of mislabeled data items. Computer simulations show the performance of the method in the presence of different percentage of mislabeled data, in networks of different sizes and average node degree. Importantly, these simulations reveals the existence of the critical points of the mislabeled subset size, below which the network is free of wrong label contamination, but above which the mislabeled samples start to propagate their labels to the rest of the network. Moreover, numerical comparisons have been made among the proposed method and other representative graph-based semi-supervised learning methods using both artificial and real-world data sets. Interestingly, the proposed method has increasing better performance than the others as the percentage of mislabeled samples is getting larger. © 2012 IEEE.

A useful empirical bayesian method to analyse industrial data from saturated factorial designs

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The use of saturated two-level designs is very popular, especially in industrial applications where the cost of experiments is too high. Standard classical approaches are not appropriate to analyze data from saturated designs, since we could only get the estimates of the main factor effects and we would not have degrees of freedom to estimate the variance of the error. In this paper, we propose the use of empirical Bayesian procedures to get inferences for data obtained from saturated designs. The proposed methodology is illustrated assuming a simulated data set. © 2013 Growing Science Ltd. All rights reserved.

Comparison of snow data assimilation system with GPS reflectometry snow depth in the Western United States

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

The evolution of phylogeographic data sets

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Empirical phylogeographic studies have progressively sampled greater numbers of loci over time, in part motivated by theoretical papers showing that estimates of key demographic parameters improve as the number of loci increases. Recently, next-generation sequencing has been applied to questions about organismal history, with the promise of revolutionizing the field. However, no systematic assessment of how phylogeographic data sets have changed over time with respect to overall size and information content has been performed. Here, we quantify the changing nature of these genetic data sets over the past 20years, focusing on papers published in Molecular Ecology. We found that the number of independent loci, the total number of alleles sampled and the total number of single nucleotide polymorphisms (SNPs) per data set has improved over time, with particularly dramatic increases within the past 5years. Interestingly, uniparentally inherited organellar markers (e.g. animal mitochondrial and plant chloroplast DNA) continue to represent an important component of phylogeographic data. Single-species studies (cf. comparative studies) that focus on vertebrates (particularly fish and to some extent, birds) represent the gold standard of phylogeographic data collection. Based on the current trajectory seen in our survey data, forecast modelling indicates that the median number of SNPs per data set for studies published by the end of the year 2016 may approach similar to 20000. This survey provides baseline information for understanding the evolution of phylogeographic data sets and underscores the fact that development of analytical methods for handling very large genetic data sets will be critical for facilitating growth of the field.