311 resultados para outliers
Resumo:
Identification and classification of overlapping nodes in networks are important topics in data mining. In this paper, a network-based (graph-based) semi-supervised learning method is proposed. It is based on competition and cooperation among walking particles in a network to uncover overlapping nodes by generating continuous-valued outputs (soft labels), corresponding to the levels of membership from the nodes to each of the communities. Moreover, the proposed method can be applied to detect overlapping data items in a data set of general form, such as a vector-based data set, once it is transformed to a network. Usually, label propagation involves risks of error amplification. In order to avoid this problem, the proposed method offers a mechanism to identify outliers among the labeled data items, and consequently prevents error propagation from such outliers. Computer simulations carried out for synthetic and real-world data sets provide a numeric quantification of the performance of the method. © 2012 Springer-Verlag.
Resumo:
The meta-analysis was used to evaluate the performance of piglets in post-weaning period, without imposition of sanitary challenge and fed diets containing blood plasma, obtained by spray-dried process (SDBP). Piglets are faced with normal challenges in post-weaning period such as environmental stress and the substitution of the liquid diet to a solid one. References regarding sanitary challenges were disregarded in this study. Only data regarding normal and expected challenges were considered. Data were obtained from indexed journals with information extracted from the material, methods and results sections of pre-selected scientific articles. First, the database was analyzed graphically to observe the distribution of data and presence of outliers. Afterwards correlation analysis and variance-covariance analyses were carried out. The database contained a total of 23 articles. The average initial weight of the piglets was 8.02. kg (4.00-9.28. kg) and the average initial age was 27 days (14-32 days). The average duration of feeding diets containing spray-dried blood plasma (SDBP) was 9 days (6-28 days). SDBP increased the feed conversion by 20.2% (P<0.05) during the initial period. Feed conversion during the total period was 10.2% higher (P<0.05) for animals fed with SDBP. Average daily weight gain and daily feed intake were not affected (P>0.05) during the entire period, but average daily gain was higher (P<0.05) for animals fed with SDBP during the initial period. The initial age of supplementation influenced the average daily weight gain and average daily feed intake of animals fed with SDBP. Better results were obtained than those obtained for animals up to 35 days of age fed diets without added SDBP supplementation. In early post-weaning period for piglets weaned up to 35 days of age, the SDBP supplementation positively influenced the average daily weight gain and feed conversion. © 2013 Elsevier B.V.
Resumo:
The sugarcane mechanized planting is becoming increasingly widespread in Brazil due to a higher operability and better working conditions offered to workers compared to other types of planting. Studies related to this topic are insufficient or scarce in Brazil. In this context, the aim of this study was to evaluate the operation quality of sugarcane mechanized planting in two operation shifts, by means of statistical process control. The mechanized planting was held on March 2012 and statistical design was completely randomized with two treatments, totaling 40 replications for the day shift and 40 replications for the night shift. The variables evaluated were: speed, engine rotation, engine oil pressure, water temperature of the engine, effective field capacity and the time consumption hourly and effective fuel. The use of statistical control charts showed that random intrinsic do not cause this process. The tractor alignment error showed outliers in the day and night shifts operations, indicating a possible delay in receiving the signal. The water temperature of the engine and the effective fuel consumption showed lower variability in nighttime operation with average values of 81°C and 22.66 L ha-1, respectively. The hourly fuel consumption had greater variability and consequently lower quality during the night of the operation, with an average consumption of 25.46 L h-1 while the day shift showed 26.86 L h-1.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Pós-graduação em Geociências e Meio Ambiente - IGCE
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Pós-graduação em Biometria - IBB
Resumo:
Pós-graduação em Biociências - FCLAS
Resumo:
Brucella species are Gram-negative bacteria that infect mammals. Recently, two unusual strains (Brucella inopinata BO1(T) and B. inopinata-like BO2) have been isolated from human patients, and their similarity to some atypical brucellae isolated from Australian native rodent species was noted. Here we present a phylogenomic analysis of the draft genome sequences of BO1(T) and BO2 and of the Australian rodent strains 83-13 and NF2653 that shows that they form two groups well separated from the other sequenced Brucella spp. Several important differences were noted. Both BO1(T) and BO2 did not agglutinate significantly when live or inactivated cells were exposed to monospecific A and Mantisera against O-side chain sugars composed of N-formyl-perosamine. While BO1(T) maintained the genes required to synthesize a typical Brucella O-antigen, BO2 lacked many of these genes but still produced a smooth LPS (lipopolysaccharide). Most missing genes were found in the wbk region involved in O-antigen synthesis in classic smooth Brucella spp. In their place, BO2 carries four genes that other bacteria use for making a rhamnose-based O-antigen. Electrophoretic, immunoblot, and chemical analyses showed that BO2 carries an antigenically different O-antigen made of repeating hexose-rich oligosaccharide units that made the LPS water-soluble, which contrasts with the homopolymeric O-antigen of other smooth brucellae that have a phenol-soluble LPS. The results demonstrate the existence of a group of early-diverging brucellae with traits that depart significantly from those of the Brucella species described thus far. IMPORTANCE This report examines differences between genomes from four new Brucella strains and those from the classic Brucella spp. Our results show that the four new strains are outliers with respect to the previously known Brucella strains and yet are part of the genus, forming two new clades. The analysis revealed important information about the evolution and survival mechanisms of Brucella species, helping reshape our knowledge of this important zoonotic pathogen. One discovery of special importance is that one of the strains, BO2, produces an O-antigen distinct from any that has been seen in any other Brucella isolates to date.
Resumo:
We report a morphology-based approach for the automatic identification of outlier neurons, as well as its application to the NeuroMorpho.org database, with more than 5,000 neurons. Each neuron in a given analysis is represented by a feature vector composed of 20 measurements, which are then projected into a two-dimensional space by applying principal component analysis. Bivariate kernel density estimation is then used to obtain the probability distribution for the group of cells, so that the cells with highest probabilities are understood as archetypes while those with the smallest probabilities are classified as outliers. The potential of the methodology is illustrated in several cases involving uniform cell types as well as cell types for specific animal species. The results provide insights regarding the distribution of cells, yielding single and multi-variate clusters, and they suggest that outlier cells tend to be more planar and tortuous. The proposed methodology can be used in several situations involving one or more categories of cells, as well as for detection of new categories and possible artifacts.
Resumo:
Model diagnostics is an integral part of model determination and an important part of the model diagnostics is residual analysis. We adapt and implement residuals considered in the literature for the probit, logistic and skew-probit links under binary regression. New latent residuals for the skew-probit link are proposed here. We have detected the presence of outliers using the residuals proposed here for different models in a simulated dataset and a real medical dataset.
Resumo:
Context. Lithium abundances in open clusters are a very effective probe of mixing processes, and their study can help us to understand the large depletion of lithium that occurs in the Sun. Owing to its age and metallicity, the open cluster M 67 is especially interesting on this respect. Many studies of lithium abundances in M 67 have been performed, but a homogeneous global analysis of lithium in stars from subsolar masses and extending to the most massive members, has yet to be accomplished for a large sample based on high-quality spectra. Aims. We test our non-standard models, which were calibrated using the Sun with observational data. Methods. We collect literature data to analyze, for the first time in a homogeneous way, the non-local thermal equilibrium lithium abundances of all observed single stars in M 67 more massive than similar to 0.9 M-circle dot. Our grid of evolutionary models is computed assuming a non-standard mixing at metallicity [Fe/H] = 0.01, using the Toulouse-Geneva evolution code. Our analysis starts from the entrance into the zero-age main-sequence. Results. Lithium in M 67 is a tight function of mass for stars more massive than the Sun, apart from a few outliers. A plateau in lithium abundances is observed for turn-off stars. Both less massive (M >= 1.10 M-circle dot) and more massive (M >= 1.28 M-circle dot) stars are more depleted than those in the plateau. There is a significant scatter in lithium abundances for any given mass M <= 1.1 M-circle dot. Conclusions. Our models qualitatively reproduce most of the features described above, although the predicted depletion of lithium is 0.45 dex smaller than observed for masses in the plateau region, i.e. between 1.1 and 1.28 solar masses. More work is clearly needed to accurately reproduce the observations. Despite hints that chromospheric activity and rotation play a role in lithium depletion, no firm conclusion can be drawn with the presently available data.
Resumo:
To understand the regulatory dynamics of transcription factors (TFs) and their interplay with other cellular components we have integrated transcriptional, protein-protein and the allosteric or equivalent interactions which mediate the physiological activity of TFs in Escherichia coli. To study this integrated network we computed a set of network measurements followed by principal component analysis (PCA), investigated the correlations between network structure and dynamics, and carried out a procedure for motif detection. In particular, we show that outliers identified in the integrated network based on their network properties correspond to previously characterized global transcriptional regulators. Furthermore, outliers are highly and widely expressed across conditions, thus supporting their global nature in controlling many genes in the cell. Motifs revealed that TFs not only interact physically with each other but also obtain feedback from signals delivered by signaling proteins supporting the extensive cross-talk between different types of networks. Our analysis can lead to the development of a general framework for detecting and understanding global regulatory factors in regulatory networks and reinforces the importance of integrating multiple types of interactions in underpinning the interrelationships between them.
Resumo:
The choice of an appropriate family of linear models for the analysis of longitudinal data is often a matter of concern for practitioners. To attenuate such difficulties, we discuss some issues that emerge when analyzing this type of data via a practical example involving pretestposttest longitudinal data. In particular, we consider log-normal linear mixed models (LNLMM), generalized linear mixed models (GLMM), and models based on generalized estimating equations (GEE). We show how some special features of the data, like a nonconstant coefficient of variation, may be handled in the three approaches and evaluate their performance with respect to the magnitude of standard errors of interpretable and comparable parameters. We also show how different diagnostic tools may be employed to identify outliers and comment on available software. We conclude by noting that the results are similar, but that GEE-based models may be preferable when the goal is to compare the marginal expected responses.
Resumo:
Abstract Background With the development of DNA hybridization microarray technologies, nowadays it is possible to simultaneously assess the expression levels of thousands to tens of thousands of genes. Quantitative comparison of microarrays uncovers distinct patterns of gene expression, which define different cellular phenotypes or cellular responses to drugs. Due to technical biases, normalization of the intensity levels is a pre-requisite to performing further statistical analyses. Therefore, choosing a suitable approach for normalization can be critical, deserving judicious consideration. Results Here, we considered three commonly used normalization approaches, namely: Loess, Splines and Wavelets, and two non-parametric regression methods, which have yet to be used for normalization, namely, the Kernel smoothing and Support Vector Regression. The results obtained were compared using artificial microarray data and benchmark studies. The results indicate that the Support Vector Regression is the most robust to outliers and that Kernel is the worst normalization technique, while no practical differences were observed between Loess, Splines and Wavelets. Conclusion In face of our results, the Support Vector Regression is favored for microarray normalization due to its superiority when compared to the other methods for its robustness in estimating the normalization curve.