978 resultados para Variable selection
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
The study was conducted in two different locations in South Brazil, in tillage in the 2009/2010 season on eight sunflower hybrids, aiming to determine the path correlations and coefficients between primary and secondary characters on the main variable of achene productivity. The correlations were similar between environments. The characters of the head diameter and mass of a thousand achenes had a significant influence on sunflower productivity. Based on the magnitude of the direct and indirect effects, we highlighted all primary components on the main variable, beside the good determination coefficient and low residual effect. The secondary component, the number of achenes, despite the significant direct effect on productivity, was indirectly influenced by the primary components, making it an undesirable character for selection.
Resumo:
The starting point of this article is the question "How to retrieve fingerprints of rhythm in written texts?" We address this problem in the case of Brazilian and European Portuguese. These two dialects of Modern Portuguese share the same lexicon and most of the sentences they produce are superficially identical. Yet they are conjectured, on linguistic grounds, to implement different rhythms. We show that this linguistic question can be formulated as a problem of model selection in the class of variable length Markov chains. To carry on this approach, we compare texts from European and Brazilian Portuguese. These texts are previously encoded according to some basic rhythmic features of the sentences which can be automatically retrieved. This is an entirely new approach from the linguistic point of view. Our statistical contribution is the introduction of the smallest maximizer criterion which is a constant free procedure for model selection. As a by-product, this provides a solution for the problem of optimal choice of the penalty constant when using the BIC to select a variable length Markov chain. Besides proving the consistency of the smallest maximizer criterion when the sample size diverges, we also make a simulation study comparing our approach with both the standard BIC selection and the Peres-Shields order estimation. Applied to the linguistic sample constituted for our case study, the smallest maximizer criterion assigns different context-tree models to the two dialects of Portuguese. The features of the selected models are compatible with current conjectures discussed in the linguistic literature.
Resumo:
A data set of a commercial Nellore beef cattle selection program was used to compare breeding models that assumed or not markers effects to estimate the breeding values, when a reduced number of animals have phenotypic, genotypic and pedigree information available. This herd complete data set was composed of 83,404 animals measured for weaning weight (WW), post-weaning gain (PWG), scrotal circumference (SC) and muscle score (MS), corresponding to 116,652 animals in the relationship matrix. Single trait analyses were performed by MTDFREML software to estimate fixed and random effects solutions using this complete data. The additive effects estimated were assumed as the reference breeding values for those animals. The individual observed phenotype of each trait was adjusted for fixed and random effects solutions, except for direct additive effects. The adjusted phenotype composed of the additive and residual parts of observed phenotype was used as dependent variable for models' comparison. Among all measured animals of this herd, only 3160 animals were genotyped for 106 SNP markers. Three models were compared in terms of changes on animals' rank, global fit and predictive ability. Model 1 included only polygenic effects, model 2 included only markers effects and model 3 included both polygenic and markers effects. Bayesian inference via Markov chain Monte Carlo methods performed by TM software was used to analyze the data for model comparison. Two different priors were adopted for markers effects in models 2 and 3, the first prior assumed was a uniform distribution (U) and, as a second prior, was assumed that markers effects were distributed as normal (N). Higher rank correlation coefficients were observed for models 3_U and 3_N, indicating a greater similarity of these models animals' rank and the rank based on the reference breeding values. Model 3_N presented a better global fit, as demonstrated by its low DIC. The best models in terms of predictive ability were models 1 and 3_N. Differences due prior assumed to markers effects in models 2 and 3 could be attributed to the better ability of normal prior in handle with collinear effects. The models 2_U and 2_N presented the worst performance, indicating that this small set of markers should not be used to genetically evaluate animals with no data, since its predictive ability is restricted. In conclusion, model 3_N presented a slight superiority when a reduce number of animals have phenotypic, genotypic and pedigree information. It could be attributed to the variation retained by markers and polygenic effects assumed together and the normal prior assumed to markers effects, that deals better with the collinearity between markers. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
The non-classical human leukocyte antigen (HLA) class I genes present a very low rate of variation. So far, only 10 HLA-E alleles encoding three proteins have been described, but only two are frequently found in worldwide populations. Because of its historical background, Brazilians are very suitable for population genetic studies. Therefore, 104 bone marrow donors from Brazil were evaluated for HLA-E exons 14. Seven variation sites were found, including two known single nucleotide polymorphisms (SNPs) at positions +424 and +756 and five new SNPs at positions +170 (intron 1), +1294 (intron 3), +1625, +1645 and +1857 (exon 4). Haplotyping analysis did show eight haplotypes, three of them known as E*01:01:01, E*01:03:01 and E*01:03:02:01 and five HLA-E new alleles that carry the new variation sites. The HLA-E*01:01:01 allele was the predominant haplotype (62.50%), followed by E*01:03:02:01 (24.52%). Selective neutrality tests have disclosed an interesting pattern of selective pressures in which balancing selection is probably shaping allele frequency distributions at an SNP at exon 3 (codon 107), sequence diversity at exon 4 and the non-coding regions is facing significant purifying pressure. Even in an admixed population such as the Brazilian one, the HLA-E locus is very conserved, presenting few polymorphic SNPs in the coding region.
Resumo:
Background The evolutionary advantages of selective attention are unclear. Since the study of selective attention began, it has been suggested that the nervous system only processes the most relevant stimuli because of its limited capacity [1]. An alternative proposal is that action planning requires the inhibition of irrelevant stimuli, which forces the nervous system to limit its processing [2]. An evolutionary approach might provide additional clues to clarify the role of selective attention. Methods We developed Artificial Life simulations wherein animals were repeatedly presented two objects, "left" and "right", each of which could be "food" or "non-food." The animals' neural networks (multilayer perceptrons) had two input nodes, one for each object, and two output nodes to determine if the animal ate each of the objects. The neural networks also had a variable number of hidden nodes, which determined whether or not it had enough capacity to process both stimuli (Table 1). The evolutionary relevance of the left and the right food objects could also vary depending on how much the animal's fitness was increased when ingesting them (Table 1). We compared sensory processing in animals with or without limited capacity, which evolved in simulations in which the objects had the same or different relevances. Table 1. Nine sets of simulations were performed, varying the values of food objects and the number of hidden nodes in the neural networks. The values of left and right food were swapped during the second half of the simulations. Non-food objects were always worth -3. The evolution of neural networks was simulated by a simple genetic algorithm. Fitness was a function of the number of food and non-food objects each animal ate and the chromosomes determined the node biases and synaptic weights. During each simulation, 10 populations of 20 individuals each evolved in parallel for 20,000 generations, then the relevance of food objects was swapped and the simulation was run again for another 20,000 generations. The neural networks were evaluated by their ability to identify the two objects correctly. The detectability (d') for the left and the right objects was calculated using Signal Detection Theory [3]. Results and conclusion When both stimuli were equally relevant, networks with two hidden nodes only processed one stimulus and ignored the other. With four or eight hidden nodes, they could correctly identify both stimuli. When the stimuli had different relevances, the d' for the most relevant stimulus was higher than the d' for the least relevant stimulus, even when the networks had four or eight hidden nodes. We conclude that selection mechanisms arose in our simulations depending not only on the size of the neuron networks but also on the stimuli's relevance for action.
Resumo:
Variable rate sprinklers (VRS) have been developed to promote localized water application of irrigated areas. In Precision Irrigation, VRS permits better control of flow adjustment and, at the same time, provides satisfactory radial distribution profiles for various pressures and flow rates are really necessary. The objective of this work was to evaluate the performance and radial distribution profiles of a developed VRS which varies the nozzle cross sectional area by moving a pin in or out using a stepper motor. Field tests were performed under different conditions of service pressure, rotation angles imposed on the pin and flow rate which resulted in maximal water throw radiuses ranging from 7.30 to 10.38 m. In the experiments in which the service pressure remained constant, the maximal throw radius varied from 7.96 to 8.91 m. Averages were used of repetitions performed under conditions without wind or with winds less than 1.3 m s-1. The VRS with the four stream deflector resulted in greater water application throw radius compared to the six stream deflector. However, the six stream deflector had greater precipitation intensities, as well as better distribution. Thus, selection of the deflector to be utilized should be based on project requirements, respecting the difference in the obtained results. With a small opening of the nozzle, the VRS produced small water droplets that visually presented applicability for foliar chemigation. Regarding the comparison between the estimated and observed flow rates, the stepper motor produced excellent results.
Resumo:
This thesis presents a creative and practical approach to dealing with the problem of selection bias. Selection bias may be the most important vexing problem in program evaluation or in any line of research that attempts to assert causality. Some of the greatest minds in economics and statistics have scrutinized the problem of selection bias, with the resulting approaches – Rubin’s Potential Outcome Approach(Rosenbaum and Rubin,1983; Rubin, 1991,2001,2004) or Heckman’s Selection model (Heckman, 1979) – being widely accepted and used as the best fixes. These solutions to the bias that arises in particular from self selection are imperfect, and many researchers, when feasible, reserve their strongest causal inference for data from experimental rather than observational studies. The innovative aspect of this thesis is to propose a data transformation that allows measuring and testing in an automatic and multivariate way the presence of selection bias. The approach involves the construction of a multi-dimensional conditional space of the X matrix in which the bias associated with the treatment assignment has been eliminated. Specifically, we propose the use of a partial dependence analysis of the X-space as a tool for investigating the dependence relationship between a set of observable pre-treatment categorical covariates X and a treatment indicator variable T, in order to obtain a measure of bias according to their dependence structure. The measure of selection bias is then expressed in terms of inertia due to the dependence between X and T that has been eliminated. Given the measure of selection bias, we propose a multivariate test of imbalance in order to check if the detected bias is significant, by using the asymptotical distribution of inertia due to T (Estadella et al. 2005) , and by preserving the multivariate nature of data. Further, we propose the use of a clustering procedure as a tool to find groups of comparable units on which estimate local causal effects, and the use of the multivariate test of imbalance as a stopping rule in choosing the best cluster solution set. The method is non parametric, it does not call for modeling the data, based on some underlying theory or assumption about the selection process, but instead it calls for using the existing variability within the data and letting the data to speak. The idea of proposing this multivariate approach to measure selection bias and test balance comes from the consideration that in applied research all aspects of multivariate balance, not represented in the univariate variable- by-variable summaries, are ignored. The first part contains an introduction to evaluation methods as part of public and private decision process and a review of the literature of evaluation methods. The attention is focused on Rubin Potential Outcome Approach, matching methods, and briefly on Heckman’s Selection Model. The second part focuses on some resulting limitations of conventional methods, with particular attention to the problem of how testing in the correct way balancing. The third part contains the original contribution proposed , a simulation study that allows to check the performance of the method for a given dependence setting and an application to a real data set. Finally, we discuss, conclude and explain our future perspectives.
Resumo:
Major histocompatibility complex (MHC) antigen-presenting genes are the most variable loci in vertebrate genomes. Host-parasite co-evolution is assumed to maintain the excessive polymorphism in the MHC loci. However, the molecular mechanisms underlying the striking diversity in the MHC remain contentious. The extent to which recombination contributes to the diversity at MHC loci in natural populations is still controversial, and there have been only few comparative studies that make quantitative estimates of recombination rates. In this study, we performed a comparative analysis for 15 different ungulates species to estimate the population recombination rate, and to quantify levels of selection. As expected for all species, we observed signatures of strong positive selection, and identified individual residues experiencing selection that were congruent with those constituting the peptide-binding region of the human DRB gene. However, in addition for each species, we also observed recombination rates that were significantly different from zero on the basis of likelihood-permutation tests, and in other non-quantitative analyses. Patterns of synonymous and non-synonymous sequence diversity were consistent with differing demographic histories between species, but recent simulation studies by other authors suggest inference of selection and recombination is likely to be robust to such deviations from standard models. If high rates of recombination are common in MHC genes of other taxa, re-evaluation of many inference-based phylogenetic analyses of MHC loci, such as estimates of the divergence time of alleles and trans-specific polymorphism, may be required.
Resumo:
The majority of mutations that cause isolated GH deficiency type II (IGHD II) affect splicing of GH-1 transcripts and produce a dominant-negative GH isoform lacking exon 3 resulting in a 17.5-kDa isoform, which further leads to disruption of the GH secretory pathway. A clinical variability in the severity of the IGHD II phenotype depending on the GH-1 gene alteration has been reported, and in vitro and transgenic animal data suggest that the onset and severity of the phenotype relates to the proportion of 17.5-kDa produced. The removal of GH in IGHD creates a positive feedback loop driving more GH expression, which may itself increase 17.5-kDa isoform productions from alternate splice sites in the mutated GH-1 allele. In this study, we aimed to test this idea by comparing the impact of stimulated expression by glucocorticoids on the production of different GH isoforms from wild-type (wt) and mutant GH-1 genes, relying on the glucocorticoid regulatory element within intron 1 in the GH-1 gene. AtT-20 cells were transfected with wt-GH or mutated GH-1 variants (5'IVS-3 + 2-bp T->C; 5'IVS-3 + 6 bp T->C; ISEm1: IVS-3 + 28 G->A) known to cause clinical IGHD II of varying severity. Cells were stimulated with 1 and 10 mum dexamethasone (DEX) for 24 h, after which the relative amounts of GH-1 splice variants were determined by semiquantitative and quantitative (TaqMan) RT-PCR. In the absence of DEX, only around 1% wt-GH-1 transcripts were the 17.5-kDa isoform, whereas the three mutant GH-1 variants produced 29, 39, and 78% of the 17.5-kDa isoform. DEX stimulated total GH-1 gene transcription from all constructs. Notably, however, DEX increased the amount of 17.5-kDa GH isoform relative to the 22- and 20-kDa isoforms produced from the mutated GH-1 variants, but not from wt-GH-1. This DEX-induced enhancement of 17.5-kDa GH isoform production, up to 100% in the most severe case, was completely blocked by the addition of RU486. In other studies, we measured cell proliferation rates, annexin V staining, and DNA fragmentation in cells transfected with the same GH-1 constructs. The results showed that that the 5'IVS-3 + 2-bp GH-1 gene mutation had a more severe impact on those measures than the splice site mutations within 5'IVS-3 + 6 bp or ISE +28, in line with the clinical severity observed with these mutations. Our findings that the proportion of 17.5-kDa produced from mutant GH-1 alleles increases with increased drive for gene expression may help to explain the variable onset progression, and severity observed in IGHD II.
Resumo:
In a matched experimental design, the effectiveness of matching in reducing bias and increasing power depends on the strength of the association between the matching variable and the outcome of interest. In particular, in the design of a community health intervention trial, the effectiveness of a matched design, where communities are matched according to some community characteristic, depends on the strength of the correlation between the matching characteristic and the change in the health behavior being measured. We attempt to estimate the correlation between community characteristics and changes in health behaviors in four datasets from community intervention trials and observational studies. Community characteristics that are highly correlated with changes in health behaviors would potentially be effective matching variables in studies of health intervention programs designed to change those behaviors. Among the community characteristics considered, the urban-rural character of the community was the most highly correlated with changes in health behaviors. The correlations between Per Capita Income, Percent Low Income & Percent aged over 65 and changes in health behaviors were marginally statistically significant (p < 0.08).
Resumo:
Experimental work and analysis was done to investigate engine startup robustness and emissions of a flex-fuel spark ignition (SI) direct injection (DI) engine. The vaporization and other characteristics of ethanol fuel blends present a challenge at engine startup. Strategies to reduce the enrichment requirements for the first engine startup cycle and emissions for the second and third fired cycle at 25°C ± 1°C engine and intake air temperature were investigated. Research work was conducted on a single cylinder SIDI engine with gasoline and E85 fuels, to study the effect on first fired cycle of engine startup. Piston configurations that included a compression ratio change (11 vs 15.5) and piston geometry change (flattop vs bowl) were tested, along with changes in intake cam timing (95,110,125) and fuel pressure (0.4 MPa vs 3 MPa). The goal was to replicate the engine speed, manifold pressure, fuel pressure and testing temperature from an engine startup trace for investigating the first fired cycle for the engine. Results showed bowl piston was able to enable lower equivalence ratio engine starts with gasoline fuel, while also showing lower IMEP at the same equivalence ratio compared to flat top piston. With E85, bowl piston showed reduced IMEP as compression ratio increased at the same equivalence ratio. A preference for constant intake valve timing across fuels seemed to indicate that flattop piston might be a good flex-fuel piston. Significant improvements were seen with higher CR bowl piston with high fuel pressure starts, but showed no improvement with low fuel pressures. Simulation work was conducted to analyze initial three cycles of engine startup in GT-POWER for the same set of hardware used in the experimentations. A steady state validated model was modified for startup conditions. The results of which allowed an understanding of the relative residual levels and IMEP at the test points in the cam phasing space. This allowed selecting additional test points that enable use of higher residual levels, eliminating those with smaller trapped mass incapable of producing required IMEP for proper engine turnover. The second phase of experimental testing results for 2nd and 3rd startup cycle revealed both E10 and E85 prefer the same SOI of 240°bTDC at second and third startup cycle for the flat top piston and high injection pressures. E85 fuel optimal cam timing for startup showed that it tolerates more residuals compared to E10 fuel. Higher internal residuals drives down the Ø requirement for both fuels up to their combustion stability limit, this is thought to be direct benefit to vaporization due to increased cycle start temperature. Benefits are shown for an advance IMOP and retarded EMOP strategy at engine startup. Overall the amount of residuals preferred by an engine for E10 fuel at startup is thought to be constant across engine speed, thus could enable easier selection of optimized cam positions across the startup speeds.
Resumo:
Invasive species often evolve rapidly in response to the novel biotic and abiotic conditions in their introduced range. Such adaptive evolutionary changes might play an important role in the success of some invasive species. Here, we investigated whether introduced European populations of the South African ragwort Senecio inaequidens (Asteraceae) have genetically diverged from native populations. We carried out a greenhouse experiment where 12 South African and 11 European populations were for several months grown at two levels of nutrient availability, as well as in the presence or absence of a generalist insect herbivore. We found that, in contrast to a current hypothesis, plants from introduced populations had a significantly lower reproductive output, but higher allocation to root biomass, and they were more tolerant to insect herbivory. Moreover, introduced populations were less genetically variable, but displayed greater plasticity in response to fertilization. Finally, introduced populations were phenotypically most similar to a subset of native populations from mountainous regions in southern Africa. Taking into account the species' likely history of introduction, our data support the idea that the invasion success of Senecio inaequidens in Central Europe is based on selective introduction of specific preadapted and plastic genotypes rather than on adaptive evolution in the introduced range.
Resumo:
Mass spectrometry (MS) data provide a promising strategy for biomarker discovery. For this purpose, the detection of relevant peakbins in MS data is currently under intense research. Data from mass spectrometry are challenging to analyze because of their high dimensionality and the generally low number of samples available. To tackle this problem, the scientific community is becoming increasingly interested in applying feature subset selection techniques based on specialized machine learning algorithms. In this paper, we present a performance comparison of some metaheuristics: best first (BF), genetic algorithm (GA), scatter search (SS) and variable neighborhood search (VNS). Up to now, all the algorithms, except for GA, have been first applied to detect relevant peakbins in MS data. All these metaheuristic searches are embedded in two different filter and wrapper schemes coupled with Naive Bayes and SVM classifiers.
Resumo:
Road accidents are a very relevant issue in many countries and macroeconomic models are very frequently applied by academia and administrations to reduce their frequency and consequences. The selection of explanatory variables and response transformation parameter within the Bayesian framework for the selection of the set of explanatory variables a TIM and 3IM (two input and three input models) procedures are proposed. The procedure also uses the DIC and pseudo -R2 goodness of fit criteria. The model to which the methodology is applied is a dynamic regression model with Box-Cox transformation (BCT) for the explanatory variables and autorgressive (AR) structure for the response. The initial set of 22 explanatory variables are identified. The effects of these factors on the fatal accident frequency in Spain, during 2000-2012, are estimated. The dependent variable is constructed considering the stochastic trend component.