7 resultados para VARIABLE SELECTION
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
Dimensionality reduction is employed for visual data analysis as a way to obtaining reduced spaces for high dimensional data or to mapping data directly into 2D or 3D spaces. Although techniques have evolved to improve data segregation on reduced or visual spaces, they have limited capabilities for adjusting the results according to user's knowledge. In this paper, we propose a novel approach to handling both dimensionality reduction and visualization of high dimensional data, taking into account user's input. It employs Partial Least Squares (PLS), a statistical tool to perform retrieval of latent spaces focusing on the discriminability of the data. The method employs a training set for building a highly precise model that can then be applied to a much larger data set very effectively. The reduced data set can be exhibited using various existing visualization techniques. The training data is important to code user's knowledge into the loop. However, this work also devises a strategy for calculating PLS reduced spaces when no training data is available. The approach produces increasingly precise visual mappings as the user feeds back his or her knowledge and is capable of working with small and unbalanced training sets.
Resumo:
Current methods for quality control of sugar cane are performed in extracted juice using several methodologies, often requiring appreciable time and chemicals (eventually toxic), making the methods not green and expensive. The present study proposes the use of X-ray spectrometry together with chemometric methods as an innovative and alternative technique for determining sugar cane quality parameters, specifically sucrose concentration, POL, and fiber content. Measurements in stem, leaf, and juice were performed, and those applied directly in stem provided the best results. Prediction models for sugar cane stem determinations with a single 60 s irradiation using portable X-ray fluorescence equipment allows estimating the % sucrose, % fiber, and POL simultaneously. Average relative deviations in the prediction step of around 8% are acceptable if considering that field measurements were done. These results may indicate the best period to cut a particular crop as well as for evaluating the quality of sugar cane for the sugar and alcohol industries.
Resumo:
The starting point of this article is the question "How to retrieve fingerprints of rhythm in written texts?" We address this problem in the case of Brazilian and European Portuguese. These two dialects of Modern Portuguese share the same lexicon and most of the sentences they produce are superficially identical. Yet they are conjectured, on linguistic grounds, to implement different rhythms. We show that this linguistic question can be formulated as a problem of model selection in the class of variable length Markov chains. To carry on this approach, we compare texts from European and Brazilian Portuguese. These texts are previously encoded according to some basic rhythmic features of the sentences which can be automatically retrieved. This is an entirely new approach from the linguistic point of view. Our statistical contribution is the introduction of the smallest maximizer criterion which is a constant free procedure for model selection. As a by-product, this provides a solution for the problem of optimal choice of the penalty constant when using the BIC to select a variable length Markov chain. Besides proving the consistency of the smallest maximizer criterion when the sample size diverges, we also make a simulation study comparing our approach with both the standard BIC selection and the Peres-Shields order estimation. Applied to the linguistic sample constituted for our case study, the smallest maximizer criterion assigns different context-tree models to the two dialects of Portuguese. The features of the selected models are compatible with current conjectures discussed in the linguistic literature.
Resumo:
A data set of a commercial Nellore beef cattle selection program was used to compare breeding models that assumed or not markers effects to estimate the breeding values, when a reduced number of animals have phenotypic, genotypic and pedigree information available. This herd complete data set was composed of 83,404 animals measured for weaning weight (WW), post-weaning gain (PWG), scrotal circumference (SC) and muscle score (MS), corresponding to 116,652 animals in the relationship matrix. Single trait analyses were performed by MTDFREML software to estimate fixed and random effects solutions using this complete data. The additive effects estimated were assumed as the reference breeding values for those animals. The individual observed phenotype of each trait was adjusted for fixed and random effects solutions, except for direct additive effects. The adjusted phenotype composed of the additive and residual parts of observed phenotype was used as dependent variable for models' comparison. Among all measured animals of this herd, only 3160 animals were genotyped for 106 SNP markers. Three models were compared in terms of changes on animals' rank, global fit and predictive ability. Model 1 included only polygenic effects, model 2 included only markers effects and model 3 included both polygenic and markers effects. Bayesian inference via Markov chain Monte Carlo methods performed by TM software was used to analyze the data for model comparison. Two different priors were adopted for markers effects in models 2 and 3, the first prior assumed was a uniform distribution (U) and, as a second prior, was assumed that markers effects were distributed as normal (N). Higher rank correlation coefficients were observed for models 3_U and 3_N, indicating a greater similarity of these models animals' rank and the rank based on the reference breeding values. Model 3_N presented a better global fit, as demonstrated by its low DIC. The best models in terms of predictive ability were models 1 and 3_N. Differences due prior assumed to markers effects in models 2 and 3 could be attributed to the better ability of normal prior in handle with collinear effects. The models 2_U and 2_N presented the worst performance, indicating that this small set of markers should not be used to genetically evaluate animals with no data, since its predictive ability is restricted. In conclusion, model 3_N presented a slight superiority when a reduce number of animals have phenotypic, genotypic and pedigree information. It could be attributed to the variation retained by markers and polygenic effects assumed together and the normal prior assumed to markers effects, that deals better with the collinearity between markers. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
The non-classical human leukocyte antigen (HLA) class I genes present a very low rate of variation. So far, only 10 HLA-E alleles encoding three proteins have been described, but only two are frequently found in worldwide populations. Because of its historical background, Brazilians are very suitable for population genetic studies. Therefore, 104 bone marrow donors from Brazil were evaluated for HLA-E exons 14. Seven variation sites were found, including two known single nucleotide polymorphisms (SNPs) at positions +424 and +756 and five new SNPs at positions +170 (intron 1), +1294 (intron 3), +1625, +1645 and +1857 (exon 4). Haplotyping analysis did show eight haplotypes, three of them known as E*01:01:01, E*01:03:01 and E*01:03:02:01 and five HLA-E new alleles that carry the new variation sites. The HLA-E*01:01:01 allele was the predominant haplotype (62.50%), followed by E*01:03:02:01 (24.52%). Selective neutrality tests have disclosed an interesting pattern of selective pressures in which balancing selection is probably shaping allele frequency distributions at an SNP at exon 3 (codon 107), sequence diversity at exon 4 and the non-coding regions is facing significant purifying pressure. Even in an admixed population such as the Brazilian one, the HLA-E locus is very conserved, presenting few polymorphic SNPs in the coding region.
Resumo:
Background The evolutionary advantages of selective attention are unclear. Since the study of selective attention began, it has been suggested that the nervous system only processes the most relevant stimuli because of its limited capacity [1]. An alternative proposal is that action planning requires the inhibition of irrelevant stimuli, which forces the nervous system to limit its processing [2]. An evolutionary approach might provide additional clues to clarify the role of selective attention. Methods We developed Artificial Life simulations wherein animals were repeatedly presented two objects, "left" and "right", each of which could be "food" or "non-food." The animals' neural networks (multilayer perceptrons) had two input nodes, one for each object, and two output nodes to determine if the animal ate each of the objects. The neural networks also had a variable number of hidden nodes, which determined whether or not it had enough capacity to process both stimuli (Table 1). The evolutionary relevance of the left and the right food objects could also vary depending on how much the animal's fitness was increased when ingesting them (Table 1). We compared sensory processing in animals with or without limited capacity, which evolved in simulations in which the objects had the same or different relevances. Table 1. Nine sets of simulations were performed, varying the values of food objects and the number of hidden nodes in the neural networks. The values of left and right food were swapped during the second half of the simulations. Non-food objects were always worth -3. The evolution of neural networks was simulated by a simple genetic algorithm. Fitness was a function of the number of food and non-food objects each animal ate and the chromosomes determined the node biases and synaptic weights. During each simulation, 10 populations of 20 individuals each evolved in parallel for 20,000 generations, then the relevance of food objects was swapped and the simulation was run again for another 20,000 generations. The neural networks were evaluated by their ability to identify the two objects correctly. The detectability (d') for the left and the right objects was calculated using Signal Detection Theory [3]. Results and conclusion When both stimuli were equally relevant, networks with two hidden nodes only processed one stimulus and ignored the other. With four or eight hidden nodes, they could correctly identify both stimuli. When the stimuli had different relevances, the d' for the most relevant stimulus was higher than the d' for the least relevant stimulus, even when the networks had four or eight hidden nodes. We conclude that selection mechanisms arose in our simulations depending not only on the size of the neuron networks but also on the stimuli's relevance for action.
Resumo:
Variable rate sprinklers (VRS) have been developed to promote localized water application of irrigated areas. In Precision Irrigation, VRS permits better control of flow adjustment and, at the same time, provides satisfactory radial distribution profiles for various pressures and flow rates are really necessary. The objective of this work was to evaluate the performance and radial distribution profiles of a developed VRS which varies the nozzle cross sectional area by moving a pin in or out using a stepper motor. Field tests were performed under different conditions of service pressure, rotation angles imposed on the pin and flow rate which resulted in maximal water throw radiuses ranging from 7.30 to 10.38 m. In the experiments in which the service pressure remained constant, the maximal throw radius varied from 7.96 to 8.91 m. Averages were used of repetitions performed under conditions without wind or with winds less than 1.3 m s-1. The VRS with the four stream deflector resulted in greater water application throw radius compared to the six stream deflector. However, the six stream deflector had greater precipitation intensities, as well as better distribution. Thus, selection of the deflector to be utilized should be based on project requirements, respecting the difference in the obtained results. With a small opening of the nozzle, the VRS produced small water droplets that visually presented applicability for foliar chemigation. Regarding the comparison between the estimated and observed flow rates, the stepper motor produced excellent results.