13 resultados para stochastic search variable selection

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dimensionality reduction is employed for visual data analysis as a way to obtaining reduced spaces for high dimensional data or to mapping data directly into 2D or 3D spaces. Although techniques have evolved to improve data segregation on reduced or visual spaces, they have limited capabilities for adjusting the results according to user's knowledge. In this paper, we propose a novel approach to handling both dimensionality reduction and visualization of high dimensional data, taking into account user's input. It employs Partial Least Squares (PLS), a statistical tool to perform retrieval of latent spaces focusing on the discriminability of the data. The method employs a training set for building a highly precise model that can then be applied to a much larger data set very effectively. The reduced data set can be exhibited using various existing visualization techniques. The training data is important to code user's knowledge into the loop. However, this work also devises a strategy for calculating PLS reduced spaces when no training data is available. The approach produces increasingly precise visual mappings as the user feeds back his or her knowledge and is capable of working with small and unbalanced training sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Current methods for quality control of sugar cane are performed in extracted juice using several methodologies, often requiring appreciable time and chemicals (eventually toxic), making the methods not green and expensive. The present study proposes the use of X-ray spectrometry together with chemometric methods as an innovative and alternative technique for determining sugar cane quality parameters, specifically sucrose concentration, POL, and fiber content. Measurements in stem, leaf, and juice were performed, and those applied directly in stem provided the best results. Prediction models for sugar cane stem determinations with a single 60 s irradiation using portable X-ray fluorescence equipment allows estimating the % sucrose, % fiber, and POL simultaneously. Average relative deviations in the prediction step of around 8% are acceptable if considering that field measurements were done. These results may indicate the best period to cut a particular crop as well as for evaluating the quality of sugar cane for the sugar and alcohol industries.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The starting point of this article is the question "How to retrieve fingerprints of rhythm in written texts?" We address this problem in the case of Brazilian and European Portuguese. These two dialects of Modern Portuguese share the same lexicon and most of the sentences they produce are superficially identical. Yet they are conjectured, on linguistic grounds, to implement different rhythms. We show that this linguistic question can be formulated as a problem of model selection in the class of variable length Markov chains. To carry on this approach, we compare texts from European and Brazilian Portuguese. These texts are previously encoded according to some basic rhythmic features of the sentences which can be automatically retrieved. This is an entirely new approach from the linguistic point of view. Our statistical contribution is the introduction of the smallest maximizer criterion which is a constant free procedure for model selection. As a by-product, this provides a solution for the problem of optimal choice of the penalty constant when using the BIC to select a variable length Markov chain. Besides proving the consistency of the smallest maximizer criterion when the sample size diverges, we also make a simulation study comparing our approach with both the standard BIC selection and the Peres-Shields order estimation. Applied to the linguistic sample constituted for our case study, the smallest maximizer criterion assigns different context-tree models to the two dialects of Portuguese. The features of the selected models are compatible with current conjectures discussed in the linguistic literature.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A data set of a commercial Nellore beef cattle selection program was used to compare breeding models that assumed or not markers effects to estimate the breeding values, when a reduced number of animals have phenotypic, genotypic and pedigree information available. This herd complete data set was composed of 83,404 animals measured for weaning weight (WW), post-weaning gain (PWG), scrotal circumference (SC) and muscle score (MS), corresponding to 116,652 animals in the relationship matrix. Single trait analyses were performed by MTDFREML software to estimate fixed and random effects solutions using this complete data. The additive effects estimated were assumed as the reference breeding values for those animals. The individual observed phenotype of each trait was adjusted for fixed and random effects solutions, except for direct additive effects. The adjusted phenotype composed of the additive and residual parts of observed phenotype was used as dependent variable for models' comparison. Among all measured animals of this herd, only 3160 animals were genotyped for 106 SNP markers. Three models were compared in terms of changes on animals' rank, global fit and predictive ability. Model 1 included only polygenic effects, model 2 included only markers effects and model 3 included both polygenic and markers effects. Bayesian inference via Markov chain Monte Carlo methods performed by TM software was used to analyze the data for model comparison. Two different priors were adopted for markers effects in models 2 and 3, the first prior assumed was a uniform distribution (U) and, as a second prior, was assumed that markers effects were distributed as normal (N). Higher rank correlation coefficients were observed for models 3_U and 3_N, indicating a greater similarity of these models animals' rank and the rank based on the reference breeding values. Model 3_N presented a better global fit, as demonstrated by its low DIC. The best models in terms of predictive ability were models 1 and 3_N. Differences due prior assumed to markers effects in models 2 and 3 could be attributed to the better ability of normal prior in handle with collinear effects. The models 2_U and 2_N presented the worst performance, indicating that this small set of markers should not be used to genetically evaluate animals with no data, since its predictive ability is restricted. In conclusion, model 3_N presented a slight superiority when a reduce number of animals have phenotypic, genotypic and pedigree information. It could be attributed to the variation retained by markers and polygenic effects assumed together and the normal prior assumed to markers effects, that deals better with the collinearity between markers. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The non-classical human leukocyte antigen (HLA) class I genes present a very low rate of variation. So far, only 10 HLA-E alleles encoding three proteins have been described, but only two are frequently found in worldwide populations. Because of its historical background, Brazilians are very suitable for population genetic studies. Therefore, 104 bone marrow donors from Brazil were evaluated for HLA-E exons 14. Seven variation sites were found, including two known single nucleotide polymorphisms (SNPs) at positions +424 and +756 and five new SNPs at positions +170 (intron 1), +1294 (intron 3), +1625, +1645 and +1857 (exon 4). Haplotyping analysis did show eight haplotypes, three of them known as E*01:01:01, E*01:03:01 and E*01:03:02:01 and five HLA-E new alleles that carry the new variation sites. The HLA-E*01:01:01 allele was the predominant haplotype (62.50%), followed by E*01:03:02:01 (24.52%). Selective neutrality tests have disclosed an interesting pattern of selective pressures in which balancing selection is probably shaping allele frequency distributions at an SNP at exon 3 (codon 107), sequence diversity at exon 4 and the non-coding regions is facing significant purifying pressure. Even in an admixed population such as the Brazilian one, the HLA-E locus is very conserved, presenting few polymorphic SNPs in the coding region.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Support Vector Machines (SVMs) have achieved very good performance on different learning problems. However, the success of SVMs depends on the adequate choice of the values of a number of parameters (e.g., the kernel and regularization parameters). In the current work, we propose the combination of meta-learning and search algorithms to deal with the problem of SVM parameter selection. In this combination, given a new problem to be solved, meta-learning is employed to recommend SVM parameter values based on parameter configurations that have been successfully adopted in previous similar problems. The parameter values returned by meta-learning are then used as initial search points by a search technique, which will further explore the parameter space. In this proposal, we envisioned that the initial solutions provided by meta-learning are located in good regions of the search space (i.e. they are closer to optimum solutions). Hence, the search algorithm would need to evaluate a lower number of candidate solutions when looking for an adequate solution. In this work, we investigate the combination of meta-learning with two search algorithms: Particle Swarm Optimization and Tabu Search. The implemented hybrid algorithms were used to select the values of two SVM parameters in the regression domain. These combinations were compared with the use of the search algorithms without meta-learning. The experimental results on a set of 40 regression problems showed that, on average, the proposed hybrid methods obtained lower error rates when compared to their components applied in isolation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We show how to construct a topological Markov map of the interval whose invariant probability measure is the stationary law of a given stochastic chain of infinite order. In particular we characterize the maps corresponding to stochastic chains with memory of variable length. The problem treated here is the converse of the classical construction of the Gibbs formalism for Markov expanding maps of the interval.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although nontechnical losses automatic identification has been massively studied, the problem of selecting the most representative features in order to boost the identification accuracy and to characterize possible illegal consumers has not attracted much attention in this context. In this paper, we focus on this problem by reviewing three evolutionary-based techniques for feature selection, and we also introduce one of them in this context. The results demonstrated that selecting the most representative features can improve a lot of the classification accuracy of possible frauds in datasets composed by industrial and commercial profiles.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background The evolutionary advantages of selective attention are unclear. Since the study of selective attention began, it has been suggested that the nervous system only processes the most relevant stimuli because of its limited capacity [1]. An alternative proposal is that action planning requires the inhibition of irrelevant stimuli, which forces the nervous system to limit its processing [2]. An evolutionary approach might provide additional clues to clarify the role of selective attention. Methods We developed Artificial Life simulations wherein animals were repeatedly presented two objects, "left" and "right", each of which could be "food" or "non-food." The animals' neural networks (multilayer perceptrons) had two input nodes, one for each object, and two output nodes to determine if the animal ate each of the objects. The neural networks also had a variable number of hidden nodes, which determined whether or not it had enough capacity to process both stimuli (Table 1). The evolutionary relevance of the left and the right food objects could also vary depending on how much the animal's fitness was increased when ingesting them (Table 1). We compared sensory processing in animals with or without limited capacity, which evolved in simulations in which the objects had the same or different relevances. Table 1. Nine sets of simulations were performed, varying the values of food objects and the number of hidden nodes in the neural networks. The values of left and right food were swapped during the second half of the simulations. Non-food objects were always worth -3. The evolution of neural networks was simulated by a simple genetic algorithm. Fitness was a function of the number of food and non-food objects each animal ate and the chromosomes determined the node biases and synaptic weights. During each simulation, 10 populations of 20 individuals each evolved in parallel for 20,000 generations, then the relevance of food objects was swapped and the simulation was run again for another 20,000 generations. The neural networks were evaluated by their ability to identify the two objects correctly. The detectability (d') for the left and the right objects was calculated using Signal Detection Theory [3]. Results and conclusion When both stimuli were equally relevant, networks with two hidden nodes only processed one stimulus and ignored the other. With four or eight hidden nodes, they could correctly identify both stimuli. When the stimuli had different relevances, the d' for the most relevant stimulus was higher than the d' for the least relevant stimulus, even when the networks had four or eight hidden nodes. We conclude that selection mechanisms arose in our simulations depending not only on the size of the neuron networks but also on the stimuli's relevance for action.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Variable rate sprinklers (VRS) have been developed to promote localized water application of irrigated areas. In Precision Irrigation, VRS permits better control of flow adjustment and, at the same time, provides satisfactory radial distribution profiles for various pressures and flow rates are really necessary. The objective of this work was to evaluate the performance and radial distribution profiles of a developed VRS which varies the nozzle cross sectional area by moving a pin in or out using a stepper motor. Field tests were performed under different conditions of service pressure, rotation angles imposed on the pin and flow rate which resulted in maximal water throw radiuses ranging from 7.30 to 10.38 m. In the experiments in which the service pressure remained constant, the maximal throw radius varied from 7.96 to 8.91 m. Averages were used of repetitions performed under conditions without wind or with winds less than 1.3 m s-1. The VRS with the four stream deflector resulted in greater water application throw radius compared to the six stream deflector. However, the six stream deflector had greater precipitation intensities, as well as better distribution. Thus, selection of the deflector to be utilized should be based on project requirements, respecting the difference in the obtained results. With a small opening of the nozzle, the VRS produced small water droplets that visually presented applicability for foliar chemigation. Regarding the comparison between the estimated and observed flow rates, the stepper motor produced excellent results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PURPOSE: To investigate the occurrence of hearing loss in individuals with HIV/AIDS and their characterization regarding type and degree. RESEARCH STRATEGY: It was conducted a systematic review of the literature found on the electronic databases PubMed, EMBASE, ADOLEC, IBECS, Web of Science, Scopus, Lilacs and SciELO. SELECTION CRITERIA: The search strategy was directed by a specific question: "Is hearing loss part of the framework of HIV/AIDS manifestations?", and the selection criteria of the studies involved coherence with the proposed theme, evidence levels 1, 2 or 3, and language (Portuguese, English and Spanish). DATA ANALYSIS: We found 698 studies. After an analysis of the title and abstract, 91 were selected for full reading. Out of these, 38 met the proposed criteria and were included on the review. RESULTS: The studies reported presence of conductive, sensorineural, and mixed hearing loss, of variable degrees and audiometric configurations, in addition to tinnitus and vestibular disorders. The etiology can be attributed to opportunistic infections, ototoxic drugs or to the action of virus itself. The auditory evoked potentials have been used as markers of neurological alterations, even in patients with normal hearing. CONCLUSION: HIV/AIDS patients may present hearing loss. Thus, programs for prevention and treatment of AIDS must involve actions aimed at auditory health.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We developed a stochastic lattice model to describe the vector-borne disease (like yellow fever or dengue). The model is spatially structured and its dynamical rules take into account the diffusion of vectors. We consider a bipartite lattice, forming a sub-lattice of human and another occupied by mosquitoes. At each site of lattice we associate a stochastic variable that describes the occupation and the health state of a single individual (mosquito or human). The process of disease transmission in the human population follows a similar dynamic of the Susceptible-Infected-Recovered model (SIR), while the disease transmission in the mosquito population has an analogous dynamic of the Susceptible-Infected-Susceptible model (SIS) with mosquitos diffusion. The occurrence of an epidemic is directly related to the conditional probability of occurrence of infected mosquitoes (human) in the presence of susceptible human (mosquitoes) on neighborhood. The probability of diffusion of mosquitoes can facilitate the formation of pairs Susceptible-Infected enabling an increase in the size of the epidemic. Using an asynchronous dynamic update, we study the disease transmission in a population initially formed by susceptible individuals due to the introduction of a single mosquito (human) infected. We find that this model exhibits a continuous phase transition related to the existence or non-existence of an epidemic. By means of mean field approximations and Monte Carlo simulations we investigate the epidemic threshold and the phase diagram in terms of the diffusion probability and the infection probability.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider a general class of mathematical models for stochastic gene expression where the transcription rate is allowed to depend on a promoter state variable that can take an arbitrary (finite) number of values. We provide the solution of the master equations in the stationary limit, based on a factorization of the stochastic transition matrix that separates timescales and relative interaction strengths, and we express its entries in terms of parameters that have a natural physical and/or biological interpretation. The solution illustrates the capacity of multiple states promoters to generate multimodal distributions of gene products, without the need for feedback. Furthermore, using the example of a three states promoter operating at low, high, and intermediate expression levels, we show that using multiple states operons will typically lead to a significant reduction of noise in the system. The underlying mechanism is that a three-states promoter can change its level of expression from low to high by passing through an intermediate state with a much smaller increase of fluctuations than by means of a direct transition.