53 resultados para attribute subset selection

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering is a difficult task: there is no single cluster definition and the data can have more than one underlying structure. Pareto-based multi-objective genetic algorithms (e.g., MOCK Multi-Objective Clustering with automatic K-determination and MOCLE-Multi-Objective Clustering Ensemble) were proposed to tackle these problems. However, the output of such algorithms can often contains a high number of partitions, becoming difficult for an expert to manually analyze all of them. In order to deal with this problem, we present two selection strategies, which are based on the corrected Rand, to choose a subset of solutions. To test them, they are applied to the set of solutions produced by MOCK and MOCLE in the context of several datasets. The study was also extended to select a reduced set of partitions from the initial population of MOCLE. These analysis show that both versions of selection strategy proposed are very effective. They can significantly reduce the number of solutions and, at the same time, keep the quality and the diversity of the partitions in the original set of solutions. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes a filter-based algorithm for feature selection. The filter is based on the partitioning of the set of features into clusters. The number of clusters, and consequently the cardinality of the subset of selected features, is automatically estimated from data. The computational complexity of the proposed algorithm is also investigated. A variant of this filter that considers feature-class correlations is also proposed for classification problems. Empirical results involving ten datasets illustrate the performance of the developed algorithm, which in general has obtained competitive results in terms of classification accuracy when compared to state of the art algorithms that find clusters of features. We show that, if computational efficiency is an important issue, then the proposed filter May be preferred over their counterparts, thus becoming eligible to join a pool of feature selection algorithms to be used in practice. As an additional contribution of this work, a theoretical framework is used to formally analyze some properties of feature selection methods that rely on finding clusters of features. (C) 2011 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We use an inequality due to Bochnak and Lojasiewicz, which follows from the Curve Selection Lemma of real algebraic geometry in order to prove that, given a C(r) function f : U subset of R(m) -> R, we have lim(y -> xy is an element of crit(f)) vertical bar f(y) - f(x)vertical bar/vertical bar y - x vertical bar(r) = 0, for all x is an element of crit(f)` boolean AND U, where crit( f) = {x is an element of U vertical bar df ( x) = 0}. This shows that the so-called Morse decomposition of the critical set, used in the classical proof of the Morse-Sard theorem, is not necessary: the conclusion of the Morse decomposition lemma holds for the whole critical set. We use this result to give a simple proof of the classical Morse-Sard theorem ( with sharp differentiability assumptions).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents the formulation of a combinatorial optimization problem with the following characteristics: (i) the search space is the power set of a finite set structured as a Boolean lattice; (ii) the cost function forms a U-shaped curve when applied to any lattice chain. This formulation applies for feature selection in the context of pattern recognition. The known approaches for this problem are branch-and-bound algorithms and heuristics that explore partially the search space. Branch-and-bound algorithms are equivalent to the full search, while heuristics are not. This paper presents a branch-and-bound algorithm that differs from the others known by exploring the lattice structure and the U-shaped chain curves of the search space. The main contribution of this paper is the architecture of this algorithm that is based on the representation and exploration of the search space by new lattice properties proven here. Several experiments, with well known public data, indicate the superiority of the proposed method to the sequential floating forward selection (SFFS), which is a popular heuristic that gives good results in very short computational time. In all experiments, the proposed method got better or equal results in similar or even smaller computational time. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Grapholita molesta (Lepidoptera: Tortricidae) is one of the main pests of peach trees in Brazil, causing fruit losses of 3-5%. Among possible biological control agents, Trichogramma pretiosum (Hymenoptera: Trichogrammatidae) has been found in peach orchards. Our objectives were to study the rearing of T pretiosum in eggs of G. molesta and Anagasta kuehniella (Lepidoptera: Pyralidae), and select lineages of this parasitoid that have the potential to control G. molesta. Selection of best lineages was made from 5 populations of T pretiosum collected from organically-cultivated peach orchards. The study was done under controlled temperature (25 +/- 2 degrees C), relative humidity (70 +/- 10%) and 14:10 h (light:dark) photoperiod conditions. Grapholita molesta eggs were found to be adequate hosts for the development of T pretiosum, and the parameters for number of parasitized eggs, percent parasitized eggs, and sex ratio were similar to those for A. kuehniella eggs. The highest rate of parasitism of G. molesta eggs occurred in eggs with up to 48 h of embryonic development. Among the lineages of T pretiosum that were collected, HO8, PO8, PEL, and L3M showed the best biological performance and are therefore indicated for semi-field and field studies for biological control of oriental fruit moth.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: The criteria and timing for nerve surgery in infants with obstetric brachial plexopathy remain controversial. Our aim was to develop a new method for early prognostic assessment to assist this decision process. Methods: Fifty-four patients with unilateral obstetric brachial plexopathy who were ten to sixty days old underwent bilateral motor-nerve-conduction studies of the axillary, musculocutaneous, proximal radial, distal radial, median, and ulnar nerves. The ratio between the amplitude of the compound muscle action potential of the affected limb and that of the healthy side was called the axonal viability index. The patients were followed and classified in three groups according to the clinical outcome. We analyzed the receiver operating characteristic curve of each index to define the best cutoff point to detect patients with a poor recovery. Results: The best cutoff points on the axonal viability index for each nerve (and its sensitivity and specificity) were <10% (88% and 89%, respectively) for the axillary nerve, 0% (88% and 73%) for the musculocutaneous nerve, <20% (82% and 97%) for the proximal radial nerve, <50% (82% and 97%) for the distal radial nerve, and <50% (59% and 97%) for the ulnar nerve. The indices from the proximal radial, distal radial, and ulnar nerves had better specificities compared with the most frequently used clinical criterion: absence of biceps function at three months of age. Conclusions: The axonal viability index yields an earlier and more specific prognostic estimation of obstetric brachial plexopathy than does the clinical criterion of biceps function, and we believe it may be useful in determining surgical indications in these patients.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Considering the broad variation in the expression of housekeeping genes among tissues and experimental situations, studies using quantitative RT-PCR require strict definition of adequate endogenous controls. For glioblastoma, the most common type of tumor in the central nervous system, there was no previous report regarding this issue. Results: Here we show that amongst seven frequently used housekeeping genes TBP and HPRT1 are adequate references for glioblastoma gene expression analysis. Evaluation of the expression levels of 12 target genes utilizing different endogenous controls revealed that the normalization method applied might introduce errors in the estimation of relative quantities. Genes presenting expression levels which do not significantly differ between tumor and normal tissues can be considered either increased or decreased if unsuitable reference genes are applied. Most importantly, genes showing significant differences in expression levels between tumor and normal tissues can be missed. We also demonstrated that the Holliday Junction Recognizing Protein, a novel DNA repair protein over expressed in lung cancer, is extremely over-expressed in glioblastoma, with a median change of about 134 fold. Conclusion: Altogether, our data show the relevance of previous validation of candidate control genes for each experimental model and indicate TBP plus HPRT1 as suitable references for studies on glioblastoma gene expression.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A variety of factors influence prey selection by predators. Because Barn Owls (Tyto alba) and Burrowing Owls (Athene cunicularia) differ in size and foraging tactics, we expected differential predation on small mammal prey. We hypothesized that the Barn Owl, all active predator, would prey on smaller and younger individuals than the Burrowing Owl, a sit-and-wait predator. We used pellet analyses to evaluate selection of small mammals by the two owls in relation to prey), species, age, and size at the Ecological Station of Itirapina, state of Sao Paulo, in southeastern Brazil. Small mammals constituted most of the prey individuals and biomass in the diet of Barn Owls. Although Burrowing Owls consumed a wider range of taxa, small mammals represented one-third of all biomass consumed. With respect. to small mammals, Barn Owls foraged selectively relative to prey species, size, and age. Burrowing Owls foraged opportunistically relative to prey species, but selectively relative to prey size and age. Barn Owls selected smaller and younger (juvenile and subadult) individuals of the delicate vesper mouse (Calomys tener) and Burrowing Owls preyed more oil larger and older (subadult only) individuals. morphology and behavior of both prey and predators may explain this differential predation. Our data suggest that the active predator feeds oil smaller and younger prey, and the sit-and-wait predator took relatively larger and older prey.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Human respiratory syncytial virus (HRSV) is the major cause of lower respiratory tract infections in children under 5 years of age and the elderly, causing annual disease outbreaks during the fall and winter. Multiple lineages of the HRSVA and HRSVB serotypes co-circulate within a single outbreak and display a strongly temporal pattern of genetic variation, with a replacement of dominant genotypes occurring during consecutive years. In the present study we utilized phylogenetic methods to detect and map sites subject to adaptive evolution in the G protein of HRSVA and HRSVB. A total of 29 and 23 amino acid sites were found to be putatively positively selected in HRSVA and HRSVB, respectively. Several of these sites defined genotypes and lineages within genotypes in both groups, and correlated well with epitopes previously described in group A. Remarkably, 18 of these positively selected tended to revert in time to a previous codon state, producing a ""flipflop'' phylogenetic pattern. Such frequent evolutionary reversals in HRSV are indicative of a combination of frequent positive selection, reflecting the changing immune status of the human population, and a limited repertoire of functionally viable amino acids at specific amino acid sites.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: The malaria parasite Plasmodium falciparum exhibits abundant genetic diversity, and this diversity is key to its success as a pathogen. Previous efforts to study genetic diversity in P. falciparum have begun to elucidate the demographic history of the species, as well as patterns of population structure and patterns of linkage disequilibrium within its genome. Such studies will be greatly enhanced by new genomic tools and recent large-scale efforts to map genomic variation. To that end, we have developed a high throughput single nucleotide polymorphism (SNP) genotyping platform for P. falciparum. Results: Using an Affymetrix 3,000 SNP assay array, we found roughly half the assays (1,638) yielded high quality, 100% accurate genotyping calls for both major and minor SNP alleles. Genotype data from 76 global isolates confirm significant genetic differentiation among continental populations and varying levels of SNP diversity and linkage disequilibrium according to geographic location and local epidemiological factors. We further discovered that nonsynonymous and silent (synonymous or noncoding) SNPs differ with respect to within-population diversity, interpopulation differentiation, and the degree to which allele frequencies are correlated between populations. Conclusions: The distinct population profile of nonsynonymous variants indicates that natural selection has a significant influence on genomic diversity in P. falciparum, and that many of these changes may reflect functional variants deserving of follow-up study. Our analysis demonstrates the potential for new high-throughput genotyping technologies to enhance studies of population structure, natural selection, and ultimately enable genome-wide association studies in P. falciparum to find genes underlying key phenotypic traits.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Plasmodium vivax malaria is a major public health challenge in Latin America, Asia and Oceania, with 130-435 million clinical cases per year worldwide. Invasion of host blood cells by P. vivax mainly depends on a type I membrane protein called Duffy binding protein (PvDBP). The erythrocyte-binding motif of PvDBP is a 170 amino-acid stretch located in its cysteine-rich region II (PvDBP(II)), which is the most variable segment of the protein. Methods: To test whether diversifying natural selection has shaped the nucleotide diversity of PvDBP(II) in Brazilian populations, this region was sequenced in 122 isolates from six different geographic areas. A Bayesian method was applied to test for the action of natural selection under a population genetic model that incorporates recombination. The analysis was integrated with a structural model of PvDBP(II), and T-and B-cell epitopes were localized on the 3-D structure. Results: The results suggest that: (i) recombination plays an important role in determining the haplotype structure of PvDBP(II), and (ii) PvDBP(II) appears to contain neutrally evolving codons as well as codons evolving under natural selection. Diversifying selection preferentially acts on sites identified as epitopes, particularly on amino acid residues 417, 419, and 424, which show strong linkage disequilibrium. Conclusions: This study shows that some polymorphisms of PvDBP(II) are present near the erythrocyte-binding domain and might serve to elude antibodies that inhibit cell invasion. Therefore, these polymorphisms should be taken into account when designing vaccines aimed at eliciting antibodies to inhibit erythrocyte invasion.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdos-Renyi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabasi-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree k variation, decreasing its network recovery rate with the increase of k. The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Feature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction). There are many genomic and proteomic applications that rely on feature selection to answer questions such as selecting signature genes which are informative about some biological state, e. g., normal tissues and several types of cancer; or inferring a prediction network among elements such as genes, proteins and external stimuli. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions have been proposed, although it is difficult to point the best solution for each application. Results: The intent of this work is to provide an open-source multiplataform graphical environment for bioinformatics problems, which supports many feature selection algorithms, criterion functions and graphic visualization tools such as scatterplots, parallel coordinates and graphs. A feature selection approach for growing genetic networks from seed genes ( targets or predictors) is also implemented in the system. Conclusion: The proposed feature selection environment allows data analysis using several algorithms, criterion functions and graphic visualization tools. Our experiments have shown the software effectiveness in two distinct types of biological problems. Besides, the environment can be used in different pattern recognition applications, although the main concern regards bioinformatics tasks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Context tree models have been introduced by Rissanen in [25] as a parsimonious generalization of Markov models. Since then, they have been widely used in applied probability and statistics. The present paper investigates non-asymptotic properties of two popular procedures of context tree estimation: Rissanen's algorithm Context and penalized maximum likelihood. First showing how they are related, we prove finite horizon bounds for the probability of over- and under-estimation. Concerning overestimation, no boundedness or loss-of-memory conditions are required: the proof relies on new deviation inequalities for empirical probabilities of independent interest. The under-estimation properties rely on classical hypotheses for processes of infinite memory. These results improve on and generalize the bounds obtained in Duarte et al. (2006) [12], Galves et al. (2008) [18], Galves and Leonardi (2008) [17], Leonardi (2010) [22], refining asymptotic results of Buhlmann and Wyner (1999) [4] and Csiszar and Talata (2006) [9]. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This text aims to approach museums` role in the production of knowledge and how objects are transformed into documents when museums incorporate them. On accepting the effects of such transformation, museums start working not only with material goods, but also symbolic goods. The collection manager or exhibition curator communicate through documents rather than bringing into light its intrinsic content. In this sense, every process involving museum documents, from the selection of collections to exhibitions, has a rhetoric and ideological nature which is given. Museums must search for meanings through correlations established in the process of producing information. Exhibitions should present objects in multiple contexts, giving visitors the opportunity to participate and attribute their own meanings to them.