11 resultados para Data representation
em Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho"
Resumo:
Feature selection aims to find the most important information to save computational efforts and data storage. We formulated this task as a combinatorial optimization problem since the exponential growth of possible solutions makes an exhaustive search infeasible. In this work, we propose a new nature-inspired feature selection technique based on bats behavior, namely, binary bat algorithm The wrapper approach combines the power of exploration of the bats together with the speed of the optimum-path forest classifier to find a better data representation. Experiments in public datasets have shown that the proposed technique can indeed improve the effectiveness of the optimum-path forest and outperform some well-known swarm-based techniques. © 2013 Copyright © 2013 Elsevier Inc. All rights reserved.
Resumo:
Pós-graduação em Geografia - FCT
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Husserl left many unpublished drafts explaining (or trying to) his views on spatial representation and geometry, such as, particularly, those collected in the second part of Studien zur Arithmetik und Geometrie (Hua XXI), but no completely articulate work on the subject. In this paper, I put forward an interpretation of what those views might have been. Husserl, I claim, distinguished among different conceptions of space, the space of perception (constituted from sensorial data by intentionally motivated psychic functions), that of physical geometry (or idealized perceptual space), the space of the mathematical science of physical nature (in which science, not only raw perception has a word) and the abstract spaces of mathematics (free creations of the mathematical mind), each of them with its peculiar geometrical structure. Perceptual space is proto-Euclidean and the space of physical geometry Euclidean, but mathematical physics, Husserl allowed, may find it convenient to represent physical space with a non-Euclidean structure. Mathematical spaces, on their turn, can be endowed, he thinks, with any geometry mathematicians may find interesting. Many other related questions are addressed here, in particular those concerning the a priori or a posteriori character of the many geometric features of perceptual space (bearing in mind that there are at least two different notions of a priori in Husserl, which we may call the conceptual and the transcendental a priori). I conclude with an overview of Weyl's ideas on the matter, since his philosophical conceptions are often traceable back to his former master, Husserl.
Resumo:
This paper presents a proposal for the semantic treatment of ambiguous homographic forms in Brazilian Portuguese, and to offer linguistic strategies for its computational implementation in Systems of Natural Language Processing (SNLP). Pustejovsky's Generative Lexicon was used as a theoretical model. From this model, the Qualia Structure - QS (and the Formal, Telic, Agentive and Constitutive roles) was selected as one of the linguistic and semantic expedients for the achievement of disambiguation of homonym forms. So that analyzed and treated data could be manipulated, we elaborated a Lexical Knowledge Base (LKB) where lexical items are correlated and interconnected by different kinds of semantic relations in the QS and ontological information.
Resumo:
Whereas genome sequencing defines the genetic potential of an organism, transcript sequencing defines the utilization of this potential and links the genome with most areas of biology. To exploit the information within the human genome in the fight against cancer, we have deposited some two million expressed sequence tags (ESTs) from human tumors and their corresponding normal tissues in the public databases. The data currently define approximate to23,500 genes, of which only approximate to1,250 are still represented only by ESTs. Examination of the EST coverage of known cancer-related (CR) genes reveals that <1% do not have corresponding ESTs, indicating that the representation of genes associated with commonly studied tumors is high. The careful recording of the origin of all ESTs we have produced has enabled detailed definition of where the genes they represent are expressed in the human body. More than 100,000 ESTs are available for seven tissues, indicating a surprising variability of gene usage that has led to the discovery of a significant number of genes with restricted expression, and that may thus be therapeutically useful. The ESTs also reveal novel nonsynonymous germline variants (although the one-pass nature of the data necessitates careful validation) and many alternatively spliced transcripts. Although widely exploited by the scientific community, vindicating our totally open source policy, the EST data generated still provide extensive information that remains to be systematically explored, and that may further facilitate progress toward both the understanding and treatment of human cancers.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Interactive visual representations complement traditional statistical and machine learning techniques for data analysis, allowing users to play a more active role in a knowledge discovery process and making the whole process more understandable. Though visual representations are applicable to several stages of the knowledge discovery process, a common use of visualization is in the initial stages to explore and organize a sometimes unknown and complex data set. In this context, the integrated and coordinated - that is, user actions should be capable of affecting multiple visualizations when desired - use of multiple graphical representations allows data to be observed from several perspectives and offers richer information than isolated representations. In this paper we propose an underlying model for an extensible and adaptable environment that allows independently developed visualization components to be gradually integrated into a user configured knowledge discovery application. Because a major requirement when using multiple visual techniques is the ability to link amongst them, so that user actions executed on a representation propagate to others if desired, the model also allows runtime configuration of coordinated user actions over different visual representations. We illustrate how this environment is being used to assist data exploration and organization in a climate classification problem.
Resumo:
The rise in boiling point of grapefruit juice was experimentally measured at soluble solids concentrations in the range of 9.3-60.6 °Brix and pressures between °6.0 × 103 and 9.0 × 104 Pa. Different approaches to represent experimental data, including the Dhring's rule, the Antoine equation and empirical models proposed in the literature were tested. In the range of 9.3-29.0 °Brix, the rise in boiling point was nearly independent of pressure, varying only with juice concentration. Considerable deviations of this behavior began to occur at concentrations higher than 29.0 °Brix. Experimental data could be best predicted by adjusting an empirical model, which consisted of a single equation that takes into account the dependence of rise in boiling point on pressure and concentration. © SAGE Publications 2007.
Resumo:
In this paper is reported the use of the chromatographic profiles of volatiles to determine disease markers in plants - in this case, leaves of Eucalyptus globulus contaminated by the necrotroph fungus Teratosphaeria nubilosa. The volatile fraction was isolated by headspace solid phase microextraction (HS-SPME) and analyzed by comprehensive two-dimensional gas chromatography-fast quadrupole mass spectrometry (GC. ×. GC-qMS). For the correlation between the metabolic profile described by the chromatograms and the presence of the infection, unfolded-partial least squares discriminant analysis (U-PLS-DA) with orthogonal signal correction (OSC) were employed. The proposed method was checked to be independent of factors such as the age of the harvested plants. The manipulation of the mathematical model obtained also resulted in graphic representations similar to real chromatograms, which allowed the tentative identification of more than 40 compounds potentially useful as disease biomarkers for this plant/pathogen pair. The proposed methodology can be considered as highly reliable, since the diagnosis is based on the whole chromatographic profile rather than in the detection of a single analyte. © 2013 Elsevier B.V..
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)