895 resultados para classification and equivalence classes
Resumo:
The advent of omic data production has opened many new perspectives in the quest for modelling complexity in biophysical systems. With the capability of characterizing a complex organism through the patterns of its molecular states, observed at different levels through various omics, a new paradigm of investigation is arising. In this thesis, we investigate the links between perturbations of the human organism, described as the ensemble of crosstalk of its molecular states, and health. Machine learning plays a key role within this picture, both in omic data analysis and model building. We propose and discuss different frameworks developed by the author using machine learning for data reduction, integration, projection on latent features, pattern analysis, classification and clustering of omic data, with a focus on 1H NMR metabolomic spectral data. The aim is to link different levels of omic observations of molecular states, from nanoscale to macroscale, to study perturbations such as diseases and diet interpreted as changes in molecular patterns. The first part of this work focuses on the fingerprinting of diseases, linking cellular and systemic metabolomics with genomic to asses and predict the downstream of perturbations all the way down to the enzymatic network. The second part is a set of frameworks and models, developed with 1H NMR metabolomic at its core, to study the exposure of the human organism to diet and food intake in its full complexity, from epidemiological data analysis to molecular characterization of food structure.
Resumo:
Deep learning methods are extremely promising machine learning tools to analyze neuroimaging data. However, their potential use in clinical settings is limited because of the existing challenges of applying these methods to neuroimaging data. In this study, first a data leakage type caused by slice-level data split that is introduced during training and validation of a 2D CNN is surveyed and a quantitative assessment of the model’s performance overestimation is presented. Second, an interpretable, leakage-fee deep learning software written in a python language with a wide range of options has been developed to conduct both classification and regression analysis. The software was applied to the study of mild cognitive impairment (MCI) in patients with small vessel disease (SVD) using multi-parametric MRI data where the cognitive performance of 58 patients measured by five neuropsychological tests is predicted using a multi-input CNN model taking brain image and demographic data. Each of the cognitive test scores was predicted using different MRI-derived features. As MCI due to SVD has been hypothesized to be the effect of white matter damage, DTI-derived features MD and FA produced the best prediction outcome of the TMT-A score which is consistent with the existing literature. In a second study, an interpretable deep learning system aimed at 1) classifying Alzheimer disease and healthy subjects 2) examining the neural correlates of the disease that causes a cognitive decline in AD patients using CNN visualization tools and 3) highlighting the potential of interpretability techniques to capture a biased deep learning model is developed. Structural magnetic resonance imaging (MRI) data of 200 subjects was used by the proposed CNN model which was trained using a transfer learning-based approach producing a balanced accuracy of 71.6%. Brain regions in the frontal and parietal lobe showing the cerebral cortex atrophy were highlighted by the visualization tools.
Resumo:
Spectral sensors are a wide class of devices that are extremely useful for detecting essential information of the environment and materials with high degree of selectivity. Recently, they have achieved high degrees of integration and low implementation cost to be suited for fast, small, and non-invasive monitoring systems. However, the useful information is hidden in spectra and it is difficult to decode. So, mathematical algorithms are needed to infer the value of the variables of interest from the acquired data. Between the different families of predictive modeling, Principal Component Analysis and the techniques stemmed from it can provide very good performances, as well as small computational and memory requirements. For these reasons, they allow the implementation of the prediction even in embedded and autonomous devices. In this thesis, I will present 4 practical applications of these algorithms to the prediction of different variables: moisture of soil, moisture of concrete, freshness of anchovies/sardines, and concentration of gasses. In all of these cases, the workflow will be the same. Initially, an acquisition campaign was performed to acquire both spectra and the variables of interest from samples. Then these data are used as input for the creation of the prediction models, to solve both classification and regression problems. From these models, an array of calibration coefficients is derived and used for the implementation of the prediction in an embedded system. The presented results will show that this workflow was successfully applied to very different scientific fields, obtaining autonomous and non-invasive devices able to predict the value of physical parameters of choice from new spectral acquisitions.
Resumo:
Natural Language Processing (NLP) has seen tremendous improvements over the last few years. Transformer architectures achieved impressive results in almost any NLP task, such as Text Classification, Machine Translation, and Language Generation. As time went by, transformers continued to improve thanks to larger corpora and bigger networks, reaching hundreds of billions of parameters. Training and deploying such large models has become prohibitively expensive, such that only big high tech companies can afford to train those models. Therefore, a lot of research has been dedicated to reducing a model’s size. In this thesis, we investigate the effects of Vocabulary Transfer and Knowledge Distillation for compressing large Language Models. The goal is to combine these two methodologies to further compress models without significant loss of performance. In particular, we designed different combination strategies and conducted a series of experiments on different vertical domains (medical, legal, news) and downstream tasks (Text Classification and Named Entity Recognition). Four different methods involving Vocabulary Transfer (VIPI) with and without a Masked Language Modelling (MLM) step and with and without Knowledge Distillation are compared against a baseline that assigns random vectors to new elements of the vocabulary. Results indicate that VIPI effectively transfers information of the original vocabulary and that MLM is beneficial. It is also noted that both vocabulary transfer and knowledge distillation are orthogonal to one another and may be applied jointly. The application of knowledge distillation first before subsequently applying vocabulary transfer is recommended. Finally, model performance due to vocabulary transfer does not always show a consistent trend as the vocabulary size is reduced. Hence, the choice of vocabulary size should be empirically selected by evaluation on the downstream task similar to hyperparameter tuning.
Resumo:
We report four cases of surgically treated intracranial arachnoid cysts, one with cyst-peritoneal shunt and three with craniotomy and arachnoid membrane resection. Their classification and etiopathogeny are discussed, and especially the different methods of treatment comparing the drastic complications (adversities) with the favorable solutions in severe clinical cases (plasticity) treated at our institution.
Resumo:
The creation of the Brazilian Program for the Modernization of the Horticulture by the Secretariat of Agriculture and Supplying of the State of São Paulo at CEAGESP, determined the standardization of fruit and vegetables in the follow aspects: degree of coloration, format, calibers, defects and packing. Therefore, the main goal of this research is to correlate the classification given by the Brazilian Program with the one used by the wholesalers at CEAGESP, verifying if the established norms are being fulfilled for cultivar Carmen and Debora (SAKATA SEED). The results showed, that for cultivar Carmem, for the averages of the observed values it does not move away from the norms created by the Program for sizes small and medium. However, for the case of cultivar Debora, the results showed differences between the adopted classifications. The tomatoes were devaluated, because had been commercialized below of the standardization indicated for the Brazilian Program.
Resumo:
This paper tries to show that the developments in linguistic sciences are better viewed as stages in a single research program, rather than different ideological -isms. The first part contains an overview of the structuralistas' beliefs about the universality and equivalence of human languages, and their search for syntactic universals. In the second part, we will see that the generative program, in its turn, tries to answer why language is a universal faculty in the human species and addresses questions about its form, its development and its use. In the second part, we will see that the paper gives a brief glimpse of the tentative answers the program has been giving to each of these issues.
Resumo:
The main purpose of this paper is to question the relationship between theory and practice or basic and applied research in the domain of Applied Linguistics and classroom discourse. In order to achieve our aim, some theoretical texts, some recorded and transcribed classes as well as some teachers and students opinions about reading and writing were analysed. Results have shown that 1) practice is not the direct application of theoretical data: the relationship between them is not as simple as some applied linguists seem to believe because of the action of the unconscious in the constitution of subjectivity; 2) the conceptualization of the theoretical issues takes place in a confused and disorderly manner mixed up with personal experiences and previous knowledge (practice). We intend to question the fact that practice comes as secondary to theory.
Resumo:
Universidade Estadual de Campinas . Faculdade de Educação Física
Resumo:
Universidade Estadual de Campinas . Faculdade de Educação Física
Resumo:
Universidade Estadual de Campinas. Faculdade de Educação Física
Resumo:
In Natural Language Processing (NLP) symbolic systems, several linguistic phenomena, for instance, the thematic role relationships between sentence constituents, such as AGENT, PATIENT, and LOCATION, can be accounted for by the employment of a rule-based grammar. Another approach to NLP concerns the use of the connectionist model, which has the benefits of learning, generalization and fault tolerance, among others. A third option merges the two previous approaches into a hybrid one: a symbolic thematic theory is used to supply the connectionist network with initial knowledge. Inspired on neuroscience, it is proposed a symbolic-connectionist hybrid system called BIO theta PRED (BIOlogically plausible thematic (theta) symbolic-connectionist PREDictor), designed to reveal the thematic grid assigned to a sentence. Its connectionist architecture comprises, as input, a featural representation of the words (based on the verb/noun WordNet classification and on the classical semantic microfeature representation), and, as output, the thematic grid assigned to the sentence. BIO theta PRED is designed to ""predict"" thematic (semantic) roles assigned to words in a sentence context, employing biologically inspired training algorithm and architecture, and adopting a psycholinguistic view of thematic theory.
Resumo:
Gender may produce different characteristics in the manifestation of systemic lupus erythematosus (SLE). The present study investigated the influence of gender on clinical, laboratory, autoantibodies and histopathological classes of lupus nephritis (LN). As much as 81 patients diagnosed with SLE (ACR criteria) and active nephritis, who underwent renal biopsy between 1999 and 2004, and who had frozen serum samples and clinical data available from the time of biopsy, were selected for this study. The presence of anti-P and antichromatin antibodies was measured using ELISA, and anti-dsDNA was measured using indirect immunofluorescence. All of the renal biopsies were reviewed in a blinded manner by the same expert renal pathologist. The charts were extensively reviewed for demographic and renal features obtained at the time of the biopsy. Of the 81 patients (13.6%), 11 were male SLE patients. Both male and female lupus patients were of similar age and race, and had similar durations of lupus and renal disease. The female patients had more cutaneous (95.7 vs. 45.5%, P = 0.0001) and haematological (52.9 vs. 18.2%, P = 0.04) involvements than the male SLE patients. In addition, the articular data, central nervous system analyses, serositis findings and SLEDAI scores were similar in both experimental groups. Positivity for anti-dsDNA, anti-ribosomal P and antichromatin did not differ between the two groups, and both groups showed similarly low C3 or C4 serum levels. Our analysis indicated that no histopathological class of LN was predominant in both males and females. Interestingly, the serum creatinine levels were higher in the male SLE patients compared to the female SLE group (3.16 +/- A 2.49 vs. 1.99 +/- A 1.54 mg/dL, P = 0.03), with an increased frequency of high creatinine (81.8 vs. 47.1%, P = 0.04) as well as renal activity index (7.6 +/- A 3.5 vs. 4.8 +/- A 3.5, P = 0.02). In addition, whilst the mean levels of proteinuria, cylindruria and serum albumin were markedly altered, they were comparable between both lupus men and women. Moreover, the frequencies of dialysis, renal transplantation and death were similar between the two groups. These data suggest that male patients had a more severe LN compared to women diagnosed with this renal abnormality.
Resumo:
The Edinburgh-Cape Blue Object Survey is a major survey to discover blue stellar objects brighter than B similar to 18 in the southern sky. It is planned to cover an area of sky of 10 000 deg(2) with \b\ > 30 degrees and delta < 0 degrees. The blue stellar objects are selected by automatic techniques from U and B pairs of UK Schmidt Telescope plates scanned with the COSMOS measuring machine. Follow-up photometry and spectroscopy are being obtained with the SAAO telescopes to classify objects brighter than B = 16.5. This paper describes the survey, the techniques used to extract the blue stellar objects, the photometric methods and accuracy, the spectroscopic classification, and the limits and completeness of the survey.
Resumo:
Glioblastoma multiforme ( GBM) is the most common and lethal type of brain cancer. To identify the genetic alterations in GBMs, we sequenced 20,661 protein coding genes, determined the presence of amplifications and deletions using high- density oligonucleotide arrays, and performed gene expression analyses using next- generation sequencing technologies in 22 human tumor samples. This comprehensive analysis led to the discovery of a variety of genes that were not known to be altered in GBMs. Most notably, we found recurrent mutations in the active site of isocitrate dehydrogenase 1 ( IDH1) in 12% of GBM patients. Mutations in IDH1 occurred in a large fraction of young patients and in most patients with secondary GBMs and were associated with an increase in overall survival. These studies demonstrate the value of unbiased genomic analyses in the characterization of human brain cancer and identify a potentially useful genetic alteration for the classification and targeted therapy of GBMs.