819 resultados para Machine learning.
Resumo:
In questo lavoro di tesi si è analizzato il problema di creare un sistema di assistenza allo shopping integrabile in applicazioni e-commerce web e mobile sviluppate con le tecnologie messe a disposizione da Marketcloud, ovvero un progetto che punta a fornire strumenti per la realizzazione, la manutenzione, la gestione, la diffusione e la pubblicizzazione di tali applicazioni limitando i costi e le problematiche di sviluppo a carico delle aziende che intendono fornire servizi di e-commerce. Dopo aver discusso gli aspetti principali del progetto Marketcloud, sono state analizzate le necessità delle aziende interessate allo sviluppo del sistema di assistenza in esame, così come le aspettative degli utenti (i clienti) finali, ed è stato discusso perché fosse necessario e preferibile, nel caso in esame, non utilizzare soluzioni già presenti sul mercato. Infine, è stata progettata ed implementata un’applicazione web che includesse tale sistema e che fosse immediatamente integrabile tra i servizi già sviluppati da Marketcloud, testandone risultati, prestazioni, problemi e possibili sviluppi futuri. Al termine del lavoro di implementazione, il sistema e l'applicazione garantiscono all'utente finale l'utilizzo di tre funzioni: ricerca per categoria, ricerca libera, recommendation di prodotti. Per gestire la ricerca libera, è stato implementato un sistema di filtri successivi, ed una rete neurale multi-livello dotata di un opportuno algoritmo di machine learning per poter apprendere dalle scelte degli utenti; per la recommendation di prodotti, è stato utilizzato un sistema di ranking (classificazione). Le prestazioni della rete neurale sono state oggetto di attenta analisi.
Resumo:
Il framework in oggetto, è un ambiente ideato con lo scopo di applicare tecniche di Machine Learning (in particolare le Random Forest) alle funzionalità dell'algoritmo di stereo matching SGM (Semi Global Matching), al fine di incrementarne l'accuratezza in versione standard. Scopo della presente tesi è quello di modificare alcune impostazioni di tale framework rendendolo un ambiente che meglio si adatti alla direzionalità delle scanline (introducendo finestre di supporto rettangolari e ortogonali e il training di foreste separate in base alla singola scanline) e ampliarne le funzionalità tramite l'aggiunta di alcune nuove feature, quali la distanza dal più vicino edge direzionale e la distintività calcolate sulle immagini Left della stereo pair e gli edge direzionali sulle mappe di disparità. Il fine ultimo sarà quello di eseguire svariati test sui dataset Middlebury 2014 e KITTI e raccogliere dati che descrivano l'andamento in positivo o negativo delle modifiche effettuate.
Resumo:
Ogni giorno, l'utente di smartphon e tablet, spesso senza rendersene conto, condivide, tramite varie applicazioni, un'enorme quantità di informazioni. Negli attuali sistemi operativi, l'assenza di meccanismi utili a garantire adeguatamente l'utente, ha spinto questo lavoro di ricerca verso lo sviluppo di un inedito framework.È stato necessario uno studio approfondito dello stato dell'arte di soluzioni con gli stessi obiettivi. Sono stati esaminati sia modelli teorici che pratici, con l'analisi accurata del relativo codice. Il lavoro, in stretto contatto con i colleghi dell'Università Centrale della Florida e la condivisione delle conoscenze con gli stessi, ha portato ad importanti risultati. Questo lavoro ha prodotto un framework personalizzato per gestire la privacy nelle applicazioni mobili che, nello specifico, è stato sviluppato per Android OS e necessita dei permessi di root per poter realizzare il suo funzionamento. Il framework in questione sfrutta le funzionalità offerte dal Xposed Framework, con il risultato di implementare modifiche al sistema operativo, senza dover cambiare il codice di Android o delle applicazioni che eseguono su quest’ultimo. Il framework sviluppato controlla l’accesso da parte delle varie applicazioni in esecuzione verso le informazioni sensibili dell’utente e stima l’importanza che queste informazioni hanno per l’utente medesimo. Le informazioni raccolte dal framework sulle preferenze e sulle valutazioni dell’utente vengono usate per costruire un modello decisionale che viene sfruttato da un algoritmo di machine-learning per migliorare l’interazione del sistema con l’utente e prevedere quelle che possono essere le decisioni dell'utente stesso, circa la propria privacy. Questo lavoro di tesi realizza gli obbiettivi sopra citati e pone un'attenzione particolare nel limitare la pervasività del sistema per la gestione della privacy, nella quotidiana esperienza dell'utente con i dispositivi mobili.
Resumo:
Robust and accurate identification of intervertebral discs from low resolution, sparse MRI scans is essential for the automated scan planning of the MRI spine scan. This paper presents a graphical model based solution for the detection of both the positions and orientations of intervertebral discs from low resolution, sparse MRI scans. Compared with the existing graphical model based methods, the proposed method does not need a training process using training data and it also has the capability to automatically determine the number of vertebrae visible in the image. Experiments on 25 low resolution, sparse spine MRI data sets verified its performance.
Resumo:
There has been limited analysis of the effects of hepatocellular carcinoma (HCC) on liver metabolism and circulating endogenous metabolites. Here, we report the findings of a plasma metabolomic investigation of HCC patients by ultraperformance liquid chromatography-electrospray ionization-quadrupole time-of-flight mass spectrometry (UPLC-ESI-QTOFMS), random forests machine learning algorithm, and multivariate data analysis. Control subjects included healthy individuals as well as patients with liver cirrhosis or acute myeloid leukemia. We found that HCC was associated with increased plasma levels of glycodeoxycholate, deoxycholate 3-sulfate, and bilirubin. Accurate mass measurement also indicated upregulation of biliverdin and the fetal bile acids 7α-hydroxy-3-oxochol-4-en-24-oic acid and 3-oxochol-4,6-dien-24-oic acid in HCC patients. A quantitative lipid profiling of patient plasma was also conducted by ultraperformance liquid chromatography-electrospray ionization-triple quadrupole mass spectrometry (UPLC-ESI-TQMS). By this method, we found that HCC was also associated with reduced levels of lysophosphocholines and in 4 of 20 patients with increased levels of lysophosphatidic acid [LPA(16:0)], where it correlated with plasma α-fetoprotein levels. Interestingly, when fatty acids were quantitatively profiled by gas chromatography-mass spectrometry (GC-MS), we found that lignoceric acid (24:0) and nervonic acid (24:1) were virtually absent from HCC plasma. Overall, this investigation illustrates the power of the new discovery technologies represented in the UPLC-ESI-QTOFMS platform combined with the targeted, quantitative platforms of UPLC-ESI-TQMS and GC-MS for conducting metabolomic investigations that can engender new insights into cancer pathobiology.
Resumo:
To enhance understanding of the metabolic indicators of type 2 diabetes mellitus (T2DM) disease pathogenesis and progression, the urinary metabolomes of well characterized rhesus macaques (normal or spontaneously and naturally diabetic) were examined. High-resolution ultra-performance liquid chromatography coupled with the accurate mass determination of time-of-flight mass spectrometry was used to analyze spot urine samples from normal (n = 10) and T2DM (n = 11) male monkeys. The machine-learning algorithm random forests classified urine samples as either from normal or T2DM monkeys. The metabolites important for developing the classifier were further examined for their biological significance. Random forests models had a misclassification error of less than 5%. Metabolites were identified based on accurate masses (<10 ppm) and confirmed by tandem mass spectrometry of authentic compounds. Urinary compounds significantly increased (p < 0.05) in the T2DM when compared with the normal group included glycine betaine (9-fold), citric acid (2.8-fold), kynurenic acid (1.8-fold), glucose (68-fold), and pipecolic acid (6.5-fold). When compared with the conventional definition of T2DM, the metabolites were also useful in defining the T2DM condition, and the urinary elevations in glycine betaine and pipecolic acid (as well as proline) indicated defective re-absorption in the kidney proximal tubules by SLC6A20, a Na(+)-dependent transporter. The mRNA levels of SLC6A20 were significantly reduced in the kidneys of monkeys with T2DM. These observations were validated in the db/db mouse model of T2DM. This study provides convincing evidence of the power of metabolomics for identifying functional changes at many levels in the omics pipeline.
Resumo:
Investigation uses simulation to explore the inherent tradeoffs ofcontrolling high-speed and highly robust walking robots while minimizing energy consumption. Using a novel controller which optimizes robustness, energy economy, and speed of a simulated robot on rough terrain, the user can adjust their priorities between these three outcome measures and systematically generate a performance curveassessing the tradeoffs associated with these metrics.
Resumo:
With a virus such as Human Immunodeficiency Virus (HIV) that has infected millions of people worldwide, and with many unaware that they are infected, it becomes vital to understand how the virus works and how it functions at the molecular level. Because there currently is no vaccine and no way to eradicate the virus from an infected person, any information about how the virus interacts with its host greatly increases the chances of understanding how HIV works and brings scientists one step closer to being able to combat such a destructive virus. Thousands of HIV viruses have been sequenced and are available in many online databases for public use. Attributes that are linked to each sequence include the viral load within the host and how sick the patient is currently. Being able to predict the stage of infection for someone is a valuable resource, as it could potentially aid in treatment options and proper medication use. Our approach of analyzing region-specific amino acid composition for select genes has been able to predict patient disease state up to an accuracy of 85.4%. Moreover, we output a set of classification rules based on the sequence that may prove useful for diagnosing the expected clinical outcome of the infected patient.
Resumo:
The task considered in this paper is performance evaluation of region segmentation algorithms in the ground-truth-based paradigm. Given a machine segmentation and a ground-truth segmentation, performance measures are needed. We propose to consider the image segmentation problem as one of data clustering and, as a consequence, to use measures for comparing clusterings developed in statistics and machine learning. By doing so, we obtain a variety of performance measures which have not been used before in image processing. In particular, some of these measures have the highly desired property of being a metric. Experimental results are reported on both synthetic and real data to validate the measures and compare them with others.
Resumo:
The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in the context of generalized linear regression based on a previous approach, Iteratively ReWeighted Partial Least Squares, i.e. IRWPLS (Marx, 1996). We compare our results with two-stage PLS (Nguyen and Rocke, 2002A; Nguyen and Rocke, 2002B) and other classifiers. We show that by phrasing the problem in a generalized linear model setting and by applying bias correction to the likelihood to avoid (quasi)separation, we often get lower classification error rates.
Resumo:
The developmental processes and functions of an organism are controlled by the genes and the proteins that are derived from these genes. The identification of key genes and the reconstruction of gene networks can provide a model to help us understand the regulatory mechanisms for the initiation and progression of biological processes or functional abnormalities (e.g. diseases) in living organisms. In this dissertation, I have developed statistical methods to identify the genes and transcription factors (TFs) involved in biological processes, constructed their regulatory networks, and also evaluated some existing association methods to find robust methods for coexpression analyses. Two kinds of data sets were used for this work: genotype data and gene expression microarray data. On the basis of these data sets, this dissertation has two major parts, together forming six chapters. The first part deals with developing association methods for rare variants using genotype data (chapter 4 and 5). The second part deals with developing and/or evaluating statistical methods to identify genes and TFs involved in biological processes, and construction of their regulatory networks using gene expression data (chapter 2, 3, and 6). For the first part, I have developed two methods to find the groupwise association of rare variants with given diseases or traits. The first method is based on kernel machine learning and can be applied to both quantitative as well as qualitative traits. Simulation results showed that the proposed method has improved power over the existing weighted sum method (WS) in most settings. The second method uses multiple phenotypes to select a few top significant genes. It then finds the association of each gene with each phenotype while controlling the population stratification by adjusting the data for ancestry using principal components. This method was applied to GAW 17 data and was able to find several disease risk genes. For the second part, I have worked on three problems. First problem involved evaluation of eight gene association methods. A very comprehensive comparison of these methods with further analysis clearly demonstrates the distinct and common performance of these eight gene association methods. For the second problem, an algorithm named the bottom-up graphical Gaussian model was developed to identify the TFs that regulate pathway genes and reconstruct their hierarchical regulatory networks. This algorithm has produced very significant results and it is the first report to produce such hierarchical networks for these pathways. The third problem dealt with developing another algorithm called the top-down graphical Gaussian model that identifies the network governed by a specific TF. The network produced by the algorithm is proven to be of very high accuracy.
Resumo:
Important food crops like rice are constantly exposed to various stresses that can have devastating effect on their survival and productivity. Being sessile, these highly evolved organisms have developed elaborate molecular machineries to sense a mixture of stress signals and elicit a precise response to minimize the damage. However, recent discoveries revealed that the interplay of these stress regulatory and signaling molecules is highly complex and remains largely unknown. In this work, we conducted large scale analysis of differential gene expression using advanced computational methods to dissect regulation of stress response which is at the heart of all molecular changes leading to the observed phenotypic susceptibility. One of the most important stress conditions in terms of loss of productivity is drought. We performed genomic and proteomic analysis of epigenetic and miRNA mechanisms in regulation of drought responsive genes in rice and found subsets of genes with striking properties. Overexpressed genesets included higher number of epigenetic marks, miRNA targets and transcription factors which regulate drought tolerance. On the other hand, underexpressed genesets were poor in above features but were rich in number of metabolic genes with multiple co-expression partners contributing majorly towards drought resistance. Identification and characterization of the patterns exhibited by differentially expressed genes hold key to uncover the synergistic and antagonistic components of the cross talk between stress response mechanisms. We performed meta-analysis on drought and bacterial stresses in rice and Arabidopsis, and identified hundreds of shared genes. We found high level of conservation of gene expression between these stresses. Weighted co-expression network analysis detected two tight clusters of genes made up of master transcription factors and signaling genes showing strikingly opposite expression status. To comprehensively identify the shared stress responsive genes between multiple abiotic and biotic stresses in rice, we performed meta-analyses of microarray studies from seven different abiotic and six biotic stresses separately and found more than thirteen hundred shared stress responsive genes. Various machine learning techniques utilizing these genes classified the stresses into two major classes' namely abiotic and biotic stresses and multiple classes of individual stresses with high accuracy and identified the top genes showing distinct patterns of expression. Functional enrichment and co-expression network analysis revealed the different roles of plant hormones, transcription factors in conserved and non-conserved genesets in regulation of stress response.
Resumo:
Activation of the peroxisome proliferator-activated receptor alpha (PPARalpha) is associated with increased fatty acid catabolism and is commonly targeted for the treatment of hyperlipidemia. To identify latent, endogenous biomarkers of PPARalpha activation and hence increased fatty acid beta-oxidation, healthy human volunteers were given fenofibrate orally for 2 weeks and their urine was profiled by UPLC-QTOFMS. Biomarkers identified by the machine learning algorithm random forests included significant depletion by day 14 of both pantothenic acid (>5-fold) and acetylcarnitine (>20-fold), observations that are consistent with known targets of PPARalpha including pantothenate kinase and genes encoding proteins involved in the transport and synthesis of acylcarnitines. It was also concluded that serum cholesterol (-12.7%), triglycerides (-25.6%), uric acid (-34.7%), together with urinary propylcarnitine (>10-fold), isobutyrylcarnitine (>2.5-fold), (S)-(+)-2-methylbutyrylcarnitine (5-fold), and isovalerylcarnitine (>5-fold) were all reduced by day 14. Specificity of these biomarkers as indicators of PPARalpha activation was demonstrated using the Ppara-null mouse. Urinary pantothenic acid and acylcarnitines may prove useful indicators of PPARalpha-induced fatty acid beta-oxidation in humans. This study illustrates the utility of a pharmacometabolomic approach to understand drug effects on lipid metabolism in both human populations and in inbred mouse models.