861 resultados para Support vectors machine
Resumo:
The multiple-instance learning (MIL) model has been successful in areas such as drug discovery and content-based image-retrieval. Recently, this model was generalized and a corresponding kernel was introduced to learn generalized MIL concepts with a support vector machine. While this kernel enjoyed empirical success, it has limitations in its representation. We extend this kernel by enriching its representation and empirically evaluate our new kernel on data from content-based image retrieval, biological sequence analysis, and drug discovery. We found that our new kernel generalized noticeably better than the old one in content-based image retrieval and biological sequence analysis and was slightly better or even with the old kernel in the other applications, showing that an SVM using this kernel does not overfit despite its richer representation.
Resumo:
The reproductive performance of cattle may be influenced by several factors, but mineral imbalances are crucial in terms of direct effects on reproduction. Several studies have shown that elements such as calcium, copper, iron, magnesium, selenium, and zinc are essential for reproduction and can prevent oxidative stress. However, toxic elements such as lead, nickel, and arsenic can have adverse effects on reproduction. In this paper, we applied a simple and fast method of multi-element analysis to bovine semen samples from Zebu and European classes used in reproduction programs and artificial insemination. Samples were analyzed by inductively coupled plasma spectrometry (ICP-MS) using aqueous medium calibration and the samples were diluted in a proportion of 1:50 in a solution containing 0.01% (vol/vol) Triton X-100 and 0.5% (vol/vol) nitric acid. Rhodium, iridium, and yttrium were used as the internal standards for ICP-MS analysis. To develop a reliable method of tracing the class of bovine semen, we used data mining techniques that make it possible to classify unknown samples after checking the differentiation of known-class samples. Based on the determination of 15 elements in 41 samples of bovine semen, 3 machine-learning tools for classification were applied to determine cattle class. Our results demonstrate the potential of support vector machine (SVM), multilayer perceptron (MLP), and random forest (RF) chemometric tools to identify cattle class. Moreover, the selection tools made it possible to reduce the number of chemical elements needed from 15 to just 8.
Resumo:
Multi-element analysis of honey samples was carried out with the aim of developing a reliable method of tracing the origin of honey. Forty-two chemical elements were determined (Al, Cu, Pb, Zn, Mn, Cd, Tl, Co, Ni, Rb, Ba, Be, Bi, U, V, Fe, Pt, Pd, Te, Hf, Mo, Sn, Sb, P, La, Mg, I, Sm, Tb, Dy, Sd, Th, Pr, Nd, Tm, Yb, Lu, Gd, Ho, Er, Ce, Cr) by inductively coupled plasma mass spectrometry (ICP-MS). Then, three machine learning tools for classification and two for attribute selection were applied in order to prove that it is possible to use data mining tools to find the region where honey originated. Our results clearly demonstrate the potential of Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Random Forest (RF) chemometric tools for honey origin identification. Moreover, the selection tools allowed a reduction from 42 trace element concentrations to only 5. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
Recent experimental evidence has suggested a neuromodulatory deficit in Alzheimer's disease (AD). In this paper, we present a new electroencephalogram (EEG) based metric to quantitatively characterize neuromodulatory activity. More specifically, the short-term EEG amplitude modulation rate-of-change (i.e., modulation frequency) is computed for five EEG subband signals. To test the performance of the proposed metric, a classification task was performed on a database of 32 participants partitioned into three groups of approximately equal size: healthy controls, patients diagnosed with mild AD, and those with moderate-to-severe AD. To gauge the benefits of the proposed metric, performance results were compared with those obtained using EEG spectral peak parameters which were recently shown to outperform other conventional EEG measures. Using a simple feature selection algorithm based on area-under-the-curve maximization and a support vector machine classifier, the proposed parameters resulted in accuracy gains, relative to spectral peak parameters, of 21.3% when discriminating between the three groups and by 50% when mild and moderate-to-severe groups were merged into one. The preliminary findings reported herein provide promising insights that automated tools may be developed to assist physicians in very early diagnosis of AD as well as provide researchers with a tool to automatically characterize cross-frequency interactions and their changes with disease.
Resumo:
The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe
Resumo:
In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.
Resumo:
Motivation An actual issue of great interest, both under a theoretical and an applicative perspective, is the analysis of biological sequences for disclosing the information that they encode. The development of new technologies for genome sequencing in the last years, opened new fundamental problems since huge amounts of biological data still deserve an interpretation. Indeed, the sequencing is only the first step of the genome annotation process that consists in the assignment of biological information to each sequence. Hence given the large amount of available data, in silico methods became useful and necessary in order to extract relevant information from sequences. The availability of data from Genome Projects gave rise to new strategies for tackling the basic problems of computational biology such as the determination of the tridimensional structures of proteins, their biological function and their reciprocal interactions. Results The aim of this work has been the implementation of predictive methods that allow the extraction of information on the properties of genomes and proteins starting from the nucleotide and aminoacidic sequences, by taking advantage of the information provided by the comparison of the genome sequences from different species. In the first part of the work a comprehensive large scale genome comparison of 599 organisms is described. 2,6 million of sequences coming from 551 prokaryotic and 48 eukaryotic genomes were aligned and clustered on the basis of their sequence identity. This procedure led to the identification of classes of proteins that are peculiar to the different groups of organisms. Moreover the adopted similarity threshold produced clusters that are homogeneous on the structural point of view and that can be used for structural annotation of uncharacterized sequences. The second part of the work focuses on the characterization of thermostable proteins and on the development of tools able to predict the thermostability of a protein starting from its sequence. By means of Principal Component Analysis the codon composition of a non redundant database comprising 116 prokaryotic genomes has been analyzed and it has been showed that a cross genomic approach can allow the extraction of common determinants of thermostability at the genome level, leading to an overall accuracy in discriminating thermophilic coding sequences equal to 95%. This result outperform those obtained in previous studies. Moreover, we investigated the effect of multiple mutations on protein thermostability. This issue is of great importance in the field of protein engineering, since thermostable proteins are generally more suitable than their mesostable counterparts in technological applications. A Support Vector Machine based method has been trained to predict if a set of mutations can enhance the thermostability of a given protein sequence. The developed predictor achieves 88% accuracy.
Resumo:
Facial expression recognition is one of the most challenging research areas in the image recognition ¯eld and has been actively studied since the 70's. For instance, smile recognition has been studied due to the fact that it is considered an important facial expression in human communication, it is therefore likely useful for human–machine interaction. Moreover, if a smile can be detected and also its intensity estimated, it will raise the possibility of new applications in the future
Resumo:
The term Ambient Intelligence (AmI) refers to a vision on the future of the information society where smart, electronic environment are sensitive and responsive to the presence of people and their activities (Context awareness). In an ambient intelligence world, devices work in concert to support people in carrying out their everyday life activities, tasks and rituals in an easy, natural way using information and intelligence that is hidden in the network connecting these devices. This promotes the creation of pervasive environments improving the quality of life of the occupants and enhancing the human experience. AmI stems from the convergence of three key technologies: ubiquitous computing, ubiquitous communication and natural interfaces. Ambient intelligent systems are heterogeneous and require an excellent cooperation between several hardware/software technologies and disciplines, including signal processing, networking and protocols, embedded systems, information management, and distributed algorithms. Since a large amount of fixed and mobile sensors embedded is deployed into the environment, the Wireless Sensor Networks is one of the most relevant enabling technologies for AmI. WSN are complex systems made up of a number of sensor nodes which can be deployed in a target area to sense physical phenomena and communicate with other nodes and base stations. These simple devices typically embed a low power computational unit (microcontrollers, FPGAs etc.), a wireless communication unit, one or more sensors and a some form of energy supply (either batteries or energy scavenger modules). WNS promises of revolutionizing the interactions between the real physical worlds and human beings. Low-cost, low-computational power, low energy consumption and small size are characteristics that must be taken into consideration when designing and dealing with WSNs. To fully exploit the potential of distributed sensing approaches, a set of challengesmust be addressed. Sensor nodes are inherently resource-constrained systems with very low power consumption and small size requirements which enables than to reduce the interference on the physical phenomena sensed and to allow easy and low-cost deployment. They have limited processing speed,storage capacity and communication bandwidth that must be efficiently used to increase the degree of local ”understanding” of the observed phenomena. A particular case of sensor nodes are video sensors. This topic holds strong interest for a wide range of contexts such as military, security, robotics and most recently consumer applications. Vision sensors are extremely effective for medium to long-range sensing because vision provides rich information to human operators. However, image sensors generate a huge amount of data, whichmust be heavily processed before it is transmitted due to the scarce bandwidth capability of radio interfaces. In particular, in video-surveillance, it has been shown that source-side compression is mandatory due to limited bandwidth and delay constraints. Moreover, there is an ample opportunity for performing higher-level processing functions, such as object recognition that has the potential to drastically reduce the required bandwidth (e.g. by transmitting compressed images only when something ‘interesting‘ is detected). The energy cost of image processing must however be carefully minimized. Imaging could play and plays an important role in sensing devices for ambient intelligence. Computer vision can for instance be used for recognising persons and objects and recognising behaviour such as illness and rioting. Having a wireless camera as a camera mote opens the way for distributed scene analysis. More eyes see more than one and a camera system that can observe a scene from multiple directions would be able to overcome occlusion problems and could describe objects in their true 3D appearance. In real-time, these approaches are a recently opened field of research. In this thesis we pay attention to the realities of hardware/software technologies and the design needed to realize systems for distributed monitoring, attempting to propose solutions on open issues and filling the gap between AmI scenarios and hardware reality. The physical implementation of an individual wireless node is constrained by three important metrics which are outlined below. Despite that the design of the sensor network and its sensor nodes is strictly application dependent, a number of constraints should almost always be considered. Among them: • Small form factor to reduce nodes intrusiveness. • Low power consumption to reduce battery size and to extend nodes lifetime. • Low cost for a widespread diffusion. These limitations typically result in the adoption of low power, low cost devices such as low powermicrocontrollers with few kilobytes of RAMand tenth of kilobytes of program memory with whomonly simple data processing algorithms can be implemented. However the overall computational power of the WNS can be very large since the network presents a high degree of parallelism that can be exploited through the adoption of ad-hoc techniques. Furthermore through the fusion of information from the dense mesh of sensors even complex phenomena can be monitored. In this dissertation we present our results in building several AmI applications suitable for a WSN implementation. The work can be divided into two main areas:Low Power Video Sensor Node and Video Processing Alghoritm and Multimodal Surveillance . Low Power Video Sensor Nodes and Video Processing Alghoritms In comparison to scalar sensors, such as temperature, pressure, humidity, velocity, and acceleration sensors, vision sensors generate much higher bandwidth data due to the two-dimensional nature of their pixel array. We have tackled all the constraints listed above and have proposed solutions to overcome the current WSNlimits for Video sensor node. We have designed and developed wireless video sensor nodes focusing on the small size and the flexibility of reuse in different applications. The video nodes target a different design point: the portability (on-board power supply, wireless communication), a scanty power budget (500mW),while still providing a prominent level of intelligence, namely sophisticated classification algorithmand high level of reconfigurability. We developed two different video sensor node: The device architecture of the first one is based on a low-cost low-power FPGA+microcontroller system-on-chip. The second one is based on ARM9 processor. Both systems designed within the above mentioned power envelope could operate in a continuous fashion with Li-Polymer battery pack and solar panel. Novel low power low cost video sensor nodes which, in contrast to sensors that just watch the world, are capable of comprehending the perceived information in order to interpret it locally, are presented. Featuring such intelligence, these nodes would be able to cope with such tasks as recognition of unattended bags in airports, persons carrying potentially dangerous objects, etc.,which normally require a human operator. Vision algorithms for object detection, acquisition like human detection with Support Vector Machine (SVM) classification and abandoned/removed object detection are implemented, described and illustrated on real world data. Multimodal surveillance: In several setup the use of wired video cameras may not be possible. For this reason building an energy efficient wireless vision network for monitoring and surveillance is one of the major efforts in the sensor network community. Energy efficiency for wireless smart camera networks is one of the major efforts in distributed monitoring and surveillance community. For this reason, building an energy efficient wireless vision network for monitoring and surveillance is one of the major efforts in the sensor network community. The Pyroelectric Infra-Red (PIR) sensors have been used to extend the lifetime of a solar-powered video sensor node by providing an energy level dependent trigger to the video camera and the wireless module. Such approach has shown to be able to extend node lifetime and possibly result in continuous operation of the node.Being low-cost, passive (thus low-power) and presenting a limited form factor, PIR sensors are well suited for WSN applications. Moreover techniques to have aggressive power management policies are essential for achieving long-termoperating on standalone distributed cameras needed to improve the power consumption. We have used an adaptive controller like Model Predictive Control (MPC) to help the system to improve the performances outperforming naive power management policies.
Resumo:
In der Erdöl– und Gasindustrie sind bildgebende Verfahren und Simulationen auf der Porenskala im Begriff Routineanwendungen zu werden. Ihr weiteres Potential lässt sich im Umweltbereich anwenden, wie z.B. für den Transport und Verbleib von Schadstoffen im Untergrund, die Speicherung von Kohlendioxid und dem natürlichen Abbau von Schadstoffen in Böden. Mit der Röntgen-Computertomografie (XCT) steht ein zerstörungsfreies 3D bildgebendes Verfahren zur Verfügung, das auch häufig für die Untersuchung der internen Struktur geologischer Proben herangezogen wird. Das erste Ziel dieser Dissertation war die Implementierung einer Bildverarbeitungstechnik, die die Strahlenaufhärtung der Röntgen-Computertomografie beseitigt und den Segmentierungsprozess dessen Daten vereinfacht. Das zweite Ziel dieser Arbeit untersuchte die kombinierten Effekte von Porenraumcharakteristika, Porentortuosität, sowie die Strömungssimulation und Transportmodellierung in Porenräumen mit der Gitter-Boltzmann-Methode. In einer zylindrischen geologischen Probe war die Position jeder Phase auf Grundlage der Beobachtung durch das Vorhandensein der Strahlenaufhärtung in den rekonstruierten Bildern, das eine radiale Funktion vom Probenrand zum Zentrum darstellt, extrahierbar und die unterschiedlichen Phasen ließen sich automatisch segmentieren. Weiterhin wurden Strahlungsaufhärtungeffekte von beliebig geformten Objekten durch einen Oberflächenanpassungsalgorithmus korrigiert. Die Methode der „least square support vector machine” (LSSVM) ist durch einen modularen Aufbau charakterisiert und ist sehr gut für die Erkennung und Klassifizierung von Mustern geeignet. Aus diesem Grund wurde die Methode der LSSVM als pixelbasierte Klassifikationsmethode implementiert. Dieser Algorithmus ist in der Lage komplexe geologische Proben korrekt zu klassifizieren, benötigt für den Fall aber längere Rechenzeiten, so dass mehrdimensionale Trainingsdatensätze verwendet werden müssen. Die Dynamik von den unmischbaren Phasen Luft und Wasser wird durch eine Kombination von Porenmorphologie und Gitter Boltzmann Methode für Drainage und Imbibition Prozessen in 3D Datensätzen von Böden, die durch synchrotron-basierte XCT gewonnen wurden, untersucht. Obwohl die Porenmorphologie eine einfache Methode ist Kugeln in den verfügbaren Porenraum einzupassen, kann sie dennoch die komplexe kapillare Hysterese als eine Funktion der Wassersättigung erklären. Eine Hysterese ist für den Kapillardruck und die hydraulische Leitfähigkeit beobachtet worden, welche durch die hauptsächlich verbundenen Porennetzwerke und der verfügbaren Porenraumgrößenverteilung verursacht sind. Die hydraulische Konduktivität ist eine Funktion des Wassersättigungslevels und wird mit einer makroskopischen Berechnung empirischer Modelle verglichen. Die Daten stimmen vor allem für hohe Wassersättigungen gut überein. Um die Gegenwart von Krankheitserregern im Grundwasser und Abwässern vorhersagen zu können, wurde in einem Bodenaggregat der Einfluss von Korngröße, Porengeometrie und Fluidflussgeschwindigkeit z.B. mit dem Mikroorganismus Escherichia coli studiert. Die asymmetrischen und langschweifigen Durchbruchskurven, besonders bei höheren Wassersättigungen, wurden durch dispersiven Transport aufgrund des verbundenen Porennetzwerks und durch die Heterogenität des Strömungsfeldes verursacht. Es wurde beobachtet, dass die biokolloidale Verweilzeit eine Funktion des Druckgradienten als auch der Kolloidgröße ist. Unsere Modellierungsergebnisse stimmen sehr gut mit den bereits veröffentlichten Daten überein.
Resumo:
It has long been known that trypanosomes regulate mitochondrial biogenesis during the life cycle of the parasite; however, the mitochondrial protein inventory (MitoCarta) and its regulation remain unknown. We present a novel computational method for genome-wide prediction of mitochondrial proteins using a support vector machine-based classifier with approximately 90% prediction accuracy. Using this method, we predicted the mitochondrial localization of 468 proteins with high confidence and have experimentally verified the localization of a subset of these proteins. We then applied a recently developed parallel sequencing technology to determine the expression profiles and the splicing patterns of a total of 1065 predicted MitoCarta transcripts during the development of the parasite, and showed that 435 of the transcripts significantly changed their expressions while 630 remain unchanged in any of the three life stages analyzed. Furthermore, we identified 298 alternatively splicing events, a small subset of which could lead to dual localization of the corresponding proteins.
Resumo:
In clinical diagnostics, it is of outmost importance to correctly identify the source of a metastatic tumor, especially if no apparent primary tumor is present. Tissue-based proteomics might allow correct tumor classification. As a result, we performed MALDI imaging to generate proteomic signatures for different tumors. These signatures were used to classify common cancer types. At first, a cohort comprised of tissue samples from six adenocarcinoma entities located at different organ sites (esophagus, breast, colon, liver, stomach, thyroid gland, n = 171) was classified using two algorithms for a training and test set. For the test set, Support Vector Machine and Random Forest yielded overall accuracies of 82.74 and 81.18%, respectively. Then, colon cancer liver metastasis samples (n = 19) were introduced into the classification. The liver metastasis samples could be discriminated with high accuracy from primary tumors of colon cancer and hepatocellular carcinoma. Additionally, colon cancer liver metastasis samples could be successfully classified by using colon cancer primary tumor samples for the training of the classifier. These findings demonstrate that MALDI imaging-derived proteomic classifiers can discriminate between different tumor types at different organ sites and in the same site.
Resumo:
Computer vision-based food recognition could be used to estimate a meal's carbohydrate content for diabetic patients. This study proposes a methodology for automatic food recognition, based on the Bag of Features (BoF) model. An extensive technical investigation was conducted for the identification and optimization of the best performing components involved in the BoF architecture, as well as the estimation of the corresponding parameters. For the design and evaluation of the prototype system, a visual dataset with nearly 5,000 food images was created and organized into 11 classes. The optimized system computes dense local features, using the scale-invariant feature transform on the HSV color space, builds a visual dictionary of 10,000 visual words by using the hierarchical k-means clustering and finally classifies the food images with a linear support vector machine classifier. The system achieved classification accuracy of the order of 78%, thus proving the feasibility of the proposed approach in a very challenging image dataset.
Resumo:
In this paper, we propose novel methodologies for the automatic segmentation and recognition of multi-food images. The proposed methods implement the first modules of a carbohydrate counting and insulin advisory system for type 1 diabetic patients. Initially the plate is segmented using pyramidal mean-shift filtering and a region growing algorithm. Then each of the resulted segments is described by both color and texture features and classified by a support vector machine into one of six different major food classes. Finally, a modified version of the Huang and Dom evaluation index was proposed, addressing the particular needs of the food segmentation problem. The experimental results prove the effectiveness of the proposed method achieving a segmentation accuracy of 88.5% and recognition rate equal to 87%
Resumo:
Smart homes for the aging population have recently started attracting the attention of the research community. The "health state" of smart homes is comprised of many different levels; starting with the physical health of citizens, it also includes longer-term health norms and outcomes, as well as the arena of positive behavior changes. One of the problems of interest is to monitor the activities of daily living (ADL) of the elderly, aiming at their protection and well-being. For this purpose, we installed passive infrared (PIR) sensors to detect motion in a specific area inside a smart apartment and used them to collect a set of ADL. In a novel approach, we describe a technology that allows the ground truth collected in one smart home to train activity recognition systems for other smart homes. We asked the users to label all instances of all ADL only once and subsequently applied data mining techniques to cluster in-home sensor firings. Each cluster would therefore represent the instances of the same activity. Once the clusters were associated to their corresponding activities, our system was able to recognize future activities. To improve the activity recognition accuracy, our system preprocessed raw sensor data by identifying overlapping activities. To evaluate the recognition performance from a 200-day dataset, we implemented three different active learning classification algorithms and compared their performance: naive Bayesian (NB), support vector machine (SVM) and random forest (RF). Based on our results, the RF classifier recognized activities with an average specificity of 96.53%, a sensitivity of 68.49%, a precision of 74.41% and an F-measure of 71.33%, outperforming both the NB and SVM classifiers. Further clustering markedly improved the results of the RF classifier. An activity recognition system based on PIR sensors in conjunction with a clustering classification approach was able to detect ADL from datasets collected from different homes. Thus, our PIR-based smart home technology could improve care and provide valuable information to better understand the functioning of our societies, as well as to inform both individual and collective action in a smart city scenario.