Biblioteca Digital

804 resultados para Pixel-based Classification

Comparative performance evaluation of hepatitis C virus genotyping based on the 5' untranslated region versus partial sequencing of the NS5B region of brazilian patients with chronic hepatitis C

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Genotyping of hepatitis C virus (HCV) has become an essential tool for prognosis and prediction of treatment duration. The aim of this study was to compare two HCV genotyping methods: reverse hybridization line probe assay (LiPA v.1) and partial sequencing of the NS5B region. Methods Plasma of 171 patients with chronic hepatitis C were screened using both a commercial method (LiPA HCV Versant, Siemens, Tarrytown, NY, USA) and different primers targeting the NS5B region for PCR amplification and sequencing analysis. Results Comparison of the HCV genotyping methods showed no difference in the classification at the genotype level. However, a total of 82/171 samples (47.9%) including misclassification, non-subtypable, discrepant and inconclusive results were not classified by LiPA at the subtype level but could be discriminated by NS5B sequencing. Of these samples, 34 samples of genotype 1a and 6 samples of genotype 1b were classified at the subtype level using sequencing of NS5B. Conclusions Sequence analysis of NS5B for genotyping HCV provides precise genotype and subtype identification and an accurate epidemiological representation of circulating viral strains.

Classification of polymers insulators hydrophobicity basead on digital image processing

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although the hydrophobicity is usually an arduous parameter to be determined in the field, it has been pointed out as a good option to monitor aging of polymeric outdoor insulators. Concerning this purpose, digital image processing of photos taken from wet insulators has been the main technique nowadays. However, important challenges on this technique still remain to be overcome, such as; images from non-controlled illumination conditions can interfere on analyses and no existence of standard surfaces with different levels of hydrophobicity. In this paper, the photo image samples were digitally filtered to reduce the illumination influence, and hydrophobic surface samples were prepared from wetting silicon surfaces with solution of water-alcohol. Furthermore norevious studies triying to quantify and relate these properties in a mathematical function were found, that could be used in the field by the electrical companies. Based on such considerations, high quality images of countless hydrophobic surfaces were obtained and three different image processing methodologies, the fractal dimension and two Haralick textures descriptors, entropy and homogeneity, associated with several digital filters, were compared. The entropy parameter Haralick's descriptors filtered with the White Top-Hat filter presented the best result to classify the hydrophobicity.

Long-term effects of oil pollution in mangrove forests (Baixada Santista, Southeast Brazil) detected using a GIS-based multitemporal analysis of aerial photographs

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Oil spills are potential threats to the integrity of highly productive coastal wetlands, such as mangrove forests. In October 1983, a mangrove area of nearly 300 ha located on the southeastern coast of Brazil was impacted by a 3.5 million liter crude oil spill released by a broken pipeline. In order to assess the long-term effects of oil pollution on mangrove vegetation, we carried out a GIS-based multitemporal analysis of aerial photographs of the years 1962, 1994, 2000 and 2003. Photointerpretation, visual classification, class quantification, ground-truth and vegetation structure data were combined to evaluate the oil impact. Before the spill, the mangroves exhibited a homogeneous canopy and well-developed stands. More than ten years after the spill, the mangrove vegetation exhibited three distinct zones reflecting the long-term effects of the oil pollution. The most impacted zone (10.5 ha) presented dead trees, exposed substrate and recovering stands with reduced structural development. We suggest that the distinct impact and recovery zones reflect the spatial variability of oil removal rates in the mangrove forest. This study identifies the multitemporal analysis of aerial photographs as a useful tool for assessing a system's capacity for recovery and monitoring the long-term residual effects of pollutants on vegetation dynamics, thus giving support to mangrove forest management and conservation.

Classification of the severity of diabetic neuropathy: a new approach taking uncertainties into account using fuzzy logic

Relevância:

30.00% 30.00%

Publicador:

Resumo:

OBJECTIVE: This study proposes a new approach that considers uncertainty in predicting and quantifying the presence and severity of diabetic peripheral neuropathy. METHODS: A rule-based fuzzy expert system was designed by four experts in diabetic neuropathy. The model variables were used to classify neuropathy in diabetic patients, defining it as mild, moderate, or severe. System performance was evaluated by means of the Kappa agreement measure, comparing the results of the model with those generated by the experts in an assessment of 50 patients. Accuracy was evaluated by an ROC curve analysis obtained based on 50 other cases; the results of those clinical assessments were considered to be the gold standard. RESULTS: According to the Kappa analysis, the model was in moderate agreement with expert opinions. The ROC analysis (evaluation of accuracy) determined an area under the curve equal to 0.91, demonstrating very good consistency in classifying patients with diabetic neuropathy. CONCLUSION: The model efficiently classified diabetic patients with different degrees of neuropathy severity. In addition, the model provides a way to quantify diabetic neuropathy severity and allows a more accurate patient condition assessment.

Evolving decision trees with beam search-based initialization and lexicographic multi-objective evaluation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Decision tree induction algorithms represent one of the most popular techniques for dealing with classification problems. However, traditional decision-tree induction algorithms implement a greedy approach for node splitting that is inherently susceptible to local optima convergence. Evolutionary algorithms can avoid the problems associated with a greedy search and have been successfully employed to the induction of decision trees. Previously, we proposed a lexicographic multi-objective genetic algorithm for decision-tree induction, named LEGAL-Tree. In this work, we propose extending this approach substantially, particularly w.r.t. two important evolutionary aspects: the initialization of the population and the fitness function. We carry out a comprehensive set of experiments to validate our extended algorithm. The experimental results suggest that it is able to outperform both traditional algorithms for decision-tree induction and another evolutionary algorithm in a variety of application domains.

Dynamic texture segmentation based on deterministic partially self-avoiding walks

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recently there has been a considerable interest in dynamic textures due to the explosive growth of multimedia databases. In addition, dynamic texture appears in a wide range of videos, which makes it very important in applications concerning to model physical phenomena. Thus, dynamic textures have emerged as a new field of investigation that extends the static or spatial textures to the spatio-temporal domain. In this paper, we propose a novel approach for dynamic texture segmentation based on automata theory and k-means algorithm. In this approach, a feature vector is extracted for each pixel by applying deterministic partially self-avoiding walks on three orthogonal planes of the video. Then, these feature vectors are clustered by the well-known k-means algorithm. Although the k-means algorithm has shown interesting results, it only ensures its convergence to a local minimum, which affects the final result of segmentation. In order to overcome this drawback, we compare six methods of initialization of the k-means. The experimental results have demonstrated the effectiveness of our proposed approach compared to the state-of-the-art segmentation methods.

A GIS based model for solving the planar Huff problem considering different demand distributions and forbidden regions

Relevância:

30.00% 30.00%

Publicador:

Resumo:

[EN] In this paper, we have used Geographical Information Systems (GIS) to solve the planar Huff problem considering different demand distributions and forbidden regions. Most of the papers connected with the competitive location problems consider that the demand is aggregated in a finite set of points. In other few cases, the models suppose that the demand is distributed along the feasible region according to a functional form, mainly a uniform distribution. In this case, in addition to the discrete and uniform demand distributions we have considered that the demand is represented by a population surface model, that is, a raster map where each pixel has associated a value corresponding to the population living in the area that it covers...

Stripe based clothes segmentation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

[EN]In this paper, a clothes segmentation method for fashion parsing is described. This method does not rely in a previous pose estimation but people segmentation. Therefore, novel and classic segmentation techniques have been considered and improved in order to achieve accurate people segmentation. Unlike other methods described in the literature, the output is the bounding box and the predominant color of the different clothes and not a pixel level segmentation. The proposal is based on dividing the person area into an initial fixed number of stripes, that are later fused according to similar color distribution. To assess the quality of the proposed method the experiments are carried out with the Fashionista dataset that is widely used in the fashion parsing community.

Mass spectrometry-based protein profiling strategies for biomarker discovery in liver and inflammatory bowel diseases

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The study of protein expression profiles for biomarker discovery in serum and in mammalian cell populations needs the continuous improvement and combination of proteins/peptides separation techniques, mass spectrometry, statistical and bioinformatic approaches. In this thesis work two different mass spectrometry-based protein profiling strategies have been developed and applied to liver and inflammatory bowel diseases (IBDs) for the discovery of new biomarkers. The first of them, based on bulk solid-phase extraction combined with matrix-assisted laser desorption/ionization - Time of Flight mass spectrometry (MALDI-TOF MS) and chemometric analysis of serum samples, was applied to the study of serum protein expression profiles both in IBDs (Crohn’s disease and ulcerative colitis) and in liver diseases (cirrhosis, hepatocellular carcinoma, viral hepatitis). The approach allowed the enrichment of serum proteins/peptides due to the high interaction surface between analytes and solid phase and the high recovery due to the elution step performed directly on the MALDI-target plate. Furthermore the use of chemometric algorithm for the selection of the variables with higher discriminant power permitted to evaluate patterns of 20-30 proteins involved in the differentiation and classification of serum samples from healthy donors and diseased patients. These proteins profiles permit to discriminate among the pathologies with an optimum classification and prediction abilities. In particular in the study of inflammatory bowel diseases, after the analysis using C18 of 129 serum samples from healthy donors and Crohn’s disease, ulcerative colitis and inflammatory controls patients, a 90.7% of classification ability and a 72.9% prediction ability were obtained. In the study of liver diseases (hepatocellular carcinoma, viral hepatitis and cirrhosis) a 80.6% of prediction ability was achieved using IDA-Cu(II) as extraction procedure. The identification of the selected proteins by MALDITOF/ TOF MS analysis or by their selective enrichment followed by enzymatic digestion and MS/MS analysis may give useful information in order to identify new biomarkers involved in the diseases. The second mass spectrometry-based protein profiling strategy developed was based on a label-free liquid chromatography electrospray ionization quadrupole - time of flight differential analysis approach (LC ESI-QTOF MS), combined with targeted MS/MS analysis of only identified differences. The strategy was used for biomarker discovery in IBDs, and in particular of Crohn’s disease. The enriched serum peptidome and the subcellular fractions of intestinal epithelial cells (IECs) from healthy donors and Crohn’s disease patients were analysed. The combining of the low molecular weight serum proteins enrichment step and the LCMS approach allowed to evaluate a pattern of peptides derived from specific exoprotease activity in the coagulation and complement activation pathways. Among these peptides, particularly interesting was the discovery of clusters of peptides from fibrinopeptide A, Apolipoprotein E and A4, and complement C3 and C4. Further studies need to be performed to evaluate the specificity of these clusters and validate the results, in order to develop a rapid serum diagnostic test. The analysis by label-free LC ESI-QTOF MS differential analysis of the subcellular fractions of IECs from Crohn’s disease patients and healthy donors permitted to find many proteins that could be involved in the inflammation process. Among them heat shock protein 70, tryptase alpha-1 precursor and proteins whose upregulation can be explained by the increased activity of IECs in Crohn’s disease were identified. Follow-up studies for the validation of the results and the in-depth investigation of the inflammation pathways involved in the disease will be performed. Both the developed mass spectrometry-based protein profiling strategies have been proved to be useful tools for the discovery of disease biomarkers that need to be validated in further studies.

A skin surface characterization system based on capacitive image analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

During the last few years, several methods have been proposed in order to study and to evaluate characteristic properties of the human skin by using non-invasive approaches. Mostly, these methods cover aspects related to either dermatology, to analyze skin physiology and to evaluate the effectiveness of medical treatments in skin diseases, or dermocosmetics and cosmetic science to evaluate, for example, the effectiveness of anti-aging treatments. To these purposes a routine approach must be followed. Although very accurate and high resolution measurements can be achieved by using conventional methods, such as optical or mechanical profilometry for example, their use is quite limited primarily to the high cost of the instrumentation required, which in turn is usually cumbersome, highlighting some of the limitations for a routine based analysis. This thesis aims to investigate the feasibility of a noninvasive skin characterization system based on the analysis of capacitive images of the skin surface. The system relies on a CMOS portable capacitive device which gives 50 micron/pixel resolution capacitance map of the skin micro-relief. In order to extract characteristic features of the skin topography, image analysis techniques, such as watershed segmentation and wavelet analysis, have been used to detect the main structures of interest: wrinkles and plateau of the typical micro-relief pattern. In order to validate the method, the features extracted from a dataset of skin capacitive images acquired during dermatological examinations of a healthy group of volunteers have been compared with the age of the subjects involved, showing good correlation with the skin ageing effect. Detailed analysis of the output of the capacitive sensor compared with optical profilometry of silicone replica of the same skin area has revealed potentiality and some limitations of this technology. Also, applications to follow-up studies, as needed to objectively evaluate the effectiveness of treatments in a routine manner, are discussed.

Development of a new galaxy classification technique and its application to the zCOSMOS survey

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this Thesis is to develop a robust and powerful method to classify galaxies from large surveys, in order to establish and confirm the connections between the principal observational parameters of the galaxies (spectral features, colours, morphological indices), and help unveil the evolution of these parameters from $z \sim 1$ to the local Universe. Within the framework of zCOSMOS-bright survey, and making use of its large database of objects ($\sim 10\,000$ galaxies in the redshift range $0 < z \lesssim 1.2$) and its great reliability in redshift and spectral properties determinations, first we adopt and extend the \emph{classification cube method}, as developed by Mignoli et al. (2009), to exploit the bimodal properties of galaxies (spectral, photometric and morphologic) separately, and then combining together these three subclassifications. We use this classification method as a test for a newly devised statistical classification, based on Principal Component Analysis and Unsupervised Fuzzy Partition clustering method (PCA+UFP), which is able to define the galaxy population exploiting their natural global bimodality, considering simultaneously up to 8 different properties. The PCA+UFP analysis is a very powerful and robust tool to probe the nature and the evolution of galaxies in a survey. It allows to define with less uncertainties the classification of galaxies, adding the flexibility to be adapted to different parameters: being a fuzzy classification it avoids the problems due to a hard classification, such as the classification cube presented in the first part of the article. The PCA+UFP method can be easily applied to different datasets: it does not rely on the nature of the data and for this reason it can be successfully employed with others observables (magnitudes, colours) or derived properties (masses, luminosities, SFRs, etc.). The agreement between the two classification cluster definitions is very high. ``Early'' and ``late'' type galaxies are well defined by the spectral, photometric and morphological properties, both considering them in a separate way and then combining the classifications (classification cube) and treating them as a whole (PCA+UFP cluster analysis). Differences arise in the definition of outliers: the classification cube is much more sensitive to single measurement errors or misclassifications in one property than the PCA+UFP cluster analysis, in which errors are ``averaged out'' during the process. This method allowed us to behold the \emph{downsizing} effect taking place in the PC spaces: the migration between the blue cloud towards the red clump happens at higher redshifts for galaxies of larger mass. The determination of $M_{\mathrm{cross}}$ the transition mass is in significant agreement with others values in literature.

Sistemi riconfigurabili a basso consumo per applicazioni di monitoraggio distribuito

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The term Ambient Intelligence (AmI) refers to a vision on the future of the information society where smart, electronic environment are sensitive and responsive to the presence of people and their activities (Context awareness). In an ambient intelligence world, devices work in concert to support people in carrying out their everyday life activities, tasks and rituals in an easy, natural way using information and intelligence that is hidden in the network connecting these devices. This promotes the creation of pervasive environments improving the quality of life of the occupants and enhancing the human experience. AmI stems from the convergence of three key technologies: ubiquitous computing, ubiquitous communication and natural interfaces. Ambient intelligent systems are heterogeneous and require an excellent cooperation between several hardware/software technologies and disciplines, including signal processing, networking and protocols, embedded systems, information management, and distributed algorithms. Since a large amount of fixed and mobile sensors embedded is deployed into the environment, the Wireless Sensor Networks is one of the most relevant enabling technologies for AmI. WSN are complex systems made up of a number of sensor nodes which can be deployed in a target area to sense physical phenomena and communicate with other nodes and base stations. These simple devices typically embed a low power computational unit (microcontrollers, FPGAs etc.), a wireless communication unit, one or more sensors and a some form of energy supply (either batteries or energy scavenger modules). WNS promises of revolutionizing the interactions between the real physical worlds and human beings. Low-cost, low-computational power, low energy consumption and small size are characteristics that must be taken into consideration when designing and dealing with WSNs. To fully exploit the potential of distributed sensing approaches, a set of challengesmust be addressed. Sensor nodes are inherently resource-constrained systems with very low power consumption and small size requirements which enables than to reduce the interference on the physical phenomena sensed and to allow easy and low-cost deployment. They have limited processing speed,storage capacity and communication bandwidth that must be efficiently used to increase the degree of local ”understanding” of the observed phenomena. A particular case of sensor nodes are video sensors. This topic holds strong interest for a wide range of contexts such as military, security, robotics and most recently consumer applications. Vision sensors are extremely effective for medium to long-range sensing because vision provides rich information to human operators. However, image sensors generate a huge amount of data, whichmust be heavily processed before it is transmitted due to the scarce bandwidth capability of radio interfaces. In particular, in video-surveillance, it has been shown that source-side compression is mandatory due to limited bandwidth and delay constraints. Moreover, there is an ample opportunity for performing higher-level processing functions, such as object recognition that has the potential to drastically reduce the required bandwidth (e.g. by transmitting compressed images only when something ‘interesting‘ is detected). The energy cost of image processing must however be carefully minimized. Imaging could play and plays an important role in sensing devices for ambient intelligence. Computer vision can for instance be used for recognising persons and objects and recognising behaviour such as illness and rioting. Having a wireless camera as a camera mote opens the way for distributed scene analysis. More eyes see more than one and a camera system that can observe a scene from multiple directions would be able to overcome occlusion problems and could describe objects in their true 3D appearance. In real-time, these approaches are a recently opened field of research. In this thesis we pay attention to the realities of hardware/software technologies and the design needed to realize systems for distributed monitoring, attempting to propose solutions on open issues and filling the gap between AmI scenarios and hardware reality. The physical implementation of an individual wireless node is constrained by three important metrics which are outlined below. Despite that the design of the sensor network and its sensor nodes is strictly application dependent, a number of constraints should almost always be considered. Among them: • Small form factor to reduce nodes intrusiveness. • Low power consumption to reduce battery size and to extend nodes lifetime. • Low cost for a widespread diffusion. These limitations typically result in the adoption of low power, low cost devices such as low powermicrocontrollers with few kilobytes of RAMand tenth of kilobytes of program memory with whomonly simple data processing algorithms can be implemented. However the overall computational power of the WNS can be very large since the network presents a high degree of parallelism that can be exploited through the adoption of ad-hoc techniques. Furthermore through the fusion of information from the dense mesh of sensors even complex phenomena can be monitored. In this dissertation we present our results in building several AmI applications suitable for a WSN implementation. The work can be divided into two main areas:Low Power Video Sensor Node and Video Processing Alghoritm and Multimodal Surveillance . Low Power Video Sensor Nodes and Video Processing Alghoritms In comparison to scalar sensors, such as temperature, pressure, humidity, velocity, and acceleration sensors, vision sensors generate much higher bandwidth data due to the two-dimensional nature of their pixel array. We have tackled all the constraints listed above and have proposed solutions to overcome the current WSNlimits for Video sensor node. We have designed and developed wireless video sensor nodes focusing on the small size and the flexibility of reuse in different applications. The video nodes target a different design point: the portability (on-board power supply, wireless communication), a scanty power budget (500mW),while still providing a prominent level of intelligence, namely sophisticated classification algorithmand high level of reconfigurability. We developed two different video sensor node: The device architecture of the first one is based on a low-cost low-power FPGA+microcontroller system-on-chip. The second one is based on ARM9 processor. Both systems designed within the above mentioned power envelope could operate in a continuous fashion with Li-Polymer battery pack and solar panel. Novel low power low cost video sensor nodes which, in contrast to sensors that just watch the world, are capable of comprehending the perceived information in order to interpret it locally, are presented. Featuring such intelligence, these nodes would be able to cope with such tasks as recognition of unattended bags in airports, persons carrying potentially dangerous objects, etc.,which normally require a human operator. Vision algorithms for object detection, acquisition like human detection with Support Vector Machine (SVM) classification and abandoned/removed object detection are implemented, described and illustrated on real world data. Multimodal surveillance: In several setup the use of wired video cameras may not be possible. For this reason building an energy efficient wireless vision network for monitoring and surveillance is one of the major efforts in the sensor network community. Energy efficiency for wireless smart camera networks is one of the major efforts in distributed monitoring and surveillance community. For this reason, building an energy efficient wireless vision network for monitoring and surveillance is one of the major efforts in the sensor network community. The Pyroelectric Infra-Red (PIR) sensors have been used to extend the lifetime of a solar-powered video sensor node by providing an energy level dependent trigger to the video camera and the wireless module. Such approach has shown to be able to extend node lifetime and possibly result in continuous operation of the node.Being low-cost, passive (thus low-power) and presenting a limited form factor, PIR sensors are well suited for WSN applications. Moreover techniques to have aggressive power management policies are essential for achieving long-termoperating on standalone distributed cameras needed to improve the power consumption. We have used an adaptive controller like Model Predictive Control (MPC) to help the system to improve the performances outperforming naive power management policies.

Determinants of cesarean delivery, a population-based study in the Emilia Romagna Region

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cesarean Delivery (CD) rates are rising in many parts of the world. In order to define strategies to reduce them, it is important to explore the role of clinical and organizational factors. This thesis has the objective to describe the contemporary CD practice and study clinical and organizational variables as determinants of CD in all women who gave birth between 2005 and June 2010 in the Emilia Romagna region (Italy). All hospital discharge abstracts of women who delivered between 2005 and mid 2010 in the region were selected and linked with birth certificates. In addition to descriptive statistics, in order to study the role of clinical and organizational variables (teaching or non-teaching hospital, birth volumes, time and day of delivery) multilevel Poisson regression models and a classification tree were used. A substantial inter-hospital variability in CD rate was found, and this was only partially explained by the considered variables. The most important risk factors of CD were: previous CD (RR 4,95; 95%CI: 4,85-5,05), cord prolapse (RR 3,51; 95% CI:2,96-4,16), and malposition/malpresentation (RR 2,72; 95%CI: 2,66-2,77). Delivery between 7 pm and 7 am and during non working days protect against CD in all subgroups including those with a small number of elective CDs while delivery at a teaching hospital and birth volumes were not statistically significant risk factors. The classification tree shows that previous CD and malposition/malpresentation are the most important variables discriminating between high and low risk of CD. These results indicate that other not considered factors might explain CD variability and do not provide clear evidence that small hospitals have a poor performance in terms of CD rate. Some strategies to reduce CD could be found by focusing on the differences in delivery practice between day and night and between working and no-working day deliveries.

The APSEL4D Monolithic Active Pixel Sensor and its Usage in a Single Electron Interference Experiment

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We have realized a Data Acquisition chain for the use and characterization of APSEL4D, a 32 x 128 Monolithic Active Pixel Sensor, developed as a prototype for frontier experiments in high energy particle physics. In particular a transition board was realized for the conversion between the chip and the FPGA voltage levels and for the signal quality enhancing. A Xilinx Spartan-3 FPGA was used for real time data processing, for the chip control and the communication with a Personal Computer through a 2.0 USB port. For this purpose a firmware code, developed in VHDL language, was written. Finally a Graphical User Interface for the online system monitoring, hit display and chip control, based on windows and widgets, was realized developing a C++ code and using Qt and Qwt dedicated libraries. APSEL4D and the full acquisition chain were characterized for the first time with the electron beam of the transmission electron microscope and with 55Fe and 90Sr radioactive sources. In addition, a beam test was performed at the T9 station of the CERN PS, where hadrons of momentum of 12 GeV/c are available. The very high time resolution of APSEL4D (up to 2.5 Mfps, but used at 6 kfps) was fundamental in realizing a single electron Young experiment using nanometric double slits obtained by a FIB technique. On high statistical samples, it was possible to observe the interference and diffractions of single isolated electrons traveling inside a transmission electron microscope. For the first time, the information on the distribution of the arrival time of the single electrons has been extracted.

Learning with Kernels on Graphs: DAG-based kernels, data streams and RNA function prediction.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.

«
1
2
...
41
42
43
44
45
46
47
...
53
54
»