813 resultados para microarray data classification
Resumo:
There are different ways to do cluster analysis of categorical data in the literature and the choice among them is strongly related to the aim of the researcher, if we do not take into account time and economical constraints. Main approaches for clustering are usually distinguished into model-based and distance-based methods: the former assume that objects belonging to the same class are similar in the sense that their observed values come from the same probability distribution, whose parameters are unknown and need to be estimated; the latter evaluate distances among objects by a defined dissimilarity measure and, basing on it, allocate units to the closest group. In clustering, one may be interested in the classification of similar objects into groups, and one may be interested in finding observations that come from the same true homogeneous distribution. But do both of these aims lead to the same clustering? And how good are clustering methods designed to fulfil one of these aims in terms of the other? In order to answer, two approaches, namely a latent class model (mixture of multinomial distributions) and a partition around medoids one, are evaluated and compared by Adjusted Rand Index, Average Silhouette Width and Pearson-Gamma indexes in a fairly wide simulation study. Simulation outcomes are plotted in bi-dimensional graphs via Multidimensional Scaling; size of points is proportional to the number of points that overlap and different colours are used according to the cluster membership.
Resumo:
Il medulloblastoma (MB) è il tumore più comune del sistema nervoso centrale. Attualmente si riesce a curare solo il 50-60% dei pazienti che tuttavia presentano gravi effetti collaterali dovuti alle cure molto aggressive. Una recente classificazione molecolare ha ridistribuito le varianti più comuni di MB in quattro distinti gruppi. In particolare, il gruppo SHH e D sono caratterizzati da un’ alta espressione di MYCN ed associati a prognosi sfavorevole. MYCN è coinvolto nella regolazione della proliferazione cellulare, differenziazione, apoptosi, angiogenesi e metastasi. L’obiettivo di questo lavoro è la valutazione dell’attività antitumorale di oligonucleotidi diretti contro MYCN, sia in vitro su diverse linee cellulari di MB che in vivo in modelli murini xenograft ortotopici di MB. I risultati hanno dimostrato un’ottima inibizione della crescita in linee cellulari di MB, accompagnata da una riduzione della trascrizione genica e dei livelli proteici di MYCN. Inoltre, sono stati confermati tramite RT-PCR alcuni dei geni trovati significativamente variati nell’esperimento di microarray , dopo il trattamento. Molto interessanti sono stati geni quali BIRC5, che risulta down regolato mentre il gene p21 risulta up regolato in tutte le linee cellulari di MB utilizzate. Inoltre, sono stati generati modelli murini di MB utilizzando cellule precedentemente trasfettate per esprimere il gene della luciferasi e valutarne così la crescita tumorale tramite imaging bioluminescente in vivo (BLI), al fine di poter testare l’attività antitumorale data dagli oligonucleotidi. I risultati hanno dimostrato una buona risposta al trattamento come rilevato dalla tecnica di BLI. I dati preliminari prodotti, dimostrano che MYCN potrebbe esser un buon target per la terapia del MB nei casi in cui è overespresso. In particolare, una sua inibizione potrebbe presentare effetti indesiderati moderati in quanto è poco espresso dopo la nascita.
Resumo:
The diagnosis, grading and classification of tumours has benefited considerably from the development of DCE-MRI which is now essential to the adequate clinical management of many tumour types due to its capability in detecting active angiogenesis. Several strategies have been proposed for DCE-MRI evaluation. Visual inspection of contrast agent concentration curves vs time is a very simple yet operator dependent procedure, therefore more objective approaches have been developed in order to facilitate comparison between studies. In so called model free approaches, descriptive or heuristic information extracted from time series raw data have been used for tissue classification. The main issue concerning these schemes is that they have not a direct interpretation in terms of physiological properties of the tissues. On the other hand, model based investigations typically involve compartmental tracer kinetic modelling and pixel-by-pixel estimation of kinetic parameters via non-linear regression applied on region of interests opportunely selected by the physician. This approach has the advantage to provide parameters directly related to the pathophysiological properties of the tissue such as vessel permeability, local regional blood flow, extraction fraction, concentration gradient between plasma and extravascular-extracellular space. Anyway, nonlinear modelling is computational demanding and the accuracy of the estimates can be affected by the signal-to-noise ratio and by the initial solutions. The principal aim of this thesis is investigate the use of semi-quantitative and quantitative parameters for segmentation and classification of breast lesion. The objectives can be subdivided as follow: describe the principal techniques to evaluate time intensity curve in DCE-MRI with focus on kinetic model proposed in literature; to evaluate the influence in parametrization choice for a classic bi-compartmental kinetic models; to evaluate the performance of a method for simultaneous tracer kinetic modelling and pixel classification; to evaluate performance of machine learning techniques training for segmentation and classification of breast lesion.
Resumo:
The objective of this dissertation is to study the structure and behavior of the Atmospheric Boundary Layer (ABL) in stable conditions. This type of boundary layer is not completely well understood yet, although it is very important for many practical uses, from forecast modeling to atmospheric dispersion of pollutants. We analyzed data from the SABLES98 experiment (Stable Atmospheric Boundary Layer Experiment in Spain, 1998), and compared the behaviour of this data using Monin-Obukhov's similarity functions for wind speed and potential temperature. Analyzing the vertical profiles of various variables, in particular the thermal and momentum fluxes, we identified two main contrasting structures describing two different states of the SBL, a traditional and an upside-down boundary layer. We were able to determine the main features of these two states of the boundary layer in terms of vertical profiles of potential temperature and wind speed, turbulent kinetic energy and fluxes, studying the time series and vertical structure of the atmosphere for two separate nights in the dataset, taken as case studies. We also developed an original classification of the SBL, in order to separate the influence of mesoscale phenomena from turbulent behavior, using as parameters the wind speed and the gradient Richardson number. We then compared these two formulations, using the SABLES98 dataset, verifying their validity for different variables (wind speed and potential temperature, and their difference, at different heights) and with different stability parameters (zita or Rg). Despite these two classifications having completely different physical origins, we were able to find some common behavior, in particular under weak stability conditions.
Resumo:
Ziel dieser Dissertation ist die experimentelle Charakterisierung und quantitative Beschreibung der Hybridisierung von komplementären Nukleinsäuresträngen mit oberflächengebundenen Fängermolekülen für die Entwicklung von integrierten Biosensoren. Im Gegensatz zu lösungsbasierten Verfahren ist mit Microarray Substraten die Untersuchung vieler Nukleinsäurekombinationen parallel möglich. Als biologisch relevantes Evaluierungssystem wurde das in Eukaryoten universell exprimierte Actin Gen aus unterschiedlichen Pflanzenspezies verwendet. Dieses Testsystem ermöglicht es, nahe verwandte Pflanzenarten auf Grund von geringen Unterschieden in der Gen-Sequenz (SNPs) zu charakterisieren. Aufbauend auf dieses gut studierte Modell eines House-Keeping Genes wurde ein umfassendes Microarray System, bestehend aus kurzen und langen Oligonukleotiden (mit eingebauten LNA-Molekülen), cDNAs sowie DNA und RNA Targets realisiert. Damit konnte ein für online Messung optimiertes Testsystem mit hohen Signalstärken entwickelt werden. Basierend auf den Ergebnissen wurde der gesamte Signalpfad von Nukleinsärekonzentration bis zum digitalen Wert modelliert. Die aus der Entwicklung und den Experimenten gewonnen Erkenntnisse über die Kinetik und Thermodynamik von Hybridisierung sind in drei Publikationen zusammengefasst die das Rückgrat dieser Dissertation bilden. Die erste Publikation beschreibt die Verbesserung der Reproduzierbarkeit und Spezifizität von Microarray Ergebnissen durch online Messung von Kinetik und Thermodynamik gegenüber endpunktbasierten Messungen mit Standard Microarrays. Für die Auswertung der riesigen Datenmengen wurden zwei Algorithmen entwickelt, eine reaktionskinetische Modellierung der Isothermen und ein auf der Fermi-Dirac Statistik beruhende Beschreibung des Schmelzüberganges. Diese Algorithmen werden in der zweiten Publikation beschrieben. Durch die Realisierung von gleichen Sequenzen in den chemisch unterschiedlichen Nukleinsäuren (DNA, RNA und LNA) ist es möglich, definierte Unterschiede in der Konformation des Riboserings und der C5-Methylgruppe der Pyrimidine zu untersuchen. Die kompetitive Wechselwirkung dieser unterschiedlichen Nukleinsäuren gleicher Sequenz und die Auswirkungen auf Kinetik und Thermodynamik ist das Thema der dritten Publikation. Neben der molekularbiologischen und technologischen Entwicklung im Bereich der Sensorik von Hybridisierungsreaktionen oberflächengebundener Nukleinsäuremolekülen, der automatisierten Auswertung und Modellierung der anfallenden Datenmengen und der damit verbundenen besseren quantitativen Beschreibung von Kinetik und Thermodynamik dieser Reaktionen tragen die Ergebnisse zum besseren Verständnis der physikalisch-chemischen Struktur des elementarsten biologischen Moleküls und seiner nach wie vor nicht vollständig verstandenen Spezifizität bei.
Resumo:
In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.
Resumo:
Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.
Resumo:
Intelligent Transport Systems (ITS) consists in the application of ICT to transport to offer new and improved services to the mobility of people and freights. While using ITS, travellers produce large quantities of data that can be collected and analysed to study their behaviour and to provide information to decision makers and planners. The thesis proposes innovative deployments of classification algorithms for Intelligent Transport System with the aim to support the decisions on traffic rerouting, bus transport demand and behaviour of two wheelers vehicles. The first part of this work provides an overview and a classification of a selection of clustering algorithms that can be implemented for the analysis of ITS data. The first contribution of this thesis is an innovative use of the agglomerative hierarchical clustering algorithm to classify similar travels in terms of their origin and destination, together with the proposal for a methodology to analyse drivers’ route choice behaviour using GPS coordinates and optimal alternatives. The clusters of repetitive travels made by a sample of drivers are then analysed to compare observed route choices to the modelled alternatives. The results of the analysis show that drivers select routes that are more reliable but that are more expensive in terms of travel time. Successively, different types of users of a service that provides information on the real time arrivals of bus at stop are classified using Support Vector Machines. The results shows that the results of the classification of different types of bus transport users can be used to update or complement the census on bus transport flows. Finally, the problem of the classification of accidents made by two wheelers vehicles is presented together with possible future application of clustering methodologies aimed at identifying and classifying the different types of accidents.
Resumo:
Satellite image classification involves designing and developing efficient image classifiers. With satellite image data and image analysis methods multiplying rapidly, selecting the right mix of data sources and data analysis approaches has become critical to the generation of quality land-use maps. In this study, a new postprocessing information fusion algorithm for the extraction and representation of land-use information based on high-resolution satellite imagery is presented. This approach can produce land-use maps with sharp interregional boundaries and homogeneous regions. The proposed approach is conducted in five steps. First, a GIS layer - ATKIS data - was used to generate two coarse homogeneous regions, i.e. urban and rural areas. Second, a thematic (class) map was generated by use of a hybrid spectral classifier combining Gaussian Maximum Likelihood algorithm (GML) and ISODATA classifier. Third, a probabilistic relaxation algorithm was performed on the thematic map, resulting in a smoothed thematic map. Fourth, edge detection and edge thinning techniques were used to generate a contour map with pixel-width interclass boundaries. Fifth, the contour map was superimposed on the thematic map by use of a region-growing algorithm with the contour map and the smoothed thematic map as two constraints. For the operation of the proposed method, a software package is developed using programming language C. This software package comprises the GML algorithm, a probabilistic relaxation algorithm, TBL edge detector, an edge thresholding algorithm, a fast parallel thinning algorithm, and a region-growing information fusion algorithm. The county of Landau of the State Rheinland-Pfalz, Germany was selected as a test site. The high-resolution IRS-1C imagery was used as the principal input data.
Resumo:
Nowadays communication is switching from a centralized scenario, where communication media like newspapers, radio, TV programs produce information and people are just consumers, to a completely different decentralized scenario, where everyone is potentially an information producer through the use of social networks, blogs, forums that allow a real-time worldwide information exchange. These new instruments, as a result of their widespread diffusion, have started playing an important socio-economic role. They are the most used communication media and, as a consequence, they constitute the main source of information enterprises, political parties and other organizations can rely on. Analyzing data stored in servers all over the world is feasible by means of Text Mining techniques like Sentiment Analysis, which aims to extract opinions from huge amount of unstructured texts. This could lead to determine, for instance, the user satisfaction degree about products, services, politicians and so on. In this context, this dissertation presents new Document Sentiment Classification methods based on the mathematical theory of Markov Chains. All these approaches bank on a Markov Chain based model, which is language independent and whose killing features are simplicity and generality, which make it interesting with respect to previous sophisticated techniques. Every discussed technique has been tested in both Single-Domain and Cross-Domain Sentiment Classification areas, comparing performance with those of other two previous works. The performed analysis shows that some of the examined algorithms produce results comparable with the best methods in literature, with reference to both single-domain and cross-domain tasks, in $2$-classes (i.e. positive and negative) Document Sentiment Classification. However, there is still room for improvement, because this work also shows the way to walk in order to enhance performance, that is, a good novel feature selection process would be enough to outperform the state of the art. Furthermore, since some of the proposed approaches show promising results in $2$-classes Single-Domain Sentiment Classification, another future work will regard validating these results also in tasks with more than $2$ classes.
Resumo:
This thesis is aimed to assess similarities and mismatches between the outputs from two independent methods for the cloud cover quantification and classification based on quite different physical basis. One of them is the SAFNWC software package designed to process radiance data acquired by the SEVIRI sensor in the VIS/IR. The other is the MWCC algorithm, which uses the brightness temperatures acquired by the AMSU-B and MHS sensors in their channels centered in the MW water vapour absorption band. At a first stage their cloud detection capability has been tested, by comparing the Cloud Masks they produced. These showed a good agreement between two methods, although some critical situations stand out. The MWCC, in effect, fails to reveal clouds which according to SAFNWC are fractional, cirrus, very low and high opaque clouds. In the second stage of the inter-comparison the pixels classified as cloudy according to both softwares have been. The overall observed tendency of the MWCC method, is an overestimation of the lower cloud classes. Viceversa, the more the cloud top height grows up, the more the MWCC not reveal a certain cloud portion, rather detected by means of the SAFNWC tool. This is what also emerges from a series of tests carried out by using the cloud top height information in order to evaluate the height ranges in which each MWCC category is defined. Therefore, although the involved methods intend to provide the same kind of information, in reality they return quite different details on the same atmospheric column. The SAFNWC retrieval being very sensitive to the top temperature of a cloud, brings the actual level reached by this. The MWCC, by exploiting the capability of the microwaves, is able to give an information about the levels that are located more deeply within the atmospheric column.
Resumo:
PURPOSE: The aim of the present study was to assess the oral mucosal health status of young male adults (aged 18 to 24 years) in Switzerland and to correlate their clinical findings with self-reported risk factors such as tobacco use and alcohol consumption. MATERIALS AND METHODS: Data on the oral health status of 615 Swiss Army recruits were collected using a standardised self-reported questionnaire, followed by an intraoral examination. Positive clinical findings were classified as (1) common conditions and anatomical variants, (2) reactive lesions, (3) benign tumour lesions and (4) premalignant lesions. The main locations of the oral mucosal findings were recorded on a topographical classification chart. Using correlational statistics, the findings were further associated with the known risk factors such as tobacco use and alcohol consumption. RESULTS: A total of 468 findings were diagnosed in 327 (53.17%) of the 615 subjects. In total, 445 findings (95.09%) were classified as common conditions, anatomical variants and reactive soft-tissue lesions. In the group of reactive soft-tissue lesions, there was a significantly higher percentage of smokers (P < 0.001) and subjects with a combination of smoking and alcohol consumption (P < 0.001). Eight lesions were clinically diagnosed as oral leukoplakias associated with smokeless tobacco. The prevalence of precursor lesions in the population examined was over 1%. CONCLUSIONS: Among young male adults in Switzerland, a significant number of oral mucosal lesions can be identified, which strongly correlate with tobacco use. To improve primary and secondary prevention, young adults should therefore be informed more extensively about the negative effects of tobacco use on oral health.
Resumo:
We present an update on clinical evaluation, staging, classification and treatment of canal cholesteatoma, including a meta-analysis of clinical data of the last 30 years.
Resumo:
We conducted a qualitative, multicenter study using a focus group design to explore the lived experiences of persons with any kind of primary sleep disorder with regard to functioning and contextual factors using six open-ended questions related to the International Classification of Functioning, Disability and Health (ICF) components. We classified the results using the ICF as a frame of reference. We identified the meaningful concepts within the transcribed data and then linked them to ICF categories according to established linking rules. The six focus groups with 27 participants yielded a total of 6986 relevant concepts, which were linked to a total of 168 different second-level ICF categories. From the patient perspective, the ICF components: (1) Body Functions; (2) Activities & Participation; and (3) Environmental Factors were equally represented; while (4) Body Structures appeared poignantly less frequently. Out of the total number of concepts, 1843 concepts (26%) were assigned to the ICF component Personal Factors, which is not yet classified but could indicate important aspects of resource management and strategy development of those who have a sleep disorder. Therefore, treatment of patients with sleep disorders must not be limited to anatomical and (patho-)physiological changes, but should also consider a more comprehensive view that includes patient's demands, strategies and resources in daily life and the contextual circumstances surrounding the individual.
Resumo:
We conducted an explorative, cross-sectional, multi-centre study in order to identify the most common problems of people with any kind of (primary) sleep disorder in a clinical setting using the International Classification of Functioning, Disability and Health (ICF) as a frame of reference. Data were collected from patients using a structured face-to-face interview of 45-60 min duration. A case record form for health professionals containing the extended ICF Checklist, sociodemographic variables and disease-specific variables was used. The study centres collected data of 99 individuals with sleep disorders. The identified categories include 48 (32%) for body functions, 13 (9%) body structures, 55 (37%) activities and participation and 32 (22%) for environmental factors. 'Sleep functions' (100%) and 'energy and drive functions', respectively, (85%) were the most severely impaired second-level categories of body functions followed by 'attention functions' (78%) and 'temperament and personality functions' (77%). With regard to the component activities and participation, patients felt most restricted in the categories of 'watching' (e.g. TV) (82%), 'recreation and leisure' (75%) and 'carrying out daily routine' (74%). Within the component environmental factors the categories 'support of immediate family', 'health services, systems and policies' and 'products or substances for personal consumption [medication]' were the most important facilitators; 'time-related changes', 'light' and 'climate' were the most important barriers. The study identified a large variety of functional problems reflecting the complexity of sleep disorders. The ICF has the potential to provide a comprehensive framework for the description of functional health in individuals with sleep disorders in a clinical setting.