898 resultados para multi-class classification
Resumo:
In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.
Resumo:
In this report it was designed an innovative satellite-based monitoring approach applied on the Iraqi Marshlands to survey the extent and distribution of marshland re-flooding and assess the development of wetland vegetation cover. The study, conducted in collaboration with MEEO Srl , makes use of images collected from the sensor (A)ATSR onboard ESA ENVISAT Satellite to collect data at multi-temporal scales and an analysis was adopted to observe the evolution of marshland re-flooding. The methodology uses a multi-temporal pixel-based approach based on classification maps produced by the classification tool SOIL MAPPER ®. The catalogue of the classification maps is available as web service through the Service Support Environment Portal (SSE, supported by ESA). The inundation of the Iraqi marshlands, which has been continuous since April 2003, is characterized by a high degree of variability, ad-hoc interventions and uncertainty. Given the security constraints and vastness of the Iraqi marshlands, as well as cost-effectiveness considerations, satellite remote sensing was the only viable tool to observe the changes taking place on a continuous basis. The proposed system (ALCS – AATSR LAND CLASSIFICATION SYSTEM) avoids the direct use of the (A)ATSR images and foresees the application of LULCC evolution models directly to „stock‟ of classified maps. This approach is made possible by the availability of a 13 year classified image database, conceived and implemented in the CARD project (http://earth.esa.int/rtd/Projects/#CARD).The approach here presented evolves toward an innovative, efficient and fast method to exploit the potentiality of multi-temporal LULCC analysis of (A)ATSR images. The two main objectives of this work are both linked to a sort of assessment: the first is to assessing the ability of modeling with the web-application ALCS using image-based AATSR classified with SOIL MAPPER ® and the second is to evaluate the magnitude, the character and the extension of wetland rehabilitation.
Resumo:
Satellite image classification involves designing and developing efficient image classifiers. With satellite image data and image analysis methods multiplying rapidly, selecting the right mix of data sources and data analysis approaches has become critical to the generation of quality land-use maps. In this study, a new postprocessing information fusion algorithm for the extraction and representation of land-use information based on high-resolution satellite imagery is presented. This approach can produce land-use maps with sharp interregional boundaries and homogeneous regions. The proposed approach is conducted in five steps. First, a GIS layer - ATKIS data - was used to generate two coarse homogeneous regions, i.e. urban and rural areas. Second, a thematic (class) map was generated by use of a hybrid spectral classifier combining Gaussian Maximum Likelihood algorithm (GML) and ISODATA classifier. Third, a probabilistic relaxation algorithm was performed on the thematic map, resulting in a smoothed thematic map. Fourth, edge detection and edge thinning techniques were used to generate a contour map with pixel-width interclass boundaries. Fifth, the contour map was superimposed on the thematic map by use of a region-growing algorithm with the contour map and the smoothed thematic map as two constraints. For the operation of the proposed method, a software package is developed using programming language C. This software package comprises the GML algorithm, a probabilistic relaxation algorithm, TBL edge detector, an edge thresholding algorithm, a fast parallel thinning algorithm, and a region-growing information fusion algorithm. The county of Landau of the State Rheinland-Pfalz, Germany was selected as a test site. The high-resolution IRS-1C imagery was used as the principal input data.
Resumo:
Nowadays communication is switching from a centralized scenario, where communication media like newspapers, radio, TV programs produce information and people are just consumers, to a completely different decentralized scenario, where everyone is potentially an information producer through the use of social networks, blogs, forums that allow a real-time worldwide information exchange. These new instruments, as a result of their widespread diffusion, have started playing an important socio-economic role. They are the most used communication media and, as a consequence, they constitute the main source of information enterprises, political parties and other organizations can rely on. Analyzing data stored in servers all over the world is feasible by means of Text Mining techniques like Sentiment Analysis, which aims to extract opinions from huge amount of unstructured texts. This could lead to determine, for instance, the user satisfaction degree about products, services, politicians and so on. In this context, this dissertation presents new Document Sentiment Classification methods based on the mathematical theory of Markov Chains. All these approaches bank on a Markov Chain based model, which is language independent and whose killing features are simplicity and generality, which make it interesting with respect to previous sophisticated techniques. Every discussed technique has been tested in both Single-Domain and Cross-Domain Sentiment Classification areas, comparing performance with those of other two previous works. The performed analysis shows that some of the examined algorithms produce results comparable with the best methods in literature, with reference to both single-domain and cross-domain tasks, in $2$-classes (i.e. positive and negative) Document Sentiment Classification. However, there is still room for improvement, because this work also shows the way to walk in order to enhance performance, that is, a good novel feature selection process would be enough to outperform the state of the art. Furthermore, since some of the proposed approaches show promising results in $2$-classes Single-Domain Sentiment Classification, another future work will regard validating these results also in tasks with more than $2$ classes.
Resumo:
It has been suggested that there are several distinct phenotypes of childhood asthma or childhood wheezing. Here, we review the research relating to these phenotypes, with a focus on the methods used to define and validate them. Childhood wheezing disorders manifest themselves in a range of observable (phenotypic) features such as lung function, bronchial responsiveness, atopy and a highly variable time course (prognosis). The underlying causes are not sufficiently understood to define disease entities based on aetiology. Nevertheless, there is a need for a classification that would (i) facilitate research into aetiology and pathophysiology, (ii) allow targeted treatment and preventive measures and (iii) improve the prediction of long-term outcome. Classical attempts to define phenotypes have been one-dimensional, relying on few or single features such as triggers (exclusive viral wheeze vs. multiple trigger wheeze) or time course (early transient wheeze, persistent and late onset wheeze). These definitions are simple but essentially subjective. Recently, a multi-dimensional approach has been adopted. This approach is based on a wide range of features and relies on multivariate methods such as cluster or latent class analysis. Phenotypes identified in this manner are more complex but arguably more objective. Although phenotypes have an undisputed standing in current research on childhood asthma and wheezing, there is confusion about the meaning of the term 'phenotype' causing much circular debate. If phenotypes are meant to represent 'real' underlying disease entities rather than superficial features, there is a need for validation and harmonization of definitions. The multi-dimensional approach allows validation by replication across different populations and may contribute to a more reliable classification of childhood wheezing disorders and to improved precision of research relying on phenotype recognition, particularly in genetics. Ultimately, the underlying pathophysiology and aetiology will need to be understood to properly characterize the diseases causing recurrent wheeze in children.
Resumo:
We conducted an explorative, cross-sectional, multi-centre study in order to identify the most common problems of people with any kind of (primary) sleep disorder in a clinical setting using the International Classification of Functioning, Disability and Health (ICF) as a frame of reference. Data were collected from patients using a structured face-to-face interview of 45-60 min duration. A case record form for health professionals containing the extended ICF Checklist, sociodemographic variables and disease-specific variables was used. The study centres collected data of 99 individuals with sleep disorders. The identified categories include 48 (32%) for body functions, 13 (9%) body structures, 55 (37%) activities and participation and 32 (22%) for environmental factors. 'Sleep functions' (100%) and 'energy and drive functions', respectively, (85%) were the most severely impaired second-level categories of body functions followed by 'attention functions' (78%) and 'temperament and personality functions' (77%). With regard to the component activities and participation, patients felt most restricted in the categories of 'watching' (e.g. TV) (82%), 'recreation and leisure' (75%) and 'carrying out daily routine' (74%). Within the component environmental factors the categories 'support of immediate family', 'health services, systems and policies' and 'products or substances for personal consumption [medication]' were the most important facilitators; 'time-related changes', 'light' and 'climate' were the most important barriers. The study identified a large variety of functional problems reflecting the complexity of sleep disorders. The ICF has the potential to provide a comprehensive framework for the description of functional health in individuals with sleep disorders in a clinical setting.
Resumo:
Three fundamental types of suppressor additives for copper electroplating could be identified by means of potential Transient measurements. These suppressor additives differ in their synergistic and antagonistic interplay with anions that are chemisorbed on the metallic copper surface during electrodeposition. In addition these suppressor chemistries reveal different barrier properties with respect to cupric ions and plating additives (Cl, SPS). While the type-I suppressor selectively forms efficient barriers for copper inter-diffusion on chloride-terminated electrode surfaces we identified a type-II suppressor that interacts non-selectively with any kind of anions chemisorbed on copper (chloride, sulfate, sulfonate). Type-I suppressors are vital for the superconformal copper growth mode in Damascene processing and show an antagonistic interaction with SPS (Bis-Sodium-Sulfopropyl-Disulfide) which involves the deactivation of this suppressor chemistry. This suppressor deactivation is rationalized in terms of compositional changes in the layer of the chemisorbed anions due to the competition of chloride and MPS (Mercaptopropane Sulfonic Acid) for adsorption sites on the metallic copper surface. MPS is the product of the dissociative SPS adsorption within the preexisting chloride matrix on the copper surface. The non-selectivity in the adsorption behavior of the type-II suppressor is rationalized in terms of anion/cation pairing effects of the poly-cationic suppressor and the anion-modified copper substrate. Atomic-scale insights into the competitive Cl/MPS adsorption are gained from in situ STM (Scanning Tunneling Microscopy) using single crystalline copper surfaces as model substrates. Type-III suppressors are a third class of suppressors. In case of type-land type-II suppressor chemistries the resulting steady-state deposition conditions are completely independent on the particular succession of additive adsorption. In contrast to that a strong dependence of the suppressing capabilities on the sequence of additive adsorption ("first comes, first serves" principle) is observed for the type-IIIsuppressor. This behavior:is explained by a suppressor barrier that impedes not only the copper inter-diffusion but also the transport of other additives (e.g. SPS) to the copper surface. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
In 1998-2001 Finland suffered the most severe insect outbreak ever recorded, over 500,000 hectares. The outbreak was caused by the common pine sawfly (Diprion pini L.). The outbreak has continued in the study area, Palokangas, ever since. To find a good method to monitor this type of outbreaks, the purpose of this study was to examine the efficacy of multi-temporal ERS-2 and ENVISAT SAR imagery for estimating Scots pine (Pinus sylvestris L.) defoliation. Three methods were tested: unsupervised k-means clustering, supervised linear discriminant analysis (LDA) and logistic regression. In addition, I assessed if harvested areas could be differentiated from the defoliated forest using the same methods. Two different speckle filters were used to determine the effect of filtering on the SAR imagery and subsequent results. The logistic regression performed best, producing a classification accuracy of 81.6% (kappa 0.62) with two classes (no defoliation, >20% defoliation). LDA accuracy was with two classes at best 77.7% (kappa 0.54) and k-means 72.8 (0.46). In general, the largest speckle filter, 5 x 5 image window, performed best. When additional classes were added the accuracy was usually degraded on a step-by-step basis. The results were good, but because of the restrictions in the study they should be confirmed with independent data, before full conclusions can be made that results are reliable. The restrictions include the small size field data and, thus, the problems with accuracy assessment (no separate testing data) as well as the lack of meteorological data from the imaging dates.
Resumo:
OBJECTIVE: The aim of this study was to estimate intra- and post-operative risk using the American Society of Anaesthesiologists (ASA) classification which is an important predictor of an intervention and of the entire operating programme. STUDY DESIGN: In this retrospective study, 4435 consecutive patients undergoing elective and emergency surgery at the Gynaecological Clinic of the University Hospital of Zurich were included. The ASA classification for pre-operative risk assessment was determined by an anaesthesiologist after a thorough physical examination. We observed several pre-, intra- and post-operative parameters, such as age, body-mass-index, duration of anaesthesia, duration of surgery, blood loss, duration of post-operative stay, complicated post-operative course, morbidity and mortality. The investigation of different risk factors was achieved by a multiple linear regression model for log-transformed duration of hospitalisation. RESULTS: Age and obesity were responsible for a higher ASA classification. ASA grade correlates with the duration of anaesthesia and the duration of the surgery itself. There was a significant difference in blood loss between ASA grades I (113+/-195 ml) and III (222+/-470 ml) and between classes II (176+/-432 ml) and III. The duration of post-operative hospitalisation could also be correlated with ASA class. ASA class I=1.7+/-3.0 days, ASA class II=3.6+/-4.3 days, ASA class III=6.8+/-8.2 days, and ASA class IV=6.2+/-3.9 days. The mean post-operative in-hospital stay was 2.5+/-4.0 days without complications, and 8.7+/-6.7 days with post-operative complications. Multiple linear regression model showed that not only the ASA classification contained an important information for the duration of hospitalisation. Parameters such as age, class of diagnosis, post-operative complications, etc. also have an influence on the duration of hospitalisation. CONCLUSION: This study shows that the ASA classification can be used as a good and early available predictor for the planning of an intervention in gynaecological surgery. The ASA classification helps the surgeon to assess the peri-operative risk profile of which important information can be derived for the planning of the operation programme.
Resumo:
Audio-visual documents obtained from German TV news are classified according to the IPTC topic categorization scheme. To this end usual text classification techniques are adapted to speech, video, and non-speech audio. For each of the three modalities word analogues are generated: sequences of syllables for speech, “video words” based on low level color features (color moments, color correlogram and color wavelet), and “audio words” based on low-level spectral features (spectral envelope and spectral flatness) for non-speech audio. Such audio and video words provide a means to represent the different modalities in a uniform way. The frequencies of the word analogues represent audio-visual documents: the standard bag-of-words approach. Support vector machines are used for supervised classification in a 1 vs. n setting. Classification based on speech outperforms all other single modalities. Combining speech with non-speech audio improves classification. Classification is further improved by supplementing speech and non-speech audio with video words. Optimal F-scores range between 62% and 94% corresponding to 50% - 84% above chance. The optimal combination of modalities depends on the category to be recognized. The construction of audio and video words from low-level features provide a good basis for the integration of speech, non-speech audio and video.
Resumo:
Many applications, such as telepresence, virtual reality, and interactive walkthroughs, require a three-dimensional(3D)model of real-world environments. Methods, such as lightfields, geometric reconstruction and computer vision use cameras to acquire visual samples of the environment and construct a model. Unfortunately, obtaining models of real-world locations is a challenging task. In particular, important environments are often actively in use, containing moving objects, such as people entering and leaving the scene. The methods previously listed have difficulty in capturing the color and structure of the environment while in the presence of moving and temporary occluders. We describe a class of cameras called lag cameras. The main concept is to generalize a camera to take samples over space and time. Such a camera, can easily and interactively detect moving objects while continuously moving through the environment. Moreover, since both the lag camera and occluder are moving, the scene behind the occluder is captured by the lag camera even from viewpoints where the occluder lies in between the lag camera and the hidden scene. We demonstrate an implementation of a lag camera, complete with analysis and captured environments.
Resumo:
In this paper, we investigate how a multilinear model can be used to represent human motion data. Based on technical modes (referring to degrees of freedom and number of frames) and natural modes that typically appear in the context of a motion capture session (referring to actor, style, and repetition), the motion data is encoded in form of a high-order tensor. This tensor is then reduced by using N-mode singular value decomposition. Our experiments show that the reduced model approximates the original motion better then previously introduced PCA-based approaches. Furthermore, we discuss how the tensor representation may be used as a valuable tool for the synthesis of new motions.
Resumo:
The grasping of virtual objects has been an active research field for several years. Solutions providing realistic grasping rely on special hardware or require time-consuming parameterizations. Therefore, we introduce a flexible grasping algorithm enabling grasping without computational complex physics. Objects can be grasped and manipulated with multiple fingers. In addition, multiple objects can be manipulated simultaneously with our approach. Through the usage of contact sensors the technique is easily configurable and versatile enough to be used in different scenarios.
Resumo:
Having to carry input devices can be inconvenient when interacting with wall-sized, high-resolution tiled displays. Such displays are typically driven by a cluster of computers. Running existing games on a cluster is non-trivial, and the performance attained using software solutions like Chromium is not good enough. This paper presents a touch-free, multi-user, humancomputer interface for wall-sized displays that enables completely device-free interaction. The interface is built using 16 cameras and a cluster of computers, and is integrated with the games Quake 3 Arena (Q3A) and Homeworld. The two games were parallelized using two different approaches in order to run on a 7x4 tile, 21 megapixel display wall with good performance. The touch-free interface enables interaction with a latency of 116 ms, where 81 ms are due to the camera hardware. The rendering performance of the games is compared to their sequential counterparts running on the display wall using Chromium. Parallel Q3A’s framerate is an order of magnitude higher compared to using Chromium. The parallel version of Homeworld performed on par with the sequential, which did not run at all using Chromium. Informal use of the touch-free interface indicates that it works better for controlling Q3A than Homeworld.
Resumo:
This paper presents different application scenarios for which the registration of sub-sequence reconstructions or multi-camera reconstructions is essential for successful camera motion estimation and 3D reconstruction from video. The registration is achieved by merging unconnected feature point tracks between the reconstructions. One application is drift removal for sequential camera motion estimation of long sequences. The state-of-the-art in drift removal is to apply a RANSAC approach to find unconnected feature point tracks. In this paper an alternative spectral algorithm for pairwise matching of unconnected feature point tracks is used. It is then shown that the algorithms can be combined and applied to novel scenarios where independent camera motion estimations must be registered into a common global coordinate system. In the first scenario multiple moving cameras, which capture the same scene simultaneously, are registered. A second new scenario occurs in situations where the tracking of feature points during sequential camera motion estimation fails completely, e.g., due to large occluding objects in the foreground, and the unconnected tracks of the independent reconstructions must be merged. In the third scenario image sequences of the same scene, which are captured under different illuminations, are registered. Several experiments with challenging real video sequences demonstrate that the presented techniques work in practice.