899 resultados para binary to multi-class classifiers
Resumo:
Questionnaire data may contain missing values because certain questions do not apply to all respondents. For instance, questions addressing particular attributes of a symptom, such as frequency, triggers or seasonality, are only applicable to those who have experienced the symptom, while for those who have not, responses to these items will be missing. This missing information does not fall into the category 'missing by design', rather the features of interest do not exist and cannot be measured regardless of survey design. Analysis of responses to such conditional items is therefore typically restricted to the subpopulation in which they apply. This article is concerned with joint multivariate modelling of responses to both unconditional and conditional items without restricting the analysis to this subpopulation. Such an approach is of interest when the distributions of both types of responses are thought to be determined by common parameters affecting the whole population. By integrating the conditional item structure into the model, inference can be based both on unconditional data from the entire population and on conditional data from subjects for whom they exist. This approach opens new possibilities for multivariate analysis of such data. We apply this approach to latent class modelling and provide an example using data on respiratory symptoms (wheeze and cough) in children. Conditional data structures such as that considered here are common in medical research settings and, although our focus is on latent class models, the approach can be applied to other multivariate models.
Resumo:
Cable-stayed bridges represent nowadays key points in transport networks and their seismic behavior needs to be fully understood, even beyond the elastic range of materials. Both nonlinear dynamic (NL-RHA) and static (pushover) procedures are currently available to face this challenge, each with intrinsic advantages and disadvantages, and their applicability in the study of the nonlinear seismic behavior of cable-stayed bridges is discussed here. The seismic response of a large number of finite element models with different span lengths, tower shapes and class of foundation soil is obtained with different procedures and compared. Several features of the original Modal Pushover Analysis (MPA) are modified in light of cable-stayed bridge characteristics, furthermore, an extension of MPA and a new coupled pushover analysis (CNSP) are suggested to estimate the complex inelastic response of such outstanding structures subjected to multi-axial strong ground motions.
Resumo:
This paper addresses the question of maximizing classifier accuracy for classifying task-related mental activity from Magnetoencelophalography (MEG) data. We propose the use of different sources of information and introduce an automatic channel selection procedure. To determine an informative set of channels, our approach combines a variety of machine learning algorithms: feature subset selection methods, classifiers based on regularized logistic regression, information fusion, and multiobjective optimization based on probabilistic modeling of the search space. The experimental results show that our proposal is able to improve classification accuracy compared to approaches whose classifiers use only one type of MEG information or for which the set of channels is fixed a priori.
Resumo:
The multi-dimensional classification problem is a generalisation of the recently-popularised task of multi-label classification, where each data instance is associated with multiple class variables. There has been relatively little research carried out specific to multi-dimensional classification and, although one of the core goals is similar (modelling dependencies among classes), there are important differences; namely a higher number of possible classifications. In this paper we present method for multi-dimensional classification, drawing from the most relevant multi-label research, and combining it with important novel developments. Using a fast method to model the conditional dependence between class variables, we form super-class partitions and use them to build multi-dimensional learners, learning each super-class as an ordinary class, and thus explicitly modelling class dependencies. Additionally, we present a mechanism to deal with the many class values inherent to super-classes, and thus make learning efficient. To investigate the effectiveness of this approach we carry out an empirical evaluation on a range of multi-dimensional datasets, under different evaluation metrics, and in comparison with high-performing existing multi-dimensional approaches from the literature. Analysis of results shows that our approach offers important performance gains over competing methods, while also exhibiting tractable running time.
Resumo:
The binding of killer cell Ig-like Receptors (KIR) to their Class I MHC ligands was shown previously to be characterized by extremely rapid association and dissociation rate constants. During experiments to investigate the biochemistry of receptor–ligand binding in more detail, the kinetic parameters of the interaction were observed to alter dramatically in the presence of Zn2+ but not other divalent cations. The basis of this phenomenon is Zn2+-induced multimerization of the KIR molecules as demonstrated by BIAcore, analytical ultracentrifugation, and chemical cross-linking experiments. Zn2+-dependent multimerization of KIR may be critical for formation of the clusters of KIR and HLA-C molecules, the “natural killer (NK) cell immune synapse,” observed at the site of contact between the NK cell and target cell.
Resumo:
Proteasomes are involved in the proteolytic generation of major histocompatibility complex (MHC) class I epitopes but their exact role has not been elucidated. We used highly purified murine 20S proteasomes for digestion of synthetic 22-mer and 41/44-mer ovalbumin partial sequences encompassing either an immunodominant or a marginally immunogenic epitope. At various times, digests were analyzed by pool sequencing and by semiquantitative electrospray ionization mass spectrometry. Most dual cleavage fragments derived from 22-mer peptides were 7-10 amino acids long, with octa- and nonamers predominating. Digestion of 41/44-mer peptides initially revealed major cleavage sites spaced by two size ranges, 8 or 9 amino acids and 14 or 15 amino acids, followed by further degradation of the latter as well as of larger single cleavage fragments. The final size distribution was slightly broader than that of fragments derived from 22-mer peptides. The majority of peptide bonds were cleaved, albeit with vastly different efficiencies. This resulted in multiple overlapping proteolytic fragments including a limited number of abundant peptides. The immunodominant epitope was generated abundantly whereas only small amounts of the marginally immunogenic epitope were detected. The frequency distributions of amino acids flanking proteasomal cleavage sites are correlated to that reported for corresponding positions of MHC class I binding peptides. The results suggest that proteasomal degradation products may include fragments with structural properties similar to MHC class I binding peptides. Proteasomes may thus be involved in the final stages of proteolytic epitope generation, often without the need for downstream proteolytic events.
Resumo:
This thesis presents a thorough and principled investigation into the application of artificial neural networks to the biological monitoring of freshwater. It contains original ideas on the classification and interpretation of benthic macroinvertebrates, and aims to demonstrate their superiority over the biotic systems currently used in the UK to report river water quality. The conceptual basis of a new biological classification system is described, and a full review and analysis of a number of river data sets is presented. The biological classification is compared to the common biotic systems using data from the Upper Trent catchment. This data contained 292 expertly classified invertebrate samples identified to mixed taxonomic levels. The neural network experimental work concentrates on the classification of the invertebrate samples into biological class, where only a subset of the sample is used to form the classification. Other experimentation is conducted into the identification of novel input samples, the classification of samples from different biotopes and the use of prior information in the neural network models. The biological classification is shown to provide an intuitive interpretation of a graphical representation, generated without reference to the class labels, of the Upper Trent data. The selection of key indicator taxa is considered using three different approaches; one novel, one from information theory and one from classical statistical methods. Good indicators of quality class based on these analyses are found to be in good agreement with those chosen by a domain expert. The change in information associated with different levels of identification and enumeration of taxa is quantified. The feasibility of using neural network classifiers and predictors to develop numeric criteria for the biological assessment of sediment contamination in the Great Lakes is also investigated.
Resumo:
Binary distributed representations of vector data (numerical, textual, visual) are investigated in classification tasks. A comparative analysis of results for various methods and tasks using artificial and real-world data is given.
Resumo:
Resource discovery is one of the key services in digitised cultural heritage collections. It requires intelligent mining in heterogeneous digital content as well as capabilities in large scale performance; this explains the recent advances in classification methods. Associative classifiers are convenient data mining tools used in the field of cultural heritage, by applying their possibilities to taking into account the specific combinations of the attribute values. Usually, the associative classifiers prioritize the support over the confidence. The proposed classifier PGN questions this common approach and focuses on confidence first by retaining only 100% confidence rules. The classification tasks in the field of cultural heritage usually deal with data sets with many class labels. This variety is caused by the richness of accumulated culture during the centuries. Comparisons of classifier PGN with other classifiers, such as OneR, JRip and J48, show the competitiveness of PGN in recognizing multi-class datasets on collections of masterpieces from different West and East European Fine Art authors and movements.
Resumo:
2010 Mathematics Subject Classification: Primary 35S05; Secondary 35A17.
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^
Resumo:
The objective of this study was to develop a GIS-based multi-class index overlay model to determine areas susceptible to inland flooding during extreme precipitation events in Broward County, Florida. Data layers used in the method include Airborne Laser Terrain Mapper (ALTM) elevation data, excess precipitation depth determined through performing a Soil Conservation Service (SCS) Curve Number (CN) analysis, and the slope of the terrain. The method includes a calibration procedure that uses "weights and scores" criteria obtained from Hurricane Irene (1999) records, a reported 100-year precipitation event, Doppler radar data and documented flooding locations. Results are displayed in maps of Eastern Broward County depicting types of flooding scenarios for a 100-year, 24-hour storm based on the soil saturation conditions. As expected the results of the multi-class index overlay analysis showed that an increase for the potential of inland flooding could be expected when a higher antecedent moisture condition is experienced. The proposed method proves to have some potential as a predictive tool for flooding susceptibility based on a relatively simple approach.
Resumo:
In this paper, the problem of semantic place categorization in mobile robotics is addressed by considering a time-based probabilistic approach called dynamic Bayesian mixture model (DBMM), which is an improved variation of the dynamic Bayesian network. More specifically, multi-class semantic classification is performed by a DBMM composed of a mixture of heterogeneous base classifiers, using geometrical features computed from 2D laserscanner data, where the sensor is mounted on-board a moving robot operating indoors. Besides its capability to combine different probabilistic classifiers, the DBMM approach also incorporates time-based (dynamic) inferences in the form of previous class-conditional probabilities and priors. Extensive experiments were carried out on publicly available benchmark datasets, highlighting the influence of the number of time-slices and the effect of additive smoothing on the classification performance of the proposed approach. Reported results, under different scenarios and conditions, show the effectiveness and competitive performance of the DBMM.
Resumo:
In this Thesis a series of numerical models for the evaluation of the seasonal performance of reversible air-to-water heat pump systems coupled to residential and non-residential buildings are presented. The exploitation of the energy saving potential linked to the adoption of heat pumps is a hard task for designers due to the influence on their energy performance of several factors, like the external climate variability, the heat pump modulation capacity, the system control strategy and the hydronic loop configuration. The aim of this work is to study in detail all these aspects. In the first part of this Thesis a series of models which use a temperature class approach for the prediction of the seasonal performance of reversible air source heat pumps are shown. An innovative methodology for the calculation of the seasonal performance of an air-to-water heat pump has been proposed as an extension of the procedure reported by the European standard EN 14825. This methodology can be applied not only to air-to-water single-stage heat pumps (On-off HPs) but also to multi-stage (MSHPs) and inverter-driven units (IDHPs). In the second part, dynamic simulation has been used with the aim to optimize the control systems of the heat pump and of the HVAC plant. A series of dynamic models, developed by means of TRNSYS, are presented to study the behavior of On-off HPs, MSHPs and IDHPs. The main goal of these dynamic simulations is to show the influence of the heat pump control strategies and of the lay-out of the hydronic loop used to couple the heat pump to the emitters on the seasonal performance of the system. A particular focus is given to the modeling of the energy losses linked to on-off cycling.
Resumo:
In the Hammersley-Aldous-Diaconis process, infinitely many particles sit in R and at most one particle is allowed at each position. A particle at x, whose nearest neighbor to the right is at y, jumps at rate y - x to a position uniformly distributed in the interval (x, y). The basic coupling between trajectories with different initial configuration induces a process with different classes of particles. We show that the invariant measures for the two-class process can be obtained as follows. First, a stationary M/M/1 queue is constructed as a function of two homogeneous Poisson processes, the arrivals with rate, and the (attempted) services with rate rho > lambda Then put first class particles at the instants of departures (effective services) and second class particles at the instants of unused services. The procedure is generalized for the n-class case by using n - 1 queues in tandem with n - 1 priority types of customers. A multi-line process is introduced; it consists of a coupling (different from Liggett's basic coupling), having as invariant measure the product of Poisson processes. The definition of the multi-line process involves the dual points of the space-time Poisson process used in the graphical construction of the reversed process. The coupled process is a transformation of the multi-line process and its invariant measure is the transformation described above of the product measure.