33 resultados para Gender classification model
Resumo:
Automatic gender classification has many security and commercial applications. Various modalities have been investigated for gender classification with face-based classification being the most popular. In some real-world scenarios the face may be partially occluded. In these circumstances a classification based on individual parts of the face known as local features must be adopted. We investigate gender classification using lip movements. We show for the first time that important gender specific information can be obtained from the way in which a person moves their lips during speech. Furthermore our study indicates that the lip dynamics during speech provide greater gender discriminative information than simply lip appearance. We also show that the lip dynamics and appearance contain complementary gender information such that a model which captures both traits gives the highest overall classification result. We use Discrete Cosine Transform based features and Gaussian Mixture Modelling to model lip appearance and dynamics and employ the XM2VTS database for our experiments. Our experiments show that a model which captures lip dynamics along with appearance can improve gender classification rates by between 16-21% compared to models of only lip appearance.
Resumo:
The diagnosis of myelodysplastic syndrome (MDS) currently relies primarily on the morphologic assessment of the patient's bone marrow and peripheral blood cells. Moreover, prognostic scoring systems rely on observer-dependent assessments of blast percentage and dysplasia. Gene expression profiling could enhance current diagnostic and prognostic systems by providing a set of standardized, objective gene signatures. Within the Microarray Innovations in LEukemia study, a diagnostic classification model was investigated to distinguish the distinct subclasses of pediatric and adult leukemia, as well as MDS. Overall, the accuracy of the diagnostic classification model for subtyping leukemia was approximately 93%, but this was not reflected for the MDS samples giving only approximately 50% accuracy. Discordant samples of MDS were classified either into acute myeloid leukemia (AML) or
Resumo:
Mobile malware has continued to grow at an alarming rate despite on-going mitigation efforts. This has been much more prevalent on Android due to being an open platform that is rapidly overtaking other competing platforms in the mobile smart devices market. Recently, a new generation of Android malware families has emerged with advanced evasion capabilities which make them much more difficult to detect using conventional methods. This paper proposes and investigates a parallel machine learning based classification approach for early detection of Android malware. Using real malware samples and benign applications, a composite classification model is developed from parallel combination of heterogeneous classifiers. The empirical evaluation of the model under different combination schemes demonstrates its efficacy and potential to improve detection accuracy. More importantly, by utilizing several classifiers with diverse characteristics, their strengths can be harnessed not only for enhanced Android malware detection but also quicker white box analysis by means of the more interpretable constituent classifiers.
Resumo:
Artificial neural network (ANN) methods are used to predict forest characteristics. The data source is the Southeast Alaska (SEAK) Grid Inventory, a ground survey compiled by the USDA Forest Service at several thousand sites. The main objective of this article is to predict characteristics at unsurveyed locations between grid sites. A secondary objective is to evaluate the relative performance of different ANNs. Data from the grid sites are used to train six ANNs: multilayer perceptron, fuzzy ARTMAP, probabilistic, generalized regression, radial basis function, and learning vector quantization. A classification and regression tree method is used for comparison. Topographic variables are used to construct models: latitude and longitude coordinates, elevation, slope, and aspect. The models classify three forest characteristics: crown closure, species land cover, and tree size/structure. Models are constructed using n-fold cross-validation. Predictive accuracy is calculated using a method that accounts for the influence of misclassification as well as measuring correct classifications. The probabilistic and generalized regression networks are found to be the most accurate. The predictions of the ANN models are compared with a classification of the Tongass national forest in southeast Alaska based on the interpretation of satellite imagery and are found to be of similar accuracy.
Resumo:
The application of chemometrics in food science has revolutionized the field by allowing the creation of models able to automate a broad range of applications such as food authenticity and food fraud detection. In order to create effective and general models able to address the complexity of real life problems, a vast amount of varied training samples are required. Training dataset has to cover all possible types of sample and instrument variability. However, acquiring a varied amount of samples is a time consuming and costly process, in which collecting samples representative of the real world variation is not always possible, specially in some application fields. To address this problem, a novel framework for the application of data augmentation techniques to spectroscopic data has been designed and implemented. This is a carefully designed pipeline of four complementary and independent blocks which can be finely tuned depending on the desired variance for enhancing model's robustness: a) blending spectra, b) changing baseline, c) shifting along x axis, and d) adding random noise.
This novel data augmentation solution has been tested in order to obtain highly efficient generalised classification model based on spectroscopic data. Fourier transform mid-infrared (FT-IR) spectroscopic data of eleven pure vegetable oils (106 admixtures) for the rapid identification of vegetable oil species in mixtures of oils have been used as a case study to demonstrate the influence of this pioneering approach in chemometrics, obtaining a 10% improvement in classification which is crucial in some applications of food adulteration.
Resumo:
Analyses regularly feature claims that European welfare states are in the process of creating an adult worker model. The theoretical and empirical basis of this argument is examined here by looking first at the conceptual foundations of the adult worker model formulation and then at the extent to which social policy reform in western Europe fits with the argument. It is suggested that the adult worker formulation is under-specified. A framework incorporating four dimensions—the treatment of individuals vis-à-vis their family role and status for the purposes of social rights, the treatment of care, the treatment of the family as a social institution, and the extent to which gender inequality is problematized—is developed and then applied. The empirical analysis reveals a strong move towards individualization as social policy promotes and valorizes individual agency and self-sufficiency and shifts some childcare from the family. Yet evidence is also found of continued (albeit changed) familism. Rather than an unequivocal move to an individualized worker model then, a dual earner, gender-specialized, family arrangement is being promoted. The latter is the middle way between the old dependencies and the new “independence.” This makes for complexity and even ambiguity in policy, a manifestation of which is that reform within countries involves concurrent moves in several directions.
Resumo:
his paper uses fuzzy-set ideal type analysis to assess the conformity of European leave regulations to four theoretical ideal typical divisions of labour: male breadwinner, caregiver parity, universal breadwinner and universal caregiver. In contrast to the majority of previous studies, the focus of this analysis is on the extent to which leave regulations promote gender equality in the family and the transformation of traditional gender roles. The results of this analysis demonstrate that European countries cluster into five models that only partly coincide with countries’ geographical proximity. Second, none of the countries considered constitutes a universal caregiver model, while the male breadwinner ideal continues to provide the normative reference point for parental leave regulations in a large number of European states. Finally, we witness a growing emphasis at the national and EU levels concerning the universal breadwinner ideal, which leaves gender inequality in unpaid work unproblematized.
Resumo:
The applicability of ultra-short-term wind power prediction (USTWPP) models is reviewed. The USTWPP method proposed extracts featrues from historical data of wind power time series (WPTS), and classifies every short WPTS into one of several different subsets well defined by stationary patterns. All the WPTS that cannot match any one of the stationary patterns are sorted into the subset of nonstationary pattern. Every above WPTS subset needs a USTWPP model specially optimized for it offline. For on-line application, the pattern of the last short WPTS is recognized, then the corresponding prediction model is called for USTWPP. The validity of the proposed method is verified by simulations.
Resumo:
Logistic regression and Gaussian mixture model (GMM) classifiers have been trained to estimate the probability of acute myocardial infarction (AMI) in patients based upon the concentrations of a panel of cardiac markers. The panel consists of two new markers, fatty acid binding protein (FABP) and glycogen phosphorylase BB (GPBB), in addition to the traditional cardiac troponin I (cTnI), creatine kinase MB (CKMB) and myoglobin. The effect of using principal component analysis (PCA) and Fisher discriminant analysis (FDA) to preprocess the marker concentrations was also investigated. The need for classifiers to give an accurate estimate of the probability of AMI is argued and three categories of performance measure are described, namely discriminatory ability, sharpness, and reliability. Numerical performance measures for each category are given and applied. The optimum classifier, based solely upon the samples take on admission, was the logistic regression classifier using FDA preprocessing. This gave an accuracy of 0.85 (95% confidence interval: 0.78-0.91) and a normalised Brier score of 0.89. When samples at both admission and a further time, 1-6 h later, were included, the performance increased significantly, showing that logistic regression classifiers can indeed use the information from the five cardiac markers to accurately and reliably estimate the probability AMI. © Springer-Verlag London Limited 2008.
Resumo:
Discrete Conditional Phase-type (DC-Ph) models consist of a process component (survival distribution) preceded by a set of related conditional discrete variables. This paper introduces a DC-Ph model where the conditional component is a classification tree. The approach is utilised for modelling health service capacities by better predicting service times, as captured by Coxian Phase-type distributions, interfaced with results from a classification tree algorithm. To illustrate the approach, a case-study within the healthcare delivery domain is given, namely that of maternity services. The classification analysis is shown to give good predictors for complications during childbirth. Based on the classification tree predictions, the duration of childbirth on the labour ward is then modelled as either a two or three-phase Coxian distribution. The resulting DC-Ph model is used to calculate the number of patients and associated bed occupancies, patient turnover, and to model the consequences of changes to risk status.
Resumo:
To improve the performance of classification using Support Vector Machines (SVMs) while reducing the model selection time, this paper introduces Differential Evolution, a heuristic method for model selection in two-class SVMs with a RBF kernel. The model selection method and related tuning algorithm are both presented. Experimental results from application to a selection of benchmark datasets for SVMs show that this method can produce an optimized classification in less time and with higher accuracy than a classical grid search. Comparison with a Particle Swarm Optimization (PSO) based alternative is also included.
Resumo:
Background
G protein-coupled receptors (GPCRs) constitute one of the largest groupings of eukaryotic proteins, and represent a particularly lucrative set of pharmaceutical targets. They play an important role in eukaryotic signal transduction and physiology, mediating cellular responses to a diverse range of extracellular stimuli. The phylum Platyhelminthes is of considerable medical and biological importance, housing major pathogens as well as established model organisms. The recent availability of genomic data for the human blood fluke Schistosoma mansoni and the model planarian Schmidtea mediterranea paves the way for the first comprehensive effort to identify and analyze GPCRs in this important phylum.
Results
Application of a novel transmembrane-oriented approach to receptor mining led to the discovery of 117 S. mansoni GPCRs, representing all of the major families; 105 Rhodopsin, 2 Glutamate, 3 Adhesion, 2 Secretin and 5 Frizzled. Similarly, 418 Rhodopsin, 9 Glutamate, 21 Adhesion, 1 Secretin and 11 Frizzled S. mediterranea receptors were identified. Among these, we report the identification of novel receptor groupings, including a large and highly-diverged Platyhelminth-specific Rhodopsin subfamily, a planarian-specific Adhesion-like family, and atypical Glutamate-like receptors. Phylogenetic analysis was carried out following extensive gene curation. Support vector machines (SVMs) were trained and used for ligand-based classification of full-length Rhodopsin GPCRs, complementing phylogenetic and homology-based classification.
Conclusions
Genome-wide investigation of GPCRs in two platyhelminth genomes reveals an extensive and complex receptor signaling repertoire with many unique features. This work provides important sequence and functional leads for understanding basic flatworm receptor biology, and sheds light on a lucrative set of anthelmintic drug targets.
Resumo:
Discrete Conditional Phase-type (DC-Ph) models are a family of models which represent skewed survival data conditioned on specific inter-related discrete variables. The survival data is modeled using a Coxian phase-type distribution which is associated with the inter-related variables using a range of possible data mining approaches such as Bayesian networks (BNs), the Naïve Bayes Classification method and classification regression trees. This paper utilizes the Discrete Conditional Phase-type model (DC-Ph) to explore the modeling of patient waiting times in an Accident and Emergency Department of a UK hospital. The resulting DC-Ph model takes on the form of the Coxian phase-type distribution conditioned on the outcome of a logistic regression model.
Resumo:
A fast screening method was developed to assess the pathogenicity of a diverse collection of environmental and clinical Burkholderia cepacia complex isolates in the nematode Caenorhabditis elegans. The method was validated by comparison with the standard slow-killing assay. We observed that the pathogenicity of B. cepacia complex isolates in C. elegans was strain-dependent but species-independent. The wide range of observed pathogenic phenotypes agrees with the high degree of phenotypic variation among species of the B. cepacia complex and suggests that the taxonomic classification of a given strain within the complex cannot predict pathogenicity.