879 resultados para Gender classification model
Resumo:
Lipids available in fingermark residue represent important targets for enhancement and dating techniques. While it is well known that lipid composition varies among fingermarks of the same donor (intra-variability) and between fingermarks of different donors (inter-variability), the extent of this variability remains uncharacterised. Thus, this worked aimed at studying qualitatively and quantitatively the initial lipid composition of fingermark residue of 25 different donors. Among the 104 detected lipids, 43 were reported for the first time in the literature. Furthermore, palmitic acid, squalene, cholesterol, myristyl myristate and myristyl myristoleate were quantified and their correlation within fingermark residue was highlighted. Ten compounds were then selected and further studied as potential targets for dating or enhancement techniques. It was shown that their relative standard deviation was significantly lower for the intra-variability than for the inter-variability. Moreover, the use of data pretreatments could significantly reduce this variability. Based on these observations, an objective donor classification model was proposed. Hierarchical cluster analysis was conducted on the pre-treated data and the fingermarks of the 25 donors were classified into two main groups, corresponding to "poor" and "rich" lipid donors. The robustness of this classification was tested using fingermark replicates of selected donors. 86% of these replicates were correctly classified, showing the potential of such a donor classification model for research purposes in order to select representative donors based on compounds of interest.
Resumo:
The paper presents a novel method for monitoring network optimisation, based on a recent machine learning technique known as support vector machine. It is problem-oriented in the sense that it directly answers the question of whether the advised spatial location is important for the classification model. The method can be used to increase the accuracy of classification models by taking a small number of additional measurements. Traditionally, network optimisation is performed by means of the analysis of the kriging variances. The comparison of the method with the traditional approach is presented on a real case study with climate data.
Resumo:
The main objective of the study is to form a framework that provides tools to recognise and classify items whose demand is not smooth but varies highly on size and/or frequency. The framework will then be combined with two other classification methods in order to form a three-dimensional classification model. Forecasting and inventory control of these abnormal demand items is difficult. Therefore another object of this study is to find out which statistical forecasting method is most suitable for forecasting of abnormal demand items. The accuracy of different methods is measured by comparing the forecast to the actual demand. Moreover, the study also aims at finding proper alternatives to the inventory control of abnormal demand items. The study is quantitative and the methodology is a case study. The research methods consist of theory, numerical data, current state analysis and testing of the framework in case company. The results of the study show that the framework makes it possible to recognise and classify the abnormal demand items. It is also noticed that the inventory performance of abnormal demand items differs significantly from the performance of smoothly demanded items. This makes the recognition of abnormal demand items very important.
Resumo:
This thesis describes a representation of gait appearance for the purpose of person identification and classification. This gait representation is based on simple localized image features such as moments extracted from orthogonal view video silhouettes of human walking motion. A suite of time-integration methods, spanning a range of coarseness of time aggregation and modeling of feature distributions, are applied to these image features to create a suite of gait sequence representations. Despite their simplicity, the resulting feature vectors contain enough information to perform well on human identification and gender classification tasks. We demonstrate the accuracy of recognition on gait video sequences collected over different days and times and under varying lighting environments. Each of the integration methods are investigated for their advantages and disadvantages. An improved gait representation is built based on our experiences with the initial set of gait representations. In addition, we show gender classification results using our gait appearance features, the effect of our heuristic feature selection method, and the significance of individual features.
Resumo:
This paper is concerned with the use of a genetic algorithm to select financial ratios for corporate distress classification models. For this purpose, the fitness value associated to a set of ratios is made to reflect the requirements of maximizing the amount of information available for the model and minimizing the collinearity between the model inputs. A case study involving 60 failed and continuing British firms in the period 1997-2000 is used for illustration. The classification model based on ratios selected by the genetic algorithm compares favorably with a model employing ratios usually found in the financial distress literature.
Resumo:
[EN]Different researches suggest that inner facial features are not the only discriminative features for tasks such as person identification or gender classification. Indeed, they have shown an influence of features which are part of the local face context, such as hair, on these tasks. However, object-centered approaches which ignore local context dominate the research in computational vision based facial analysis. In this paper, we performed an analysis to study which areas and which resolutions are diagnostic for the gender classification problem. We first demonstrate the importance of contextual features in human observers for gender classification using a psychophysical ”bubbles” technique.
Resumo:
Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.
Resumo:
Exposure to counter-stereotypic gender role models (e.g., a woman engineer) has been shown to successfully reduce the application of biased gender stereotypes. We tested the hypothesis that such efforts may more generally lessen the application of stereotypic knowledge in other (non-gendered) domains. Specifically, based on the notion that counter-stereotypes can stimulate a lesser reliance on heuristic thinking, we predicted that contesting gender stereotypes would eliminate a more general group prototypicality bias in the selection of leaders. Three studies supported this hypothesis. After exposing participants to a counter-stereotypic gender role model, group prototypicality no longer predicted leadership evaluation and selection. We discuss the implications of these findings for groups and organizations seeking to capitalize on the benefits of an increasingly diverse workforce.
Resumo:
This study examined the association of theoretically guided and empirically identified psychosocial variables on the co-occurrence of risky sexual behavior with alcohol consumption among university students. The study utilized event analysis to determine whether risky sex occurred during the same event in which alcohol was consumed. Relevant conceptualizations included alcohol disinhibition, self-efficacy, and social network theories. Predictor variables included negative condom attitudes, general risk taking, drinking motives, mistrust, social group membership, and gender. Factor analysis was employed to identify dimensions of drinking motives. Measured risky sex behaviors were (a) sex without a condom, (b) sex with people not known very well, (c) sex with injecting drug users (IDUs), (d) sex with people without knowing whether they had a STD, and (e) sex with using drugs. A purposive sample was used and included 222 male and female students recruited from a major urban university. Chi-square analysis was used to determine whether participants were more likely to engage in risky sex behavior in different alcohol use contexts. These contexts were only when drinking, only when not drinking, and when drinking or not. The chi-square findings did not support the hypothesis that university students who use alcohol with sex will engage in riskier sex. These results added to the literature by extending other similar findings to a university student sample. For each of the observed risky sex behaviors, discriminant analysis methodology was used to determine whether the predictor variables would differentiate the drinking contexts, or whether the behavior occurred. Results from discriminant analyses indicated that sex with people not known very well was the only behavior for which there were significant discriminant functions. Gender and enhancement drinking motives were important constructs in the classification model. Limitations of the study and implications for future research, social work practice and policy are discussed. ^
Resumo:
This study examined the association of theoretically guided and empirically identified psychosocial variables on the co-occurrence of risky sexual behavior with alcohol consumption among university students. The study utilized event analysis to determine whether risky sex occurred during the same event in which alcohol was consumed. Relevant conceptualizations included alcohol disinhibition, self-efficacy, and social network theories. Predictor variables included negative condom attitudes, general risk taking, drinking motives, mistrust, social group membership, and gender. Factor analysis was employed to identify dimensions of drinking motives. Measured risky sex behaviors were (a) sex without a condom, (b) sex with people not known very well, (c) sex with injecting drug users (IDUs), (d) sex with people without knowing whether they had a STD, and (e) sex with using drugs. A purposive sample was used and included 222 male and female students recruited from a major urban university. Chi-square analysis was used to determine whether participants were more likely to engage in risky sex behavior in different alcohol use contexts. These contexts were only when drinking, only when not drinking, and when drinking or not. The chi-square findings did not support the hypothesis that university students who use alcohol with sex will engage in riskier sex. These results added to the literature by extending other similar findings to a university student sample. For each of the observed risky sex behaviors, discriminant analysis methodology was used to determine whether the predictor variables would differentiate the drinking contexts, or whether the behavior occurred. Results from discriminant analyses indicated that sex with people not known very well was the only behavior for which there were significant discriminant functions. Gender and enhancement drinking motives were important constructs in the classification model. Limitations of the study and implications for future research, social work practice and policy are discussed.
Resumo:
In order to reduce serious health incidents, individuals with high risks need to be identified as early as possible so that effective intervention and preventive care can be provided. This requires regular and efficient assessments of risk within communities that are the first point of contacts for individuals. Clinical Decision Support Systems CDSSs have been developed to help with the task of risk assessment, however such systems and their underpinning classification models are tailored towards those with clinical expertise. Communities where regular risk assessments are required lack such expertise. This paper presents the continuation of GRiST research team efforts to disseminate clinical expertise to communities. Based on our earlier published findings, this paper introduces the framework and skeleton for a data collection and risk classification model that evaluates data redundancy in real-time, detects the risk-informative data and guides the risk assessors towards collecting those data. By doing so, it enables non-experts within the communities to conduct reliable Mental Health risk triage.
Resumo:
A method using the ring-oven technique for pre-concentration in filter paper discs and near infrared hyperspectral imaging is proposed to identify four detergent and dispersant additives, and to determine their concentration in gasoline. Different approaches were used to select the best image data processing in order to gather the relevant spectral information. This was attained by selecting the pixels of the region of interest (ROI), using a pre-calculated threshold value of the PCA scores arranged as histograms, to select the spectra set; summing up the selected spectra to achieve representativeness; and compensating for the superimposed filter paper spectral information, also supported by scores histograms for each individual sample. The best classification model was achieved using linear discriminant analysis and genetic algorithm (LDA/GA), whose correct classification rate in the external validation set was 92%. Previous classification of the type of additive present in the gasoline is necessary to define the PLS model required for its quantitative determination. Considering that two of the additives studied present high spectral similarity, a PLS regression model was constructed to predict their content in gasoline, while two additional models were used for the remaining additives. The results for the external validation of these regression models showed a mean percentage error of prediction varying from 5 to 15%.
Resumo:
One hundred fifteen cachaça samples derived from distillation in copper stills (73) or in stainless steels (42) were analyzed for thirty five itens by chromatography and inductively coupled plasma optical emission spectrometry. The analytical data were treated through Factor Analysis (FA), Partial Least Square Discriminant Analysis (PLS-DA) and Quadratic Discriminant Analysis (QDA). The FA explained 66.0% of the database variance. PLS-DA showed that it is possible to distinguish between the two groups of cachaças with 52.8% of the database variance. QDA was used to build up a classification model using acetaldehyde, ethyl carbamate, isobutyl alcohol, benzaldehyde, acetic acid and formaldehyde as chemical descriptors. The model presented 91.7% of accuracy on predicting the apparatus in which unknown samples were distilled.
Resumo:
Objective: Several limitations of published bioelectrical impedance analysis (BIA) equations have been reported. The aims were to develop in a multiethnic, elderly population a new prediction equation and cross-validate it along with some published BIA equations for estimating fat-free mass using deuterium oxide dilution as the reference method. Design and setting: Cross-sectional study of elderly from five developing countries. Methods: Total body water (TBW) measured by deuterium dilution was used to determine fat-free mass (FFM) in 383 subjects. Anthropometric and BIA variables were also measured. Only 377 subjects were included for the analysis, randomly divided into development and cross-validation groups after stratified by gender. Stepwise model selection was used to generate the model and Bland Altman analysis was used to test agreement. Results: FFM = 2.95 - 3.89 (Gender) + 0.514 (Ht(2)/Z) + 0.090 (Waist) + 0.156 (Body weight). The model fit parameters were an R(2), total F-Ratio, and the SEE of 0.88, 314.3, and 3.3, respectively. None of the published BIA equations met the criteria for agreement. The new BIA equation underestimated FFM by just 0.3 kg in the cross-validation sample. The mean of the difference between FFM by TBW and the new BIA equation were not significantly different; 95% of the differences were between the limits of agreement of -6.3 to 6.9 kg of FFM. There was no significant association between the mean of the differences and their averages (r = 0.008 and p = 0.2). Conclusions: This new BIA equation offers a valid option compared with some of the current published BIA equations to estimate FFM in elderly subjects from five developing countries.
Resumo:
With the electricity market liberalization, distribution and retail companies are looking for better market strategies based on adequate information upon the consumption patterns of its electricity customers. In this environment all consumers are free to choose their electricity supplier. A fair insight on the customer´s behaviour will permit the definition of specific contract aspects based on the different consumption patterns. In this paper Data Mining (DM) techniques are applied to electricity consumption data from a utility client’s database. To form the different customer´s classes, and find a set of representative consumption patterns, we have used the Two-Step algorithm which is a hierarchical clustering algorithm. Each consumer class will be represented by its load profile resulting from the clustering operation. Next, to characterize each consumer class a classification model will be constructed with the C5.0 classification algorithm.