879 resultados para Gender classification model


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sin duda, el rostro humano ofrece mucha más información de la que pensamos. La cara transmite sin nuestro consentimiento señales no verbales, a partir de las interacciones faciales, que dejan al descubierto nuestro estado afectivo, actividad cognitiva, personalidad y enfermedades. Estudios recientes [OFT14, TODMS15] demuestran que muchas de nuestras decisiones sociales e interpersonales derivan de un previo análisis facial de la cara que nos permite establecer si esa persona es confiable, trabajadora, inteligente, etc. Esta interpretación, propensa a errores, deriva de la capacidad innata de los seres humanas de encontrar estas señales e interpretarlas. Esta capacidad es motivo de estudio, con un especial interés en desarrollar métodos que tengan la habilidad de calcular de manera automática estas señales o atributos asociados a la cara. Así, el interés por la estimación de atributos faciales ha crecido rápidamente en los últimos años por las diversas aplicaciones en que estos métodos pueden ser utilizados: marketing dirigido, sistemas de seguridad, interacción hombre-máquina, etc. Sin embargo, éstos están lejos de ser perfectos y robustos en cualquier dominio de problemas. La principal dificultad encontrada es causada por la alta variabilidad intra-clase debida a los cambios en la condición de la imagen: cambios de iluminación, oclusiones, expresiones faciales, edad, género, etnia, etc.; encontradas frecuentemente en imágenes adquiridas en entornos no controlados. Este de trabajo de investigación estudia técnicas de análisis de imágenes para estimar atributos faciales como el género, la edad y la postura, empleando métodos lineales y explotando las dependencias estadísticas entre estos atributos. Adicionalmente, nuestra propuesta se centrará en la construcción de estimadores que tengan una fuerte relación entre rendimiento y coste computacional. Con respecto a éste último punto, estudiamos un conjunto de estrategias para la clasificación de género y las comparamos con una propuesta basada en un clasificador Bayesiano y una adecuada extracción de características. Analizamos en profundidad el motivo de porqué las técnicas lineales no han logrado resultados competitivos hasta la fecha y mostramos cómo obtener rendimientos similares a las mejores técnicas no-lineales. Se propone un segundo algoritmo para la estimación de edad, basado en un regresor K-NN y una adecuada selección de características tal como se propuso para la clasificación de género. A partir de los experimentos desarrollados, observamos que el rendimiento de los clasificadores se reduce significativamente si los ´estos han sido entrenados y probados sobre diferentes bases de datos. Hemos encontrado que una de las causas es la existencia de dependencias entre atributos faciales que no han sido consideradas en la construcción de los clasificadores. Nuestro resultados demuestran que la variabilidad intra-clase puede ser reducida cuando se consideran las dependencias estadísticas entre los atributos faciales de el género, la edad y la pose; mejorando el rendimiento de nuestros clasificadores de atributos faciales con un coste computacional pequeño. Abstract Surely the human face provides much more information than we think. The face provides without our consent nonverbal cues from facial interactions that reveal our emotional state, cognitive activity, personality and disease. Recent studies [OFT14, TODMS15] show that many of our social and interpersonal decisions derive from a previous facial analysis that allows us to establish whether that person is trustworthy, hardworking, intelligent, etc. This error-prone interpretation derives from the innate ability of human beings to find and interpret these signals. This capability is being studied, with a special interest in developing methods that have the ability to automatically calculate these signs or attributes associated with the face. Thus, the interest in the estimation of facial attributes has grown rapidly in recent years by the various applications in which these methods can be used: targeted marketing, security systems, human-computer interaction, etc. However, these are far from being perfect and robust in any domain of problems. The main difficulty encountered is caused by the high intra-class variability due to changes in the condition of the image: lighting changes, occlusions, facial expressions, age, gender, ethnicity, etc.; often found in images acquired in uncontrolled environments. This research work studies image analysis techniques to estimate facial attributes such as gender, age and pose, using linear methods, and exploiting the statistical dependencies between these attributes. In addition, our proposal will focus on the construction of classifiers that have a good balance between performance and computational cost. We studied a set of strategies for gender classification and we compare them with a proposal based on a Bayesian classifier and a suitable feature extraction based on Linear Discriminant Analysis. We study in depth why linear techniques have failed to provide competitive results to date and show how to obtain similar performances to the best non-linear techniques. A second algorithm is proposed for estimating age, which is based on a K-NN regressor and proper selection of features such as those proposed for the classification of gender. From our experiments we note that performance estimates are significantly reduced if they have been trained and tested on different databases. We have found that one of the causes is the existence of dependencies between facial features that have not been considered in the construction of classifiers. Our results demonstrate that intra-class variability can be reduced when considering the statistical dependencies between facial attributes gender, age and pose, thus improving the performance of our classifiers with a reduced computational cost.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

En los últimos años han surgido nuevos campos de las tecnologías de la información que exploran el tratamiento de la gran cantidad de datos digitales existentes y cómo transformarlos en conocimiento explícito. Las técnicas de Procesamiento del Lenguaje Natural (NLP) son capaces de extraer información de los textos digitales presentados en forma narrativa. Además, las técnicas de machine learning clasifican instancias o ejemplos en función de sus atributos, en distintas categorías, aprendiendo de otros previamente clasificados. Los textos clínicos son una gran fuente de información no estructurada; en consecuencia, información no explotada en su totalidad. Algunos términos usados en textos clínicos se encuentran en una situación de afirmación, negación, hipótesis o histórica. La detección de esta situación es necesaria para la estructuración de información, pero a su vez tiene una gran complejidad. Extrayendo características lingüísticas de los elementos, o tokens, de los textos mediante NLP; transformando estos tokens en instancias y las características en atributos, podemos mediante técnicas de machine learning clasificarlos con el objetivo de detectar si se encuentran afirmados, negados, hipotéticos o históricos. La selección de los atributos que cada token debe tener para su clasificación, así como la selección del algoritmo de machine learning utilizado son elementos cruciales para la clasificación. Son, de hecho, los elementos que componen el modelo de clasificación. Consecuentemente, este trabajo aborda el proceso de extracción de características, selección de atributos y selección del algoritmo de machine learning para la detección de la negación en textos clínicos en español. Se expone un modelo para la clasificación que, mediante el algoritmo J48 y 35 atributos obtenidos de características lingüísticas (morfológicas y sintácticas) y disparadores de negación, detecta si un token está negado en 465 frases provenientes de textos clínicos con un F-Score del 73%, una exhaustividad del 66% y una precisión del 81% con una validación cruzada de 10 iteraciones. ---ABSTRACT--- New information technologies have emerged in the recent years which explore the processing of the huge amount of existing digital data and its transformation into knowledge. Natural Language Processing (NLP) techniques are able to extract certain features from digital texts. Additionally, through machine learning techniques it is feasible to classify instances according to different categories, learning from others previously classified. Clinical texts contain great amount of unstructured data, therefore information not fully exploited. Some terms (tokens) in clinical texts appear in different situations such as affirmed, negated, hypothetic or historic. Detecting this situation is necessary for the structuring of this data, however not simple. It is possible to detect whether if a token is negated, affirmed, hypothetic or historic by extracting its linguistic features by NLP; transforming these tokens into instances, the features into attributes, and classifying these instances through machine learning techniques. Selecting the attributes each instance must have, and choosing the machine learning algorithm are crucial issues for the classification. In fact, these elements set the classification model. Consequently, this work approaches the features retrieval as well as the attributes and algorithm selection process used by machine learning techniques for the detection of negation in clinical texts in Spanish. We present a classification model which, through J48 algorithm and 35 attributes from linguistic features (morphologic and syntactic) and negation triggers, detects whether if a token is negated in 465 sentences from historical records, with a result of 73% FScore, 66% recall and 81% precision using a 10-fold cross-validation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Uma imagem engloba informação que precisa ser organizada para interpretar e compreender seu conteúdo. Existem diversas técnicas computacionais para extrair a principal informação de uma imagem e podem ser divididas em três áreas: análise de cor, textura e forma. Uma das principais delas é a análise de forma, por descrever características de objetos baseadas em seus pontos fronteira. Propomos um método de caracterização de imagens, por meio da análise de forma, baseada nas propriedades espectrais do laplaciano em grafos. O procedimento construiu grafos G baseados nos pontos fronteira do objeto, cujas conexões entre vértices são determinadas por limiares T_l. A partir dos grafos obtêm-se a matriz de adjacência A e a matriz de graus D, as quais definem a matriz Laplaciana L=D -A. A decomposição espectral da matriz Laplaciana (autovalores) é investigada para descrever características das imagens. Duas abordagens são consideradas: a) Análise do vetor característico baseado em limiares e a histogramas, considera dois parâmetros o intervalo de classes IC_l e o limiar T_l; b) Análise do vetor característico baseado em vários limiares para autovalores fixos; os quais representam o segundo e último autovalor da matriz L. As técnicas foram testada em três coleções de imagens: sintéticas (Genéricas), parasitas intestinais (SADPI) e folhas de plantas (CNShape), cada uma destas com suas próprias características e desafios. Na avaliação dos resultados, empregamos o modelo de classificação support vector machine (SVM), o qual avalia nossas abordagens, determinando o índice de separação das categorias. A primeira abordagem obteve um acerto de 90 % com a coleção de imagens Genéricas, 88 % na coleção SADPI, e 72 % na coleção CNShape. Na segunda abordagem, obtém-se uma taxa de acerto de 97 % com a coleção de imagens Genéricas; 83 % para SADPI e 86 % no CNShape. Os resultados mostram que a classificação de imagens a partir do espectro do Laplaciano, consegue categorizá-las satisfatoriamente.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The paper proposes an ISE (Information goal, Search strategy, Evaluation threshold) user classification model based on Information Foraging Theory for understanding user interaction with content-based image retrieval (CBIR). The proposed model is verified by a multiple linear regression analysis based on 50 users' interaction features collected from a task-based user study of interactive CBIR systems. To our best knowledge, this is the first principled user classification model in CBIR verified by a formal and systematic qualitative analysis of extensive user interaction data. Copyright 2010 ACM.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This is the first of two linked papers exploring decision making in nursing which integrate research evidence from different clinical and academic disciplines. Currently there are many decision-making theories, each with their own distinctive concepts and terminology, and there is a tendency for separate disciplines to view their own decision-making processes as unique. Identifying good nursing decisions and where improvements can be made is therefore problematic, and this can undermine clinical and organizational effectiveness, as well as nurses' professional status. Within the unifying framework of psychological classification, the overall aim of the two papers is to clarify and compare terms, concepts and processes identified in a diversity of decision-making theories, and to demonstrate their underlying similarities. It is argued that the range of explanations used across disciplines can usefully be re-conceptualized as classification behaviour. This paper explores problems arising from multiple theories of decision making being applied to separate clinical disciplines. Attention is given to detrimental effects on nursing practice within the context of multidisciplinary health-care organizations and the changing role of nurses. The different theories are outlined and difficulties in applying them to nursing decisions highlighted. An alternative approach based on a general model of classification is then presented in detail to introduce its terminology and the unifying framework for interpreting all types of decisions. The classification model is used to provide the context for relating alternative philosophical approaches and to define decision-making activities common to all clinical domains. This may benefit nurses by improving multidisciplinary collaboration and weakening clinical elitism.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Original method and technology of systemological «Unit-Function-Object» analysis for solving complete ill-structured problems is proposed. The given visual grapho-analytical UFO technology for the fist time combines capabilities and advantages of the system and object approaches and can be used for business reengineering and for information systems design. UFO- technology procedures are formalized by pattern-theory methods and developed by embedding systemological conceptual classification models into the system-object analysis and software tools. Technology is based on natural classification and helps to investigate deep semantic regularities of subject domain and to take proper account of system-classes essential properties the most objectively. Systemological knowledge models are based on method which for the first time synthesizes system and classification analysis. It allows creating CASE-toolkit of a new generation for organizational modelling for companies’ sustainable development and competitive advantages providing.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Clinical decision support systems (CDSSs) often base their knowledge and advice on human expertise. Knowledge representation needs to be in a format that can be easily understood by human users as well as supporting ongoing knowledge engineering, including evolution and consistency of knowledge. This paper reports on the development of an ontology specification for managing knowledge engineering in a CDSS for assessing and managing risks associated with mental-health problems. The Galatean Risk and Safety Tool, GRiST, represents mental-health expertise in the form of a psychological model of classification. The hierarchical structure was directly represented in the machine using an XML document. Functionality of the model and knowledge management were controlled using attributes in the XML nodes, with an accompanying paper manual for specifying how end-user tools should behave when interfacing with the XML. This paper explains the advantages of using the web-ontology language, OWL, as the specification, details some of the issues and problems encountered in translating the psychological model to OWL, and shows how OWL benefits knowledge engineering. The conclusions are that OWL can have an important role in managing complex knowledge domains for systems based on human expertise without impeding the end-users' understanding of the knowledge base. The generic classification model underpinning GRiST makes it applicable to many decision domains and the accompanying OWL specification facilitates its implementation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Effective conservation and management of top predators requires a comprehensive understanding of their distributions and of the underlying biological and physical processes that affect these distributions. The Mid-Atlantic Bight shelf break system is a dynamic and productive region where at least 32 species of cetaceans have been recorded through various systematic and opportunistic marine mammal surveys from the 1970s through 2012. My dissertation characterizes the spatial distribution and habitat of cetaceans in the Mid-Atlantic Bight shelf break system by utilizing marine mammal line-transect survey data, synoptic multi-frequency active acoustic data, and fine-scale hydrographic data collected during the 2011 summer Atlantic Marine Assessment Program for Protected Species (AMAPPS) survey. Although studies describing cetacean habitat and distributions have been previously conducted in the Mid-Atlantic Bight, my research specifically focuses on the shelf break region to elucidate both the physical and biological processes that influence cetacean distribution patterns within this cetacean hotspot.

In Chapter One I review biologically important areas for cetaceans in the Atlantic waters of the United States. I describe the study area, the shelf break region of the Mid-Atlantic Bight, in terms of the general oceanography, productivity and biodiversity. According to recent habitat-based cetacean density models, the shelf break region is an area of high cetacean abundance and density, yet little research is directed at understanding the mechanisms that establish this region as a cetacean hotspot.

In Chapter Two I present the basic physical principles of sound in water and describe the methodology used to categorize opportunistically collected multi-frequency active acoustic data using frequency responses techniques. Frequency response classification methods are usually employed in conjunction with net-tow data, but the logistics of the 2011 AMAPPS survey did not allow for appropriate net-tow data to be collected. Biologically meaningful information can be extracted from acoustic scattering regions by comparing the frequency response curves of acoustic regions to theoretical curves of known scattering models. Using the five frequencies on the EK60 system (18, 38, 70, 120, and 200 kHz), three categories of scatterers were defined: fish-like (with swim bladder), nekton-like (e.g., euphausiids), and plankton-like (e.g., copepods). I also employed a multi-frequency acoustic categorization method using three frequencies (18, 38, and 120 kHz) that has been used in the Gulf of Maine and Georges Bank which is based the presence or absence of volume backscatter above a threshold. This method is more objective than the comparison of frequency response curves because it uses an established backscatter value for the threshold. By removing all data below the threshold, only strong scattering information is retained.

In Chapter Three I analyze the distribution of the categorized acoustic regions of interest during the daytime cross shelf transects. Over all transects, plankton-like acoustic regions of interest were detected most frequently, followed by fish-like acoustic regions and then nekton-like acoustic regions. Plankton-like detections were the only significantly different acoustic detections per kilometer, although nekton-like detections were only slightly not significant. Using the threshold categorization method by Jech and Michaels (2006) provides a more conservative and discrete detection of acoustic scatterers and allows me to retrieve backscatter values along transects in areas that have been categorized. This provides continuous data values that can be integrated at discrete spatial increments for wavelet analysis. Wavelet analysis indicates significant spatial scales of interest for fish-like and nekton-like acoustic backscatter range from one to four kilometers and vary among transects.

In Chapter Four I analyze the fine scale distribution of cetaceans in the shelf break system of the Mid-Atlantic Bight using corrected sightings per trackline region, classification trees, multidimensional scaling, and random forest analysis. I describe habitat for common dolphins, Risso’s dolphins and sperm whales. From the distribution of cetacean sightings, patterns of habitat start to emerge: within the shelf break region of the Mid-Atlantic Bight, common dolphins were sighted more prevalently over the shelf while sperm whales were more frequently found in the deep waters offshore and Risso’s dolphins were most prevalent at the shelf break. Multidimensional scaling presents clear environmental separation among common dolphins and Risso’s dolphins and sperm whales. The sperm whale random forest habitat model had the lowest misclassification error (0.30) and the Risso’s dolphin random forest habitat model had the greatest misclassification error (0.37). Shallow water depth (less than 148 meters) was the primary variable selected in the classification model for common dolphin habitat. Distance to surface density fronts and surface temperature fronts were the primary variables selected in the classification models to describe Risso’s dolphin habitat and sperm whale habitat respectively. When mapped back into geographic space, these three cetacean species occupy different fine-scale habitats within the dynamic Mid-Atlantic Bight shelf break system.

In Chapter Five I present a summary of the previous chapters and present potential analytical steps to address ecological questions pertaining the dynamic shelf break region. Taken together, the results of my dissertation demonstrate the use of opportunistically collected data in ecosystem studies; emphasize the need to incorporate middle trophic level data and oceanographic features into cetacean habitat models; and emphasize the importance of developing more mechanistic understanding of dynamic ecosystems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This essay explores whether the gender constructions in Joe Abercrombie’s Best Served Cold and Juliet Marillier’s Daughter of the Forest question or contribute to existing gender categories. The analysis is performed using Raewynn Connell’s gender structure model, Brian Attebery’s theory of fantasy as a "fuzzy set" and Maria Nikolajeva’s schedule for stereotypical gender traits. Thus, both of the texts were analyzed to determine if their contents, structures and reader responses create opportunities or act limiting, how the main characters are portrayed and how the books various power-, production-, emotional- and symbolic relations look like. The result of the analysis is that both of the books portray patriarchal worlds, sexual division of labor, misogyny and gender-binding statements. The characters in Daughter of the Forest are quite stereotypical, with some traits that exceed their gender, whilst the characters in Best Served Cold are all portrayed with traditionally manly traits (even the female main character). Therefor one can say that Best Served cold’s female protagonist is the only element in the books that fully questions prevailing gender categories.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

O presente trabalho tem como principal objetivo contribuir para o desenvolvimento de um modelo de gestão da cadeia de abastecimento sustentável, posteriormente operacionalizado num conjunto de empresas. O principal elemento diferenciador do modelo apresentado é o seu cariz operacional, focado na fase da implementação com a integração de um conjunto de práticas de apoio à sustentabilidade. Para responder às pressões da conjuntura económica atual, das alterações climáticas, da escassez de recursos e das desigualdades sociais é necessário desenvolver de forma consolidada e abrangente um novo paradigma de gestão nas empresas. Muitas destas pressões fazem-se sentir nas atividades da Gestão da Cadeia de Abastecimento. O grande desafio é conseguir que as empresas obtenham bons resultados económicos, sociais e ambientais. A sustentabilidade tem sido abordada como a área de estudo donde deverá emergir este novo paradigma de gestão. Atendendo a esta problemática, a principal questão de investigação do presente trabalho é “Como se implementa a Sustentabilidade na Gestão da Cadeia de Abastecimento?” A metodologia de investigação partiu da revisão da literatura que permitiu estruturar um conjunto de pressupostos teóricos, estruturados num modelo conceptual sobre a implementação da sustentabilidade na Gestão da Cadeia de Abastecimento. O modelo foi aplicado em dois grupos de estudos empíricos: Análise Qualitativa de Relatórios de Sustentabilidade publicados por seis empresas com atividade em Portugal (Sonae; Lipor, Galp; EDP; Portucel e AutoEuropa); e o desenvolvimento de dois Estudos de Caso nas empresas Bosch Termotecnologia e Gestamp Aveiro. Os resultados permitiram o desenvolvimento de um Modelo Teórico de Implementação da Sustentabilidade na Gestão da Cadeia de Abastecimento. Bem como, um modelo de classificação das ferramentas de apoio à implementação da sustentabilidade adequadas a cada etapa que constitui o modelo de implementação. No desenvolvimento deste trabalho, acreditou-se que o caminho da sustentabilidade é possível e tangível. Os modelos desenvolvidos explicam que a integração da sustentabilidade se enceta pela estruturação da área da sustentabilidade na organização, prosseguindo com o processo de implementação constituído por quatro etapas: Envolvimento, Execução, Monitorização e Comunicação. A implementação necessita de ser abrangente a toda a cadeia de valor e apoiada num conjunto de ferramentas adequadas a cada fase de implementação.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents our work at 2016 FIRE CHIS. Given a CHIS query and a document associated with that query, the task is to classify the sentences in the document as relevant to the query or not; and further classify the relevant sentences to be supporting, neutral or opposing to the claim made in the query. In this paper, we present two different approaches to do the classification. With the first approach, we implement two models to satisfy the task. We first implement an information retrieval model to retrieve the sentences that are relevant to the query; and then we use supervised learning method to train a classification model to classify the relevant sentences into support, oppose or neutral. With the second approach, we only use machine learning techniques to learn a model and classify the sentences into four classes (relevant & support, relevant & neutral, relevant & oppose, irrelevant & neutral). Our submission for CHIS uses the first approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we describe one of the approaches of the participation of Universidade de Évora. Our approach is similar to usual methods where text is preprocessed, features are extracted, and then used in SVMs with cross validation. The main difference is that features used come from averages of word embeddings, specifically word2vec vectors. Using PAN 2016 dataset, we were able to achieve 44.8% and 68.2% for English age and gender classification respectively. We were also able to achieve 51.3% and 67.1% accuracy for Spanish age and gender classification. Finally, we report 71.9% accuracy for Dutch age classification.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes various experiments done to investigate author profiling of tweets in 4 different languages – English, Dutch, Italian, and Spanish. Profiling consists of age and gender classification, as well as regression on 5 different person- ality dimensions – extroversion, stability, agreeableness, open- ness, and conscientiousness. Different sets of features were tested – bag-of-words, word ngrams, POS ngrams, and average of word embeddings. SVM was used as the classifier. Tfidf worked best for most English tasks while for most of the tasks from the other languages, the combination of the best features worked better.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Hematological cancers are a heterogeneous family of diseases that can be divided into leukemias, lymphomas, and myelomas, often called “liquid tumors”. Since they cannot be surgically removable, chemotherapy represents the mainstay of their treatment. However, it still faces several challenges like drug resistance and low response rate, and the need for new anticancer agents is compelling. The drug discovery process is long-term, costly, and prone to high failure rates. With the rapid expansion of biological and chemical "big data", some computational techniques such as machine learning tools have been increasingly employed to speed up and economize the whole process. Machine learning algorithms can create complex models with the aim to determine the biological activity of compounds against several targets, based on their chemical properties. These models are defined as multi-target Quantitative Structure-Activity Relationship (mt-QSAR) and can be used to virtually screen small and large chemical libraries for the identification of new molecules with anticancer activity. The aim of my Ph.D. project was to employ machine learning techniques to build an mt-QSAR classification model for the prediction of cytotoxic drugs simultaneously active against 43 hematological cancer cell lines. For this purpose, first, I constructed a large and diversified dataset of molecules extracted from the ChEMBL database. Then, I compared the performance of different ML classification algorithms, until Random Forest was identified as the one returning the best predictions. Finally, I used different approaches to maximize the performance of the model, which achieved an accuracy of 88% by correctly classifying 93% of inactive molecules and 72% of active molecules in a validation set. This model was further applied to the virtual screening of a small dataset of molecules tested in our laboratory, where it showed 100% accuracy in correctly classifying all molecules. This result is confirmed by our previous in vitro experiments.