956 resultados para Active Apperance Models
Resumo:
In several areas of health professionals (pediatricians, nutritionists, orthopedists, endocrinologists, dentists, etc.) are used in the assessment of bone age to diagnose growth disorders in children. Through interviews with specialists in diagnostic imaging and research done in the literature, we identified the TW method - Tanner and Whitehouse as the most efficient. Even achieving better results than other methods, it is still not the most used, due to the complexity of their use. This work presents the possibility of automation of this method and therefore that its use more widespread. Also in this work, they are met two important steps in the evaluation of bone age, identification and classification of regions of interest. Even in the radiography in which the positioning of the hands were not suitable for TW method, the identification algorithm of the fingers showed good results. As the use AAM - Active Appearance Models showed good results in the identification of regions of interest even in radiographs with high contrast and brightness variation. It has been shown through appearance, good results in the classification of the epiphysis in their stages of development, being chosen the average epiphysis finger III (middle) to show the performance. The final results show an average percentage of 90% hit and misclassified, it was found that the error went away just one stage of the correct stage.
Resumo:
In a clinical setting, pain is reported either through patient self-report or via an observer. Such measures are problematic as they are: 1) subjective, and 2) give no specific timing information. Coding pain as a series of facial action units (AUs) can avoid these issues as it can be used to gain an objective measure of pain on a frame-by-frame basis. Using video data from patients with shoulder injuries, in this paper, we describe an active appearance model (AAM)-based system that can automatically detect the frames in video in which a patient is in pain. This pain data set highlights the many challenges associated with spontaneous emotion detection, particularly that of expression and head movement due to the patient's reaction to pain. In this paper, we show that the AAM can deal with these movements and can achieve significant improvements in both the AU and pain detection performance compared to the current-state-of-the-art approaches which utilize similarity-normalized appearance features only.
Resumo:
Spontaneous facial expressions differ from posed ones in appearance, timing and accompanying head movements. Still images cannot provide timing or head movement information directly. However, indirectly the distances between key points on a face extracted from a still image using active shape models can capture some movement and pose changes. This information is superposed on information about non-rigid facial movement that is also part of the expression. Does geometric information improve the discrimination between spontaneous and posed facial expressions arising from discrete emotions? We investigate the performance of a machine vision system for discrimination between posed and spontaneous versions of six basic emotions that uses SIFT appearance based features and FAP geometric features. Experimental results on the NVIE database demonstrate that fusion of geometric information leads only to marginal improvement over appearance features. Using fusion features, surprise is the easiest emotion (83.4% accuracy) to be distinguished, while disgust is the most difficult (76.1%). Our results find different important facial regions between discriminating posed versus spontaneous version of one emotion and classifying the same emotion versus other emotions. The distribution of the selected SIFT features shows that mouth is more important for sadness, while nose is more important for surprise, however, both the nose and mouth are important for disgust, fear, and happiness. Eyebrows, eyes, nose and mouth are important for anger.
In the pursuit of effective affective computing : the relationship between features and registration
Resumo:
For facial expression recognition systems to be applicable in the real world, they need to be able to detect and track a previously unseen person's face and its facial movements accurately in realistic environments. A highly plausible solution involves performing a "dense" form of alignment, where 60-70 fiducial facial points are tracked with high accuracy. The problem is that, in practice, this type of dense alignment had so far been impossible to achieve in a generic sense, mainly due to poor reliability and robustness. Instead, many expression detection methods have opted for a "coarse" form of face alignment, followed by an application of a biologically inspired appearance descriptor such as the histogram of oriented gradients or Gabor magnitudes. Encouragingly, recent advances to a number of dense alignment algorithms have demonstrated both high reliability and accuracy for unseen subjects [e.g., constrained local models (CLMs)]. This begs the question: Aside from countering against illumination variation, what do these appearance descriptors do that standard pixel representations do not? In this paper, we show that, when close to perfect alignment is obtained, there is no real benefit in employing these different appearance-based representations (under consistent illumination conditions). In fact, when misalignment does occur, we show that these appearance descriptors do work well by encoding robustness to alignment error. For this work, we compared two popular methods for dense alignment-subject-dependent active appearance models versus subject-independent CLMs-on the task of action-unit detection. These comparisons were conducted through a battery of experiments across various publicly available data sets (i.e., CK+, Pain, M3, and GEMEP-FERA). We also report our performance in the recent 2011 Facial Expression Recognition and Analysis Challenge for the subject-independent task.
Resumo:
In this paper we propose a framework for both gradient descent image and object alignment in the Fourier domain. Our method centers upon the classical Lucas & Kanade (LK) algorithm where we represent the source and template/model in the complex 2D Fourier domain rather than in the spatial 2D domain. We refer to our approach as the Fourier LK (FLK) algorithm. The FLK formulation is advantageous when one pre-processes the source image and template/model with a bank of filters (e.g. oriented edges, Gabor, etc.) as: (i) it can handle substantial illumination variations, (ii) the inefficient pre-processing filter bank step can be subsumed within the FLK algorithm as a sparse diagonal weighting matrix, (iii) unlike traditional LK the computational cost is invariant to the number of filters and as a result far more efficient, and (iv) this approach can be extended to the inverse compositional form of the LK algorithm where nearly all steps (including Fourier transform and filter bank pre-processing) can be pre-computed leading to an extremely efficient and robust approach to gradient descent image matching. Further, these computational savings translate to non-rigid object alignment tasks that are considered extensions of the LK algorithm such as those found in Active Appearance Models (AAMs).
Resumo:
Active Appearance Models (AAMs) employ a paradigm of inverting a synthesis model of how an object can vary in terms of shape and appearance. As a result, the ability of AAMs to register an unseen object image is intrinsically linked to two factors. First, how well the synthesis model can reconstruct the object image. Second, the degrees of freedom in the model. Fewer degrees of freedom yield a higher likelihood of good fitting performance. In this paper we look at how these seemingly contrasting factors can complement one another for the problem of AAM fitting of an ensemble of images stemming from a constrained set (e.g. an ensemble of face images of the same person).
Resumo:
Novel techniques have been developed for the automatic recognition of human behaviour in challenging environments using information from visual and infra-red camera feeds. The techniques have been applied to two interesting scenarios: Recognise drivers' speech using lip movements and recognising audience behaviour, while watching a movie, using facial features and body movements. Outcome of the research in these two areas will be useful in the improving the performance of voice recognition in automobiles for voice based control and for obtaining accurate movie interest ratings based on live audience response analysis.
Resumo:
Age estimation from facial images is increasingly receiving attention to solve age-based access control, age-adaptive targeted marketing, amongst other applications. Since even humans can be induced in error due to the complex biological processes involved, finding a robust method remains a research challenge today. In this paper, we propose a new framework for the integration of Active Appearance Models (AAM), Local Binary Patterns (LBP), Gabor wavelets (GW) and Local Phase Quantization (LPQ) in order to obtain a highly discriminative feature representation which is able to model shape, appearance, wrinkles and skin spots. In addition, this paper proposes a novel flexible hierarchical age estimation approach consisting of a multi-class Support Vector Machine (SVM) to classify a subject into an age group followed by a Support Vector Regression (SVR) to estimate a specific age. The errors that may happen in the classification step, caused by the hard boundaries between age classes, are compensated in the specific age estimation by a flexible overlapping of the age ranges. The performance of the proposed approach was evaluated on FG-NET Aging and MORPH Album 2 datasets and a mean absolute error (MAE) of 4.50 and 5.86 years was achieved respectively. The robustness of the proposed approach was also evaluated on a merge of both datasets and a MAE of 5.20 years was achieved. Furthermore, we have also compared the age estimation made by humans with the proposed approach and it has shown that the machine outperforms humans. The proposed approach is competitive with current state-of-the-art and it provides an additional robustness to blur, lighting and expression variance brought about by the local phase features.
Resumo:
Segmentation of medical imagery is a challenging problem due to the complexity of the images, as well as to the absence of models of the anatomy that fully capture the possible deformations in each structure. Brain tissue is a particularly complex structure, and its segmentation is an important step for studies in temporal change detection of morphology, as well as for 3D visualization in surgical planning. In this paper, we present a method for segmentation of brain tissue from magnetic resonance images that is a combination of three existing techniques from the Computer Vision literature: EM segmentation, binary morphology, and active contour models. Each of these techniques has been customized for the problem of brain tissue segmentation in a way that the resultant method is more robust than its components. Finally, we present the results of a parallel implementation of this method on IBM's supercomputer Power Visualization System for a database of 20 brain scans each with 256x256x124 voxels and validate those against segmentations generated by neuroanatomy experts.
Resumo:
This paper presents an optimum user-steered boundary tracking approach for image segmentation, which simulates the behavior of water flowing through a riverbed. The riverbed approach was devised using the image foresting transform with a never-exploited connectivity function. We analyze its properties in the derived image graphs and discuss its theoretical relation with other popular methods such as live wire and graph cuts. Several experiments show that riverbed can significantly reduce the number of user interactions (anchor points), as compared to live wire for objects with complex shapes. This paper also includes a discussion about how to combine different methods in order to take advantage of their complementary strengths.
Resumo:
Statistical shape analysis techniques commonly employed in the medical imaging community, such as active shape models or active appearance models, rely on principal component analysis (PCA) to decompose shape variability into a reduced set of interpretable components. In this paper we propose principal factor analysis (PFA) as an alternative and complementary tool to PCA providing a decomposition into modes of variation that can be more easily interpretable, while still being a linear efficient technique that performs dimensionality reduction (as opposed to independent component analysis, ICA). The key difference between PFA and PCA is that PFA models covariance between variables, rather than the total variance in the data. The added value of PFA is illustrated on 2D landmark data of corpora callosa outlines. Then, a study of the 3D shape variability of the human left femur is performed. Finally, we report results on vector-valued 3D deformation fields resulting from non-rigid registration of ventricles in MRI of the brain.
Resumo:
There is a need for accurate predictions of ecosystem carbon (C) and water fluxes in field conditions. Previous research has shown that ecosystem properties can be predicted from community abundance-weighted means (CWM) of plant functional traits and measures of trait variability within a community (FDvar). The capacity for traits to predict carbon (C) and water fluxes, and the seasonal dependency of these trait-function relationships has not been fully explored. Here we measured daytime C and water fluxes over four seasons in grasslands of a range of successional ages in southern England. In a model selection procedure, we related these fluxes to environmental covariates and plant biomass measures before adding CWM and FDvar plant trait measures that were scaled up from measures of individual plants grown in greenhouse conditions. Models describing fluxes in periods of low biological activity contained few predictors, which were usually abiotic factors. In more biologically active periods, models contained more predictors, including plant trait measures. Field-based plant biomass measures were generally better predictors of fluxes than CWM and FDvar traits. However, when these measures were used in combination traits accounted for additional variation. Where traits were significant predictors their identity often reflected seasonal vegetation dynamics. These results suggest that database derived trait measures can improve the prediction of ecosystem C and water fluxes. Controlled studies and those involving more detailed flux measurements are required to validate and explore these findings, a worthwhile effort given the potential for using simple vegetation measures to help predict landscape-scale fluxes.
Resumo:
Purpose: Proper delineation of ocular anatomy in 3D imaging is a big challenge, particularly when developing treatment plans for ocular diseases. Magnetic Resonance Imaging (MRI) is nowadays utilized in clinical practice for the diagnosis confirmation and treatment planning of retinoblastoma in infants, where it serves as a source of information, complementary to the Fundus or Ultrasound imaging. Here we present a framework to fully automatically segment the eye anatomy in the MRI based on 3D Active Shape Models (ASM), we validate the results and present a proof of concept to automatically segment pathological eyes. Material and Methods: Manual and automatic segmentation were performed on 24 images of healthy children eyes (3.29±2.15 years). Imaging was performed using a 3T MRI scanner. The ASM comprises the lens, the vitreous humor, the sclera and the cornea. The model was fitted by first automatically detecting the position of the eye center, the lens and the optic nerve, then aligning the model and fitting it to the patient. We validated our segmentation method using a leave-one-out cross validation. The segmentation results were evaluated by measuring the overlap using the Dice Similarity Coefficient (DSC) and the mean distance error. Results: We obtained a DSC of 94.90±2.12% for the sclera and the cornea, 94.72±1.89% for the vitreous humor and 85.16±4.91% for the lens. The mean distance error was 0.26±0.09mm. The entire process took 14s on average per eye. Conclusion: We provide a reliable and accurate tool that enables clinicians to automatically segment the sclera, the cornea, the vitreous humor and the lens using MRI. We additionally present a proof of concept for fully automatically segmenting pathological eyes. This tool reduces the time needed for eye shape delineation and thus can help clinicians when planning eye treatment and confirming the extent of the tumor.
Resumo:
El presente trabajo describe una nueva metodología para la detección automática del espacio glotal de imágenes laríngeas tomadas a partir de 15 vídeos grabados por el servicio ORL del hospital Gregorio Marañón de Madrid con luz estroboscópica. El sistema desarrollado está basado en el modelo de contornos activos (snake). El algoritmo combina en el pre-procesado, algunas técnicas tradicionales (umbralización y filtro de mediana) con técnicas más sofisticadas tales como filtrado anisotrópico. De esta forma, se obtiene una imagen apropiada para el uso de las snakes. El valor escogido para el umbral es del 85% del pico máximo del histograma de la imagen; sobre este valor la información de los píxeles no es relevante. El filtro anisotrópico permite distinguir dos niveles de intensidad, uno es el fondo y el otro es la glotis. La inicialización se basa en obtener el módulo del campo GVF; de esta manera se asegura un proceso automático para la selección del contorno inicial. El rendimiento del algoritmo se valida usando los coeficientes de Pratt y se compara contra una segmentación realizada manualmente y otro método automático basado en la transformada de watershed. SUMMARY: The present work describes a new methodology for the automatic detection of the glottal space from laryngeal images taken from 15 videos recorded by the ENT service of the Gregorio Marañon Hospital in Madrid with videostroboscopic equipment. The system is based on active contour models (snakes). The algorithm combines for the pre-processing, some traditional techniques (thresholding and median filter) with more sophisticated techniques such as anisotropic filtering. In this way, we obtain an appropriate image for the use of snake. The value selected for the threshold is 85% of the maximum peak of the image histogram; over this point the information of the pixels is not relevant. The anisotropic filter permits to distinguish two intensity levels, one is the background and the other one is the glottis. The initialization is based on the obtained magnitude by GVF field; in this manner an automatic process for the initial contour selection will be assured. The performance of the algorithm is tested using the Pratt coefficient and compared against a manual segmentation and another automatic method based on the watershed transformation.