In this paper, the goal of identifying disease subgroups based on differences in observed symptom profile is considered. Commonly referred to as phenotype identification, solutions to this task often involve the application of unsupervised clustering techniques. In this paper, we investigate the application of a Dirichlet Process mixture (DPM) model for this task. This model is defined by the placement of the Dirichlet Process (DP) on the unknown components of a mixture model, allowing for the expression of uncertainty about the partitioning of observed data into homogeneous subgroups. To exemplify this approach, an application to phenotype identification in Parkinson’s disease (PD) is considered, with symptom profiles collected using the Unified Parkinson’s Disease Rating Scale (UPDRS). Clustering, Dirichlet Process mixture, Parkinson’s disease, UPDRS.


This thesis addressed issues that have prevented qualitative researchers from using thematic discovery algorithms. The central hypothesis evaluated whether allowing qualitative researchers to interact with thematic discovery algorithms and incorporate domain knowledge improved their ability to address research questions and trust the derived themes. Non-negative Matrix Factorisation and Latent Dirichlet Allocation find latent themes within document collections but these algorithms are rarely used, because qualitative researchers do not trust and cannot interact with the themes that are automatically generated. The research determined the types of interactivity that qualitative researchers require and then evaluated interactive algorithms that matched these requirements. Theoretical contributions included the articulation of design guidelines for interactive thematic discovery algorithms, the development of an Evaluation Model and a Conceptual Framework for Interactive Content Analysis.


Local spatio-temporal features with a Bag-of-visual words model is a popular approach used in human action recognition. Bag-of-features methods suffer from several challenges such as extracting appropriate appearance and motion features from videos, converting extracted features appropriate for classification and designing a suitable classification framework. In this paper we address the problem of efficiently representing the extracted features for classification to improve the overall performance. We introduce two generative supervised topic models, maximum entropy discrimination LDA (MedLDA) and class- specific simplex LDA (css-LDA), to encode the raw features suitable for discriminative SVM based classification. Unsupervised LDA models disconnect topic discovery from the classification task, hence yield poor results compared to the baseline Bag-of-words framework. On the other hand supervised LDA techniques learn the topic structure by considering the class labels and improve the recognition accuracy significantly. MedLDA maximizes likelihood and within class margins using max-margin techniques and yields a sparse highly discriminative topic structure; while in css-LDA separate class specific topics are learned instead of common set of topics across the entire dataset. In our representation first topics are learned and then each video is represented as a topic proportion vector, i.e. it can be comparable to a histogram of topics. Finally SVM classification is done on the learned topic proportion vector. We demonstrate the efficiency of the above two representation techniques through the experiments carried out in two popular datasets. Experimental results demonstrate significantly improved performance compared to the baseline Bag-of-features framework which uses kmeans to construct histogram of words from the feature vectors.


The object of this dissertation is to study globally defined bounded p-harmonic functions on Cartan-Hadamard manifolds and Gromov hyperbolic metric measure spaces. Such functions are constructed by solving the so called Dirichlet problem at infinity. This problem is to find a p-harmonic function on the space that extends continuously to the boundary at inifinity and obtains given boundary values there. The dissertation consists of an overview and three published research articles. In the first article the Dirichlet problem at infinity is considered for more general A-harmonic functions on Cartan-Hadamard manifolds. In the special case of two dimensions the Dirichlet problem at infinity is solved by only assuming that the sectional curvature has a certain upper bound. A sharpness result is proved for this upper bound. In the second article the Dirichlet problem at infinity is solved for p-harmonic functions on Cartan-Hadamard manifolds under the assumption that the sectional curvature is bounded outside a compact set from above and from below by functions that depend on the distance to a fixed point. The curvature bounds allow examples of quadratic decay and examples of exponential growth. In the final article a generalization of the Dirichlet problem at infinity for p-harmonic functions is considered on Gromov hyperbolic metric measure spaces. Existence and uniqueness results are proved and Cartan-Hadamard manifolds are considered as an application.


It is important to identify the ``correct'' number of topics in mechanisms like Latent Dirichlet Allocation(LDA) as they determine the quality of features that are presented as features for classifiers like SVM. In this work we propose a measure to identify the correct number of topics and offer empirical evidence in its favor in terms of classification accuracy and the number of topics that are naturally present in the corpus. We show the merit of the measure by applying it on real-world as well as synthetic data sets(both text and images). In proposing this measure, we view LDA as a matrix factorization mechanism, wherein a given corpus C is split into two matrix factors M-1 and M-2 as given by C-d*w = M1(d*t) x Q(t*w).Where d is the number of documents present in the corpus anti w is the size of the vocabulary. The quality of the split depends on ``t'', the right number of topics chosen. The measure is computed in terms of symmetric KL-Divergence of salient distributions that are derived from these matrix factors. We observe that the divergence values are higher for non-optimal number of topics - this is shown by a `dip' at the right value for `t'.


Song-selection and mood are interdependent. If we capture a song’s sentiment, we can determine the mood of the listener, which can serve as a basis for recommendation systems. Songs are generally classified according to genres, which don’t entirely reflect sentiments. Thus, we require an unsupervised scheme to mine them. Sentiments are classified into either two (positive/negative) or multiple (happy/angry/sad/...) classes, depending on the application. We are interested in analyzing the feelings invoked by a song, involving multi-class sentiments. To mine the hidden sentimental structure behind a song, in terms of “topics”, we consider its lyrics and use Latent Dirichlet Allocation (LDA). Each song is a mixture of moods. Topics mined by LDA can represent moods. Thus we get a scheme of collecting similar-mood songs. For validation, we use a dataset of songs containing 6 moods annotated by users of a particular website.


In this paper, we present a novel approach that makes use of topic models based on Latent Dirichlet allocation(LDA) for generating single document summaries. Our approach is distinguished from other LDA based approaches in that we identify the summary topics which best describe a given document and only extract sentences from those paragraphs within the document which are highly correlated given the summary topics. This ensures that our summaries always highlight the crux of the document without paying any attention to the grammar and the structure of the documents. Finally, we evaluate our summaries on the DUC 2002 Single document summarization data corpus using ROUGE measures. Our summaries had higher ROUGE values and better semantic similarity with the documents than the DUC summaries.


We prove that given a Hecke-Maass form f for SL(2, Z) and a sufficiently large prime q, there exists a primitive Dirichlet character chi of conductor q such that the L-values L(1/2, f circle times chi) and L(1/2, chi) do not vanish.


La pérdida de diversidad genética es un proceso que transcurre a gran velocidad, para preservar y conservar estos recursos genéticos vegetales, se hace necesario el inventario y caracterización (agronómica, morfológica, genética, bioquímica, etc), con el propósito de describir y diferenciar el material genético. Las estrategias de conservación del germoplasma deben basarse en la preservación de las poblaciones en su hábitat ( in situ) y la preservación fuera de su hábitat (ex situ). El presente estudio se desarrolló durante el período de octubre (2002) a noviembre (2002), muestreado en la zona del pacífico en los departamentos de Chinandega, León, Managua, Masaya, Granada, Carazo y Rivas con el objetivo de proponer una guía de descriptores del cultivo de pitahaya (Hylocereus undatus Britt & Rosse), definir descriptores que determinen similitud y relación entre los diferentes materiales genéticos y la realización de un catálogo de los caracteres cuantitativos de la estructura floral y vegetativa de esta especie, mediante análisis de estadística descriptiva, análisis de correlación y técnicas de taxonomía numérica como análisis de componentes principales (ACP) y análisis de agrupamiento (AA). Se encontró que esta especie florece con las primeras lluvias de mayo a junio, el periodo de producción es de mayo a noviembre, obteniendo mayores rendimientos entre agosto y septiembre. El cultivo de la pitahaya tiene amplia distribución en el país, cultivado en huertos familiares y de forma comercial, abasteciendo al mercado local de Masaya principalmente y en menor proporción a los otros departamentos, localizando los mejores frutos en La Concepción (cerro San Ignacio), debido a las condiciones ambientales y de adaptación del material genético en la zona. Tiene usos múltiples como fruta fresca, alimento para el ganado, uso medicinal. Se determinó que la variable diámetro del estilo presentó un C.V de 83.11% y el peso de la cáscara un C.V de 61.24%, siendo estas dos las que presentan mayor variación. Las variables diámetro basal de la flor y número de pétalos presentaron un C.V de 10.60% y 9.37% respectivamente, siendo las de menor variación. Asimismo las variables forma de brácteas inferior y superior al igual que el color primario y secundario presentaron semejanzas. El análisis de componentes principales determinó que el 46.83% de la variación total que la aportan los 3 primeros componentes y las variables que la integran son VOLFRU, PESFRU, LONFRU, PESCAS, VOLPUL y DIAFRU para discriminar un 19.24% en el primer componente; las variables COLFRU, UNIFES, DIAESTI, LONESP aislaron un 14.81% en el segundo componente y un 12.78% para el tercer componente conformado por NUMBRF, DIAEST y DIAFLO, estas variables pueden ser utilizadas para evaluar materiales de pitahayas.


En vista que el cultivo del plátano en Nicaragua presenta serios problemas que afectan el rendimiento, principalmente el uso de material de propagación de mala calidad genética y fitosanitaria y el mal manejo agronómico, se realizó el presente estudio con los objetivos de evaluar la dinámica del crecimiento vegetativo de vitroplantas de plátano (Musa spp.) cultivar Cuerno (AAB) en condiciones de campo; determinar el efecto de el deshije sobre el rendimiento e identificar caracteres precoses relacionados con el rendimiento para facilitar así la selección de plantas madres fuentes de semilla mediante correlaciones lineales. El estudio fue establecido en el Centro Experimental El Plantel ubicado en el km 42 carretera Masaya-Tipitapa municipio de Sambrano, se establecieron seis bloques, a tres se les aplicó a los siete meses la práctica de deshije que consistió en la elimación total de hijos presentes extrayéndolos completamente. Se utilizó un diseño de bloques completamente al azar (BCA) con arreglo unifactorial conformado por tres bloques por tratamiento. Cada bloque estaba conformado por 4 surcos de 12 m de longitud, conteniendo 28 plantas de plátano a distancias de 2 m entre plantas y 2 m entre surcos. Se evaluaron 10 plantas del área de la parcela útil. Se evaluaron las variables altura de planta (cm), diámetro del grosor del pseudotallo (cm), número de hojas, largo de la hoja y ancho de la hoja (cm); y las variables de rendimiento número de manos por racimo, número de dedos por racimo, longitud de los dedos de la primera mano (cm), longitud de los dedos de la penúltima mano (cm), diámetro del dedo central de la primera mano (cm), diámetro del dedo central de la penúltima mano (cm), peso del racimo (kg), longitud del ráquis (cm) y diámetro del ráquis (cm). Únicamente se encontró diferencia estadística en el número de dedos obteniendo valores máximos de 30.57 dedos por racimo, para una estimación del rendimiento por hectárea de 76,425 dedos en rangos aceptables. Los valores inferiores se presentaron en plantas con hijos con rendimiento de 25.10 dedos para un rendimiento por hectárea de 62,150 dedos.