985 resultados para Probabilistic Models


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract Background One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements. Results A new framework to select specific genes that distinguish different biological states based on the analysis of SAGE data is proposed. The new framework applies the bolstered error for the identification of strong genes that separate the biological states in a feature space defined by the gene expression of a training set. Credibility intervals defined from a probabilistic model of SAGE measurements are used to identify the genes that distinguish the different states with more reliability among all gene groups selected by the strong genes method. A score taking into account the credibility and the bolstered error values in order to rank the groups of considered genes is proposed. Results obtained using SAGE data from gliomas are presented, thus corroborating the introduced methodology. Conclusion The model representing counting data, such as SAGE, provides additional statistical information that allows a more robust analysis. The additional statistical information provided by the probabilistic model is incorporated in the methodology described in the paper. The introduced method is suitable to identify signature genes that lead to a good separation of the biological states using SAGE and may be adapted for other counting methods such as Massive Parallel Signature Sequencing (MPSS) or the recent Sequencing-By-Synthesis (SBS) technique. Some of such genes identified by the proposed method may be useful to generate classifiers.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Abstract Background A large number of probabilistic models used in sequence analysis assign non-zero probability values to most input sequences. To decide when a given probability is sufficient the most common way is bayesian binary classification, where the probability of the model characterizing the sequence family of interest is compared to that of an alternative probability model. We can use as alternative model a null model. This is the scoring technique used by sequence analysis tools such as HMMER, SAM and INFERNAL. The most prevalent null models are position-independent residue distributions that include: the uniform distribution, genomic distribution, family-specific distribution and the target sequence distribution. This paper presents a study to evaluate the impact of the choice of a null model in the final result of classifications. In particular, we are interested in minimizing the number of false predictions in a classification. This is a crucial issue to reduce costs of biological validation. Results For all the tests, the target null model presented the lowest number of false positives, when using random sequences as a test. The study was performed in DNA sequences using GC content as the measure of content bias, but the results should be valid also for protein sequences. To broaden the application of the results, the study was performed using randomly generated sequences. Previous studies were performed on aminoacid sequences, using only one probabilistic model (HMM) and on a specific benchmark, and lack more general conclusions about the performance of null models. Finally, a benchmark test with P. falciparum confirmed these results. Conclusions Of the evaluated models the best suited for classification are the uniform model and the target model. However, the use of the uniform model presents a GC bias that can cause more false positives for candidate sequences with extreme compositional bias, a characteristic not described in previous studies. In these cases the target model is more dependable for biological validation due to its higher specificity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Forecasting the time, location, nature, and scale of volcanic eruptions is one of the most urgent aspects of modern applied volcanology. The reliability of probabilistic forecasting procedures is strongly related to the reliability of the input information provided, implying objective criteria for interpreting the historical and monitoring data. For this reason both, detailed analysis of past data and more basic research into the processes of volcanism, are fundamental tasks of a continuous information-gain process; in this way the precursor events of eruptions can be better interpreted in terms of their physical meanings with correlated uncertainties. This should lead to better predictions of the nature of eruptive events. In this work we have studied different problems associated with the long- and short-term eruption forecasting assessment. First, we discuss different approaches for the analysis of the eruptive history of a volcano, most of them generally applied for long-term eruption forecasting purposes; furthermore, we present a model based on the characteristics of a Brownian passage-time process to describe recurrent eruptive activity, and apply it for long-term, time-dependent, eruption forecasting (Chapter 1). Conversely, in an effort to define further monitoring parameters as input data for short-term eruption forecasting in probabilistic models (as for example, the Bayesian Event Tree for eruption forecasting -BET_EF-), we analyze some characteristics of typical seismic activity recorded in active volcanoes; in particular, we use some methodologies that may be applied to analyze long-period (LP) events (Chapter 2) and volcano-tectonic (VT) seismic swarms (Chapter 3); our analysis in general are oriented toward the tracking of phenomena that can provide information about magmatic processes. Finally, we discuss some possible ways to integrate the results presented in Chapters 1 (for long-term EF), 2 and 3 (for short-term EF) in the BET_EF model (Chapter 4).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In a study of Lunar and Mars settlement concepts, an analysis was made of fundamental design assumptions in five technical areas against a model list of occupational and environmental health concerns. The technical areas included the proposed science projects to be supported, habitat and construction issues, closed ecosystem issues, the "MMM" issues--mining, material-processing, and manufacturing, and the human elements of physiology, behavior and mission approach. Four major lessons were learned. First it is possible to relate public health concerns to complex technological development in a proactive design mode, which has the potential for long-term cost savings. Second, it became very apparent that prior to committing any nation or international group to spending the billions to start and complete a lunar settlement, over the next century, that a significantly different approach must be taken from those previously proposed, to solve the closed ecosystem and "MMM" problems. Third, it also appears that the health concerns and technology issues to be addressed for human exploration into space are fundamentally those to be solved for human habitation of the earth (as a closed ecosystem) in the 21st century. Finally, it is proposed that ecosystem design modeling must develop new tools, based on probabilistic models as a step up from closed circuit models. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

En este trabajo se desarrolló un modelo probabilístico que utiliza la teoría de la función de densidad de probabilidades derivada para estimar la carga media anual de nitratos transportada por el escurrimiento superficial, utilizando una relación funcional entre el escurrimiento y la carga de nitratos. El modelo determinístico hidrológico y de calidad de agua denominado Simulator for Water Resources in Rural Basins - Water Quality (SWRRB-WQ) fue utilizado para estimar la carga de nitratos en el escurrimiento superficial. Este modelo emplea como variable de entrada la precipitación diaria observada en la Estación del Aeropuerto de Olavarría durante el período 1988 a 2002. Para la calibración del modelo se aplicó una nueva metodología que estima la incertidumbre en los valores observados. Ambos modelos probabilístico y determinístico se aplican en una subcuenca rural del arroyo Tapalqué (provincia de Buenos Aires, Argentina) y finalmente se comparan los valores de la carga de nitratos estimados con los dos modelos con las observaciones realizadas en la sección del arroyo motivo de este estudio. Los resultados muestran que la carga media de nitratos obtenida con el modelo probabilístico es del mismo orden de magnitud que los valores medios observados y estimados con el modelo hidrológico y de calidad de agua SWRRB-WQ.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Stereo video techniques are effective for estimating the space–time wave dynamics over an area of the ocean. Indeed, a stereo camera view allows retrieval of both spatial and temporal data whose statistical content is richer than that of time series data retrieved from point wave probes. We present an application of the Wave Acquisition Stereo System (WASS) for the analysis of offshore video measurements of gravity waves in the Northern Adriatic Sea and near the southern seashore of the Crimean peninsula, in the Black Sea. We use classical epipolar techniques to reconstruct the sea surface from the stereo pairs sequentially in time, viz. a sequence of spatial snapshots. We also present a variational approach that exploits the entire data image set providing a global space–time imaging of the sea surface, viz. simultaneous reconstruction of several spatial snapshots of the surface in order to guarantee continuity of the sea surface both in space and time. Analysis of the WASS measurements show that the sea surface can be accurately estimated in space and time together, yielding associated directional spectra and wave statistics at a point in time that agrees well with probabilistic models. In particular, WASS stereo imaging is able to capture typical features of the wave surface, especially the crest-to-trough asymmetry due to second order nonlinearities, and the observed shape of large waves are fairly described by theoretical models based on the theory of quasi-determinism (Boccotti, 2000). Further, we investigate space–time extremes of the observed stationary sea states, viz. the largest surface wave heights expected over a given area during the sea state duration. The WASS analysis provides the first experimental proof that a space–time extreme is generally larger than that observed in time via point measurements, in agreement with the predictions based on stochastic theories for global maxima of Gaussian fields.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

En España existen del orden de 1,300 grandes presas, de las cuales un 20% fueron construidas antes de los años 60. El hecho de que existan actualmente una gran cantidad de presas antiguas aún en operación, ha producido un creciente interés en reevaluar su seguridad empleando herramientas nuevas o modificadas que incorporan modelos de fallo teóricos más completos, conceptos geotécnicos más complejos y nuevas técnicas de evaluación de la seguridad. Una manera muy común de abordar el análisis de estabilidad de presas de gravedad es, por ejemplo, considerar el deslizamiento a través de la interfase presa-cimiento empleando el criterio de rotura lineal de Mohr-Coulomb, en donde la cohesión y el ángulo de rozamiento son los parámetros que definen la resistencia al corte de la superficie de contacto. Sin embargo la influencia de aspectos como la presencia de planos de debilidad en el macizo rocoso de cimentación; la influencia de otros criterios de rotura para la junta y para el macizo rocoso (ej. el criterio de rotura de Hoek-Brown); las deformaciones volumétricas que ocurren durante la deformación plástica en el fallo del macizo rocoso (i.e., influencia de la dilatancia) no son usualmente consideradas durante el diseño original de la presa. En este contexto, en la presente tesis doctoral se propone una metodología analítica para el análisis de la estabilidad al deslizamiento de presas de hormigón, considerando un mecanismo de fallo en la cimentación caracterizado por la presencia de una familia de discontinuidades. En particular, se considera la posibilidad de que exista una junta sub-horizontal, preexistente y persistente en el macizo rocoso de la cimentación, con una superficie potencial de fallo que se extiende a través del macizo rocoso. El coeficiente de seguridad es entonces estimado usando una combinación de las resistencias a lo largo de los planos de rotura, cuyas resistencias son evaluadas empleando los criterios de rotura no lineales de Barton y Choubey (1977) y Barton y Bandis (1990), a lo largo del plano de deslizamiento de la junta; y el criterio de rotura de Hoek y Brown (1980) en su versión generalizada (Hoek et al. 2002), a lo largo del macizo rocoso. La metodología propuesta también considera la influencia del comportamiento del macizo rocoso cuando este sigue una ley de flujo no asociada con ángulo de dilatancia constante (Hoek y Brown 1997). La nueva metodología analítica propuesta es usada para evaluar las condiciones de estabilidad empleando dos modelos: un modelo determinista y un modelo probabilista, cuyos resultados son el valor del coeficiente de seguridad y la probabilidad de fallo al deslizamiento, respectivamente. El modelo determinista, implementado en MATLAB, es validado usando soluciones numéricas calculadas mediante el método de las diferencias finitas, empleando el código FLAC 6.0. El modelo propuesto proporciona resultados que son bastante similares a aquellos calculados con FLAC; sin embargo, los costos computacionales de la formulación propuesta son significativamente menores, facilitando el análisis de sensibilidad de la influencia de los diferentes parámetros de entrada sobre la seguridad de la presa, de cuyos resultados se obtienen los parámetros que más peso tienen en la estabilidad al deslizamiento de la estructura, manifestándose además la influencia de la ley de flujo en la rotura del macizo rocoso. La probabilidad de fallo es obtenida empleando el método de fiabilidad de primer orden (First Order Reliability Method; FORM), y los resultados de FORM son posteriormente validados mediante simulaciones de Monte Carlo. Los resultados obtenidos mediante ambas metodologías demuestran que, para el caso no asociado, los valores de probabilidad de fallo se ajustan de manera satisfactoria a los obtenidos mediante las simulaciones de Monte Carlo. Los resultados del caso asociado no son tan buenos, ya que producen resultados con errores del 0.7% al 66%, en los que no obstante se obtiene una buena concordancia cuando los casos se encuentran en, o cerca de, la situación de equilibrio límite. La eficiencia computacional es la principal ventaja que ofrece el método FORM para el análisis de la estabilidad de presas de hormigón, a diferencia de las simulaciones de Monte Carlo (que requiere de al menos 4 horas por cada ejecución) FORM requiere tan solo de 1 a 3 minutos en cada ejecución. There are 1,300 large dams in Spain, 20% of which were built before 1960. The fact that there are still many old dams in operation has produced an interest of reevaluate their safety using new or updated tools that incorporate state-of-the-art failure modes, geotechnical concepts and new safety assessment techniques. For instance, for gravity dams one common design approach considers the sliding through the dam-foundation interface, using a simple linear Mohr-Coulomb failure criterion with constant friction angle and cohesion parameters. But the influence of aspects such as the persistence of joint sets in the rock mass below the dam foundation; of the influence of others failure criteria proposed for rock joint and rock masses (e.g. the Hoek-Brown criterion); or the volumetric strains that occur during plastic failure of rock masses (i.e., the influence of dilatancy) are often no considered during the original dam design. In this context, an analytical methodology is proposed herein to assess the sliding stability of concrete dams, considering an extended failure mechanism in its rock foundation, which is characterized by the presence of an inclined, and impersistent joint set. In particular, the possibility of a preexisting sub-horizontal and impersistent joint set is considered, with a potential failure surface that could extend through the rock mass; the safety factor is therefore computed using a combination of strength along the rock joint (using the nonlinear Barton and Choubey (1977) and Barton and Bandis (1990) failure criteria) and along the rock mass (using the nonlinear failure criterion of Hoek and Brown (1980) in its generalized expression from Hoek et al. (2002)). The proposed methodology also considers the influence of a non-associative flow rule that has been incorporated using a (constant) dilation angle (Hoek and Brown 1997). The newly proposed analytical methodology is used to assess the dam stability conditions, employing for this purpose the deterministic and probabilistic models, resulting in the sliding safety factor and the probability of failure respectively. The deterministic model, implemented in MATLAB, is validated using numerical solution computed with the finite difference code FLAC 6.0. The proposed deterministic model provides results that are very similar to those computed with FLAC; however, since the new formulation can be implemented in a spreadsheet, the computational cost of the proposed model is significantly smaller, hence allowing to more easily conduct parametric analyses of the influence of the different input parameters on the dam’s safety. Once the model is validated, parametric analyses are conducting using the main parameters that describe the dam’s foundation. From this study, the impact of the more influential parameters on the sliding stability analysis is obtained and the error of considering the flow rule is assessed. The probability of failure is obtained employing the First Order Reliability Method (FORM). The probabilistic model is then validated using the Monte Carlo simulation method. Results obtained using both methodologies show good agreement for cases in which the rock mass has a nonassociate flow rule. For cases with an associated flow rule errors between 0.70% and 66% are obtained, so that the better adjustments are obtained for cases with, or close to, limit equilibrium conditions. The main advantage of FORM on sliding stability analyses of gravity dams is its computational efficiency, so that Monte Carlo simulations require at least 4 hours on each execution, whereas FORM requires only 1 to 3 minutes on each execution.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Esta tesis doctoral propone un modelo de comportamiento del paciente de la clínica dental, basado en la percepción de la calidad del servicio (SERVQUAL), la fidelización del paciente, acciones de Marketing Relacional y aspectos socioeconómicos relevantes, de los pacientes de clínicas dentales. En particular, el estudio de campo se lleva a cabo en el ámbito geográfico de la Comunidad de Madrid, España, durante los años 2012 y 2013. La primera parte del proceso de elaboración del modelo está basada en la recolección de datos. Para ello, se realizaron cinco entrevistas a expertos dentistas y se aplicaron dos tipos encuestas diferentes: una para el universo formado por el conjunto de los pacientes de las clínicas dentales y la otra para el universo formado el conjunto de los dentistas de las clínicas dentales de la Comunidad de Madrid. Se obtuvo muestras de: 200 encuestas de pacientes y 220 encuestas de dentistas activos colegiados en el Ilustre Colegio Oficial de Odontólogos y Estomatólogos de la I Región Madrid. En la segunda parte de la elaboración del modelo, se realizó el análisis de los datos, la inducción y síntesis del modelo propuesto. Se utilizó la metodología de modelos gráficos probabilísticos, específicamente, una Red Bayesiana, donde se integraron variables (nodos) y sus dependencias estadísticas causales (arcos dirigidos), que representan el conocimiento obtenido de los datos recopilados en las encuestas y el conocimiento derivado de investigaciones precedentes en el área. Se obtuvo una Red Bayesiana compuesta por 6 nodos principales, de los cuales dos de ellos son nodos de observación directa: “Revisit Intention” y “SERVQUAL”, y los otros cuatro nodos restantes son submodelos (agrupaciones de variables), estos son respectivamente: “Attitudinal”, “Disease Information”, “Socioeconomical” y “Services”. Entre las conclusiones principales derivadas del uso del modelo, como herramientas de inferencia y los análisis de las entrevistas realizadas se obtiene que: (i) las variables del nodo “Attitudinal” (submodelo), son las más sensibles y significativas. Al realizarse imputaciones particulares en las variables que conforman el nodo “Attitudinal” (“RelationalMk”, “Satisfaction”, “Recommendation” y “Friendship”) se obtienen altas probabilidades a posteriori en la fidelidad del paciente de la clínica dental, medida por su intención de revisita. (ii) En el nodo “Disease Information” (submodelo) se destaca la relación de dependencia causal cuando se imputa la variable “Perception of disease” en “SERVQUAL”, demostrando que la percepción de la gravedad del paciente condiciona significativamente la percepción de la calidad del servicio del paciente. Como ejemplo destacado, si se realiza una imputación en la variable “Clinic_Type” se obtienen altas probabilidades a posteriori de las variables “SERVQUAL” y “Revisit Intention”, lo que evidencia, que el tipo de clínica dental influye significativamente en la percepción de la calidad del servicio y en la fidelidad del paciente (intención de revisita). (iii) En el nodo “Socioeconomical” (submodelo) la variable “Sex” resultó no ser significativa cuando se le imputaban diferentes valores, por el contrario, la variable “Age” e “Income” mostraban altas variabilidades en las probabilidades a posteriori cuando se imputaba alguna variable del submodelo “Services”, lo que evidencia, que estas variables condicionan la intención de contratar servicios (“Services”), sobretodo en las franjas de edad de 30 a 51 años en pacientes con ingresos entre 3000€ y 4000€. (iv) En el nodo “Services” (submodelo) los pacientes de las clínicas dentales mostraron altas probabilidades a priori para contratar servicios de fisiotrapia oral y gingival: “Dental Health Education” y “Parking”. (v) Las variables de fidelidad del paciente medidas desde su perspectiva comportamental que fueron utilizadas en el modelo: “Visit/year” “Time_clinic”, no aportaron información significativa. Tampoco, la variable de fidelidad del cliente (actitudinal): “Churn Efford”. (vi) De las entrevistas realizadas a expertos dentistas se obtiene que, los propietarios de la clínica tradicional tienen poca disposición a implementar nuevas estrategias comerciales, debido a la falta de formación en la gestión comercial y por falta de recursos y herramientas. Existe un rechazo generalizado hacia los nuevos modelos de negocios de clínicas dentales, especialmente en las franquicias y en lo que a políticas comerciales se refiere. Esto evidencia una carencia de gerencia empresarial en el sector. Como líneas futuras de investigación, se propone profundizar en algunas relaciones de dependencia (causales) como SERVQUALServices; SatisfactionServices; RelationalMKServices, Perception of diseaseSatisfaction, entre otras. Así como, otras variables de medición de la fidelidad comportamental que contribuyan a la mejora del modelo, como por ej. Gasto del paciente y rentabilidad de la visita. ABSTRACT This doctoral dissertation proposes a model of the behavior of the dental-clinic customer, based on the service-quality perception (SERVQUAL), loyalty, Relational Marketing and some relevant socio-economical characteristics, of the dental-clinic customers. In particular, the field study has been developed in the geographical region of Madrid, Spain during the years 2012 and 2013. The first stage of the preparation of the model consist in the data gathering process. For this purpose, five interviews where realized to expert dentists and also two different types of surveys: one for the universe defined by the set of dental-clinic patients and the second for the universe defined by the set of the dentists of the dental clinics of the Madrid Community. A sample of 200 surveys where collected for patients and a sample of 220 surveys where collected from active dentists belonging to the Ilustre Colegio Oficial de Odontólogos y Estomatólogos de la I Región Madrid. In the second stage of the model preparation, the processes of data-analysis, induction and synthesis of the final model where performed. The Graphic Probabilistic Models methodology was used to elaborate the final model, specifically, a Bayesian Network, where the variables (nodes) and their statistical and causal dependencies where integrated and modeled, representing thus, the obtained knowledge from the data obtained by the surveys and the scientific knowledge derived from previous research in the field. A Bayesian Net consisting on six principal nodes was obtained, of which two of them are directly observable: “Revisit Intention” y “SERVQUAL”, and the remaining four are submodels (a grouping of variables). These are: “Attitudinal”, “Disease Information”, “Socioeconomical” and “Services”. The main conclusions derived from the model, as an inference tool, and the analysis of the interviews are: (i) the variables inside the “Attitudinal” node are the most sensitive and significant. By making some particular imputations on the variables that conform the “Attitudinal” node (“RelationalMk”, “Satisfaction”, “Recommendation” y “Friendship”), high posterior probabilities (measured in revisit intention) are obtained for the loyalty of the dental-clinic patient. (ii) In the “Disease Information” node, the causal relation between the “Perception of disease” and “SERVQUAL” when “Perception of disease” is imputed is highlighted, showing that the perception of the severity of the patient’s disease conditions significantly the perception of service quality. As an example, by imputing some particular values to the “Clinic_Type” node high posterior probabilities are obtained for the “SERVQUAL” variables and for “Revisit Intention” showing that the clinic type influences significantly in the service quality perception and loyalty (revisit intention). (iii) In the “Socioeconomical” variable, the variable “Sex” showed to be non-significant, however, the “Age” variable and “Income” show high variability in its posterior probabilities when some variable from the “Services” node where imputed, showing thus, that these variables condition the intention to buy new services (“Services”), especially in the age range from 30 to 50 years in patients with incomes between 3000€ and 4000€. (iv) In the “Services” submodel the dental-clinic patients show high priors to buy services such as oral and gingival therapy, Dental Health Education and “Parking” service. (v) The obtained loyalty measures, from the behavioral perspective, “Visit/year” and “Time_clinic”, do not add significant information to the model. Neither the attitudinal loyalty component “Churn Efford”. (vi) From the interviews realized to the expert dentists it is observed that the owners of the traditional clinics have a low propensity to apply new commercial strategies due to a lack of resources and tools. In general, there exists an opposition to new business models in the sector, especially to the franchise dental model. All of this evidences a lack in business management in the sector. As future lines of research, a deep look into some statistical and causal relations is proposed, such as: SERVQUALServices; SatisfactionServices; RelationalMKServices, Perception of diseaseSatisfaction, as well as new measurement variables related to attitudinal loyalty that contribute to improve the model, for example, profit per patient and per visit.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

60.00% 60.00%

Publicador:

Resumo:

For second-hand products sold with warranty, the expected warranty cost for an item to the manufacturer, depends on (i) the age and/or usage as well as the maintenance history for the item and (ii) the terms of the warranty policy. The paper develops probabilistic models to compute the expected warranty cost to the manufacturer when the items are sold with free replacement or pro rata warranties. (C) 2000 Elsevier Science Ltd. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

It has been argued that a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex data sets, and therefore a hierarchical visualization system is desirable. In this paper we extend an existing locally linear hierarchical visualization system PhiVis ¸iteBishop98a in several directions: bf(1) We allow for em non-linear projection manifolds. The basic building block is the Generative Topographic Mapping. bf(2) We introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree. General training equations are derived, regardless of the position of the model in the tree. bf(3) Using tools from differential geometry we derive expressions for local directional curvatures of the projection manifold. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. It enables the user to interactively highlight those data in the parent visualization plot which are captured by a child model. We also incorporate into our system a hierarchical, locally selective representation of magnification factors and directional curvatures of the projection manifolds. Such information is important for further refinement of the hierarchical visualization plot, as well as for controlling the amount of regularization imposed on the local models. We demonstrate the principle of the approach on a toy data set and apply our system to two more complex 12- and 19-dimensional data sets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

It has been argued that a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex data sets, and therefore a hierarchical visualization system is desirable. In this paper we extend an existing locally linear hierarchical visualization system PhiVis ¸iteBishop98a in several directions: bf(1) We allow for em non-linear projection manifolds. The basic building block is the Generative Topographic Mapping (GTM). bf(2) We introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree. General training equations are derived, regardless of the position of the model in the tree. bf(3) Using tools from differential geometry we derive expressions for local directional curvatures of the projection manifold. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. It enables the user to interactively highlight those data in the ancestor visualization plots which are captured by a child model. We also incorporate into our system a hierarchical, locally selective representation of magnification factors and directional curvatures of the projection manifolds. Such information is important for further refinement of the hierarchical visualization plot, as well as for controlling the amount of regularization imposed on the local models. We demonstrate the principle of the approach on a toy data set and apply our system to two more complex 12- and 18-dimensional data sets.