578 resultados para Dirichlet-multinomial


Relevância:

10.00% 10.00%

Publicador:

Resumo:

With the recognition of the importance of evidence-based medicine, there is an emerging need for methods to systematically synthesize available data. Specifically, methods to provide accurate estimates of test characteristics for diagnostic tests are needed to help physicians make better clinical decisions. To provide more flexible approaches for meta-analysis of diagnostic tests, we developed three Bayesian generalized linear models. Two of these models, a bivariate normal and a binomial model, analyzed pairs of sensitivity and specificity values while incorporating the correlation between these two outcome variables. Noninformative independent uniform priors were used for the variance of sensitivity, specificity and correlation. We also applied an inverse Wishart prior to check the sensitivity of the results. The third model was a multinomial model where the test results were modeled as multinomial random variables. All three models can include specific imaging techniques as covariates in order to compare performance. Vague normal priors were assigned to the coefficients of the covariates. The computations were carried out using the 'Bayesian inference using Gibbs sampling' implementation of Markov chain Monte Carlo techniques. We investigated the properties of the three proposed models through extensive simulation studies. We also applied these models to a previously published meta-analysis dataset on cervical cancer as well as to an unpublished melanoma dataset. In general, our findings show that the point estimates of sensitivity and specificity were consistent among Bayesian and frequentist bivariate normal and binomial models. However, in the simulation studies, the estimates of the correlation coefficient from Bayesian bivariate models are not as good as those obtained from frequentist estimation regardless of which prior distribution was used for the covariance matrix. The Bayesian multinomial model consistently underestimated the sensitivity and specificity regardless of the sample size and correlation coefficient. In conclusion, the Bayesian bivariate binomial model provides the most flexible framework for future applications because of its following strengths: (1) it facilitates direct comparison between different tests; (2) it captures the variability in both sensitivity and specificity simultaneously as well as the intercorrelation between the two; and (3) it can be directly applied to sparse data without ad hoc correction. ^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Objective. To examine associations between parental monitoring and adolescent alcohol/drug use. ^ Methods. 981 7th grade students from 10 inner-city middle schools were surveyed at the 3 month follow-up of an HIV, STD, and pregnancy prevention program. Data from 549 control subjects were used for analyses. Multinomial logistic regression was used to examine associations between five parental monitoring variables and substance use, coded as: low risk [never drank alcohol or used drugs (0)], moderate risk [drank alcohol, no drug use (1)], and high risk [both drank alcohol and used drugs or just used drugs (2)]. ^ Results. Participants were 58.3% female, 39.6% African American, 43.8% Hispanic, mean age 13.3 years. Lifetime alcohol use was 47.9%. Lifetime drug use was 14.9%. Adjusted for gender, age, race, and family structure, each individual parental monitoring variable (perceived parental monitoring, less permissive parental monitoring, greater supervision (public places), greater supervision (teen clubs), and less time spent with older teens) was significant and protective for the moderate and high risk groups. When all 5 variables were entered into a single model, only perceived parental monitoring was significantly associated (OR=0.40, 95% CI 0.29-0.55) for the moderate risk group. For the high risk group, 3 variables were significantly protective (perceived parental monitoring OR=0.28, CI 0.18-0.42, less time spent with older teens OR=0.75, CI 0.60-0.93, and greater supervision (public places) OR=0.79, CI 0.64-0.99). ^ Conclusion. The association between parental monitoring and substance abuse is complex and varied for different risk levels. Implications for intervention development are addressed. ^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Introduction. Injury mortality was classically described with a tri-modal distribution, with immediate deaths at the scene, early deaths due to hemorrhage, and late deaths from organ failure. We hypothesized that trauma systems development have improved pre-hospital care, early resuscitation, and critical care, and altered this pattern. ^ Methods. This is a population-based study of all trauma deaths in an urban county with a mature trauma system (n=678, median age 33 years, 81% male, 43% gunshot, 20% motor vehicle crashes). Deaths were classified as immediate (scene), early (in hospital, ≤ 4 hours from injury), or late (>4 hours post injury). Multinomial regression was used to identify independent predictors of immediate and early vs. late deaths, adjusted for age, gender, race, intention, mechanism, toxicology and cause of death. ^ Results. There were 416 (61%) immediate, 199 (29%) early, and 63 (10%) late deaths. Immediate deaths remained unchanged and early deaths occurred much earlier (median 52 minutes vs. 120). However, unlike the classic trimodal distribution, there was no late peak. Intentional injuries, alcohol intoxication, asphyxia, and injuries to the head and chest were independent predictors of immediate deaths. Alcohol intoxication and injuries to the chest were predictors of early deaths, while pelvic fractures and blunt assaults were associated with late deaths. ^ Conclusion. Trauma deaths now have a bimodal distribution. Elimination of the late peak likely represents advancements in resuscitation and critical care that have reduced organ failure. Further reductions in mortality will likely come from prevention of intentional injuries, and injuries associated with alcohol intoxication. ^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Much attention has been given to treating Operation Iraqi Freedom/Operation Enduring (OIF/OEF) Veterans with posttraumatic stress disorder (PTSD). However, little attention is given to those Veterans who do not meet diagnostic criteria for PTSD but who may still benefit from intervention. Research is needed to investigate the impact of how different racial/ethnic backgrounds, different levels of social support and comorbid mental health disorders impact OIF/OEF Veterans with varying levels of PTSD. The purpose of this dissertation is to examine the association of comorbid Axis I disorders, race/ethnicity, different levels of postdeployment social support and unit support on OIF/OEF Veterans with varying levels of PTSD. Data for this dissertation were from postdeployment screenings of OIF/OEF Veterans from a large Veterans Affairs hospital in southeast Texas. To examine the study hypotheses, we conducted multinomial logistic regressions of the clinician reported data. ^ The first article examined the prevalence of subthreshold and full levels of PTSD and compared Axis I and alcohol use comorbidity rates among 1,362 OIF/OEF Veterans with varying levels of PTSD. Results suggest that OIF/OEF Veterans with subthreshold PTSD experience similar levels of psychological distress as those with full PTSD and highlight the need to provide timely and appropriate mental health services to individuals who may not meet the diagnostic criteria for full PTSD. ^ These results suggest that OIF/OEF Veterans of all race/ethnicities can benefit from strong social support systems. Postdeployment social support was found to be a protective factor against the development of PTSD among White, Black and Hispanic veterans while deployment unit support was a protective factor only among Black Veterans. The second article investigated the association between postdeployment social support and unit support with varying levels of PTSD by race/ethnicity among 1,115 OIF/OEF Veterans. ^ The results of this study can help to formulate treatment and interventions for OIF/OEF Veterans with varying levels of PTSD and social support systems.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Objectives: The purpose of this study is to understand the perceived effects of patient-dental staff communication and cultural diversity on the utilization of dental services in the U.S. by Saudi Arabian students who live in the U.S. and enrolled into the King Abdullah Scholarship program. Methods: The study design was an analytical cross-sectional study. Data for this study was obtained from the Saudi Dental Servicers Utilization Survey, a voluntary internet survey available online for one month through Facebook. Ordered logistic regression analyses and multinomial logistic regression analyses were used to measure the relationships between patient-dental staff communication and cultural diversity on the utilization of dental services. Results: Eight hundred and forty-seven responses were analyzed for this study. Overall, the majority of Saudi students reported having excellent communication experience with dental providers in the U.S. More than 58% of respondents reported at least one regular dental visit last year. Factors that influenced the use of regular dental care were: dentist's explanation of treatment plan, response of dental staff to patient's needs, respectful and polite dental staff, dental staff kindness, availability of up-to-date equipment, and overall communication with dentist. However, the utilization of emergency dental care was not associated with any measurement of patient-dental provider communication. Overall future utilization of dental care is associated with all aspects of patient-dental staff communication measured in this survey. Furthermore, more utilization of regular dental care was related to respondent's perception of the importance of trustworthiness dental staff and the importance of a dentist's reputation was only marginally associated. Respondent's perception of dentist's reputation was associated with more use of emergency dental services. Respondents are more likely to anticipate using dental care in the future if they perceived trustworthiness dental staff, and the dentist's reputation as influencing factors to their usage of dental services. Conclusions: Patient-dental staff communication was partially associated with utilization of regular dental care, not associated with utilization of emergency dental care, and broadly associated with anticipated future utilization of dental care. In addition, trustworthy dental staff, and a dentist's reputation were considered to be strong influencing factors towards utilization of dental services.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Introduction: The average age of onset of breast cancer among Hispanic women is 50 years, more than a decade earlier than non-Hispanic white women. Age at diagnosis is an important prognostic factor for breast cancer; younger age at onset is more likely to be associated with advanced disease, poorer prognosis, hormone receptor negative breast tumors, and a greater likelihood of hereditary breast cancer. Studies of breast cancer risk factors including reproductive risk factors, family history of breast cancer, and breast cancer subtype have been conducted predominately in non-Hispanic whites. Breast cancer is a heterogeneous disease with the presence of clinically, biologically, and epidemiologically distinct subtypes that also differ with respect to their risk factors. The associations between reproductive risk factors and family history of breast cancer have been well documented in the literature. However, only a few studies have assessed these associations with breast cancer subtype in Hispanic populations. Methods: To assess the associations between reproductive risk factors and family history of breast cancer we conducted three separate studies. First, we conducted a case-control study of 172 Mexican-American breast cancer cases and 344 age matched controls residing in Harris County, TX to assess reproductive and other risk factors. We conducted logistic regression analysis to assess differences in cases and controls adjusted for age at diagnosis and birthplace and then we conducted a multinomial logistic regression analysis to compare reproductive risk factors among the breast tumor subtypes. In a second study, we identified 139 breast cancer patients with a first- or second-degree family history of breast cancer and 298 without a family history from the ELLA Bi-National Breast Cancer Study. In this analysis, we also computed a multinomial logistic regression to evaluate associations between family history of breast cancer and breast cancer subtypes, and logistic regression to estimate associations between breast cancer screening practices with family history of breast cancer. In the final study, we employed a cross-sectional study design in 7279 Mexican-American women in the Mano a Mano Cohort Study. We evaluated associations with family history of breast cancer and breast cancer risk factors including body mass index (BMI), lifestyle factors, migration history, and adherence to American Cancer Society (ACS) guidelines. Results: In the results of our first analyses, reproductive risk factors differed in the magnitude and direction of associations when stratified by age and birthplace among cases and controls. In our second study, family history of breast cancer, and having at least one relative diagnosed at an early age (<50 years) was associated with triple negative breast cancer (TNBC). Mammography prior to receiving a breast cancer diagnosis was associated with family history of breast cancer. In our third study that assessed lifestyle factors, migration history and family history of breast cancer; we found that women with a first-degree family history of breast cancer were more overweight or obese compared with their counterparts without a family history. There was no indication that having a family history contributed to women practicing healthier lifestyle behaviors and/or adhering to the ACS guidelines for cancer prevention. Conclusions: We observed that among Mexican-American women, reproductive risk factors were associated with breast cancer where the woman was born (US or Mexico). Having a family history of breast cancer, especially having either a first- or second-degree relative diagnosed at a younger age, was strongly associated with TNBC subtype. These results are consistent with other published studies in this area. Further, our results indicate that women with strong family histories of breast cancer are more likely to undertake mammography but not to engage in healthier lifestyle behaviors.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This study analyzes there lative importance of the factors that influence the decision to produce for foreign markets in the Chilean agricultural sector. Using data obtained from personal interviews with 368 farmers, the market/production decision was estimated using a multinomial logit model. Three market/production alternatives were analyzed: production aimed for the external market, production for the internal market but with expectations of being exported, and production targeted only for the internal market. Marginal effects, odds ratios and predicted probabilities were used to identify the relevance of each variable. The results showed that a producer that is male, with a higher educational level, that does not own the land, but rents it, whose farm has irrigation and is located in an area that has a high concentration of exporting producers, will have a high probability of producing exportables. However, the factor that has the highest impact on producing for the external market is the geographic concentration of exporting producers, that is, an export spillover effect. Indeed, when the concentration change from 0 to its maximum (0.26), the odds of producing exportables rather than producing traditional products increases by a factor of 70 (against a factor of 10 in the case of irrigation).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This study aims to analyze households' attitude toward flood risk in Cotonou in the sense to identify whether they are willing or not to leave the flood-prone zones. Moreover, the attitudes toward the management of wastes and dirty water are analyzed. The data used in this study were obtained from two sources: the survey implemented during March 2011 on one hundred and fifty randomly selected households living in flood-prone areas of Cotonou, and Benin Living Standard Survey of 2006 (Part relative to Cotonou on 1,586 households). Moreover, climate data were used in this study. Multinomial probability model is used for the econometric analysis of the attitude toward flood risk. While the attitudes toward the management of wastes and dirty water are analyzed through a simple logit. The results show that 55.3% of households agreed to go elsewhere while 44.7% refused [we are better-off here (10.67%), due to the proximity of the activities (19.33), the best way is to build infrastructures that will protect against flood and family house (14.67%)]. The authorities have to rethink an alternative policy to what they have been doing such as building socio-economic houses outside Cotonou and propose to the households that are living the areas prone to inundation. Moreover, access to formal education has to be reinforced.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We examine transport modal decision by multinational firms to shed light on the role of freight logistics in multinational activity. Using a firm-level survey in Southeast Asia, we show that foreign ownership has a significantly positive and quantitatively large impact on the likelihood that air/sea transportation is chosen relative to truck shipping. This result is robust to the shipping distance, cross-border freight, and transport infrastructure. Both foreign-owned exporters and importers also tend to use air/sea transportation. Thus, our analysis presents a new distinction between multinational and domestic firms in their decision over transport modes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Multi-dimensional Bayesian network classifiers (MBCs) are probabilistic graphical models recently proposed to deal with multi-dimensional classification problems, where each instance in the data set has to be assigned to more than one class variable. In this paper, we propose a Markov blanket-based approach for learning MBCs from data. Basically, it consists of determining the Markov blanket around each class variable using the HITON algorithm, then specifying the directionality over the MBC subgraphs. Our approach is applied to the prediction problem of the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson’s Disease Questionnaire (PDQ-39) in order to estimate the health-related quality of life of Parkinson’s patients. Fivefold cross-validation experiments were carried out on randomly generated synthetic data sets, Yeast data set, as well as on a real-world Parkinson’s disease data set containing 488 patients. The experimental study, including comparison with additional Bayesian network-based approaches, back propagation for multi-label learning, multi-label k-nearest neighbor, multinomial logistic regression, ordinary least squares, and censored least absolute deviations, shows encouraging results in terms of predictive accuracy as well as the identification of dependence relationships among class and feature variables.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The implementation of a charging policy for heavy goods vehicles in European Union (EU) member countries has been imposed to reflect costs of construction and maintenance of infrastructure as well as externalities such as congestion, accidents and environmental impact. In this context, EU countries approved the Eurovignette directive (1999/62/EC) and its amending directive (2006 /38/EC) which established a legal framework to regulate the system of tolls. Even if that regulation seek s to increase the efficien cy of freight, it will trigger direct and indirect effects on Spain’s regional economies by increasing transport costs. This paper presents the development of a multiregional Input-Output methodology (MRIO) with elastic trade coefficients to predict in terregional trade, using transport attributes integrated in multinomial logit models. This method is highly useful to carry out an ex-ante evaluation of transport policies because it involves road freight transport cost sensitivity, and determine regional distributive and substitution economic effect s of countries like Spain, characterized by socio-demographic and economic attributes, differentiated region by region. It will thus be possible to determine cost-effective strategies, given different policy scenarios. MRIO mode l would then be used to determine the impact on the employment rate of imposing a charge in the Madrid-Sevilla corridor in Spain. This methodology is important for measuring the impact on the employment rate since it is one of the main macroeconomic indicators of Spain’s regional and national economic situation. A previous research developed (DESTINO) using a MRIO method estimated employment impacts of road pricing policy across Spanish regions considering a fuel tax charge (€/liter) in the entire shortest cost path network for freight transport. Actually, it found that the variation in employment is expected to be substantial for some regions, and negligible for others. For example, in this Spanish case study of regional employment has showed reductions between 16.1% (Rioja) and 1.4% (Madrid region). This variation range seems to be related to either the intensity of freight transport in each region or dependency of regions to transport intensive economic sect ors. In fact, regions with freight transport intensive sectors will lose more jobs while regions with a predominantly service economy undergo a fairly insignificant loss of employment. This paper is focused on evaluating a freight transport vehicle-kilometer charge (€/km) in a non-tolled motorway corridor (A-4) between Madrid-Sevilla (517 Km.). The consequences of the road pricing policy implementation show s that the employment reductions are not as high as the diminution stated in the previous research because this corridor does not affect the whole freight transport system of Spain.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Twitter lists organise Twitter users into multiple, often overlapping, sets. We believe that these lists capture some form of emergent semantics, which may be useful to characterise. In this paper we describe an approach for such characterisation, which consists of deriving semantic relations between lists and users by analyzing the cooccurrence of keywords in list names. We use the vector space model and Latent Dirichlet Allocation to obtain similar keywords according to co-occurrence patterns. These results are then compared to similarity measures relying on WordNet and to existing Linked Data sets. Results show that co-occurrence of keywords based on members of the lists produce more synonyms and more correlated results to that of WordNet similarity measures.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes a novel approach to phonotactic LID, where instead of using soft-counts based on phoneme lattices, we use posteriogram to obtain n-gram counts. The high-dimensional vectors of counts are reduced to low-dimensional units for which we adapted the commonly used term i-vectors. The reduction is based on multinomial subspace modeling and is designed to work in the total-variability space. The proposed technique was tested on the NIST 2009 LRE set with better results to a system based on using soft-counts (Cavg on 30s: 3.15% vs 3.43%), and with very good results when fused with an acoustic i-vector LID system (Cavg on 30s acoustic 2.4% vs 1.25%). The proposed technique is also compared with another low dimensional projection system based on PCA. In comparison with the original soft-counts, the proposed technique provides better results, reduces the problems due to sparse counts, and avoids the process of using pruning techniques when creating the lattices.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

La ley para la Promoción y Desarrollo de Biocombustibles aprobada en México en 2007 permite la producción de bioetanol y biodiesel. Esta producción puede entrar en conflicto con la producción de alimentos y con los ecosistemas naturales y en esta tesis se desarrolla un modelo microeconométrico que puede servir de base para anticiparse a esos conflictos y para diseñar medidas de política agraria orientadas a potenciar la compatibilidad de la producción de biocombustibles con la de alimentos y con la conservación de los ecosistemas naturales. A partir de una muestra de explotaciones de tres Estados de México – Hidalgo, Querétaro y Tamaulipas- y de un modelo logit multinomial mixto, se estima la elasticidad de la superficie destinada a cultivos alimentarios respecto a cambios en los márgenes económicos de los cultivos agroenergéticos. Esa elasticidad resulta ser significativa. Mostramos que su estimación es útil para anticipar cambios en la superficie destinada a los cultivos alimentarios y a los forestales. Se evalúa el impacto de varios escenarios relativos a los márgenes brutos de los cultivos sobre las decisiones de los agricultores y se muestra la utilidad del modelo para detectar tendencias de cambio a largo plazo en la alternativa de cultivos, incluyendo los forestales. ABSTRACT The Law for the Promotion and Development of Biofuels in Mexico adopted in 2007 allows for the production of bioethanol and biodiesel. This production may conflict with food production and natural ecosystems and this thesis develops a microeconometric model that can serve as a basis to anticipate such conflicts and to implement agricultural policy measures designed to enhance the compatibility of biofuels with production food and natural ecosystems conservation. We estimate the elasticity of the area devoted to food crops with respect to changes in economic margins of energy crops, using a sample of farms in three states of Mexico - Hidalgo, Queretaro and Tamaulipas - , and a multinomial mixed logit model. We found that this elasticity is significant. And we show how it can be useful to anticipate changes in area under food crops and forests. The impact of various scenarios about gross margins on farmers' decisions is assessed and it is shown the usefulness of the model to detect trends of long-term change in the crops area, including forests.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.