941 resultados para Maximum likelihood – Expectation maximization (ML-EM)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the maximum parsimony (MP) and minimum evolution (ME) methods of phylogenetic inference, evolutionary trees are constructed by searching for the topology that shows the minimum number of mutational changes required (M) and the smallest sum of branch lengths (S), respectively, whereas in the maximum likelihood (ML) method the topology showing the highest maximum likelihood (A) of observing a given data set is chosen. However, the theoretical basis of the optimization principle remains unclear. We therefore examined the relationships of M, S, and A for the MP, ME, and ML trees with those for the true tree by using computer simulation. The results show that M and S are generally greater for the true tree than for the MP and ME trees when the number of nucleotides examined (n) is relatively small, whereas A is generally lower for the true tree than for the ML tree. This finding indicates that the optimization principle tends to give incorrect topologies when n is small. To deal with this disturbing property of the optimization principle, we suggest that more attention should be given to testing the statistical reliability of an estimated tree rather than to finding the optimal tree with excessive efforts. When a reliability test is conducted, simplified MP, ME, and ML algorithms such as the neighbor-joining method generally give conclusions about phylogenetic inference very similar to those obtained by the more extensive tree search algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper deals with the estimation of a time-invariant channel spectrum from its own nonuniform samples, assuming there is a bound on the channel’s delay spread. Except for this last assumption, this is the basic estimation problem in systems providing channel spectral samples. However, as shown in the paper, the delay spread bound leads us to view the spectrum as a band-limited signal, rather than the Fourier transform of a tapped delay line (TDL). Using this alternative model, a linear estimator is presented that approximately minimizes the expected root-mean-square (RMS) error for a deterministic channel. Its main advantage over the TDL is that it takes into account the spectrum’s smoothness (time width), thus providing a performance improvement. The proposed estimator is compared numerically with the maximum likelihood (ML) estimator based on a TDL model in pilot-assisted channel estimation (PACE) for OFDM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work has, as its objective, the development of non-invasive and low-cost systems for monitoring and automatic diagnosing specific neonatal diseases by means of the analysis of suitable video signals. We focus on monitoring infants potentially at risk of diseases characterized by the presence or absence of rhythmic movements of one or more body parts. Seizures and respiratory diseases are specifically considered, but the approach is general. Seizures are defined as sudden neurological and behavioural alterations. They are age-dependent phenomena and the most common sign of central nervous system dysfunction. Neonatal seizures have onset within the 28th day of life in newborns at term and within the 44th week of conceptional age in preterm infants. Their main causes are hypoxic-ischaemic encephalopathy, intracranial haemorrhage, and sepsis. Studies indicate an incidence rate of neonatal seizures of 0.2% live births, 1.1% for preterm neonates, and 1.3% for infants weighing less than 2500 g at birth. Neonatal seizures can be classified into four main categories: clonic, tonic, myoclonic, and subtle. Seizures in newborns have to be promptly and accurately recognized in order to establish timely treatments that could avoid an increase of the underlying brain damage. Respiratory diseases related to the occurrence of apnoea episodes may be caused by cerebrovascular events. Among the wide range of causes of apnoea, besides seizures, a relevant one is Congenital Central Hypoventilation Syndrome (CCHS) \cite{Healy}. With a reported prevalence of 1 in 200,000 live births, CCHS, formerly known as Ondine's curse, is a rare life-threatening disorder characterized by a failure of the automatic control of breathing, caused by mutations in a gene classified as PHOX2B. CCHS manifests itself, in the neonatal period, with episodes of cyanosis or apnoea, especially during quiet sleep. The reported mortality rates range from 8% to 38% of newborn with genetically confirmed CCHS. Nowadays, CCHS is considered a disorder of autonomic regulation, with related risk of sudden infant death syndrome (SIDS). Currently, the standard method of diagnosis, for both diseases, is based on polysomnography, a set of sensors such as ElectroEncephaloGram (EEG) sensors, ElectroMyoGraphy (EMG) sensors, ElectroCardioGraphy (ECG) sensors, elastic belt sensors, pulse-oximeter and nasal flow-meters. This monitoring system is very expensive, time-consuming, moderately invasive and requires particularly skilled medical personnel, not always available in a Neonatal Intensive Care Unit (NICU). Therefore, automatic, real-time and non-invasive monitoring equipments able to reliably recognize these diseases would be of significant value in the NICU. A very appealing monitoring tool to automatically detect neonatal seizures or breathing disorders may be based on acquiring, through a network of sensors, e.g., a set of video cameras, the movements of the newborn's body (e.g., limbs, chest) and properly processing the relevant signals. An automatic multi-sensor system could be used to permanently monitor every patient in the NICU or specific patients at home. Furthermore, a wire-free technique may be more user-friendly and highly desirable when used with infants, in particular with newborns. This work has focused on a reliable method to estimate the periodicity in pathological movements based on the use of the Maximum Likelihood (ML) criterion. In particular, average differential luminance signals from multiple Red, Green and Blue (RGB) cameras or depth-sensor devices are extracted and the presence or absence of a significant periodicity is analysed in order to detect possible pathological conditions. The efficacy of this monitoring system has been measured on the basis of video recordings provided by the Department of Neurosciences of the University of Parma. Concerning clonic seizures, a kinematic analysis was performed to establish a relationship between neonatal seizures and human inborn pattern of quadrupedal locomotion. Moreover, we have decided to realize simulators able to replicate the symptomatic movements characteristic of the diseases under consideration. The reasons is, essentially, the opportunity to have, at any time, a 'subject' on which to test the continuously evolving detection algorithms. Finally, we have developed a smartphone App, called 'Smartphone based contactless epilepsy detector' (SmartCED), able to detect neonatal clonic seizures and warn the user about the occurrence in real-time.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe a method of recognizing handwritten digits by fitting generative models that are built from deformable B-splines with Gaussian ``ink generators'' spaced along the length of the spline. The splines are adjusted using a novel elastic matching procedure based on the Expectation Maximization (EM) algorithm that maximizes the likelihood of the model generating the data. This approach has many advantages. (1) After identifying the model most likely to have generated the data, the system not only produces a classification of the digit but also a rich description of the instantiation parameters which can yield information such as the writing style. (2) During the process of explaining the image, generative models can perform recognition driven segmentation. (3) The method involves a relatively small number of parameters and hence training is relatively easy and fast. (4) Unlike many other recognition schemes it does not rely on some form of pre-normalization of input images, but can handle arbitrary scalings, translations and a limited degree of image rotation. We have demonstrated our method of fitting models to images does not get trapped in poor local minima. The main disadvantage of the method is it requires much more computation than more standard OCR techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the present study, multilayer perceptron (MLP) neural networks were applied to help in the diagnosis of obstructive sleep apnoea syndrome (OSAS). Oxygen saturation (SaO2) recordings from nocturnal pulse oximetry were used for this purpose. We performed time and spectral analysis of these signals to extract 14 features related to OSAS. The performance of two different MLP classifiers was compared: maximum likelihood (ML) and Bayesian (BY) MLP networks. A total of 187 subjects suspected of suffering from OSAS took part in the study. Their SaO2 signals were divided into a training set with 74 recordings and a test set with 113 recordings. BY-MLP networks achieved the best performance on the test set with 85.58% accuracy (87.76% sensitivity and 82.39% specificity). These results were substantially better than those provided by ML-MLP networks, which were affected by overfitting and achieved an accuracy of 76.81% (86.42% sensitivity and 62.83% specificity). Our results suggest that the Bayesian framework is preferred to implement our MLP classifiers. The proposed BY-MLP networks could be used for early OSAS detection. They could contribute to overcome the difficulties of nocturnal polysomnography (PSG) and thus reduce the demand for these studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: Primary 62F35; Secondary 62P99

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The correlated probit model is frequently used for multiple ordered data since it allows to incorporate seamlessly different correlation structures. The estimation of the probit model parameters based on direct maximization of the limited information maximum likelihood is a numerically intensive procedure. We propose an extension of the EM algorithm for obtaining maximum likelihood estimates for a correlated probit model for multiple ordinal outcomes. The algorithm is implemented in the free software environment for statistical computing and graphics R. We present two simulation studies to examine the performance of the developed algorithm. We apply the model to data on 121 women with cervical or endometrial cancer. Patients developed normal tissue reactions as a result of post-operative external beam pelvic radiotherapy. In this work we focused on modeling the effects of a genetic factor on early skin and early urogenital tissue reactions and on assessing the strength of association between the two types of reactions. We established that there was an association between skin reactions and polymorphism XRCC3 codon 241 (C>T) (rs861539) and that skin and urogenital reactions were positively correlated. ACM Computing Classification System (1998): G.3.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

2010 Mathematics Subject Classification: 62F12, 62M05, 62M09, 62M10, 60G42.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62E16,62F15, 62H12, 62M20.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Polynomial phase modulated (PPM) signals have been shown to provide improved error rate performance with respect to conventional modulation formats under additive white Gaussian noise and fading channels in single-input single-output (SISO) communication systems. In this dissertation, systems with two and four transmit antennas using PPM signals were presented. In both cases we employed full-rate space-time block codes in order to take advantage of the multipath channel. For two transmit antennas, we used the orthogonal space-time block code (OSTBC) proposed by Alamouti and performed symbol-wise decoding by estimating the phase coefficients of the PPM signal using three different methods: maximum-likelihood (ML), sub-optimal ML (S-ML) and the high-order ambiguity function (HAF). In the case of four transmit antennas, we used the full-rate quasi-OSTBC (QOSTBC) proposed by Jafarkhani. However, in order to ensure the best error rate performance, PPM signals were selected such as to maximize the QOSTBC’s minimum coding gain distance (CGD). Since this method does not always provide a unique solution, an additional criterion known as maximum channel interference coefficient (CIC) was proposed. Through Monte Carlo simulations it was shown that by using QOSTBCs along with the properly selected PPM constellations based on the CGD and CIC criteria, full diversity in flat fading channels and thus, low BER at high signal-to-noise ratios (SNR) can be ensured. Lastly, the performance of symbol-wise decoding for QOSTBCs was evaluated. In this case a quasi zero-forcing method was used to decouple the received signal and it was shown that although this technique reduces the decoding complexity of the system, there is a penalty to be paid in terms of error rate performance at high SNRs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this research the integration of nanostructures and micro-scale devices was investigated using silica nanowires to develop a simple yet robust nanomanufacturing technique for improving the detection parameters of chemical and biological sensors. This has been achieved with the use of a dielectric barrier layer, to restrict nanowire growth to site-specific locations which has removed the need for post growth processing, by making it possible to place nanostructures on pre-pattern substrates. Nanowires were synthesized using the Vapor-Liquid-Solid growth method. Process parameters (temperature and time) and manufacturing aspects (structural integrity and biocompatibility) were investigated. Silica nanowires were observed experimentally to determine how their physical and chemical properties could be tuned for integration into existing sensing structures. Growth kinetic experiments performed using gold and palladium catalysts at 1050°C for 60 minutes in an open-tube furnace yielded dense and consistent silica nanowire growth. This consistent growth led to the development of growth model fitting, through use of the Maximum Likelihood Estimation (MLE) and Bayesian hierarchical modeling. Transmission electron microscopy studies revealed the nanowires to be amorphous and X-ray diffraction confirmed the composition to be SiO2 . Silica nanowires were monitored in epithelial breast cancer media using Impedance spectroscopy, to test biocompatibility, due to potential in vivo use as a diagnostic aid. It was found that palladium catalyzed silica nanowires were toxic to breast cancer cells, however, nanowires were inert at 1μg/mL concentrations. Additionally a method for direct nanowire integration was developed that allowed for silica nanowires to be grown directly into interdigitated sensing structures. This technique eliminates the need for physical nanowire transfer thus preserving nanowire structure and performance integrity and further reduces fabrication cost. Successful nanowire integration was physically verified using Scanning electron microscopy and confirmed electrically using Electrochemical Impedance Spectroscopy of immobilized Prostate Specific Antigens (PSA). The experiments performed above serve as a guideline to addressing the metallurgic challenges in nanoscale integration of materials with varying composition and to understanding the effects of nanomaterials on biological structures that come in contact with the human body.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

People go through their life making all kinds of decisions, and some of these decisions affect their demand for transportation, for example, their choices of where to live and where to work, how and when to travel and which route to take. Transport related choices are typically time dependent and characterized by large number of alternatives that can be spatially correlated. This thesis deals with models that can be used to analyze and predict discrete choices in large-scale networks. The proposed models and methods are highly relevant for, but not limited to, transport applications. We model decisions as sequences of choices within the dynamic discrete choice framework, also known as parametric Markov decision processes. Such models are known to be difficult to estimate and to apply to make predictions because dynamic programming problems need to be solved in order to compute choice probabilities. In this thesis we show that it is possible to explore the network structure and the flexibility of dynamic programming so that the dynamic discrete choice modeling approach is not only useful to model time dependent choices, but also makes it easier to model large-scale static choices. The thesis consists of seven articles containing a number of models and methods for estimating, applying and testing large-scale discrete choice models. In the following we group the contributions under three themes: route choice modeling, large-scale multivariate extreme value (MEV) model estimation and nonlinear optimization algorithms. Five articles are related to route choice modeling. We propose different dynamic discrete choice models that allow paths to be correlated based on the MEV and mixed logit models. The resulting route choice models become expensive to estimate and we deal with this challenge by proposing innovative methods that allow to reduce the estimation cost. For example, we propose a decomposition method that not only opens up for possibility of mixing, but also speeds up the estimation for simple logit models, which has implications also for traffic simulation. Moreover, we compare the utility maximization and regret minimization decision rules, and we propose a misspecification test for logit-based route choice models. The second theme is related to the estimation of static discrete choice models with large choice sets. We establish that a class of MEV models can be reformulated as dynamic discrete choice models on the networks of correlation structures. These dynamic models can then be estimated quickly using dynamic programming techniques and an efficient nonlinear optimization algorithm. Finally, the third theme focuses on structured quasi-Newton techniques for estimating discrete choice models by maximum likelihood. We examine and adapt switching methods that can be easily integrated into usual optimization algorithms (line search and trust region) to accelerate the estimation process. The proposed dynamic discrete choice models and estimation methods can be used in various discrete choice applications. In the area of big data analytics, models that can deal with large choice sets and sequential choices are important. Our research can therefore be of interest in various demand analysis applications (predictive analytics) or can be integrated with optimization models (prescriptive analytics). Furthermore, our studies indicate the potential of dynamic programming techniques in this context, even for static models, which opens up a variety of future research directions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

ABSTRACT. Phylogenies and molecular clocks of the diatoms have largely been inferred from SSU rDNA sequences. A new phylogeny of diatoms was estimated using four gene markers SSU and LSU rDNA rbcL and psbA (total 4352 bp) with 42 diatom species. The four gene trees analysed with a maximum likelihood (ML) and Baysian (BI) analysis recovered a monophyletic origin of the new diatom classes with high bootstrap support, which has been controversial with single gene markers using single outgroups and alignments that do not take secondary structure of the SSU gene into account. The divergence time of the classes were calculated from a ML tree in the MultliDiv Time program using a Bayesian estimation allowing for simultaneous constraints from the fossil record and varying rates of molecular evolution of different branches in the phylogenetic tree. These divergence times are generally in agreement with those proposed by other clocks using single genes with the exception that the pennates appear much earlier and suggest a longer Cretaceous fossil record that has yet to be sampled. Ghost lineages (i.e. the discrepancy between first appearance (FA) and molecular clock age of origin from an extant taxon) were revealed in the pennate lineage, whereas those ghost lineages in the centric lineages previously reported by others are reviewed and referred to earlier literature.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

ABSTRACT. Phylogenies and molecular clocks of the diatoms have largely been inferred from SSU rDNA sequences. A new phylogeny of diatoms was estimated using four gene markers SSU and LSU rDNA rbcL and psbA (total 4352 bp) with 42 diatom species. The four gene trees analysed with a maximum likelihood (ML) and Baysian (BI) analysis recovered a monophyletic origin of the new diatom classes with high bootstrap support, which has been controversial with single gene markers using single outgroups and alignments that do not take secondary structure of the SSU gene into account. The divergence time of the classes were calculated from a ML tree in the MultliDiv Time program using a Bayesian estimation allowing for simultaneous constraints from the fossil record and varying rates of molecular evolution of different branches in the phylogenetic tree. These divergence times are generally in agreement with those proposed by other clocks using single genes with the exception that the pennates appear much earlier and suggest a longer Cretaceous fossil record that has yet to be sampled. Ghost lineages (i.e. the discrepancy between first appearance (FA) and molecular clock age of origin from an extant taxon) were revealed in the pennate lineage, whereas those ghost lineages in the centric lineages previously reported by others are reviewed and referred to earlier literature.