2 resultados para modeling algorithms

em DigitalCommons@The Texas Medical Center


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The first manuscript, entitled "Time-Series Analysis as Input for Clinical Predictive Modeling: Modeling Cardiac Arrest in a Pediatric ICU" lays out the theoretical background for the project. There are several core concepts presented in this paper. First, traditional multivariate models (where each variable is represented by only one value) provide single point-in-time snapshots of patient status: they are incapable of characterizing deterioration. Since deterioration is consistently identified as a precursor to cardiac arrests, we maintain that the traditional multivariate paradigm is insufficient for predicting arrests. We identify time series analysis as a method capable of characterizing deterioration in an objective, mathematical fashion, and describe how to build a general foundation for predictive modeling using time series analysis results as latent variables. Building a solid foundation for any given modeling task involves addressing a number of issues during the design phase. These include selecting the proper candidate features on which to base the model, and selecting the most appropriate tool to measure them. We also identified several unique design issues that are introduced when time series data elements are added to the set of candidate features. One such issue is in defining the duration and resolution of time series elements required to sufficiently characterize the time series phenomena being considered as candidate features for the predictive model. Once the duration and resolution are established, there must also be explicit mathematical or statistical operations that produce the time series analysis result to be used as a latent candidate feature. In synthesizing the comprehensive framework for building a predictive model based on time series data elements, we identified at least four classes of data that can be used in the model design. The first two classes are shared with traditional multivariate models: multivariate data and clinical latent features. Multivariate data is represented by the standard one value per variable paradigm and is widely employed in a host of clinical models and tools. These are often represented by a number present in a given cell of a table. Clinical latent features derived, rather than directly measured, data elements that more accurately represent a particular clinical phenomenon than any of the directly measured data elements in isolation. The second two classes are unique to the time series data elements. The first of these is the raw data elements. These are represented by multiple values per variable, and constitute the measured observations that are typically available to end users when they review time series data. These are often represented as dots on a graph. The final class of data results from performing time series analysis. This class of data represents the fundamental concept on which our hypothesis is based. The specific statistical or mathematical operations are up to the modeler to determine, but we generally recommend that a variety of analyses be performed in order to maximize the likelihood that a representation of the time series data elements is produced that is able to distinguish between two or more classes of outcomes. The second manuscript, entitled "Building Clinical Prediction Models Using Time Series Data: Modeling Cardiac Arrest in a Pediatric ICU" provides a detailed description, start to finish, of the methods required to prepare the data, build, and validate a predictive model that uses the time series data elements determined in the first paper. One of the fundamental tenets of the second paper is that manual implementations of time series based models are unfeasible due to the relatively large number of data elements and the complexity of preprocessing that must occur before data can be presented to the model. Each of the seventeen steps is analyzed from the perspective of how it may be automated, when necessary. We identify the general objectives and available strategies of each of the steps, and we present our rationale for choosing a specific strategy for each step in the case of predicting cardiac arrest in a pediatric intensive care unit. Another issue brought to light by the second paper is that the individual steps required to use time series data for predictive modeling are more numerous and more complex than those used for modeling with traditional multivariate data. Even after complexities attributable to the design phase (addressed in our first paper) have been accounted for, the management and manipulation of the time series elements (the preprocessing steps in particular) are issues that are not present in a traditional multivariate modeling paradigm. In our methods, we present the issues that arise from the time series data elements: defining a reference time; imputing and reducing time series data in order to conform to a predefined structure that was specified during the design phase; and normalizing variable families rather than individual variable instances. The final manuscript, entitled: "Using Time-Series Analysis to Predict Cardiac Arrest in a Pediatric Intensive Care Unit" presents the results that were obtained by applying the theoretical construct and its associated methods (detailed in the first two papers) to the case of cardiac arrest prediction in a pediatric intensive care unit. Our results showed that utilizing the trend analysis from the time series data elements reduced the number of classification errors by 73%. The area under the Receiver Operating Characteristic curve increased from a baseline of 87% to 98% by including the trend analysis. In addition to the performance measures, we were also able to demonstrate that adding raw time series data elements without their associated trend analyses improved classification accuracy as compared to the baseline multivariate model, but diminished classification accuracy as compared to when just the trend analysis features were added (ie, without adding the raw time series data elements). We believe this phenomenon was largely attributable to overfitting, which is known to increase as the ratio of candidate features to class examples rises. Furthermore, although we employed several feature reduction strategies to counteract the overfitting problem, they failed to improve the performance beyond that which was achieved by exclusion of the raw time series elements. Finally, our data demonstrated that pulse oximetry and systolic blood pressure readings tend to start diminishing about 10-20 minutes before an arrest, whereas heart rates tend to diminish rapidly less than 5 minutes before an arrest.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Essential biological processes are governed by organized, dynamic interactions between multiple biomolecular systems. Complexes are thus formed to enable the biological function and get dissembled as the process is completed. Examples of such processes include the translation of the messenger RNA into protein by the ribosome, the folding of proteins by chaperonins or the entry of viruses in host cells. Understanding these fundamental processes by characterizing the molecular mechanisms that enable then, would allow the (better) design of therapies and drugs. Such molecular mechanisms may be revealed trough the structural elucidation of the biomolecular assemblies at the core of these processes. Various experimental techniques may be applied to investigate the molecular architecture of biomolecular assemblies. High-resolution techniques, such as X-ray crystallography, may solve the atomic structure of the system, but are typically constrained to biomolecules of reduced flexibility and dimensions. In particular, X-ray crystallography requires the sample to form a three dimensional (3D) crystal lattice which is technically di‑cult, if not impossible, to obtain, especially for large, dynamic systems. Often these techniques solve the structure of the different constituent components within the assembly, but encounter difficulties when investigating the entire system. On the other hand, imaging techniques, such as cryo-electron microscopy (cryo-EM), are able to depict large systems in near-native environment, without requiring the formation of crystals. The structures solved by cryo-EM cover a wide range of resolutions, from very low level of detail where only the overall shape of the system is visible, to high-resolution that approach, but not yet reach, atomic level of detail. In this dissertation, several modeling methods are introduced to either integrate cryo-EM datasets with structural data from X-ray crystallography, or to directly interpret the cryo-EM reconstruction. Such computational techniques were developed with the goal of creating an atomic model for the cryo-EM data. The low-resolution reconstructions lack the level of detail to permit a direct atomic interpretation, i.e. one cannot reliably locate the atoms or amino-acid residues within the structure obtained by cryo-EM. Thereby one needs to consider additional information, for example, structural data from other sources such as X-ray crystallography, in order to enable such a high-resolution interpretation. Modeling techniques are thus developed to integrate the structural data from the different biophysical sources, examples including the work described in the manuscript I and II of this dissertation. At intermediate and high-resolution, cryo-EM reconstructions depict consistent 3D folds such as tubular features which in general correspond to alpha-helices. Such features can be annotated and later on used to build the atomic model of the system, see manuscript III as alternative. Three manuscripts are presented as part of the PhD dissertation, each introducing a computational technique that facilitates the interpretation of cryo-EM reconstructions. The first manuscript is an application paper that describes a heuristics to generate the atomic model for the protein envelope of the Rift Valley fever virus. The second manuscript introduces the evolutionary tabu search strategies to enable the integration of multiple component atomic structures with the cryo-EM map of their assembly. Finally, the third manuscript develops further the latter technique and apply it to annotate consistent 3D patterns in intermediate-resolution cryo-EM reconstructions. The first manuscript, titled An assembly model for Rift Valley fever virus, was submitted for publication in the Journal of Molecular Biology. The cryo-EM structure of the Rift Valley fever virus was previously solved at 27Å-resolution by Dr. Freiberg and collaborators. Such reconstruction shows the overall shape of the virus envelope, yet the reduced level of detail prevents the direct atomic interpretation. High-resolution structures are not yet available for the entire virus nor for the two different component glycoproteins that form its envelope. However, homology models may be generated for these glycoproteins based on similar structures that are available at atomic resolutions. The manuscript presents the steps required to identify an atomic model of the entire virus envelope, based on the low-resolution cryo-EM map of the envelope and the homology models of the two glycoproteins. Starting with the results of the exhaustive search to place the two glycoproteins, the model is built iterative by running multiple multi-body refinements to hierarchically generate models for the different regions of the envelope. The generated atomic model is supported by prior knowledge regarding virus biology and contains valuable information about the molecular architecture of the system. It provides the basis for further investigations seeking to reveal different processes in which the virus is involved such as assembly or fusion. The second manuscript was recently published in the of Journal of Structural Biology (doi:10.1016/j.jsb.2009.12.028) under the title Evolutionary tabu search strategies for the simultaneous registration of multiple atomic structures in cryo-EM reconstructions. This manuscript introduces the evolutionary tabu search strategies applied to enable a multi-body registration. This technique is a hybrid approach that combines a genetic algorithm with a tabu search strategy to promote the proper exploration of the high-dimensional search space. Similar to the Rift Valley fever virus, it is common that the structure of a large multi-component assembly is available at low-resolution from cryo-EM, while high-resolution structures are solved for the different components but lack for the entire system. Evolutionary tabu search strategies enable the building of an atomic model for the entire system by considering simultaneously the different components. Such registration indirectly introduces spatial constrains as all components need to be placed within the assembly, enabling the proper docked in the low-resolution map of the entire assembly. Along with the method description, the manuscript covers the validation, presenting the benefit of the technique in both synthetic and experimental test cases. Such approach successfully docked multiple components up to resolutions of 40Å. The third manuscript is entitled Evolutionary Bidirectional Expansion for the Annotation of Alpha Helices in Electron Cryo-Microscopy Reconstructions and was submitted for publication in the Journal of Structural Biology. The modeling approach described in this manuscript applies the evolutionary tabu search strategies in combination with the bidirectional expansion to annotate secondary structure elements in intermediate resolution cryo-EM reconstructions. In particular, secondary structure elements such as alpha helices show consistent patterns in cryo-EM data, and are visible as rod-like patterns of high density. The evolutionary tabu search strategy is applied to identify the placement of the different alpha helices, while the bidirectional expansion characterizes their length and curvature. The manuscript presents the validation of the approach at resolutions ranging between 6 and 14Å, a level of detail where alpha helices are visible. Up to resolution of 12 Å, the method measures sensitivities between 70-100% as estimated in experimental test cases, i.e. 70-100% of the alpha-helices were correctly predicted in an automatic manner in the experimental data. The three manuscripts presented in this PhD dissertation cover different computation methods for the integration and interpretation of cryo-EM reconstructions. The methods were developed in the molecular modeling software Sculptor (http://sculptor.biomachina.org) and are available for the scientific community interested in the multi-resolution modeling of cryo-EM data. The work spans a wide range of resolution covering multi-body refinement and registration at low-resolution along with annotation of consistent patterns at high-resolution. Such methods are essential for the modeling of cryo-EM data, and may be applied in other fields where similar spatial problems are encountered, such as medical imaging.