920 resultados para Multivariate data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

INTRODUCTION Data concerning outcome after management of acetabular fractures by anterior approaches with focus on age and fractures associated with roof impaction, central dislocation and/or quadrilateral plate displacement are rare. METHODS Between October 2005 and April 2009 a series of 59 patients (mean age 57 years, range 13-91) with fractures involving the anterior column was treated using the modified Stoppa approach alone or for reduction of displaced iliac wing or low anterior column fractures in combination with the 1st window of the ilioinguinal approach or the modified Smith-Petersen approach, respectively. Surgical data, accuracy of reduction, clinical and radiographic outcome at mid-term and the need for endoprosthetic replacement in the postoperative course (defined as failure) were assessed; uni- and multivariate regression analysis were performed to identify independent predictive factors (e.g. age, nonanatomical reduction, acetabular roof impaction, central dislocation, quadrilateral plate displacement) for a failure. Outcome was assessed for all patients in general and in accordance to age in particular; patients were subdivided into two groups according to their age (group "<60yrs", group "≥60yrs"). RESULTS Forty-three of 59 patients (mean age 54yrs, 13-89) were available for evaluation. Of these, anatomic reduction was achieved in 72% of cases. Nonanatomical reduction was identified as being the only multivariate predictor for subsequent total hip replacement (Adjusted Hazard Ratio 23.5; p<0.01). A statistically significant higher rate of nonanatomical reduction was observed in the presence of acetabular roof impaction (p=0.01). In 16% of all patients, total hip replacement was performed and in 69% of patients with preserved hips the clinical results were excellent or good at a mean follow up of 35±10 months (range: 24-55). No statistical significant differences were observed between both groups. CONCLUSION Nonanatomical reconstruction of the articular surfaces is at risk for failure of joint-preserving management of acetabular fractures through an isolated or combined modified Stoppa approach resulting in total joint replacement at mid-term. In the elderly, joint-preserving surgery is worth considering as promising clinical and radiographic results might be obtained at mid-term.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client’s site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The ascertainment and analysis of adverse reactions to investigational agents presents a significant challenge because of the infrequency of these events, their subjective nature and the low priority of safety evaluations in many clinical trials. A one year review of antibiotic trials published in medical journals demonstrates the lack of standards in identifying and reporting these potentially fatal conditions. This review also illustrates the low probability of observing and detecting rare events in typical clinical trials which include fewer than 300 subjects. Uniform standards for ascertainment and reporting are suggested which include operational definitions of study subjects. Meta-analysis of selected antibiotic trials using multivariate regression analysis indicates that meaningful conclusions may be drawn from data from multiple studies which are pooled in a scientifically rigorous manner. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The first manuscript, entitled "Time-Series Analysis as Input for Clinical Predictive Modeling: Modeling Cardiac Arrest in a Pediatric ICU" lays out the theoretical background for the project. There are several core concepts presented in this paper. First, traditional multivariate models (where each variable is represented by only one value) provide single point-in-time snapshots of patient status: they are incapable of characterizing deterioration. Since deterioration is consistently identified as a precursor to cardiac arrests, we maintain that the traditional multivariate paradigm is insufficient for predicting arrests. We identify time series analysis as a method capable of characterizing deterioration in an objective, mathematical fashion, and describe how to build a general foundation for predictive modeling using time series analysis results as latent variables. Building a solid foundation for any given modeling task involves addressing a number of issues during the design phase. These include selecting the proper candidate features on which to base the model, and selecting the most appropriate tool to measure them. We also identified several unique design issues that are introduced when time series data elements are added to the set of candidate features. One such issue is in defining the duration and resolution of time series elements required to sufficiently characterize the time series phenomena being considered as candidate features for the predictive model. Once the duration and resolution are established, there must also be explicit mathematical or statistical operations that produce the time series analysis result to be used as a latent candidate feature. In synthesizing the comprehensive framework for building a predictive model based on time series data elements, we identified at least four classes of data that can be used in the model design. The first two classes are shared with traditional multivariate models: multivariate data and clinical latent features. Multivariate data is represented by the standard one value per variable paradigm and is widely employed in a host of clinical models and tools. These are often represented by a number present in a given cell of a table. Clinical latent features derived, rather than directly measured, data elements that more accurately represent a particular clinical phenomenon than any of the directly measured data elements in isolation. The second two classes are unique to the time series data elements. The first of these is the raw data elements. These are represented by multiple values per variable, and constitute the measured observations that are typically available to end users when they review time series data. These are often represented as dots on a graph. The final class of data results from performing time series analysis. This class of data represents the fundamental concept on which our hypothesis is based. The specific statistical or mathematical operations are up to the modeler to determine, but we generally recommend that a variety of analyses be performed in order to maximize the likelihood that a representation of the time series data elements is produced that is able to distinguish between two or more classes of outcomes. The second manuscript, entitled "Building Clinical Prediction Models Using Time Series Data: Modeling Cardiac Arrest in a Pediatric ICU" provides a detailed description, start to finish, of the methods required to prepare the data, build, and validate a predictive model that uses the time series data elements determined in the first paper. One of the fundamental tenets of the second paper is that manual implementations of time series based models are unfeasible due to the relatively large number of data elements and the complexity of preprocessing that must occur before data can be presented to the model. Each of the seventeen steps is analyzed from the perspective of how it may be automated, when necessary. We identify the general objectives and available strategies of each of the steps, and we present our rationale for choosing a specific strategy for each step in the case of predicting cardiac arrest in a pediatric intensive care unit. Another issue brought to light by the second paper is that the individual steps required to use time series data for predictive modeling are more numerous and more complex than those used for modeling with traditional multivariate data. Even after complexities attributable to the design phase (addressed in our first paper) have been accounted for, the management and manipulation of the time series elements (the preprocessing steps in particular) are issues that are not present in a traditional multivariate modeling paradigm. In our methods, we present the issues that arise from the time series data elements: defining a reference time; imputing and reducing time series data in order to conform to a predefined structure that was specified during the design phase; and normalizing variable families rather than individual variable instances. The final manuscript, entitled: "Using Time-Series Analysis to Predict Cardiac Arrest in a Pediatric Intensive Care Unit" presents the results that were obtained by applying the theoretical construct and its associated methods (detailed in the first two papers) to the case of cardiac arrest prediction in a pediatric intensive care unit. Our results showed that utilizing the trend analysis from the time series data elements reduced the number of classification errors by 73%. The area under the Receiver Operating Characteristic curve increased from a baseline of 87% to 98% by including the trend analysis. In addition to the performance measures, we were also able to demonstrate that adding raw time series data elements without their associated trend analyses improved classification accuracy as compared to the baseline multivariate model, but diminished classification accuracy as compared to when just the trend analysis features were added (ie, without adding the raw time series data elements). We believe this phenomenon was largely attributable to overfitting, which is known to increase as the ratio of candidate features to class examples rises. Furthermore, although we employed several feature reduction strategies to counteract the overfitting problem, they failed to improve the performance beyond that which was achieved by exclusion of the raw time series elements. Finally, our data demonstrated that pulse oximetry and systolic blood pressure readings tend to start diminishing about 10-20 minutes before an arrest, whereas heart rates tend to diminish rapidly less than 5 minutes before an arrest.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Quantitative real-time polymerase chain reaction (qPCR) is a sensitive gene quantitation method that has been widely used in the biological and biomedical fields. The currently used methods for PCR data analysis, including the threshold cycle (CT) method, linear and non-linear model fitting methods, all require subtracting background fluorescence. However, the removal of background fluorescence is usually inaccurate, and therefore can distort results. Here, we propose a new method, the taking-difference linear regression method, to overcome this limitation. Briefly, for each two consecutive PCR cycles, we subtracted the fluorescence in the former cycle from that in the later cycle, transforming the n cycle raw data into n-1 cycle data. Then linear regression was applied to the natural logarithm of the transformed data. Finally, amplification efficiencies and the initial DNA molecular numbers were calculated for each PCR run. To evaluate this new method, we compared it in terms of accuracy and precision with the original linear regression method with three background corrections, being the mean of cycles 1-3, the mean of cycles 3-7, and the minimum. Three criteria, including threshold identification, max R2, and max slope, were employed to search for target data points. Considering that PCR data are time series data, we also applied linear mixed models. Collectively, when the threshold identification criterion was applied and when the linear mixed model was adopted, the taking-difference linear regression method was superior as it gave an accurate estimation of initial DNA amount and a reasonable estimation of PCR amplification efficiencies. When the criteria of max R2 and max slope were used, the original linear regression method gave an accurate estimation of initial DNA amount. Overall, the taking-difference linear regression method avoids the error in subtracting an unknown background and thus it is theoretically more accurate and reliable. This method is easy to perform and the taking-difference strategy can be extended to all current methods for qPCR data analysis.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The reduction in sea ice along the SE Greenland coast during the last century has severely impacted ice-rafting to this area. In order to reconstruct ice-rafting and oceanographic conditions in the area of Denmark Strait during the last ~150 years, we conducted a multiproxy study on three short (20 cm) sediment cores from outer Kangerdlugssuaq Trough (~300 m water depth). The proxy-based data obtained have been compared with historical and instrumental data to gain a better understanding of the ice sheet-ocean interactions in the area. A robust chronology has been developed based on 210Pb and 137Cs measurements on core PO175GKC#9 (~66.2°N, 32°W) and expanded to the two adjacent cores based on correlations between calcite weight percent records. Our proxy records include sea-ice and phytoplankton biomarkers, and a variety of mineralogical determinations based on the <2 mm sediment fraction, including identification with quantitative x-ray diffraction, ice-rafted debris counts on the 63-150 µm sand fraction, and source identifications based on the composition of Fe oxides in the 45-250 µm fraction. A multivariate statistical analysis indicated significant correlations between our proxy records and historical data, especially with the mean annual temperature data from Stykkishólmur (Iceland) and the storis index (historical observations of sea-ice export via the East Greenland Current). In particular, the biological proxies (calcite weight percent, IP25, and total organic carbon %) showed significant linkage with the storis index. Our records show two distinct intervals in the recent history of the SE Greenland coast. The first of these (ad 1850-1910) shows predominantly perennial sea-ice conditions in the area, while the second (ad 1910-1990) shows more seasonally open water conditions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sites 1085, 1086 and 1087 were drilled off South Africa during Ocean Drilling Program (ODP) Leg 175 to investigate the Benguela Current System. While previous studies have focused on reconstructing the Neogene palaeoceanographic and palaeoclimatic history of these sites, palynology has been largely ignored, except for the Late Pliocene and Quaternary. This study presents palynological data from the upper Middle Miocene to lower Upper Pliocene sediments in Holes 1085A, 1086A and 1087C that provide complementary information about the history of the area. Abundant and diverse marine palynomorphs (mainly dinoflagellate cysts), rare spores and pollen, and dispersed organic matter have been recovered. Multivariate statistical analysis of dispersed organic matter identified three palynofacies assemblages (A, B, C) in the most continuous hole (1085A), and they were defined primarily by amorphous organic matter (AOM), and to a lesser extent black debris, structured phytoclasts, degraded phytoclasts, and marine palynomorphs. Ecostratigraphic interpretation based on dinoflagellate cyst, spore-pollen and palynofacies data allowed us to identify several palaeoceanographic and palaeoclimatic signals. First, the late Middle Miocene was subtropical, and sediments contained the highest percentages of land-derived organic matter, even though they are rich in AOM (palynofacies assemblage A). Second, the Late Miocene was cool-temperate and characterized by periods of intensified upwelling, increase in productivity, abundant and diverse oceanic dinoflagellate cysts, and the highest percentages of AOM (palynofacies assemblage C). Third, the Early to early Late Pliocene was warm-temperate with some dry intervals (increase in grass pollen) and intensified upwelling. Fourth, the Neogene "carbonate crash" identified in other southern oceans was recognized in two palynofacies A samples in Hole 1085A that are nearly barren of dinoflagellate cysts: one Middle Miocene sample (590 mbsf, 13.62 Ma) and one Upper Miocene sample (355 mbsf, 6.5 Ma). Finally, the extremely low percentages of pollen suggest sparse vegetation on the adjacent landmass, and Namib desert conditions were already in existence during the late Middle Miocene.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Global Ocean Sampling (GOS) expedition is currently the largest and geographically most comprehensive metagenomic dataset, including samples from the Atlantic, Pacific, and Indian Oceans. This study makes use of the wide range of environmental conditions and habitats encompassed within the GOS sites in order to investigate the ecological structuring of bacterial and archaeal taxon ranks. Community structures based on taxonomically classified 16S ribosomal RNA (rRNA) gene fragments at phylum, class, order, family, and genus rank levels were examined using multivariate statistical analysis, and the results were inspected in the context of oceanographic environmental variables and structured habitat classifications. At all taxon rank levels, community structures of neritic, oceanic, estuarine biomes, as well as other exotic biomes (salt marsh, lake, mangrove), were readily distinguishable from each other. A strong structuring of the communities with chlorophyll a concentration and a weaker yet significant structuring with temperature and salinity were observed. Furthermore, there were significant correlations between community structures and habitat classification. These results were used for further investigation of one-to-one relationships between taxa and environment and provided indications for ecological preferences shaped by primary production for both cultured and uncultured bacterial and archaeal clades.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spatial variability of Vertisol properties is relevant for identifying those zones with physical degradation. In this sense, one has to face the problem of identifying the origin and distribution of spatial variability patterns. The objectives of the present work were (i) to quantify the spatial structure of different physical properties collected from a Vertisol, (ii) to search for potential correlations between different spatial patterns and (iii) to identify relevant components through multivariate spatial analysis. The study was conducted on a Vertisol (Typic Hapludert) dedicated to sugarcane (Saccharum officinarum L.) production during the last sixty years. We used six soil properties collected from a squared grid (225 points) (penetrometer resistance (PR), total porosity, fragmentation dimension (Df), vertical electrical conductivity (ECv), horizontal electrical conductivity (ECh) and soil water content (WC)). All the original data sets were z-transformed before geostatistical analysis. Three different types of semivariogram models were necessary for fitting individual experimental semivariograms. This suggests the different natures of spatial variability patterns. Soil water content rendered the largest nugget effect (C0 = 0.933) while soil total porosity showed the largest range of spatial correlation (A = 43.92 m). The bivariate geostatistical analysis also rendered significant cross-semivariance between different paired soil properties. However, four different semivariogram models were required in that case. This indicates an underlying co-regionalization between different soil properties, which is of interest for delineating management zones within sugarcane fields. Cross-semivariograms showed larger correlation ranges than individual, univariate, semivariograms (A ≥ 29 m). All the findings were supported by multivariate spatial analysis, which showed the influence of soil tillage operations, harvesting machinery and irrigation water distribution on the status of the investigated area.