958 resultados para multivariate binary data
Resumo:
[EN]A thermodynamic study is carried out on binary systems composed of propyl ethanoate with six alkanes, from pentane to decane. Vapor pressures of the ester and the isobaric vapor−liquid equilibria of these six mixtures were measured at 101.32 kPa in a small-capacity ebulliometer and also the mixing properties yE = vE,hE over a range of temperatures and at atmospheric pressure. Adequate correlations are drawn for the surfaces yE = yE(x,T) with an interpretation on the behavior of the mixtures and also using cp E data from literature.
Resumo:
Questa tesi descrive alcuni studi di messa a punto di metodi di analisi fisici accoppiati con tecniche statistiche multivariate per valutare la qualità e l’autenticità di oli vegetali e prodotti caseari. L’applicazione di strumenti fisici permette di abbattere i costi ed i tempi necessari per le analisi classiche ed allo stesso tempo può fornire un insieme diverso di informazioni che possono riguardare tanto la qualità come l’autenticità di prodotti. Per il buon funzionamento di tali metodi è necessaria la costruzione di modelli statistici robusti che utilizzino set di dati correttamente raccolti e rappresentativi del campo di applicazione. In questo lavoro di tesi sono stati analizzati oli vegetali e alcune tipologie di formaggi (in particolare pecorini per due lavori di ricerca e Parmigiano-Reggiano per un altro). Sono stati utilizzati diversi strumenti di analisi (metodi fisici), in particolare la spettroscopia, l’analisi termica differenziale, il naso elettronico, oltre a metodiche separative tradizionali. I dati ottenuti dalle analisi sono stati trattati mediante diverse tecniche statistiche, soprattutto: minimi quadrati parziali; regressione lineare multipla ed analisi discriminante lineare.
Resumo:
Millisecond Pulsars (MSPs) are fast rotating, highly magnetized neutron stars. According to the "canonical recycling scenario", MSPs form in binary systems containing a neutron star which is spun up through mass accretion from the evolving companion. Therefore, the final stage consists of a binary made of a MSP and the core of the deeply peeled companion. In the last years, however an increasing number of systems deviating from these expectations has been discovered, thus strongly indicating that our understanding of MSPs is far to be complete. The identification of the optical companions to binary MSPs is crucial to constrain the formation and evolution of these objects. In dense environments such as Globular Clusters (GCs), it also allows us to get insights on the cluster internal dynamics. By using deep photometric data, acquired both from space and ground-based telescopes, we identified 5 new companions to MSPs. Three of them being located in GCs and two in the Galactic Field. The three new identifications in GCs increased by 50% the number of such objects known before this Thesis. They all are non-degenerate stars, at odds with the expectations of the "canonical recycling scenario". These results therefore suggest either that transitory phases should also be taken into account, or that dynamical processes, as exchange interactions, play a crucial role in the evolution of MSPs. We also performed a spectroscopic follow-up of the companion to PSRJ1740-5340A in the GC NGC 6397, confirming that it is a deeply peeled star descending from a ~0.8Msun progenitor. This nicely confirms the theoretical expectations about the formation and evolution of MSPs.
Resumo:
The atmosphere is a global influence on the movement of heat and humidity between the continents, and thus significantly affects climate variability. Information about atmospheric circulation are of major importance for the understanding of different climatic conditions. Dust deposits from maar lakes and dry maars from the Eifel Volcanic Field (Germany) are therefore used as proxy data for the reconstruction of past aeolian dynamics.rnrnIn this thesis past two sediment cores from the Eifel region are examined: the core SM3 from Lake Schalkenmehren and the core DE3 from the Dehner dry maar. Both cores contain the tephra of the Laacher See eruption, which is dated to 12,900 before present. Taken together the cores cover the last 60,000 years: SM3 the Holocene and DE3 the marine isotope stages MIS-3 and MIS-2, respectively. The frequencies of glacial dust storm events and their paleo wind direction are detected by high resolution grain size and provenance analysis of the lake sediments. Therefore two different methods are applied: geochemical measurements of the sediment using µXRF-scanning and the particle analysis method RADIUS (rapid particle analysis of digital images by ultra-high-resolution scanning of thin sections).rnIt is shown that single dust layers in the lake sediment are characterized by an increased content of aeolian transported carbonate particles. The limestone-bearing Eifel-North-South zone is the most likely source for the carbonate rich aeolian dust in the lake sediments of the Dehner dry maar. The dry maar is located on the western side of the Eifel-North-South zone. Thus, carbonate rich aeolian sediment is most likely to be transported towards the Dehner dry maar within easterly winds. A methodology is developed which limits the detection to the aeolian transported carbonate particles in the sediment, the RADIUS-carbonate module.rnrnIn summary, during the marine isotope stage MIS-3 the storm frequency and the east wind frequency are both increased in comparison to MIS-2. These results leads to the suggestion that atmospheric circulation was affected by more turbulent conditions during MIS-3 in comparison to the more stable atmospheric circulation during the full glacial conditions of MIS-2.rnThe results of the investigations of the dust records are finally evaluated in relation a study of atmospheric general circulation models for a comprehensive interpretation. Here, AGCM experiments (ECHAM3 and ECHAM4) with different prescribed SST patterns are used to develop a synoptic interpretation of long-persisting east wind conditions and of east wind storm events, which are suggested to lead to an enhanced accumulation of sediment being transported by easterly winds to the proxy site of the Dehner dry maar.rnrnThe basic observations made on the proxy record are also illustrated in the 10 m-wind vectors in the different model experiments under glacial conditions with different prescribed sea surface temperature patterns. Furthermore, the analysis of long-persisting east wind conditions in the AGCM data shows a stronger seasonality under glacial conditions: all the different experiments are characterized by an increase of the relative importance of the LEWIC during spring and summer. The different glacial experiments consistently show a shift from a long-lasting high over the Baltic Sea towards the NW, directly above the Scandinavian Ice Sheet, together with contemporary enhanced westerly circulation over the North Atlantic.rnrnThis thesis is a comprehensive analysis of atmospheric circulation patterns during the last glacial period. It has been possible to reconstruct important elements of the glacial paleo climate in Central Europe. While the proxy data from sediment cores lead to a binary signal of the wind direction changes (east versus west wind), a synoptic interpretation using atmospheric circulation models is successful. This shows a possible distribution of high and low pressure areas and thus the direction and strength of wind fields which have the capacity to transport dust. In conclusion, the combination of numerical models, to enhance understanding of processes in the climate system, with proxy data from the environmental record is the key to a comprehensive approach to paleo climatic reconstruction.rn
Resumo:
The Large Hadron Collider, located at the CERN laboratories in Geneva, is the largest particle accelerator in the world. One of the main research fields at LHC is the study of the Higgs boson, the latest particle discovered at the ATLAS and CMS experiments. Due to the small production cross section for the Higgs boson, only a substantial statistics can offer the chance to study this particle properties. In order to perform these searches it is desirable to avoid the contamination of the signal signature by the number and variety of the background processes produced in pp collisions at LHC. Much account assumes the study of multivariate methods which, compared to the standard cut-based analysis, can enhance the signal selection of a Higgs boson produced in association with a top quark pair through a dileptonic final state (ttH channel). The statistics collected up to 2012 is not sufficient to supply a significant number of ttH events; however, the methods applied in this thesis will provide a powerful tool for the increasing statistics that will be collected during the next LHC data taking.
Resumo:
Il Data Distribution Management (DDM) è un componente dello standard High Level Architecture. Il suo compito è quello di rilevare le sovrapposizioni tra update e subscription extent in modo efficiente. All'interno di questa tesi si discute la necessità di avere un framework e per quali motivi è stato implementato. Il testing di algoritmi per un confronto equo, librerie per facilitare la realizzazione di algoritmi, automatizzazione della fase di compilazione, sono motivi che sono stati fondamentali per iniziare la realizzazione framework. Il motivo portante è stato che esplorando articoli scientifici sul DDM e sui vari algoritmi si è notato che in ogni articolo si creavano dei dati appositi per fare dei test. L'obiettivo di questo framework è anche quello di riuscire a confrontare gli algoritmi con un insieme di dati coerente. Si è deciso di testare il framework sul Cloud per avere un confronto più affidabile tra esecuzioni di utenti diversi. Si sono presi in considerazione due dei servizi più utilizzati: Amazon AWS EC2 e Google App Engine. Sono stati mostrati i vantaggi e gli svantaggi dell'uno e dell'altro e il motivo per cui si è scelto di utilizzare Google App Engine. Si sono sviluppati quattro algoritmi: Brute Force, Binary Partition, Improved Sort, Interval Tree Matching. Sono stati svolti dei test sul tempo di esecuzione e sulla memoria di picco utilizzata. Dai risultati si evince che l'Interval Tree Matching e l'Improved Sort sono i più efficienti. Tutti i test sono stati svolti sulle versioni sequenziali degli algoritmi e che quindi ci può essere un riduzione nel tempo di esecuzione per l'algoritmo Interval Tree Matching.
Resumo:
We have investigated the use of hierarchical clustering of flow cytometry data to classify samples of conventional central chondrosarcoma, a malignant cartilage forming tumor of uncertain cellular origin, according to similarities with surface marker profiles of several known cell types. Human primary chondrosarcoma cells, articular chondrocytes, mesenchymal stem cells, fibroblasts, and a panel of tumor cell lines from chondrocytic or epithelial origin were clustered based on the expression profile of eleven surface markers. For clustering, eight hierarchical clustering algorithms, three distance metrics, as well as several approaches for data preprocessing, including multivariate outlier detection, logarithmic transformation, and z-score normalization, were systematically evaluated. By selecting clustering approaches shown to give reproducible results for cluster recovery of known cell types, primary conventional central chondrosacoma cells could be grouped in two main clusters with distinctive marker expression signatures: one group clustering together with mesenchymal stem cells (CD49b-high/CD10-low/CD221-high) and a second group clustering close to fibroblasts (CD49b-low/CD10-high/CD221-low). Hierarchical clustering also revealed substantial differences between primary conventional central chondrosarcoma cells and established chondrosarcoma cell lines, with the latter not only segregating apart from primary tumor cells and normal tissue cells, but clustering together with cell lines from epithelial lineage. Our study provides a foundation for the use of hierarchical clustering applied to flow cytometry data as a powerful tool to classify samples according to marker expression patterns, which could lead to uncover new cancer subtypes.
Resumo:
The SWISSspine registry is the first mandatory registry of its kind in the history of Swiss orthopaedics and it follows the principle of "coverage with evidence development". Its goal is the generation of evidence for a decision by the Swiss federal office of health about reimbursement of the concerned technologies and treatments by the basic health insurance of Switzerland. Recently, developed and clinically implemented, the Dynardi total disc arthroplasty (TDA) accounted for 10% of the implanted lumbar TDAs in the registry. We compared the outcomes of patients treated with Dynardi to those of the recipients of the other TDAs in the registry. Between March 2005 and October 2009, 483 patients with single-level TDA were documented in the registry. The 52 patients with a single Dynardi lumbar disc prosthesis implanted by two surgeons (CE and OS) were compared to the 431 patients who received one of the other prostheses. Data were collected in a prospective, observational multicenter mode. Surgery, implant, 3-month, 1-year, and 2-year follow-up forms as well as comorbidity, NASS and EQ-5D questionnaires were collected. For statistical analyses, the Wilcoxon signed-rank test and chi-square test were used. Multivariate regression analyses were also performed. Significant and clinically relevant reduction of low back pain and leg pain as well as improvement in quality of life was seen in both groups (P < 0.001 postop vs. preop). There were no inter-group differences regarding postoperative pain levels, intraoperative and follow-up complications or revision procedures with a new hospitalization. However, significantly more Dynardi patients achieved a minimum clinically relevant low back pain alleviation of 18 VAS points and a quality of life improvement of 0.25 EQ-5D points. The patients with Dynardi prosthesis showed a similar outcome to patients receiving the other TDAs in terms of postoperative low back and leg pain, complications, and revision procedures. A higher likelihood for achieving a minimum clinically relevant improvement of low back pain and quality of life in Dynardi patients was observed. This difference might be due to the large number of surgeons using other TDAs compared to only two surgeons using the Dynardi TDA, with corresponding variations in patient selection, patient-physician interaction and other factors, which cannot be assessed in a registry study.
Resumo:
The objective of this study was to characterize empirically the association between vaccination coverage and the size and occurrence of measles epidemics in Germany. In order to achieve this we analysed data routinely collected by the Robert Koch Institute, which comprise the weekly number of reported measles cases at all ages as well as estimates of vaccination coverage at the average age of entry into the school system. Coverage levels within each federal state of Germany are incorporated into a multivariate time-series model for infectious disease counts, which captures occasional outbreaks by means of an autoregressive component. The observed incidence pattern of measles for all ages is best described by using the log proportion of unvaccinated school starters in the autoregressive component of the model.
Resumo:
Currently, a variety of linear and nonlinear measures is in use to investigate spatiotemporal interrelation patterns of multivariate time series. Whereas the former are by definition insensitive to nonlinear effects, the latter detect both nonlinear and linear interrelation. In the present contribution we employ a uniform surrogate-based approach, which is capable of disentangling interrelations that significantly exceed random effects and interrelations that significantly exceed linear correlation. The bivariate version of the proposed framework is explored using a simple model allowing for separate tuning of coupling and nonlinearity of interrelation. To demonstrate applicability of the approach to multivariate real-world time series we investigate resting state functional magnetic resonance imaging (rsfMRI) data of two healthy subjects as well as intracranial electroencephalograms (iEEG) of two epilepsy patients with focal onset seizures. The main findings are that for our rsfMRI data interrelations can be described by linear cross-correlation. Rejection of the null hypothesis of linear iEEG interrelation occurs predominantly for epileptogenic tissue as well as during epileptic seizures.
Resumo:
The occupant impact velocity (OIV) and acceleration severity index (ASI) are competing measures of crash severity used to assess occupant injury risk in full-scale crash tests involving roadside safety hardware, e.g. guardrail. Delta-V, or the maximum change in vehicle velocity, is the traditional metric of crash severity for real world crashes. This study compares the ability of the OIV, ASI, and delta-V to discriminate between serious and non-serious occupant injury in real world frontal collisions. Vehicle kinematics data from event data recorders (EDRs) were matched with detailed occupant injury information for 180 real world crashes. Cumulative probability of injury risk curves were generated using binary logistic regression for belted and unbelted data subsets. By comparing the available fit statistics and performing a separate ROC curve analysis, the more computationally intensive OIV and ASI were found to offer no significant predictive advantage over the simpler delta-V.
Resumo:
Objectives: Previous research conducted in the late 1980s suggested that vehicle impacts following an initial barrier collision increase severe occupant injury risk. Now over 25years old, the data are no longer representative of the currently installed barriers or the present US vehicle fleet. The purpose of this study is to provide a present-day assessment of secondary collisions and to determine if current full-scale barrier crash testing criteria provide an indication of secondary collision risk for real-world barrier crashes. Methods: To characterize secondary collisions, 1,363 (596,331 weighted) real-world barrier midsection impacts selected from 13years (1997-2009) of in-depth crash data available through the National Automotive Sampling System (NASS) / Crashworthiness Data System (CDS) were analyzed. Scene diagram and available scene photographs were used to determine roadside and barrier specific variables unavailable in NASS/CDS. Binary logistic regression models were developed for second event occurrence and resulting driver injury. To investigate current secondary collision crash test criteria, 24 full-scale crash test reports were obtained for common non-proprietary US barriers, and the risk of secondary collisions was determined using recommended evaluation criteria from National Cooperative Highway Research Program (NCHRP) Report 350. Results: Secondary collisions were found to occur in approximately two thirds of crashes where a barrier is the first object struck. Barrier lateral stiffness, post-impact vehicle trajectory, vehicle type, and pre-impact tracking conditions were found to be statistically significant contributors to secondary event occurrence. The presence of a second event was found to increase the likelihood of a serious driver injury by a factor of 7 compared to cases with no second event present. The NCHRP Report 350 exit angle criterion was found to underestimate the risk of secondary collisions in real-world barrier crashes. Conclusions: Consistent with previous research, collisions following a barrier impact are not an infrequent event and substantially increase driver injury risk. The results suggest that using exit-angle based crash test criteria alone to assess secondary collision risk is not sufficient to predict second collision occurrence for real-world barrier crashes.
Resumo:
Publication bias and related bias in meta-analysis is often examined by visually checking for asymmetry in funnel plots of treatment effect against its standard error. Formal statistical tests of funnel plot asymmetry have been proposed, but when applied to binary outcome data these can give false-positive rates that are higher than the nominal level in some situations (large treatment effects, or few events per trial, or all trials of similar sizes). We develop a modified linear regression test for funnel plot asymmetry based on the efficient score and its variance, Fisher's information. The performance of this test is compared to the other proposed tests in simulation analyses based on the characteristics of published controlled trials. When there is little or no between-trial heterogeneity, this modified test has a false-positive rate close to the nominal level while maintaining similar power to the original linear regression test ('Egger' test). When the degree of between-trial heterogeneity is large, none of the tests that have been proposed has uniformly good properties.
Resumo:
The positive and negative predictive value are standard measures used to quantify the predictive accuracy of binary biomarkers when the outcome being predicted is also binary. When the biomarkers are instead being used to predict a failure time outcome, there is no standard way of quantifying predictive accuracy. We propose a natural extension of the traditional predictive values to accommodate censored survival data. We discuss not only quantifying predictive accuracy using these extended predictive values, but also rigorously comparing the accuracy of two biomarkers in terms of their predictive values. Using a marginal regression framework, we describe how to estimate differences in predictive accuracy and how to test whether the observed difference is statistically significant.
Resumo:
With recent advances in mass spectrometry techniques, it is now possible to investigate proteins over a wide range of molecular weights in small biological specimens. This advance has generated data-analytic challenges in proteomics, similar to those created by microarray technologies in genetics, namely, discovery of "signature" protein profiles specific to each pathologic state (e.g., normal vs. cancer) or differential profiles between experimental conditions (e.g., treated by a drug of interest vs. untreated) from high-dimensional data. We propose a data analytic strategy for discovering protein biomarkers based on such high-dimensional mass-spectrometry data. A real biomarker-discovery project on prostate cancer is taken as a concrete example throughout the paper: the project aims to identify proteins in serum that distinguish cancer, benign hyperplasia, and normal states of prostate using the Surface Enhanced Laser Desorption/Ionization (SELDI) technology, a recently developed mass spectrometry technique. Our data analytic strategy takes properties of the SELDI mass-spectrometer into account: the SELDI output of a specimen contains about 48,000 (x, y) points where x is the protein mass divided by the number of charges introduced by ionization and y is the protein intensity of the corresponding mass per charge value, x, in that specimen. Given high coefficients of variation and other characteristics of protein intensity measures (y values), we reduce the measures of protein intensities to a set of binary variables that indicate peaks in the y-axis direction in the nearest neighborhoods of each mass per charge point in the x-axis direction. We then account for a shifting (measurement error) problem of the x-axis in SELDI output. After these pre-analysis processing of data, we combine the binary predictors to generate classification rules for cancer, benign hyperplasia, and normal states of prostate. Our approach is to apply the boosting algorithm to select binary predictors and construct a summary classifier. We empirically evaluate sensitivity and specificity of the resulting summary classifiers with a test dataset that is independent from the training dataset used to construct the summary classifiers. The proposed method performed nearly perfectly in distinguishing cancer and benign hyperplasia from normal. In the classification of cancer vs. benign hyperplasia, however, an appreciable proportion of the benign specimens were classified incorrectly as cancer. We discuss practical issues associated with our proposed approach to the analysis of SELDI output and its application in cancer biomarker discovery.