949 resultados para monotone missing data
Resumo:
This article reports on a search for dark matter pair production in association with bottom or top quarks in 20.3fb−1 of pp collisions collected at s√=8 TeV by the ATLAS detector at the LHC. Events with large missing transverse momentum are selected when produced in association with high-momentum jets of which one or more are identified as jets containing b-quarks. Final states with top quarks are selected by requiring a high jet multiplicity and in some cases a single lepton. The data are found to be consistent with the Standard Model expectations and limits are set on the mass scale of effective field theories that describe scalar and tensor interactions between dark matter and Standard Model particles. Limits on the dark-matter--nucleon cross-section for spin-independent and spin-dependent interactions are also provided. These limits are particularly strong for low-mass dark matter. Using a simplified model, constraints are set on the mass of dark matter and of a coloured mediator suitable to explain a possible signal of annihilating dark matter.
Resumo:
Results of a search for new phenomena in final states with an energetic jet and large missing transverse momentum are reported. The search uses 20.3 fb−1 of s√=8 TeV data collected in 2012 with the ATLAS detector at the LHC. Events are required to have at least one jet with pT>120 GeV and no leptons. Nine signal regions are considered with increasing missing transverse momentum requirements between EmissT>150 GeV and EmissT>700 GeV. Good agreement is observed between the number of events in data and Standard Model expectations. The results are translated into exclusion limits on models with large extra spatial dimensions, pair production of weakly interacting dark matter candidates, and production of very light gravitinos in a gauge-mediated supersymmetric model. In addition, limits on the production of an invisibly decaying Higgs-like boson leading to similar topologies in the final state are presented.
Resumo:
The results of a search for supersymmetry in final states containing at least one isolated lepton (electron or muon), jets and large missing transverse momentum with the ATLAS detector at the Large Hadron Collider (LHC) are reported. The search is based on proton-proton collision data at a centre-of-mass energy s√=8 TeV collected in 2012, corresponding to an integrated luminosity of 20 fb−1. No significant excess above the Standard Model expectation is observed. Limits are set on the parameters of a minimal universal extra dimensions model, excluding a compactification radius of 1/Rc=950 GeV for a cut-off scale times radius (ΛRc) of approximately 30, as well as on sparticle masses for various supersymmetric models. Depending on the model, the search excludes gluino masses up to 1.32 TeV and squark masses up to 840 GeV.
Resumo:
Results of a search for new phenomena in events with large missing transverse momentum and a Higgs boson decaying to two photons are reported. Data from proton--proton collisions at a center-of-mass energy of 8 TeV and corresponding to an integrated luminosity of 20.3 fb−1 have been collected with the ATLAS detector at the LHC. The observed data are well described by the expected Standard Model backgrounds. Upper limits on the cross section of events with large missing transverse momentum and a Higgs boson candidate are also placed. Exclusion limits are presented for models of physics beyond the Standard Model featuring dark-matter candidates.
Resumo:
Background: Natural Killer (NK) cells are thought to protect from residual leukemic cells in patients receiving stem cell transplantation. However, multiple retrospective analyses of patient data have yielded conflicting conclusions regarding a putative role of NK cells and the essential NK cell recognition events mediating a protective effect against leukemia. Further, a NK cell mediated protective effect against primary leukemia in vivo has not been shown directly.Methodology/Principal Findings: Here we addressed whether NK cells have the potential to control chronic myeloid leukemia (CML) arising based on the transplantation of BCR-ABL1 oncogene expressing primary bone marrow precursor cells into lethally irradiated recipient mice. These analyses identified missing-self recognition as the only NK cell-mediated recognition strategy, which is able to significantly protect from the development of CML disease in vivo.Conclusion: Our data provide a proof of principle that NK cells can control primary leukemic cells in vivo. Since the presence of NK cells reduced the abundance of leukemia propagating cancer stem cells, the data raise the possibility that NK cell recognition has the potential to cure CML, which may be difficult using small molecule BCR-ABL1 inhibitors. Finally, our findings validate approaches to treat leukemia using antibody-based blockade of self-specific inhibitory MHC class I receptors.
Resumo:
ABSTRACT: Invasive candidiasis is a frequent life-threatening complication in critically ill patients. Early diagnosis followed by prompt treatment aimed at improving outcome by minimizing unnecessary antifungal use remains a major challenge in the ICU setting. Timely patient selection thus plays a key role for clinically efficient and cost-effective management. Approaches combining clinical risk factors and Candida colonization data have improved our ability to identify such patients early. While the negative predictive value of scores and predicting rules is up to 95 to 99%, the positive predictive value is much lower, ranging between 10 and 60%. Accordingly, if a positive score or rule is used to guide the start of antifungal therapy, many patients may be treated unnecessarily. Candida biomarkers display higher positive predictive values; however, they lack sensitivity and are thus not able to identify all cases of invasive candidiasis. The (1→3)-β-D-glucan (BG) assay, a panfungal antigen test, is recommended as a complementary tool for the diagnosis of invasive mycoses in high-risk hemato-oncological patients. Its role in the more heterogeneous ICU population remains to be defined. More efficient clinical selection strategies combined with performant laboratory tools are needed in order to treat the right patients at the right time by keeping costs of screening and therapy as low as possible. The new approach proposed by Posteraro and colleagues in the previous issue of Critical Care meets these requirements. A single positive BG value in medical patients admitted to the ICU with sepsis and expected to stay for more than 5 days preceded the documentation of candidemia by 1 to 3 days with an unprecedented diagnostic accuracy. Applying this one-point fungal screening on a selected subset of ICU patients with an estimated 15 to 20% risk of developing candidemia is an appealing and potentially cost-effective approach. If confirmed by multicenter investigations, and extended to surgical patients at high risk of invasive candidiasis after abdominal surgery, this Bayesian-based risk stratification approach aimed at maximizing clinical efficiency by minimizing health care resource utilization may substantially simplify the management of critically ill patients at risk of invasive candidiasis.
Resumo:
This analysis was stimulated by the real data analysis problem of householdexpenditure data. The full dataset contains expenditure data for a sample of 1224 households. The expenditure is broken down at 2 hierarchical levels: 9 major levels (e.g. housing, food, utilities etc.) and 92 minor levels. There are also 5 factors and 5 covariates at the household level. Not surprisingly, there are a small number of zeros at the major level, but many zeros at the minor level. The question is how best to model the zeros. Clearly, models that tryto add a small amount to the zero terms are not appropriate in general as at least some of the zeros are clearly structural, e.g. alcohol/tobacco for households that are teetotal. The key question then is how to build suitable conditional models. For example, is the sub-composition of spendingexcluding alcohol/tobacco similar for teetotal and non-teetotal households?In other words, we are looking for sub-compositional independence. Also, what determines whether a household is teetotal? Can we assume that it is independent of the composition? In general, whether teetotal will clearly depend on the household level variables, so we need to be able to model this dependence. The other tricky question is that with zeros on more than onecomponent, we need to be able to model dependence and independence of zeros on the different components. Lastly, while some zeros are structural, others may not be, for example, for expenditure on durables, it may be chance as to whether a particular household spends money on durableswithin the sample period. This would clearly be distinguishable if we had longitudinal data, but may still be distinguishable by looking at the distribution, on the assumption that random zeros will usually be for situations where any non-zero expenditure is not small.While this analysis is based on around economic data, the ideas carry over tomany other situations, including geological data, where minerals may be missing for structural reasons (similar to alcohol), or missing because they occur only in random regions which may be missed in a sample (similar to the durables)
Resumo:
R from http://www.r-project.org/ is ‘GNU S’ – a language and environment for statistical computingand graphics. The environment in which many classical and modern statistical techniques havebeen implemented, but many are supplied as packages. There are 8 standard packages and many moreare available through the cran family of Internet sites http://cran.r-project.org .We started to develop a library of functions in R to support the analysis of mixtures and our goal isa MixeR package for compositional data analysis that provides support foroperations on compositions: perturbation and power multiplication, subcomposition with or withoutresiduals, centering of the data, computing Aitchison’s, Euclidean, Bhattacharyya distances,compositional Kullback-Leibler divergence etc.graphical presentation of compositions in ternary diagrams and tetrahedrons with additional features:barycenter, geometric mean of the data set, the percentiles lines, marking and coloring ofsubsets of the data set, theirs geometric means, notation of individual data in the set . . .dealing with zeros and missing values in compositional data sets with R procedures for simpleand multiplicative replacement strategy,the time series analysis of compositional data.We’ll present the current status of MixeR development and illustrate its use on selected data sets
Resumo:
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completelyabsent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and byMartín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involvedparts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method isintroduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that thetheoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approachhas reasonable properties from a compositional point of view. In particular, it is “natural” in the sense thatit recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in thesame paper a substitution method for missing values on compositional data sets is introduced
Resumo:
PURPOSE: Health-related quality of life (HRQoL) is considered a representative outcome in the evaluation of chronic disease management initiatives emphasizing patient-centered care. We evaluated the association between receipt of processes-of-care (PoC) for diabetes and HRQoL. METHODS: This cross-sectional study used self-reported data from non-institutionalized adults with diabetes in a Swiss canton. Outcomes were the physical/mental composites of the short form health survey 12 (SF-12) physical composite score, mental composite score (PCS, MCS) and the Audit of Diabetes-Dependent Quality of Life (ADDQoL). Main exposure variables were receipt of six PoC for diabetes in the past 12 months, and the Patient Assessment of Chronic Illness Care (PACIC) score. We performed linear regressions to examine the association between PoC, PACIC and the three composites of HRQoL. RESULTS: Mean age of the 519 patients was 64.5 years (SD 11.3); 60% were male, 87% reported type 2 or undetermined diabetes and 48% had diabetes for over 10 years. Mean HRQoL scores were SF-12 PCS: 43.4 (SD 10.5), SF-12 MCS: 47.0 (SD 11.2) and ADDQoL: -1.6 (SD 1.6). In adjusted models including all six PoC simultaneously, receipt of influenza vaccine was associated with lower ADDQoL (β=-0.4, p≤0.01) and foot examination was negatively associated with SF-12 PCS (β=-1.8, p≤0.05). There was no association or trend towards a negative association when these PoC were reported as combined measures. PACIC score was associated only with the SF-12 MCS (β=1.6, p≤0.05). CONCLUSIONS: PoC for diabetes did not show a consistent association with HRQoL in a cross-sectional analysis. This may represent an effect lag time between time of process received and health-related quality of life. Further research is needed to study this complex phenomenon.
Resumo:
Therapeutic drug monitoring (TDM) can be defined as the measurement of drug in biological samples to individualise treatment by adapting drug dose to improve efficacy and/or reduce toxicity. The cytotoxic drugs are characterised by steep dose-response relationships and narrow therapeutic windows. Inter-individual pharmacokinetic (PK) variability is often substantial. There are, however, a multitude of reasons why TDM has never been fully implemented in daily oncology practice. These include difficulties in establishing appropriate concentration target, common use of combination chemotherapies and the paucity of published data from pharmacological trials. The situation is different with targeted therapies. The large interindividual PK variability is influenced by the pharmacogenetic background of the patient (e.g. cytochrome P450 and ABC transporters polymorphisms), patient characteristics such as adherence to treatment and environmental factors (drug-drug interactions). Retrospective studies have shown that targeted drug exposure correlates with treatment response in various cancers. Evidence for imatinib currently exists, others are emerging for compounds including nilotinib, dasatinib, erlotinib, sunitinib, sorafenib and mammalian target of rapamycin (mTOR) inhibitors. Applications for TDM during oral targeted therapies may best be reserved for particular situations including lack of therapeutic response, severe or unexpected toxicities, anticipated drug-drug interactions and concerns over adherence treatment. There are still few data with monoclonal antibodies (mAbs) in favour of TDM approaches, even if data showed encouraging results with rituximab and cetuximab. TDM of mAbs is not yet supported by scientific evidence. Considerable effort should be made for targeted therapies to better define concentration-effect relationships and to perform comparative randomised trials of classic dosing versus pharmacokinetically-guided adaptive dosing.
Resumo:
We consider the problem of estimating the mean hospital cost of stays of a class of patients (e.g., a diagnosis-related group) as a function of patient characteristics. The statistical analysis is complicated by the asymmetry of the cost distribution, the possibility of censoring on the cost variable, and the occurrence of outliers. These problems have often been treated separately in the literature, and a method offering a joint solution to all of them is still missing. Indirect procedures have been proposed, combining an estimate of the duration distribution with an estimate of the conditional cost for a given duration. We propose a parametric version of this approach, allowing for asymmetry and censoring in the cost distribution and providing a mean cost estimator that is robust in the presence of extreme values. In addition, the new method takes covariate information into account.
Resumo:
This paper presents an Italian to CatalanRBMT system automatically built bycombining the linguistic data of theexisting pairs Spanish-Catalan andSpanish-Italian. A lightweight manualpostprocessing is carried out in order tofix inconsistencies in the automaticallyderived dictionaries and to add very frequentwords that are missing accordingto a corpus analysis. The system isevaluated on the KDE4 corpus and outperformsGoogle Translate by approximatelyten absolute points in terms ofboth TER and GTM.
Resumo:
Nowadays, Species Distribution Models (SDMs) are a widely used tool. Using different statistical approaches these models reconstruct the realized niche of a species using presence data and a set of variables, often topoclimatic. There utilization range is quite large from understanding single species requirements, to the creation of nature reserve based on species hotspots, or modeling of climate change impact, etc... Most of the time these models are using variables at a resolution of 50km x 50km or 1 km x 1 km. However in some cases these models are used with resolutions below the kilometer scale and thus called high resolution models (100 m x 100 m or 25 m x 25 m). Quite recently a new kind of data has emerged enabling precision up to lm x lm and thus allowing very high resolution modeling. However these new variables are very costly and need an important amount of time to be processed. This is especially the case when these variables are used in complex calculation like models projections over large areas. Moreover the importance of very high resolution data in SDMs has not been assessed yet and is not well understood. Some basic knowledge on what drive species presence-absences is still missing. Indeed, it is not clear whether in mountain areas like the Alps coarse topoclimatic gradients are driving species distributions or if fine scale temperature or topography are more important or if their importance can be neglected when balance to competition or stochasticity. In this thesis I investigated the importance of very high resolution data (2-5m) in species distribution models using either very high resolution topographic, climatic or edaphic variables over a 2000m elevation gradient in the Western Swiss Alps. I also investigated more local responses of these variables for a subset of species living in this area at two precise elvation belts. During this thesis I showed that high resolution data necessitates very good datasets (species and variables for the models) to produce satisfactory results. Indeed, in mountain areas, temperature is the most important factor driving species distribution and needs to be modeled at very fine resolution instead of being interpolated over large surface to produce satisfactory results. Despite the instinctive idea that topographic should be very important at high resolution, results are mitigated. However looking at the importance of variables over a large gradient buffers the importance of the variables. Indeed topographic factors have been shown to be highly important at the subalpine level but their importance decrease at lower elevations. Wether at the mountane level edaphic and land use factors are more important high resolution topographic data is more imporatant at the subalpine level. Finally the biggest improvement in the models happens when edaphic variables are added. Indeed, adding soil variables is of high importance and variables like pH are overpassing the usual topographic variables in SDMs in term of importance in the models. To conclude high resolution is very important in modeling but necessitate very good datasets. Only increasing the resolution of the usual topoclimatic predictors is not sufficient and the use of edaphic predictors has been highlighted as fundamental to produce significantly better models. This is of primary importance, especially if these models are used to reconstruct communities or as basis for biodiversity assessments. -- Ces dernières années, l'utilisation des modèles de distribution d'espèces (SDMs) a continuellement augmenté. Ces modèles utilisent différents outils statistiques afin de reconstruire la niche réalisée d'une espèce à l'aide de variables, notamment climatiques ou topographiques, et de données de présence récoltées sur le terrain. Leur utilisation couvre de nombreux domaines allant de l'étude de l'écologie d'une espèce à la reconstruction de communautés ou à l'impact du réchauffement climatique. La plupart du temps, ces modèles utilisent des occur-rences issues des bases de données mondiales à une résolution plutôt large (1 km ou même 50 km). Certaines bases de données permettent cependant de travailler à haute résolution, par conséquent de descendre en dessous de l'échelle du kilomètre et de travailler avec des résolutions de 100 m x 100 m ou de 25 m x 25 m. Récemment, une nouvelle génération de données à très haute résolution est apparue et permet de travailler à l'échelle du mètre. Les variables qui peuvent être générées sur la base de ces nouvelles données sont cependant très coûteuses et nécessitent un temps conséquent quant à leur traitement. En effet, tout calcul statistique complexe, comme des projections de distribution d'espèces sur de larges surfaces, demande des calculateurs puissants et beaucoup de temps. De plus, les facteurs régissant la distribution des espèces à fine échelle sont encore mal connus et l'importance de variables à haute résolution comme la microtopographie ou la température dans les modèles n'est pas certaine. D'autres facteurs comme la compétition ou la stochasticité naturelle pourraient avoir une influence toute aussi forte. C'est dans ce contexte que se situe mon travail de thèse. J'ai cherché à comprendre l'importance de la haute résolution dans les modèles de distribution d'espèces, que ce soit pour la température, la microtopographie ou les variables édaphiques le long d'un important gradient d'altitude dans les Préalpes vaudoises. J'ai également cherché à comprendre l'impact local de certaines variables potentiellement négligées en raison d'effets confondants le long du gradient altitudinal. Durant cette thèse, j'ai pu monter que les variables à haute résolution, qu'elles soient liées à la température ou à la microtopographie, ne permettent qu'une amélioration substantielle des modèles. Afin de distinguer une amélioration conséquente, il est nécessaire de travailler avec des jeux de données plus importants, tant au niveau des espèces que des variables utilisées. Par exemple, les couches climatiques habituellement interpolées doivent être remplacées par des couches de température modélisées à haute résolution sur la base de données de terrain. Le fait de travailler le long d'un gradient de température de 2000m rend naturellement la température très importante au niveau des modèles. L'importance de la microtopographie est négligeable par rapport à la topographie à une résolution de 25m. Cependant, lorsque l'on regarde à une échelle plus locale, la haute résolution est une variable extrêmement importante dans le milieu subalpin. À l'étage montagnard par contre, les variables liées aux sols et à l'utilisation du sol sont très importantes. Finalement, les modèles de distribution d'espèces ont été particulièrement améliorés par l'addition de variables édaphiques, principalement le pH, dont l'importance supplante ou égale les variables topographique lors de leur ajout aux modèles de distribution d'espèces habituels.
Resumo:
Longitudinal surveys are increasingly used to collect event history data on person-specific processes such as transitions between labour market states. Surveybased event history data pose a number of challenges for statistical analysis. These challenges include survey errors due to sampling, non-response, attrition and measurement. This study deals with non-response, attrition and measurement errors in event history data and the bias caused by them in event history analysis. The study also discusses some choices faced by a researcher using longitudinal survey data for event history analysis and demonstrates their effects. These choices include, whether a design-based or a model-based approach is taken, which subset of data to use and, if a design-based approach is taken, which weights to use. The study takes advantage of the possibility to use combined longitudinal survey register data. The Finnish subset of European Community Household Panel (FI ECHP) survey for waves 1–5 were linked at person-level with longitudinal register data. Unemployment spells were used as study variables of interest. Lastly, a simulation study was conducted in order to assess the statistical properties of the Inverse Probability of Censoring Weighting (IPCW) method in a survey data context. The study shows how combined longitudinal survey register data can be used to analyse and compare the non-response and attrition processes, test the missingness mechanism type and estimate the size of bias due to non-response and attrition. In our empirical analysis, initial non-response turned out to be a more important source of bias than attrition. Reported unemployment spells were subject to seam effects, omissions, and, to a lesser extent, overreporting. The use of proxy interviews tended to cause spell omissions. An often-ignored phenomenon classification error in reported spell outcomes, was also found in the data. Neither the Missing At Random (MAR) assumption about non-response and attrition mechanisms, nor the classical assumptions about measurement errors, turned out to be valid. Both measurement errors in spell durations and spell outcomes were found to cause bias in estimates from event history models. Low measurement accuracy affected the estimates of baseline hazard most. The design-based estimates based on data from respondents to all waves of interest and weighted by the last wave weights displayed the largest bias. Using all the available data, including the spells by attriters until the time of attrition, helped to reduce attrition bias. Lastly, the simulation study showed that the IPCW correction to design weights reduces bias due to dependent censoring in design-based Kaplan-Meier and Cox proportional hazard model estimators. The study discusses implications of the results for survey organisations collecting event history data, researchers using surveys for event history analysis, and researchers who develop methods to correct for non-sampling biases in event history data.