910 resultados para Missing data
Resumo:
Results of a search for new phenomena in events with an energetic photon and large missing transverse momentum in proton-proton collisions at root s = 7 TeV are reported. Data collected by the ATLAS experiment at the LHC corresponding to an integrated luminosity of 4.6 fb(-1) are used. Good agreement is observed between the data and the standard model predictions. The results are translated into exclusion limits on models with large extra spatial dimensions and on pair production of weakly interacting dark matter candidates.
Resumo:
A search for squarks and gluinos in final states containing jets, missing transverse momentum and no high-p(T) electrons or muons is presented. The data represent the complete sample recorded in 2011 by the ATLAS experiment in 7 TeV proton-proton collisions at the Large Hadron Collider, with a total integrated luminosity of 4.7 fb(-1). No excess above the Standard Model background expectation is observed. Gluino masses below 860 GeV and squark masses below 1320 GeV are excluded at the 95% confidence level in simplified models containing only squarks of the first two generations, a gluino octet and a massless neutralino, for squark or gluino masses below 2 TeV, respectively. Squarks and gluinos with equal masses below 1410 GeV are excluded. In minimal supergravity/constrained minimal supersymmetric Standard Model models with tan beta = 10, A(0) = 0 and mu > 0, squarks and gluinos of equal mass are excluded for masses below 1360 GeV. Constraints are also placed on the parameter space of supersymmetric models with compressed spectra. These limits considerably extend the region of supersymmetric parameter space excluded by previous measurements with the ATLAS detector.
Resumo:
A search for supersymmetry (SUSY) in events with large missing transverse momentum, jets, at least one hadronically decaying tau lepton and zero or one additional light leptons (electron/muon), has been performed using 20.3 fb−1 of proton-proton collision data at √s = 8TeV recorded with the ATLAS detector at the Large Hadron Collider. No excess above the Standard Model background expectation is observed in the various signal regions and 95% confidence level upper limits on the visible cross section for new phenomena are set. The results of the analysis are interpreted in several SUSY scenarios, significantly extending previous limits obtained in the same final states. In the framework of minimal gauge-mediated SUSY breaking models, values of the SUSY breaking scale ʌ below 63TeV are excluded, independently of tan β. Exclusion limits are also derived for an mSUGRA/CMSSM model, in both the R-parity-conserving and R-parity-violating case. A further interpretation is presented in a framework of natural gauge mediation, in which the gluino is assumed to be the only light coloured sparticle and gluino masses below 1090GeV are excluded.
Resumo:
A search for squarks and gluinos in final states containing high-pT jets, missing transverse momentum and no electrons or muons is presented. The data were recorded in 2012 by the ATLAS experiment in √s = 8TeV proton-proton collisions at the Large Hadron Collider, with a total integrated luminosity of 20.3 fb−1. Results are interpreted in a variety of simplified and specific supersymmetry-breaking models assuming that R-parity is conserved and that the lightest neutralino is the lightest supersymmetric particle. An exclusion limit at the 95% confidence level on the mass of the gluino is set at 1330GeV for a simplified model incorporating only a gluino and the lightest neutralino. For a simplified model involving the strong production of first- and second-generation squarks, squark masses below 850GeV (440GeV) are excluded for a massless lightest neutralino, assuming mass degenerate (single light-flavour) squarks. In mSUGRA/CMSSM models with tan β = 30, A0 = −2m0 and μ > 0, squarks and gluinos of equal mass are excluded for masses below 1700GeV. Additional limits are set for non-universal Higgs mass models with gaugino mediation and for simplified models involving the pair production of gluinos, each decaying to a top squark and a top quark, with the top squark decaying to a charm quark and a neutralino. These limits extend the region of supersymmetric parameter space excluded by previous searches with the ATLAS detector.
Resumo:
Abstract
Continuous variable is one of the major data types collected by the survey organizations. It can be incomplete such that the data collectors need to fill in the missingness. Or, it can contain sensitive information which needs protection from re-identification. One of the approaches to protect continuous microdata is to sum them up according to different cells of features. In this thesis, I represents novel methods of multiple imputation (MI) that can be applied to impute missing values and synthesize confidential values for continuous and magnitude data.
The first method is for limiting the disclosure risk of the continuous microdata whose marginal sums are fixed. The motivation for developing such a method comes from the magnitude tables of non-negative integer values in economic surveys. I present approaches based on a mixture of Poisson distributions to describe the multivariate distribution so that the marginals of the synthetic data are guaranteed to sum to the original totals. At the same time, I present methods for assessing disclosure risks in releasing such synthetic magnitude microdata. The illustration on a survey of manufacturing establishments shows that the disclosure risks are low while the information loss is acceptable.
The second method is for releasing synthetic continuous micro data by a nonstandard MI method. Traditionally, MI fits a model on the confidential values and then generates multiple synthetic datasets from this model. Its disclosure risk tends to be high, especially when the original data contain extreme values. I present a nonstandard MI approach conditioned on the protective intervals. Its basic idea is to estimate the model parameters from these intervals rather than the confidential values. The encouraging results of simple simulation studies suggest the potential of this new approach in limiting the posterior disclosure risk.
The third method is for imputing missing values in continuous and categorical variables. It is extended from a hierarchically coupled mixture model with local dependence. However, the new method separates the variables into non-focused (e.g., almost-fully-observed) and focused (e.g., missing-a-lot) ones. The sub-model structure of focused variables is more complex than that of non-focused ones. At the same time, their cluster indicators are linked together by tensor factorization and the focused continuous variables depend locally on non-focused values. The model properties suggest that moving the strongly associated non-focused variables to the side of focused ones can help to improve estimation accuracy, which is examined by several simulation studies. And this method is applied to data from the American Community Survey.
Resumo:
After years of deliberation, the EU commission sped up the reform process of a common EU digital policy considerably in 2015 by launching the EU digital single market strategy. In particular, two core initiatives of the strategy were agreed upon: General Data Protection Regulation and the Network and Information Security (NIS) Directive law texts. A new initiative was additionally launched addressing the role of online platforms. This paper focuses on the platform privacy rationale behind the data protection legislation, primarily based on the proposal for a new EU wide General Data Protection Regulation. We analyse the legislation rationale from an Information System perspective to understand the role user data plays in creating platforms that we identify as “processing silos”. Generative digital infrastructure theories are used to explain the innovative mechanisms that are thought to govern the notion of digitalization and successful business models that are affected by digitalization. We foresee continued judicial data protection challenges with the now proposed Regulation as the adoption of the “Internet of Things” continues. The findings of this paper illustrate that many of the existing issues can be addressed through legislation from a platform perspective. We conclude by proposing three modifications to the governing rationale, which would not only improve platform privacy for the data subject, but also entrepreneurial efforts in developing intelligent service platforms. The first modification is aimed at improving service differentiation on platforms by lessening the ability of incumbent global actors to lock-in the user base to their service/platform. The second modification posits limiting the current unwanted tracking ability of syndicates, by separation of authentication and data store services from any processing entity. Thirdly, we propose a change in terms of how security and data protection policies are reviewed, suggesting a third party auditing procedure.
Resumo:
This paper studies the missing covariate problem which is often encountered in survival analysis. Three covariate imputation methods are employed in the study, and the effectiveness of each method is evaluated within the hazard prediction framework. Data from a typical engineering asset is used in the case study. Covariate values in some time steps are deliberately discarded to generate an incomplete covariate set. It is found that although the mean imputation method is simpler than others for solving missing covariate problems, the results calculated by it can differ largely from the real values of the missing covariates. This study also shows that in general, results obtained from the regression method are more accurate than those of the mean imputation method but at the cost of a higher computational expensive. Gaussian Mixture Model (GMM) method is found to be the most effective method within these three in terms of both computation efficiency and predication accuracy.
Resumo:
In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is also proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Experiments based on several real-world data collections demonstrate that WebPut outperforms existing approaches.
Resumo:
This thesis takes a new data mining approach for analyzing road/crash data by developing models for the whole road network and generating a crash risk profile. Roads with an elevated crash risk due to road surface friction deficit are identified. The regression tree model, predicting road segment crash rate, is applied in a novel deployment coined regression tree extrapolation that produces a skid resistance/crash rate curve. Using extrapolation allows the method to be applied across the network and cope with the high proportion of missing road surface friction values. This risk profiling method can be applied in other domains.
Resumo:
Recently, vision-based systems have been deployed in professional sports to track the ball and players to enhance analysis of matches. Due to their unobtrusive nature, vision-based approaches are preferred to wearable sensors (e.g. GPS or RFID sensors) as it does not require players or balls to be instrumented prior to matches. Unfortunately, in continuous team sports where players need to be tracked continuously over long-periods of time (e.g. 35 minutes in field-hockey or 45 minutes in soccer), current vision-based tracking approaches are not reliable enough to provide fully automatic solutions. As such, human intervention is required to fix-up missed or false detections. However, in instances where a human can not intervene due to the sheer amount of data being generated - this data can not be used due to the missing/noisy data. In this paper, we investigate two representations based on raw player detections (and not tracking) which are immune to missed and false detections. Specifically, we show that both team occupancy maps and centroids can be used to detect team activities, while the occupancy maps can be used to retrieve specific team activities. An evaluation on over 8 hours of field hockey data captured at a recent international tournament demonstrates the validity of the proposed approach.
Resumo:
In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Moreover, several optimization techniques are also proposed to reduce the cost of estimating the confidence of imputation queries at both the tuple-level and the database-level. Experiments based on several real-world data collections demonstrate not only the effectiveness of WebPut compared to existing approaches, but also the efficiency of our proposed algorithms and optimization techniques.
Resumo:
Purpose Potential positive associations between youth physical activity and wellness scores could emphasize the value of youth physical activity engagement and promotion interventions, beyond the many established physiological and psychological benefits of increased physical activity. The purpose of this study was to explore the associations between adolescents' self-reported physical activity and wellness. Methods This investigation included 493 adolescents (165 males and 328 females) aged between 12 and 15 years. The participants were recruited from six secondary schools of varying socioeconomic status within a metropolitan area. Students were administered the Five-Factor Wellness Inventory and the International Physical Activity Questionnaire for Adolescents to assess both wellness and physical activity, respectively. Results Data indicated that significant associations between physical activity and wellness existed. Self-reported physical activity was shown to be positively associated with four dimensions including friendship, gender identity, spirituality, and exercise—the higher order factor physical self and total wellness, and negatively associated with self-care, self-worth, love, and cultural identity. Conclusion This study suggests that relationships exist between self-reported physical activity and various elements of wellness. Future research should use controlled trials of physical activity and wellness to establish causal links among youth populations. Understanding the nature of these relationships, including causality, has implications for the justification of youth physical activity promotion interventions and the development of youth physical activity engagement programs.
Resumo:
This paper proposes a method, based on polychotomous discrete choice methods, to impute a continuous measure of income when only a bracketed measure of income is available and for only a subset of the obsevations. The method is shown to perform well with CP5 data. © 1991.
Resumo:
When crystallization screening is conducted many outcomes are observed but typically the only trial recorded in the literature is the condition that yielded the crystal(s) used for subsequent diffraction studies. The initial hit that was optimized and the results of all the other trials are lost. These missing results contain information that would be useful for an improved general understanding of crystallization. This paper provides a report of a crystallization data exchange (XDX) workshop organized by several international large-scale crystallization screening laboratories to discuss how this information may be captured and utilized. A group that administers a significant fraction of the worlds crystallization screening results was convened, together with chemical and structural data informaticians and computational scientists who specialize in creating and analysing large disparate data sets. The development of a crystallization ontology for the crystallization community was proposed. This paper (by the attendees of the workshop) provides the thoughts and rationale leading to this conclusion. This is brought to the attention of the wider audience of crystallographers so that they are aware of these early efforts and can contribute to the process going forward. © 2012 International Union of Crystallography All rights reserved.
Resumo:
Due to their unobtrusive nature, vision-based approaches to tracking sports players have been preferred over wearable sensors as they do not require the players to be instrumented for each match. Unfortunately however, due to the heavy occlusion between players, variation in resolution and pose, in addition to fluctuating illumination conditions, tracking players continuously is still an unsolved vision problem. For tasks like clustering and retrieval, having noisy data (i.e. missing and false player detections) is problematic as it generates discontinuities in the input data stream. One method of circumventing this issue is to use an occupancy map, where the field is discretised into a series of zones and a count of player detections in each zone is obtained. A series of frames can then be concatenated to represent a set-play or example of team behaviour. A problem with this approach though is that the compressibility is low (i.e. the variability in the feature space is incredibly high). In this paper, we propose the use of a bilinear spatiotemporal basis model using a role representation to clean-up the noisy detections which operates in a low-dimensional space. To evaluate our approach, we used a fully instrumented field-hockey pitch with 8 fixed high-definition (HD) cameras and evaluated our approach on approximately 200,000 frames of data from a state-of-the-art real-time player detector and compare it to manually labeled data.