910 resultados para Missing Data
Resumo:
Estimation of population size with missing zero-class is an important problem that is encountered in epidemiological assessment studies. Fitting a Poisson model to the observed data by the method of maximum likelihood and estimation of the population size based on this fit is an approach that has been widely used for this purpose. In practice, however, the Poisson assumption is seldom satisfied. Zelterman (1988) has proposed a robust estimator for unclustered data that works well in a wide class of distributions applicable for count data. In the work presented here, we extend this estimator to clustered data. The estimator requires fitting a zero-truncated homogeneous Poisson model by maximum likelihood and thereby using a Horvitz-Thompson estimator of population size. This was found to work well, when the data follow the hypothesized homogeneous Poisson model. However, when the true distribution deviates from the hypothesized model, the population size was found to be underestimated. In the search of a more robust estimator, we focused on three models that use all clusters with exactly one case, those clusters with exactly two cases and those with exactly three cases to estimate the probability of the zero-class and thereby use data collected on all the clusters in the Horvitz-Thompson estimator of population size. Loss in efficiency associated with gain in robustness was examined based on a simulation study. As a trade-off between gain in robustness and loss in efficiency, the model that uses data collected on clusters with at most three cases to estimate the probability of the zero-class was found to be preferred in general. In applications, we recommend obtaining estimates from all three models and making a choice considering the estimates from the three models, robustness and the loss in efficiency. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)
Resumo:
Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.
Resumo:
Current methods for estimating vegetation parameters are generally sub-optimal in the way they exploit information and do not generally consider uncertainties. We look forward to a future where operational dataassimilation schemes improve estimates by tracking land surface processes and exploiting multiple types of observations. Dataassimilation schemes seek to combine observations and models in a statistically optimal way taking into account uncertainty in both, but have not yet been much exploited in this area. The EO-LDAS scheme and prototype, developed under ESA funding, is designed to exploit the anticipated wealth of data that will be available under GMES missions, such as the Sentinel family of satellites, to provide improved mapping of land surface biophysical parameters. This paper describes the EO-LDAS implementation, and explores some of its core functionality. EO-LDAS is a weak constraint variational dataassimilationsystem. The prototype provides a mechanism for constraint based on a prior estimate of the state vector, a linear dynamic model, and EarthObservationdata (top-of-canopy reflectance here). The observation operator is a non-linear optical radiative transfer model for a vegetation canopy with a soil lower boundary, operating over the range 400 to 2500 nm. Adjoint codes for all model and operator components are provided in the prototype by automatic differentiation of the computer codes. In this paper, EO-LDAS is applied to the problem of daily estimation of six of the parameters controlling the radiative transfer operator over the course of a year (> 2000 state vector elements). Zero and first order process model constraints are implemented and explored as the dynamic model. The assimilation estimates all state vector elements simultaneously. This is performed in the context of a typical Sentinel-2 MSI operating scenario, using synthetic MSI observations simulated with the observation operator, with uncertainties typical of those achieved by optical sensors supposed for the data. The experiments consider a baseline state vector estimation case where dynamic constraints are applied, and assess the impact of dynamic constraints on the a posteriori uncertainties. The results demonstrate that reductions in uncertainty by a factor of up to two might be obtained by applying the sorts of dynamic constraints used here. The hyperparameter (dynamic model uncertainty) required to control the assimilation are estimated by a cross-validation exercise. The result of the assimilation is seen to be robust to missing observations with quite large data gaps.
Resumo:
Nearly all chemistry–climate models (CCMs) have a systematic bias of a delayed springtime breakdown of the Southern Hemisphere (SH) stratospheric polar vortex, implying insufficient stratospheric wave drag. In this study the Canadian Middle Atmosphere Model (CMAM) and the CMAM Data Assimilation System (CMAM-DAS) are used to investigate the cause of this bias. Zonal wind analysis increments from CMAMDAS reveal systematic negative values in the stratosphere near 608S in winter and early spring. These are interpreted as indicating a bias in the model physics, namely, missing gravity wave drag (GWD). The negative analysis increments remain at a nearly constant height during winter and descend as the vortex weakens, much like orographic GWD. This region is also where current orographic GWD parameterizations have a gap in wave drag, which is suggested to be unrealistic because of missing effects in those parameterizations. These findings motivate a pair of free-runningCMAMsimulations to assess the impact of extra orographicGWDat 608S. The control simulation exhibits the cold-pole bias and delayed vortex breakdown seen in the CCMs. In the simulation with extra GWD, the cold-pole bias is significantly reduced and the vortex breaks down earlier. Changes in resolved wave drag in the stratosphere also occur in response to the extra GWD, which reduce stratospheric SH polar-cap temperature biases in late spring and early summer. Reducing the dynamical biases, however, results in degraded Antarctic column ozone. This suggests that CCMs that obtain realistic column ozone in the presence of an overly strong and persistent vortex may be doing so through compensating errors.
Resumo:
Knowledge spillover theory of entrepreneurship and the prevailing theory of economic growth treat opportunities as endogenous and generally focus on opportunity recognition by entrepreneurs. New knowledge created endogenously results in knowledge spillovers enabling inventors and entrepreneurs to commercialize it. This article discusses that knowledge spillover entrepreneurship depends not only on ordinary human capital, but more importantly also on creativity embodied in creative individuals and diverse urban environments that attract creative classes. This might result in self-selection of creative individuals into entrepreneurship or enable entrepreneurs to recognize creativity and commercialize it. This creativity theory of knowledge spillover entrepreneurship is tested utilizing data on European cities.
Resumo:
Background Despite the promising benefits of adaptive designs (ADs), their routine use, especially in confirmatory trials, is lagging behind the prominence given to them in the statistical literature. Much of the previous research to understand barriers and potential facilitators to the use of ADs has been driven from a pharmaceutical drug development perspective, with little focus on trials in the public sector. In this paper, we explore key stakeholders’ experiences, perceptions and views on barriers and facilitators to the use of ADs in publicly funded confirmatory trials. Methods Semi-structured, in-depth interviews of key stakeholders in clinical trials research (CTU directors, funding board and panel members, statisticians, regulators, chief investigators, data monitoring committee members and health economists) were conducted through telephone or face-to-face sessions, predominantly in the UK. We purposively selected participants sequentially to optimise maximum variation in views and experiences. We employed the framework approach to analyse the qualitative data. Results We interviewed 27 participants. We found some of the perceived barriers to be: lack of knowledge and experience coupled with paucity of case studies, lack of applied training, degree of reluctance to use ADs, lack of bridge funding and time to support design work, lack of statistical expertise, some anxiety about the impact of early trial stopping on researchers’ employment contracts, lack of understanding of acceptable scope of ADs and when ADs are appropriate, and statistical and practical complexities. Reluctance to use ADs seemed to be influenced by: therapeutic area, unfamiliarity, concerns about their robustness in decision-making and acceptability of findings to change practice, perceived complexities and proposed type of AD, among others. Conclusions There are still considerable multifaceted, individual and organisational obstacles to be addressed to improve uptake, and successful implementation of ADs when appropriate. Nevertheless, inferred positive change in attitudes and receptiveness towards the appropriate use of ADs by public funders are supportive and are a stepping stone for the future utilisation of ADs by researchers.
Resumo:
Children's views are essential to enabling schools to fulfil their duties under the Special Educational Needs and Disability Act 2001 and create inclusive learning environments. Arguably children are the best source of information about the ways in which schools support their learning and what barriers they encounter. Accessing this requires a deeper level of reflection than simply asking what children find difficult. It is also a challenge to ensure that the views of all children contribute including those who find communication difficult. Development work in five schools is drawn on to analyse the ways in which teachers used suggestions for three interview activities. The data reveals the strengths and limitations of different ways of supporting the communication process.
Resumo:
In numerical weather prediction, parameterisations are used to simulate missing physics in the model. These can be due to a lack of scientific understanding or a lack of computing power available to address all the known physical processes. Parameterisations are sources of large uncertainty in a model as parameter values used in these parameterisations cannot be measured directly and hence are often not well known; and the parameterisations themselves are also approximations of the processes present in the true atmosphere. Whilst there are many efficient and effective methods for combined state/parameter estimation in data assimilation (DA), such as state augmentation, these are not effective at estimating the structure of parameterisations. A new method of parameterisation estimation is proposed that uses sequential DA methods to estimate errors in the numerical models at each space-time point for each model equation. These errors are then fitted to pre-determined functional forms of missing physics or parameterisations that are based upon prior information. We applied the method to a one-dimensional advection model with additive model error, and it is shown that the method can accurately estimate parameterisations, with consistent error estimates. Furthermore, it is shown how the method depends on the quality of the DA results. The results indicate that this new method is a powerful tool in systematic model improvement.
Resumo:
The results of a search for squarks and gluinos using data from p (p) over bar collisions recorded at a center-of-mass energy of 1.96 TeV by the DO detector at the Fermilab Tevatron Collider are reported. The topologies analyzed consist of acoplanar-jet and multijet events with large missing transverse energy. No evidence for the production of squarks or gluinos was found in a data sample of 310pb(-1). Lower limits of 325 and 241 GeV were derived at the 95% C.L. on the squark and gluino masses, respectively, within the framework of minimal supergravity with tan beta=3, A(0) = 0, and mu < 0. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
We present a search for the pair production of scalar top quarks, (t) over tilde, using 995 pb(-1) of data collected in p (p) over bar collisions with the Dempty set detector at the Fermilab Tevatron Collider at root s = 1.96 TeV. Both scalar top quarks are assumed to decay into a charm quark and a neutralino ((chi) over tilde (0)(1)), where (chi) over tilde (0)(1) is the lightest supersymmetric particle. This leads to a final state with two acoplanar charm jets and missing transverse energy. We find the yield of such events to be consistent with the standard model expectation, and exclude sets of (t) over tilde and (chi) over tilde (0)(1) masses at the 95% C.L. that substantially extend the domain excluded by previous searches. Published by Elsevier B.V.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Search for supersymmetry in pp collisions at 7 TeV in events with jets and missing transverse energy
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)