956 resultados para Models for count data
Resumo:
We have searched for periodic variations of the electronic recoil event rate in the (2-6) keV energy range recorded between February 2011 and March 2012 with the XENON100 detector, adding up to 224.6 live days in total. Following a detailed study to establish the stability of the detector and its background contributions during this run, we performed an un-binned profile likelihood analysis to identify any periodicity up to 500 days. We find a global significance of less than 1 sigma for all periods suggesting no statistically significant modulation in the data. While the local significance for an annual modulation is 2.8 sigma, the analysis of a multiple-scatter control sample and the phase of the modulation disfavor a dark matter interpretation. The DAMA/LIBRA annual modulation interpreted as a dark matter signature with axial-vector coupling of WIMPs to electrons is excluded at 4.8 sigma.
Resumo:
BACKGROUND Antiretroviral therapy (ART) initiation is now recommended irrespective of CD4 count. However data on the relationship between CD4 count at ART initiation and loss to follow-up (LTFU) are limited and conflicting. METHODS We conducted a cohort analysis including all adults initiating ART (2008-2012) at three public sector sites in South Africa. LTFU was defined as no visit in the 6 months before database closure. The Kaplan-Meier estimator and Cox's proportional hazards models examined the relationship between CD4 count at ART initiation and 24-month LTFU. Final models were adjusted for demographics, year of ART initiation, programme expansion and corrected for unascertained mortality. RESULTS Among 17 038 patients, the median CD4 at initiation increased from 119 (IQR 54-180) in 2008 to 257 (IQR 175-318) in 2012. In unadjusted models, observed LTFU was associated with both CD4 counts <100 cells/μL and CD4 counts ≥300 cells/μL. After adjustment, patients with CD4 counts ≥300 cells/μL were 1.35 (95% CI 1.12 to 1.63) times as likely to be LTFU after 24 months compared to those with a CD4 150-199 cells/μL. This increased risk for patients with CD4 counts ≥300 cells/μL was largest in the first 3 months on treatment. Correction for unascertained deaths attenuated the association between CD4 counts <100 cells/μL and LTFU while the association between CD4 counts ≥300 cells/μL and LTFU persisted. CONCLUSIONS Patients initiating ART at higher CD4 counts may be at increased risk for LTFU. With programmes initiating patients at higher CD4 counts, models of ART delivery need to be reoriented to support long-term retention.
Resumo:
A problem frequently encountered in Data Envelopment Analysis (DEA) is that the total number of inputs and outputs included tend to be too many relative to the sample size. One way to counter this problem is to combine several inputs (or outputs) into (meaningful) aggregate variables reducing thereby the dimension of the input (or output) vector. A direct effect of input aggregation is to reduce the number of constraints. This, in its turn, alters the optimal value of the objective function. In this paper, we show how a statistical test proposed by Banker (1993) may be applied to test the validity of a specific way of aggregating several inputs. An empirical application using data from Indian manufacturing for the year 2002-03 is included as an example of the proposed test.
Resumo:
We present a framework for fitting multiple random walks to animal movement paths consisting of ordered sets of step lengths and turning angles. Each step and turn is assigned to one of a number of random walks, each characteristic of a different behavioral state. Behavioral state assignments may be inferred purely from movement data or may include the habitat type in which the animals are located. Switching between different behavioral states may be modeled explicitly using a state transition matrix estimated directly from data, or switching probabilities may take into account the proximity of animals to landscape features. Model fitting is undertaken within a Bayesian framework using the WinBUGS software. These methods allow for identification of different movement states using several properties of observed paths and lead naturally to the formulation of movement models. Analysis of relocation data from elk released in east-central Ontario, Canada, suggests a biphasic movement behavior: elk are either in an "encamped" state in which step lengths are small and turning angles are high, or in an "exploratory" state, in which daily step lengths are several kilometers and turning angles are small. Animals encamp in open habitat (agricultural fields and opened forest), but the exploratory state is not associated with any particular habitat type.
Resumo:
The discrete-time Markov chain is commonly used in describing changes of health states for chronic diseases in a longitudinal study. Statistical inferences on comparing treatment effects or on finding determinants of disease progression usually require estimation of transition probabilities. In many situations when the outcome data have some missing observations or the variable of interest (called a latent variable) can not be measured directly, the estimation of transition probabilities becomes more complicated. In the latter case, a surrogate variable that is easier to access and can gauge the characteristics of the latent one is usually used for data analysis. ^ This dissertation research proposes methods to analyze longitudinal data (1) that have categorical outcome with missing observations or (2) that use complete or incomplete surrogate observations to analyze the categorical latent outcome. For (1), different missing mechanisms were considered for empirical studies using methods that include EM algorithm, Monte Carlo EM and a procedure that is not a data augmentation method. For (2), the hidden Markov model with the forward-backward procedure was applied for parameter estimation. This method was also extended to cover the computation of standard errors. The proposed methods were demonstrated by the Schizophrenia example. The relevance of public health, the strength and limitations, and possible future research were also discussed. ^
Resumo:
Prevalent sampling is an efficient and focused approach to the study of the natural history of disease. Right-censored time-to-event data observed from prospective prevalent cohort studies are often subject to left-truncated sampling. Left-truncated samples are not randomly selected from the population of interest and have a selection bias. Extensive studies have focused on estimating the unbiased distribution given left-truncated samples. However, in many applications, the exact date of disease onset was not observed. For example, in an HIV infection study, the exact HIV infection time is not observable. However, it is known that the HIV infection date occurred between two observable dates. Meeting these challenges motivated our study. We propose parametric models to estimate the unbiased distribution of left-truncated, right-censored time-to-event data with uncertain onset times. We first consider data from a length-biased sampling, a specific case in left-truncated samplings. Then we extend the proposed method to general left-truncated sampling. With a parametric model, we construct the full likelihood, given a biased sample with unobservable onset of disease. The parameters are estimated through the maximization of the constructed likelihood by adjusting the selection bias and unobservable exact onset. Simulations are conducted to evaluate the finite sample performance of the proposed methods. We apply the proposed method to an HIV infection study, estimating the unbiased survival function and covariance coefficients. ^
Resumo:
Ever since its discovery, Eocene Thermal Maximum 2 (ETM2; ~53.7 Ma) has been considered as one of the "little brothers" of the Paleocene-Eocene Thermal Maximum (PETM; ~56 Ma) as it displays similar characteristics including abrupt warming, ocean acidification, and biotic shifts. One of the remaining key questions is what effect these lesser climate perturbations had on ocean circulation and ventilation and, ultimately, biotic disruptions. Here we characterize ETM2 sections of the NE Atlantic (Deep Sea Drilling Project Sites 401 and 550) using multispecies benthic foraminiferal stable isotopes, grain size analysis, XRF core scanning, and carbonate content. The magnitude of the carbon isotope excursion (0.85-1.10 per mil) and bottom water warming (2-2.5°C) during ETM2 seems slightly smaller than in South Atlantic records. The comparison of the lateral d13C gradient between the North and South Atlantic reveals that a transient circulation switch took place during ETM2, a similar pattern as observed for the PETM. New grain size and published faunal data support this hypothesis by indicating a reduction in deepwater current velocity. Following ETM2, we record a distinct intensification of bottom water currents influencing Atlantic carbonate accumulation and biotic communities, while a dramatic and persistent clay reduction hints at a weakening of the regional hydrological cycle. Our findings highlight the similarities and differences between the PETM and ETM2. Moreover, the heterogeneity of hyperthermal expression emphasizes the need to specifically characterize each hyperthermal event and its background conditions to minimalize artifacts in global climate and carbonate burial models for the early Paleogene.
Resumo:
Studies on the impact of historical, current and future global change require very high-resolution climate data (less or equal 1km) as a basis for modelled responses, meaning that data from digital climate models generally require substantial rescaling. Another shortcoming of available datasets on past climate is that the effects of sea level rise and fall are not considered. Without such information, the study of glacial refugia or early Holocene plant and animal migration are incomplete if not impossible. Sea level at the last glacial maximum (LGM) was approximately 125m lower, creating substantial additional terrestrial area for which no current baseline data exist. Here, we introduce the development of a novel, gridded climate dataset for LGM that is both very high resolution (1km) and extends to the LGM sea and land mask. We developed two methods to extend current terrestrial precipitation and temperature data to areas between the current and LGM coastlines. The absolute interpolation error is less than 1°C and 0.5 °C for 98.9% and 87.8% of all pixels for the first two 1 arc degree distance zones. We use the change factor method with these newly assembled baseline data to downscale five global circulation models of LGM climate to a resolution of 1km for Europe. As additional variables we calculate 19 'bioclimatic' variables, which are often used in climate change impact studies on biological diversity. The new LGM climate maps are well suited for analysing refugia and migration during Holocene warming following the LGM.
Resumo:
The Spanish National Library (Biblioteca Nacional de España1. BNE) and the Ontology Engineering Group2 of Universidad Politécnica de Madrid are working on the joint project ?Preliminary Study of Linked Data?, whose aim is to enrich the Web of Data with the BNE authority and bibliographic records. To this end, they are transforming the BNE information to RDF following the Linked Data principles3 proposed by Tim Berners Lee.
Resumo:
We present a framework specially designed to deal with structurally complex data, where all individuals have the same structure, as is the case in many medical domains. A structurally complex individual may be composed of any type of singlevalued or multivalued attributes, including time series, for example. These attributes are structured according to domain-dependent hierarchies. Our aim is to generate reference models of population groups. These models represent the population archetype and are very useful for supporting such important tasks as diagnosis, detecting fraud, analyzing patient evolution, identifying control groups, etc.
Resumo:
Using the Bayesian approach as the model selection criteria, the main purpose in this study is to establish a practical road accident model that can provide a better interpretation and prediction performance. For this purpose we are using a structural explanatory model with autoregressive error term. The model estimation is carried out through Bayesian inference and the best model is selected based on the goodness of fit measures. To cross validate the model estimation further prediction analysis were done. As the road safety measures the number of fatal accidents in Spain, during 2000-2011 were employed. The results of the variable selection process show that the factors explaining fatal road accidents are mainly exposure, economic factors, and surveillance and legislative measures. The model selection shows that the impact of economic factors on fatal accidents during the period under study has been higher compared to surveillance and legislative measures.
Resumo:
Stream-mining approach is defined as a set of cutting-edge techniques designed to process streams of data in real time, in order to extract knowledge. In the particular case of classification, stream-mining has to adapt its behaviour to the volatile underlying data distributions, what has been called concept drift. Moreover, it is important to note that concept drift may lead to situations where predictive models become invalid and have therefore to be updated to represent the actual concepts that data poses. In this context, there is a specific type of concept drift, known as recurrent concept drift, where the concepts represented by data have already appeared in the past. In those cases the learning process could be saved or at least minimized by applying a previously trained model. This could be extremely useful in ubiquitous environments that are characterized by the existence of resource constrained devices. To deal with the aforementioned scenario, meta-models can be used in the process of enhancing the drift detection mechanisms used by data stream algorithms, by representing and predicting when the change will occur. There are some real-world situations where a concept reappears, as in the case of intrusion detection systems (IDS), where the same incidents or an adaptation of them usually reappear over time. In these environments the early prediction of drift by means of a better knowledge of past models can help to anticipate to the change, thus improving efficiency of the model regarding the training instances needed. By means of using meta-models as a recurrent drift detection mechanism, the ability to share concepts representations among different data mining processes is open. That kind of exchanges could improve the accuracy of the resultant local model as such model may benefit from patterns similar to the local concept that were observed in other scenarios, but not yet locally. This would also improve the efficiency of training instances used during the classification process, as long as the exchange of models would aid in the application of already trained recurrent models, that have been previously seen by any of the collaborative devices. Which it is to say that the scope of recurrence detection and representation is broaden. In fact the detection, representation and exchange of concept drift patterns would be extremely useful for the law enforcement activities fighting against cyber crime. Being the information exchange one of the main pillars of cooperation, national units would benefit from the experience and knowledge gained by third parties. Moreover, in the specific scope of critical infrastructures protection it is crucial to count with information exchange mechanisms, both from a strategical and technical scope. The exchange of concept drift detection schemes in cyber security environments would aid in the process of preventing, detecting and effectively responding to threads in cyber space. Furthermore, as a complement of meta-models, a mechanism to assess the similarity between classification models is also needed when dealing with recurrent concepts. In this context, when reusing a previously trained model a rough comparison between concepts is usually made, applying boolean logic. The introduction of fuzzy logic comparisons between models could lead to a better efficient reuse of previously seen concepts, by applying not just equal models, but also similar ones. This work faces the aforementioned open issues by means of: the MMPRec system, that integrates a meta-model mechanism and a fuzzy similarity function; a collaborative environment to share meta-models between different devices; a recurrent drift generator that allows to test the usefulness of recurrent drift systems, as it is the case of MMPRec. Moreover, this thesis presents an experimental validation of the proposed contributions using synthetic and real datasets.
Resumo:
Understanding spatial distributions and how environmental conditions influence catch-per-unit-effort (CPUE) is important for increased fishing efficiency and sustainable fisheries management. This study investigated the relationship between CPUE, spatial factors, temperature, and depth using generalized additive models. Combinations of factors, and not one single factor, were frequently included in the best model. Parameters which best described CPUE varied by geographic region. The amount of variance, or deviance, explained by the best models ranged from a low of 29% (halibut, Charlotte region) to a high of 94% (sablefish, Charlotte region). Depth, latitude, and longitude influenced most species in several regions. On the broad geographic scale, depth was associated with CPUE for every species, except dogfish. Latitude and longitude influenced most species, except halibut (Areas 4 A/D), sablefish, and cod. Temperature was important for describing distributions of halibut in Alaska, arrowtooth flounder in British Columbia, dogfish, Alaska skate, and Aleutian skate. The species-habitat relationships revealed in this study can be used to create improved fishing and management strategies.
Resumo:
Phase equilibrium data regression is an unavoidable task necessary to obtain the appropriate values for any model to be used in separation equipment design for chemical process simulation and optimization. The accuracy of this process depends on different factors such as the experimental data quality, the selected model and the calculation algorithm. The present paper summarizes the results and conclusions achieved in our research on the capabilities and limitations of the existing GE models and about strategies that can be included in the correlation algorithms to improve the convergence and avoid inconsistencies. The NRTL model has been selected as a representative local composition model. New capabilities of this model, but also several relevant limitations, have been identified and some examples of the application of a modified NRTL equation have been discussed. Furthermore, a regression algorithm has been developed that allows for the advisable simultaneous regression of all the condensed phase equilibrium regions that are present in ternary systems at constant T and P. It includes specific strategies designed to avoid some of the pitfalls frequently found in commercial regression tools for phase equilibrium calculations. Most of the proposed strategies are based on the geometrical interpretation of the lowest common tangent plane equilibrium criterion, which allows an unambiguous comprehension of the behavior of the mixtures. The paper aims to show all the work as a whole in order to reveal the necessary efforts that must be devoted to overcome the difficulties that still exist in the phase equilibrium data regression problem.
Resumo:
Substantial retreat or disintegration of numerous ice shelves have been observed on the Antarctic Peninsula. The ice shelf in the Prince Gustav Channel retreated gradually since the late 1980's and broke-up in 1995. Tributary glaciers reacted with speed-up, surface lowering and increased ice discharge, consequently contributing to sea level rise. We present a detailed long-term study (1993-2014) on the dynamic response of Sjögren Inlet glaciers to the disintegration of Prince Gustav Ice Shelf. We analyzed various remote sensing datasets to observe the reactions of the glaciers to the loss of the buttressing ice shelf. A strong increase in ice surface velocities was observed with maximum flow speeds reaching 2.82±0.48 m/d in 2007 and 1.50±0.32 m/d in 2004 at Sjögren and Boydell glaciers respectively. Subsequently, the flow velocities decelerated, however in late 2014, we still measured about two times the values of our first measurements in 1996. The tributary glaciers retreated 61.7±3.1 km² behind the former grounding line of the ice shelf. In regions below 1000 m a.s.l., a mean surface lowering of -68±10 m (-3.1 m/a) was observed in the period 1993-2014. The lowering rate decreased to -2.2 m/a in recent years. Based on the surface lowering rates, geodetic mass balances of the glaciers were derived for different time steps. High mass loss rate of -1.21±0.36 Gt/a was found in the earliest period (1993-2001). Due to the dynamic adjustments of the glaciers to the new boundary conditions the ice mass loss reduced to -0.59±0.11 Gt/a in the period 2012-2014, resulting in an average mass loss rate of -0.89±0.16 Gt/a (1993-2014). Including the retreat of the ice front and grounding line, a total mass change of -38.5±7.7 Gt and a contribution to sea level rise of 0.061±0.013 mm were computed. Analysis of the ice flux revealed that available bedrock elevation estimates at Sjögren Inlet are too shallow and are the major uncertainty in ice flux computations. This temporally dense time series analysis of Sjögren Inlet glaciers shows that the adjustments of tributary glaciers to ice shelf disintegration are still going on and provides detailed information of the changes in glacier dynamics.