970 resultados para Zero-inflated Count Data
Resumo:
The attached file is created with Scientific Workplace Latex
Resumo:
Contexte Un objectif important de la prise en charge de l'arthrite juvénile oligoarticulaire serait d'altérer le cours de la maladie à l'aide d'une thérapie hâtive. Nous avons étudié l'effet des injections intra-articulaires de corticostéroïdes hâtives sur les chances d'atteindre un décompte d'articulation active de zéro et une maladie inactive. Méthode Les données démographiques, cliniques et thérapeutiques des patients avec oligoarthrite juvénile enrôlés dans une étude prospective longitudinale pancanadienne ont été collectées pendant 2 ans. Une injection hâtive était définie comme étant reçue dans les 3 premiers mois suivant le diagnostic. Les équations d'estimation généralisées ont été utilisées pour l'analyse statistique. Résultats Trois cent dix patients ont été inclus. Cent onze (35.8%) ont reçu une injection hâtive. Ces derniers avaient une maladie plus active lors de l'entrée dans l'étude. Les patients exposés à une injection hâtive avaient une chance similaire d'obtenir un décompte d'articulation active de zéro, OR 1.52 (IC95% 0.68-3.37), p=0.306 mais étaient significativement moins à risque d'avoir une maladie inactive, OR 0.35 (IC95% 0.14-0.88), p=0.026. Interprétation Dans cette cohorte de 310 patients avec oligoarthrite juvénile, les injections hâtives de corticostéroïdes n'ont pas mené à une probabilité plus élevée d'atteindre un décompte d'articulation active de zéro ou une maladie inactive. Des problématiques méthodologiques intrinsèques à l'utilisation de données observationnelles pour fins d'estimation d'effets thérapeutiques auraient pu biaiser les résultats. Nous ne pouvons affirmer avec certitude que les injections hâtives n'améliorent pas le décours de la maladie. Des études prospectives adressant les limitations soulevées seront requises pour clarifier la question.
Resumo:
We critically discuss relaxation experiments in magnetic systems that can be characterized in terms of an energy barrier distribution, showing that proper normalization of the relaxation data is needed whenever curves corresponding to different temperatures are to be compared. We show how these normalization factors can be obtained from experimental data by using the Tln (t/t0) scaling method without making any assumptions about the nature of the energy barrier distribution. The validity of the procedure is tested using a ferrofluid of Fe3O4 particles.
Resumo:
Die zunehmende Vernetzung der Informations- und Kommunikationssysteme führt zu einer weiteren Erhöhung der Komplexität und damit auch zu einer weiteren Zunahme von Sicherheitslücken. Klassische Schutzmechanismen wie Firewall-Systeme und Anti-Malware-Lösungen bieten schon lange keinen Schutz mehr vor Eindringversuchen in IT-Infrastrukturen. Als ein sehr wirkungsvolles Instrument zum Schutz gegenüber Cyber-Attacken haben sich hierbei die Intrusion Detection Systeme (IDS) etabliert. Solche Systeme sammeln und analysieren Informationen von Netzwerkkomponenten und Rechnern, um ungewöhnliches Verhalten und Sicherheitsverletzungen automatisiert festzustellen. Während signatur-basierte Ansätze nur bereits bekannte Angriffsmuster detektieren können, sind anomalie-basierte IDS auch in der Lage, neue bisher unbekannte Angriffe (Zero-Day-Attacks) frühzeitig zu erkennen. Das Kernproblem von Intrusion Detection Systeme besteht jedoch in der optimalen Verarbeitung der gewaltigen Netzdaten und der Entwicklung eines in Echtzeit arbeitenden adaptiven Erkennungsmodells. Um diese Herausforderungen lösen zu können, stellt diese Dissertation ein Framework bereit, das aus zwei Hauptteilen besteht. Der erste Teil, OptiFilter genannt, verwendet ein dynamisches "Queuing Concept", um die zahlreich anfallenden Netzdaten weiter zu verarbeiten, baut fortlaufend Netzverbindungen auf, und exportiert strukturierte Input-Daten für das IDS. Den zweiten Teil stellt ein adaptiver Klassifikator dar, der ein Klassifikator-Modell basierend auf "Enhanced Growing Hierarchical Self Organizing Map" (EGHSOM), ein Modell für Netzwerk Normalzustand (NNB) und ein "Update Model" umfasst. In dem OptiFilter werden Tcpdump und SNMP traps benutzt, um die Netzwerkpakete und Hostereignisse fortlaufend zu aggregieren. Diese aggregierten Netzwerkpackete und Hostereignisse werden weiter analysiert und in Verbindungsvektoren umgewandelt. Zur Verbesserung der Erkennungsrate des adaptiven Klassifikators wird das künstliche neuronale Netz GHSOM intensiv untersucht und wesentlich weiterentwickelt. In dieser Dissertation werden unterschiedliche Ansätze vorgeschlagen und diskutiert. So wird eine classification-confidence margin threshold definiert, um die unbekannten bösartigen Verbindungen aufzudecken, die Stabilität der Wachstumstopologie durch neuartige Ansätze für die Initialisierung der Gewichtvektoren und durch die Stärkung der Winner Neuronen erhöht, und ein selbst-adaptives Verfahren eingeführt, um das Modell ständig aktualisieren zu können. Darüber hinaus besteht die Hauptaufgabe des NNB-Modells in der weiteren Untersuchung der erkannten unbekannten Verbindungen von der EGHSOM und der Überprüfung, ob sie normal sind. Jedoch, ändern sich die Netzverkehrsdaten wegen des Concept drif Phänomens ständig, was in Echtzeit zur Erzeugung nicht stationärer Netzdaten führt. Dieses Phänomen wird von dem Update-Modell besser kontrolliert. Das EGHSOM-Modell kann die neuen Anomalien effektiv erkennen und das NNB-Model passt die Änderungen in Netzdaten optimal an. Bei den experimentellen Untersuchungen hat das Framework erfolgversprechende Ergebnisse gezeigt. Im ersten Experiment wurde das Framework in Offline-Betriebsmodus evaluiert. Der OptiFilter wurde mit offline-, synthetischen- und realistischen Daten ausgewertet. Der adaptive Klassifikator wurde mit dem 10-Fold Cross Validation Verfahren evaluiert, um dessen Genauigkeit abzuschätzen. Im zweiten Experiment wurde das Framework auf einer 1 bis 10 GB Netzwerkstrecke installiert und im Online-Betriebsmodus in Echtzeit ausgewertet. Der OptiFilter hat erfolgreich die gewaltige Menge von Netzdaten in die strukturierten Verbindungsvektoren umgewandelt und der adaptive Klassifikator hat sie präzise klassifiziert. Die Vergleichsstudie zwischen dem entwickelten Framework und anderen bekannten IDS-Ansätzen zeigt, dass der vorgeschlagene IDSFramework alle anderen Ansätze übertrifft. Dies lässt sich auf folgende Kernpunkte zurückführen: Bearbeitung der gesammelten Netzdaten, Erreichung der besten Performanz (wie die Gesamtgenauigkeit), Detektieren unbekannter Verbindungen und Entwicklung des in Echtzeit arbeitenden Erkennungsmodells von Eindringversuchen.
Resumo:
This analysis was stimulated by the real data analysis problem of household expenditure data. The full dataset contains expenditure data for a sample of 1224 households. The expenditure is broken down at 2 hierarchical levels: 9 major levels (e.g. housing, food, utilities etc.) and 92 minor levels. There are also 5 factors and 5 covariates at the household level. Not surprisingly, there are a small number of zeros at the major level, but many zeros at the minor level. The question is how best to model the zeros. Clearly, models that try to add a small amount to the zero terms are not appropriate in general as at least some of the zeros are clearly structural, e.g. alcohol/tobacco for households that are teetotal. The key question then is how to build suitable conditional models. For example, is the sub-composition of spending excluding alcohol/tobacco similar for teetotal and non-teetotal households? In other words, we are looking for sub-compositional independence. Also, what determines whether a household is teetotal? Can we assume that it is independent of the composition? In general, whether teetotal will clearly depend on the household level variables, so we need to be able to model this dependence. The other tricky question is that with zeros on more than one component, we need to be able to model dependence and independence of zeros on the different components. Lastly, while some zeros are structural, others may not be, for example, for expenditure on durables, it may be chance as to whether a particular household spends money on durables within the sample period. This would clearly be distinguishable if we had longitudinal data, but may still be distinguishable by looking at the distribution, on the assumption that random zeros will usually be for situations where any non-zero expenditure is not small. While this analysis is based on around economic data, the ideas carry over to many other situations, including geological data, where minerals may be missing for structural reasons (similar to alcohol), or missing because they occur only in random regions which may be missed in a sample (similar to the durables)
Resumo:
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completely absent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involved parts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method is introduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that the theoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approach has reasonable properties from a compositional point of view. In particular, it is “natural” in the sense that it recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in the same paper a substitution method for missing values on compositional data sets is introduced
Resumo:
Compositional data naturally arises from the scientific analysis of the chemical composition of archaeological material such as ceramic and glass artefacts. Data of this type can be explored using a variety of techniques, from standard multivariate methods such as principal components analysis and cluster analysis, to methods based upon the use of log-ratios. The general aim is to identify groups of chemically similar artefacts that could potentially be used to answer questions of provenance. This paper will demonstrate work in progress on the development of a documented library of methods, implemented using the statistical package R, for the analysis of compositional data. R is an open source package that makes available very powerful statistical facilities at no cost. We aim to show how, with the aid of statistical software such as R, traditional exploratory multivariate analysis can easily be used alongside, or in combination with, specialist techniques of compositional data analysis. The library has been developed from a core of basic R functionality, together with purpose-written routines arising from our own research (for example that reported at CoDaWork'03). In addition, we have included other appropriate publicly available techniques and libraries that have been implemented in R by other authors. Available functions range from standard multivariate techniques through to various approaches to log-ratio analysis and zero replacement. We also discuss and demonstrate a small selection of relatively new techniques that have hitherto been little-used in archaeometric applications involving compositional data. The application of the library to the analysis of data arising in archaeometry will be demonstrated; results from different analyses will be compared; and the utility of the various methods discussed
Resumo:
There is almost not a case in exploration geology, where the studied data doesn’t includes below detection limits and/or zero values, and since most of the geological data responds to lognormal distributions, these “zero data” represent a mathematical challenge for the interpretation. We need to start by recognizing that there are zero values in geology. For example the amount of quartz in a foyaite (nepheline syenite) is zero, since quartz cannot co-exists with nepheline. Another common essential zero is a North azimuth, however we can always change that zero for the value of 360°. These are known as “Essential zeros”, but what can we do with “Rounded zeros” that are the result of below the detection limit of the equipment? Amalgamation, e.g. adding Na2O and K2O, as total alkalis is a solution, but sometimes we need to differentiate between a sodic and a potassic alteration. Pre-classification into groups requires a good knowledge of the distribution of the data and the geochemical characteristics of the groups which is not always available. Considering the zero values equal to the limit of detection of the used equipment will generate spurious distributions, especially in ternary diagrams. Same situation will occur if we replace the zero values by a small amount using non-parametric or parametric techniques (imputation). The method that we are proposing takes into consideration the well known relationships between some elements. For example, in copper porphyry deposits, there is always a good direct correlation between the copper values and the molybdenum ones, but while copper will always be above the limit of detection, many of the molybdenum values will be “rounded zeros”. So, we will take the lower quartile of the real molybdenum values and establish a regression equation with copper, and then we will estimate the “rounded” zero values of molybdenum by their corresponding copper values. The method could be applied to any type of data, provided we establish first their correlation dependency. One of the main advantages of this method is that we do not obtain a fixed value for the “rounded zeros”, but one that depends on the value of the other variable. Key words: compositional data analysis, treatment of zeros, essential zeros, rounded zeros, correlation dependency
Resumo:
The quantitative estimation of Sea Surface Temperatures from fossils assemblages is a fundamental issue in palaeoclimatic and paleooceanographic investigations. The Modern Analogue Technique, a widely adopted method based on direct comparison of fossil assemblages with modern coretop samples, was revised with the aim of conforming it to compositional data analysis. The new CODAMAT method was developed by adopting the Aitchison metric as distance measure. Modern coretop datasets are characterised by a large amount of zeros. The zero replacement was carried out by adopting a Bayesian approach to the zero replacement, based on a posterior estimation of the parameter of the multinomial distribution. The number of modern analogues from which reconstructing the SST was determined by means of a multiple approach by considering the Proxies correlation matrix, Standardized Residual Sum of Squares and Mean Squared Distance. This new CODAMAT method was applied to the planktonic foraminiferal assemblages of a core recovered in the Tyrrhenian Sea. Kew words: Modern analogues, Aitchison distance, Proxies correlation matrix, Standardized Residual Sum of Squares
Resumo:
Using a flexible chemical box model with full heterogeneous chemistry, intercepts of chemically modified Langley plots have been computed for the 5 years of zenith-sky NO2 data from Faraday in Antarctica (65°S). By using these intercepts as the effective amount in the reference spectrum, drifts in zero of total vertical NO2 were much reduced. The error in zero of total NO2 is ±0.03×1015 moleccm−2 from one year to another. This error is small enough to determine trends in midsummer and any variability in denoxification between midwinters. The technique also suggests a more sensitive method for determining N2O5 from zenith-sky NO2 data.
Resumo:
None of the current surveillance streams monitoring the presence of scrapie in Great Britain provide a comprehensive and unbiased estimate of the prevalence of the disease at the holding level. Previous work to estimate the under-ascertainment adjusted prevalence of scrapie in Great Britain applied multiple-list capture–recapture methods. The enforcement of new control measures on scrapie-affected holdings in 2004 has stopped the overlapping between surveillance sources and, hence, the application of multiple-list capture–recapture models. Alternative methods, still under the capture–recapture methodology, relying on repeated entries in one single list have been suggested in these situations. In this article, we apply one-list capture–recapture approaches to data held on the Scrapie Notifications Database to estimate the undetected population of scrapie-affected holdings with clinical disease in Great Britain for the years 2002, 2003, and 2004. For doing so, we develop a new diagnostic tool for indication of heterogeneity as well as a new understanding of the Zelterman and Chao’s lower bound estimators to account for potential unobserved heterogeneity. We demonstrate that the Zelterman estimator can be viewed as a maximum likelihood estimator for a special, locally truncated Poisson likelihood equivalent to a binomial likelihood. This understanding allows the extension of the Zelterman approach by means of logistic regression to include observed heterogeneity in the form of covariates—in case studied here, the holding size and country of origin. Our results confirm the presence of substantial unobserved heterogeneity supporting the application of our two estimators. The total scrapie-affected holding population in Great Britain is around 300 holdings per year. None of the covariates appear to inform the model significantly.
Resumo:
Fixed transactions costs that prohibit exchange engender bias in supply analysis due to censoring of the sample observations. The associated bias in conventional regression procedures applied to censored data and the construction of robust methods for mitigating bias have been preoccupations of applied economists since Tobin [Econometrica 26 (1958) 24]. This literature assumes that the true point of censoring in the data is zero and, when this is not the case, imparts a bias to parameter estimates of the censored regression model. We conjecture that this bias can be significant; affirm this from experiments; and suggest techniques for mitigating this bias using Bayesian procedures. The bias-mitigating procedures are based on modifications of the key step that facilitates Bayesian estimation of the censored regression model; are easy to implement; work well in both small and large samples; and lead to significantly improved inference in the censored regression model. These findings are important in light of the widespread use of the zero-censored Tobit regression and we investigate their consequences using data on milk-market participation in the Ethiopian highlands. (C) 2004 Elsevier B.V. All rights reserved.
Resumo:
None of the current surveillance streams monitoring the presence of scrapie in Great Britain provide a comprehensive and unbiased estimate of the prevalence of the disease at the holding level. Previous work to estimate the under-ascertainment adjusted prevalence of scrapie in Great Britain applied multiple-list capture-recapture methods. The enforcement of new control measures on scrapie-affected holdings in 2004 has stopped the overlapping between surveillance sources and, hence, the application of multiple-list capture-recapture models. Alternative methods, still under the capture-recapture methodology, relying on repeated entries in one single list have been suggested in these situations. In this article, we apply one-list capture-recapture approaches to data held on the Scrapie Notifications Database to estimate the undetected population of scrapie-affected holdings with clinical disease in Great Britain for the years 2002, 2003, and 2004. For doing so, we develop a new diagnostic tool for indication of heterogeneity as well as a new understanding of the Zelterman and Chao's lower bound estimators to account for potential unobserved heterogeneity. We demonstrate that the Zelterman estimator can be viewed as a maximum likelihood estimator for a special, locally truncated Poisson likelihood equivalent to a binomial likelihood. This understanding allows the extension of the Zelterman approach by means of logistic regression to include observed heterogeneity in the form of covariates-in case studied here, the holding size and country of origin. Our results confirm the presence of substantial unobserved heterogeneity supporting the application of our two estimators. The total scrapie-affected holding population in Great Britain is around 300 holdings per year. None of the covariates appear to inform the model significantly.
Resumo:
The optimal and the zero-forcing beamformers are two commonly used algorithms in the subspace-based blind beamforming technology. The optimal beamformer is regarded as the algorithm with the best output SINR. The zero-forcing algorithm emphasizes the co-channel interference cancellation. This paper compares the performance of these two algorithms under some practical conditions: the effect of the finite data length and the existence of the angle estimation error. The investigation reveals that the zero-forcing algorithm can be more robust in the practical environment than the optimal algorithm.
Resumo:
In this paper we discuss current work concerning Appearance-based and CAD-based vision; two opposing vision strategies. CAD-based vision is geometry based, reliant on having complete object centred models. Appearance-based vision builds view dependent models from training images. Existing CAD-based vision systems that work with intensity images have all used one and zero dimensional features, for example lines, arcs, points and corners. We describe a system we have developed for combining these two strategies. Geometric models are extracted from a commercial CAD library of industry standard parts. Surface appearance characteristics are then learnt automatically by observing actual object instances. This information is combined with geometric information and is used in hypothesis evaluation. This augmented description improves the systems robustness to texture, specularities and other artifacts which are hard to model with geometry alone, whilst maintaining the advantages of a geometric description.