129 resultados para Data Streams Distribution
Resumo:
Japanese encephalitis (JE) is the most common cause of viral encephalitis and an important public health concern in the Asia-Pacific region, particularly in China where 50% of global cases are notified. To explore the association between environmental factors and human JE cases and identify the high risk areas for JE transmission in China, we used annual notified data on JE cases at the center of administrative township and environmental variables with a pixel resolution of 1 km×1 km from 2005 to 2011 to construct models using ecological niche modeling (ENM) approaches based on maximum entropy. These models were then validated by overlaying reported human JE case localities from 2006 to 2012 onto each prediction map. ENMs had good discriminatory ability with the area under the curve (AUC) of the receiver operating curve (ROC) of 0.82-0.91, and low extrinsic omission rate of 5.44-7.42%. Resulting maps showed JE being presented extensively throughout southwestern and central China, with local spatial variations in probability influenced by minimum temperatures, human population density, mean temperatures, and elevation, with contribution of 17.94%-38.37%, 15.47%-21.82%, 3.86%-21.22%, and 12.05%-16.02%, respectively. Approximately 60% of JE cases occurred in predicted high risk areas, which covered less than 6% of areas in mainland China. Our findings will help inform optimal geographical allocation of the limited resources available for JE prevention and control in China, find hidden high-risk areas, and increase the effectiveness of public health interventions against JE transmission.
Resumo:
Large sized power transformers are important parts of the power supply chain. These very critical networks of engineering assets are an essential base of a nation’s energy resource infrastructure. This research identifies the key factors influencing transformer normal operating conditions and predicts the asset management lifespan. Engineering asset research has developed few lifespan forecasting methods combining real-time monitoring solutions for transformer maintenance and replacement. Utilizing the rich data source from a remote terminal unit (RTU) system for sensor-data driven analysis, this research develops an innovative real-time lifespan forecasting approach applying logistic regression based on the Weibull distribution. The methodology and the implementation prototype are verified using a data series from 161 kV transformers to evaluate the efficiency and accuracy for energy sector applications. The asset stakeholders and suppliers significantly benefit from the real-time power transformer lifespan evaluation for maintenance and replacement decision support.
Resumo:
Background Spatial analysis is increasingly important for identifying modifiable geographic risk factors for disease. However, spatial health data from surveys are often incomplete, ranging from missing data for only a few variables, to missing data for many variables. For spatial analyses of health outcomes, selection of an appropriate imputation method is critical in order to produce the most accurate inferences. Methods We present a cross-validation approach to select between three imputation methods for health survey data with correlated lifestyle covariates, using as a case study, type II diabetes mellitus (DM II) risk across 71 Queensland Local Government Areas (LGAs). We compare the accuracy of mean imputation to imputation using multivariate normal and conditional autoregressive prior distributions. Results Choice of imputation method depends upon the application and is not necessarily the most complex method. Mean imputation was selected as the most accurate method in this application. Conclusions Selecting an appropriate imputation method for health survey data, after accounting for spatial correlation and correlation between covariates, allows more complete analysis of geographic risk factors for disease with more confidence in the results to inform public policy decision-making.
Resumo:
Background Over the past decade, molecular imaging has played a key role in the progression of drug delivery platforms from concept to commercialisation. Of the molecular imaging techniques commonly utilised, positron emission tomography (PET) can yield a breadth of information not easily accessible by other methodologies and when combined with other complementary imaging modalities, is a powerful tool for pre- and clinical development of therapeutics. However, very little research has focussed on the information available from complimentary imaging modalities. This paper reports on the data-rich methodologies of contrast enhanced PET/CT and PET/MRI for probing efficacy of polymer drug delivery platforms. Results The information available from an ExiTron nano 6000 contrast enhanced PET/CT and a gadolinium (Gd) enhanced PET/MRI image of a 64Cu labeled HBP in the same mouse was qualitatively compared. Conclusions Gd contrast enhanced PET/MRI offers a powerful methodology for investigating the distribution of polymer drug delivery platforms in vivo and throughout a tumour volume. Furthermore, information about depth of penetration away from primary blood vessels can be gleaned, potentially leading to development of more efficacious delivery vehicles for clinical use.
Resumo:
Introduction Two symposia on “cardiovascular diseases and vulnerable plaques” Cardiovascular disease (CVD) is the leading cause of death worldwide. Huge effort has been made in many disciplines including medical imaging, computational modeling, bio- mechanics, bioengineering, medical devices, animal and clinical studies, population studies as well as genomic, molecular, cellular and organ-level studies seeking improved methods for early detection, diagnosis, prevention and treatment of these diseases [1-14]. However, the mechanisms governing the initiation, progression and the occurrence of final acute clinical CVD events are still poorly understood. A large number of victims of these dis- eases who are apparently healthy die suddenly without prior symptoms. Available screening and diagnostic methods are insufficient to identify the victims before the event occurs [8,9]. Most cardiovascular diseases are associated with vulnerable plaques. A grand challenge here is to develop new imaging techniques, predictive methods and patient screening tools to identify vulnerable plaques and patients who are more vulnerable to plaque rupture and associated clinical events such as stroke and heart attack, and recommend proper treatment plans to prevent those clinical events from happening. Articles in this special issue came from two symposia held recently focusing on “Cardio-vascular Diseases and Vulnerable Plaques: Data, Modeling, Predictions and Clinical Applications.” One was held at Worcester Polytechnic Institute (WPI), Worcester, MA, USA, July 13-14, 2014, right after the 7th World Congress of Biomechanics. This symposium was endorsed by the World Council of Biomechanics, and partially supported by a grant from NIH-National Institute of Biomedical Image and Bioengineering. The other was held at Southeast University (SEU), Nanjing, China, April 18-20, 2014.
Resumo:
Deriving an estimate of optimal fishing effort or even an approximate estimate is very valuable for managing fisheries with multiple target species. The most challenging task associated with this is allocating effort to individual species when only the total effort is recorded. Spatial information on the distribution of each species within a fishery can be used to justify the allocations, but often such information is not available. To determine the long-term overall effort required to achieve maximum sustainable yield (MSY) and maximum economic yield (MEY), we consider three methods for allocating effort: (i) optimal allocation, which optimally allocates effort among target species; (ii) fixed proportions, which chooses proportions based on past catch data; and (iii) economic allocation, which splits effort based on the expected catch value of each species. Determining the overall fishing effort required to achieve these management objectives is a maximizing problem subject to constraints due to economic and social considerations. We illustrated the approaches using a case study of the Moreton Bay Prawn Trawl Fishery in Queensland (Australia). The results were consistent across the three methods. Importantly, our analysis demonstrated the optimal total effort was very sensitive to daily fishing costs-the effort ranged from 9500-11 500 to 6000-7000, 4000 and 2500 boat-days, using daily cost estimates of $0, $500, $750, and $950, respectively. The zero daily cost corresponds to the MSY, while a daily cost of $750 most closely represents the actual present fishing cost. Given the recent debate on which costs should be factored into the analyses for deriving MEY, our findings highlight the importance of including an appropriate cost function for practical management advice. The approaches developed here could be applied to other multispecies fisheries where only aggregated fishing effort data are recorded, as the literature on this type of modelling is sparse.
Resumo:
Selection criteria and misspecification tests for the intra-cluster correlation structure (ICS) in longitudinal data analysis are considered. In particular, the asymptotical distribution of the correlation information criterion (CIC) is derived and a new method for selecting a working ICS is proposed by standardizing the selection criterion as the p-value. The CIC test is found to be powerful in detecting misspecification of the working ICS structures, while with respect to the working ICS selection, the standardized CIC test is also shown to have satisfactory performance. Some simulation studies and applications to two real longitudinal datasets are made to illustrate how these criteria and tests might be useful.
Resumo:
This paper considers the one-sample sign test for data obtained from general ranked set sampling when the number of observations for each rank are not necessarily the same, and proposes a weighted sign test because observations with different ranks are not identically distributed. The optimal weight for each observation is distribution free and only depends on its associated rank. It is shown analytically that (1) the weighted version always improves the Pitman efficiency for all distributions; and (2) the optimal design is to select the median from each ranked set.
Resumo:
We propose a new model for estimating the size of a population from successive catches taken during a removal experiment. The data from these experiments often have excessive variation, known as overdispersion, as compared with that predicted by the multinomial model. The new model allows catchability to vary randomly among samplings, which accounts for overdispersion. When the catchability is assumed to have a beta distribution, the likelihood function, which is refered to as beta-multinomial, is derived, and hence the maximum likelihood estimates can be evaluated. Simulations show that in the presence of extravariation in the data, the confidence intervals have been substantially underestimated in previous models (Leslie-DeLury, Moran) and that the new model provides more reliable confidence intervals. The performance of these methods was also demonstrated using two real data sets: one with overdispersion, from smallmouth bass (Micropterus dolomieu), and the other without overdispersion, from rat (Rattus rattus).
Resumo:
Robust methods are useful in making reliable statistical inferences when there are small deviations from the model assumptions. The widely used method of the generalized estimating equations can be "robustified" by replacing the standardized residuals with the M-residuals. If the Pearson residuals are assumed to be unbiased from zero, parameter estimators from the robust approach are asymptotically biased when error distributions are not symmetric. We propose a distribution-free method for correcting this bias. Our extensive numerical studies show that the proposed method can reduce the bias substantially. Examples are given for illustration.
Resumo:
We consider estimation of mortality rates and growth parameters from length-frequency data of a fish stock and derive the underlying length distribution of the population and the catch when there is individual variability in the von Bertalanffy growth parameter L-infinity. The model is flexible enough to accommodate 1) any recruitment pattern as a function of both time and length, 2) length-specific selectivity, and 3) varying fishing effort over time. The maximum likelihood method gives consistent estimates, provided the underlying distribution for individual variation in growth is correctly specified. Simulation results indicate that our method is reasonably robust to violations in the assumptions. The method is applied to tiger prawn data (Penaeus semisulcatus) to obtain estimates of natural and fishing mortality.
Resumo:
Background: Standard methods for quantifying IncuCyte ZOOM™ assays involve measurements that quantify how rapidly the initially-vacant area becomes re-colonised with cells as a function of time. Unfortunately, these measurements give no insight into the details of the cellular-level mechanisms acting to close the initially-vacant area. We provide an alternative method enabling us to quantify the role of cell motility and cell proliferation separately. To achieve this we calibrate standard data available from IncuCyte ZOOM™ images to the solution of the Fisher-Kolmogorov model. Results: The Fisher-Kolmogorov model is a reaction-diffusion equation that has been used to describe collective cell spreading driven by cell migration, characterised by a cell diffusivity, D, and carrying capacity limited proliferation with proliferation rate, λ, and carrying capacity density, K. By analysing temporal changes in cell density in several subregions located well-behind the initial position of the leading edge we estimate λ and K. Given these estimates, we then apply automatic leading edge detection algorithms to the images produced by the IncuCyte ZOOM™ assay and match this data with a numerical solution of the Fisher-Kolmogorov equation to provide an estimate of D. We demonstrate this method by applying it to interpret a suite of IncuCyte ZOOM™ assays using PC-3 prostate cancer cells and obtain estimates of D, λ and K. Comparing estimates of D, λ and K for a control assay with estimates of D, λ and K for assays where epidermal growth factor (EGF) is applied in varying concentrations confirms that EGF enhances the rate of scratch closure and that this stimulation is driven by an increase in D and λ, whereas K is relatively unaffected by EGF. Conclusions: Our approach for estimating D, λ and K from an IncuCyte ZOOM™ assay provides more detail about cellular-level behaviour than standard methods for analysing these assays. In particular, our approach can be used to quantify the balance of cell migration and cell proliferation and, as we demonstrate, allow us to quantify how the addition of growth factors affects these processes individually.
Resumo:
The quality of species distribution models (SDMs) relies to a large degree on the quality of the input data, from bioclimatic indices to environmental and habitat descriptors (Austin, 2002). Recent reviews of SDM techniques, have sought to optimize predictive performance e.g. Elith et al., 2006. In general SDMs employ one of three approaches to variable selection. The simplest approach relies on the expert to select the variables, as in environmental niche models Nix, 1986 or a generalized linear model without variable selection (Miller and Franklin, 2002). A second approach explicitly incorporates variable selection into model fitting, which allows examination of particular combinations of variables. Examples include generalized linear or additive models with variable selection (Hastie et al. 2002); or classification trees with complexity or model based pruning (Breiman et al., 1984, Zeileis, 2008). A third approach uses model averaging, to summarize the overall contribution of a variable, without considering particular combinations. Examples include neural networks, boosted or bagged regression trees and Maximum Entropy as compared in Elith et al. 2006. Typically, users of SDMs will either consider a small number of variable sets, via the first approach, or else supply all of the candidate variables (often numbering more than a hundred) to the second or third approaches. Bayesian SDMs exist, with several methods for eliciting and encoding priors on model parameters (see review in Low Choy et al. 2010). However few methods have been published for informative variable selection; one example is Bayesian trees (O’Leary 2008). Here we report an elicitation protocol that helps makes explicit a priori expert judgements on the quality of candidate variables. This protocol can be flexibly applied to any of the three approaches to variable selection, described above, Bayesian or otherwise. We demonstrate how this information can be obtained then used to guide variable selection in classical or machine learning SDMs, or to define priors within Bayesian SDMs.
Resumo:
Reputation systems are employed to measure the quality of items on the Web. Incorporating accurate reputation scores in recommender systems is useful to provide more accurate recommendations as recommenders are agnostic to reputation. The ratings aggregation process is a vital component of a reputation system. Reputation models available do not consider statistical data in the rating aggregation process. This limitation can reduce the accuracy of generated reputation scores. In this paper, we propose a new reputation model that considers previously ignored statistical data. We compare our proposed model against state-of the-art models using top-N recommender system experiment.
Resumo:
Compositional data analysis usually deals with relative information between parts where the total (abundances, mass, amount, etc.) is unknown or uninformative. This article addresses the question of what to do when the total is known and is of interest. Tools used in this case are reviewed and analysed, in particular the relationship between the positive orthant of D-dimensional real space, the product space of the real line times the D-part simplex, and their Euclidean space structures. The first alternative corresponds to data analysis taking logarithms on each component, and the second one to treat a log-transformed total jointly with a composition describing the distribution of component amounts. Real data about total abundances of phytoplankton in an Australian river motivated the present study and are used for illustration.