917 resultados para Statistical mixture-design optimization
Resumo:
This paper proposes solutions to three issues pertaining to the estimation of finite mixture models with an unknown number of components: the non-identifiability induced by overfitting the number of components, the mixing limitations of standard Markov Chain Monte Carlo (MCMC) sampling techniques, and the related label switching problem. An overfitting approach is used to estimate the number of components in a finite mixture model via a Zmix algorithm. Zmix provides a bridge between multidimensional samplers and test based estimation methods, whereby priors are chosen to encourage extra groups to have weights approaching zero. MCMC sampling is made possible by the implementation of prior parallel tempering, an extension of parallel tempering. Zmix can accurately estimate the number of components, posterior parameter estimates and allocation probabilities given a sufficiently large sample size. The results will reflect uncertainty in the final model and will report the range of possible candidate models and their respective estimated probabilities from a single run. Label switching is resolved with a computationally light-weight method, Zswitch, developed for overfitted mixtures by exploiting the intuitiveness of allocation-based relabelling algorithms and the precision of label-invariant loss functions. Four simulation studies are included to illustrate Zmix and Zswitch, as well as three case studies from the literature. All methods are available as part of the R package Zmix, which can currently be applied to univariate Gaussian mixture models.
Resumo:
To facilitate marketing and export, the Australian macadamia industry requires accurate crop forecasts. Each year, two levels of crop predictions are produced for this industry. The first is an overall longer-term forecast based on tree census data of growers in the Australian Macadamia Society (AMS). This data set currently accounts for around 70% of total production, and is supplemented by our best estimates of non-AMS orchards. Given these total tree numbers, average yields per tree are needed to complete the long-term forecasts. Yields from regional variety trials were initially used, but were found to be consistently higher than the average yields that growers were obtaining. Hence, a statistical model was developed using growers' historical yields, also taken from the AMS database. This model accounted for the effects of tree age, variety, year, region and tree spacing, and explained 65% of the total variation in the yield per tree data. The second level of crop prediction is an annual climate adjustment of these overall long-term estimates, taking into account the expected effects on production of the previous year's climate. This adjustment is based on relative historical yields, measured as the percentage deviance between expected and actual production. The dominant climatic variables are observed temperature, evaporation, solar radiation and modelled water stress. Initially, a number of alternate statistical models showed good agreement within the historical data, with jack-knife cross-validation R2 values of 96% or better. However, forecasts varied quite widely between these alternate models. Exploratory multivariate analyses and nearest-neighbour methods were used to investigate these differences. For 2001-2003, the overall forecasts were in the right direction (when compared with the long-term expected values), but were over-estimates. In 2004 the forecast was well under the observed production, and in 2005 the revised models produced a forecast within 5.1% of the actual production. Over the first five years of forecasting, the absolute deviance for the climate-adjustment models averaged 10.1%, just outside the targeted objective of 10%.
Resumo:
Genetic mark–recapture requires efficient methods of uniquely identifying individuals. 'Shadows' (individuals with the same genotype at the selected loci) become more likely with increasing sample size, and bias harvest rate estimates. Finding loci is costly, but better loci reduce analysis costs and improve power. Optimal microsatellite panels minimize shadows, but panel design is a complex optimization process. locuseater and shadowboxer permit power and cost analysis of this process and automate some aspects, by simulating the entire experiment from panel design to harvest rate estimation.
Design and testing of stand-specific bucking instructions for use on modern cut-to-length harvesters
Resumo:
This study addresses three important issues in tree bucking optimization in the context of cut-to-length harvesting. (1) Would the fit between the log demand and log output distributions be better if the price and/or demand matrices controlling the bucking decisions on modern cut-to-length harvesters were adjusted to the unique conditions of each individual stand? (2) In what ways can we generate stand and product specific price and demand matrices? (3) What alternatives do we have to measure the fit between the log demand and log output distributions, and what would be an ideal goodness-of-fit measure? Three iterative search systems were developed for seeking stand-specific price and demand matrix sets: (1) A fuzzy logic control system for calibrating the price matrix of one log product for one stand at a time (the stand-level one-product approach); (2) a genetic algorithm system for adjusting the price matrices of one log product in parallel for several stands (the forest-level one-product approach); and (3) a genetic algorithm system for dividing the overall demand matrix of each of the several log products into stand-specific sub-demands simultaneously for several stands and products (the forest-level multi-product approach). The stem material used for testing the performance of the stand-specific price and demand matrices against that of the reference matrices was comprised of 9 155 Norway spruce (Picea abies (L.) Karst.) sawlog stems gathered by harvesters from 15 mature spruce-dominated stands in southern Finland. The reference price and demand matrices were either direct copies or slightly modified versions of those used by two Finnish sawmilling companies. Two types of stand-specific bucking matrices were compiled for each log product. One was from the harvester-collected stem profiles and the other was from the pre-harvest inventory data. Four goodness-of-fit measures were analyzed for their appropriateness in determining the similarity between the log demand and log output distributions: (1) the apportionment degree (index), (2) the chi-square statistic, (3) Laspeyres quantity index, and (4) the price-weighted apportionment degree. The study confirmed that any improvement in the fit between the log demand and log output distributions can only be realized at the expense of log volumes produced. Stand-level pre-control of price matrices was found to be advantageous, provided the control is done with perfect stem data. Forest-level pre-control of price matrices resulted in no improvement in the cumulative apportionment degree. Cutting stands under the control of stand-specific demand matrices yielded a better total fit between the demand and output matrices at the forest level than was obtained by cutting each stand with non-stand-specific reference matrices. The theoretical and experimental analyses suggest that none of the three alternative goodness-of-fit measures clearly outperforms the traditional apportionment degree measure. Keywords: harvesting, tree bucking optimization, simulation, fuzzy control, genetic algorithms, goodness-of-fit
Resumo:
The past decade has brought a proliferation of statistical genetic (linkage) analysis techniques, incorporating new methodology and/or improvement of existing methodology in gene mapping, specifically targeted towards the localization of genes underlying complex disorders. Most of these techniques have been implemented in user-friendly programs and made freely available to the genetics community. Although certain packages may be more 'popular' than others, a common question asked by genetic researchers is 'which program is best for me?'. To help researchers answer this question, the following software review aims to summarize the main advantages and disadvantages of the popular GENEHUNTER package.
Resumo:
This study investigated within-person relationships between daily problem solving demands, selection, optimization, and compensation (SOC) strategy use, job satisfaction, and fatigue at work. Based on conservation of resources theory, it was hypothesized that high SOC strategy use boosts the positive relationship between problem solving demands and job satisfaction, and buffers the positive relationship between problem solving demands and fatigue. Using a daily diary study design, data were collected from 64 administrative employees who completed a general questionnaire and two daily online questionnaires over four work days. Multilevel analyses showed that problem solving demands were positively related to fatigue, but unrelated to job satisfaction. SOC strategy use was positively related to job satisfaction, but unrelated to fatigue. A buffering effect of high SOC strategy use on the demands-fatigue relationship was found, but no booster effect on the demands-satisfaction relationship. The results suggest that high SOC strategy use is a resource that protects employees from the negative effects of high problem solving demands.
Resumo:
This study examines the application of digital ecosystems concepts to a biological ecosystem simulation problem. The problem involves the use of a digital ecosystem agent to optimize the accuracy of a second digital ecosystem agent, the biological ecosystem simulation. The study also incorporates social ecosystems, with a technological solution design subsystem communicating with a science subsystem and simulation software developer subsystem to determine key characteristics of the biological ecosystem simulation. The findings show similarities between the issues involved in digital ecosystem collaboration and those occurring when digital ecosystems interact with biological ecosystems. The results also suggest that even precise semantic descriptions and comprehensive ontologies may be insufficient to describe agents in enough detail for use within digital ecosystems, and a number of solutions to this problem are proposed.
Resumo:
Whether a statistician wants to complement a probability model for observed data with a prior distribution and carry out fully probabilistic inference, or base the inference only on the likelihood function, may be a fundamental question in theory, but in practice it may well be of less importance if the likelihood contains much more information than the prior. Maximum likelihood inference can be justified as a Gaussian approximation at the posterior mode, using flat priors. However, in situations where parametric assumptions in standard statistical models would be too rigid, more flexible model formulation, combined with fully probabilistic inference, can be achieved using hierarchical Bayesian parametrization. This work includes five articles, all of which apply probability modeling under various problems involving incomplete observation. Three of the papers apply maximum likelihood estimation and two of them hierarchical Bayesian modeling. Because maximum likelihood may be presented as a special case of Bayesian inference, but not the other way round, in the introductory part of this work we present a framework for probability-based inference using only Bayesian concepts. We also re-derive some results presented in the original articles using the toolbox equipped herein, to show that they are also justifiable under this more general framework. Here the assumption of exchangeability and de Finetti's representation theorem are applied repeatedly for justifying the use of standard parametric probability models with conditionally independent likelihood contributions. It is argued that this same reasoning can be applied also under sampling from a finite population. The main emphasis here is in probability-based inference under incomplete observation due to study design. This is illustrated using a generic two-phase cohort sampling design as an example. The alternative approaches presented for analysis of such a design are full likelihood, which utilizes all observed information, and conditional likelihood, which is restricted to a completely observed set, conditioning on the rule that generated that set. Conditional likelihood inference is also applied for a joint analysis of prevalence and incidence data, a situation subject to both left censoring and left truncation. Other topics covered are model uncertainty and causal inference using posterior predictive distributions. We formulate a non-parametric monotonic regression model for one or more covariates and a Bayesian estimation procedure, and apply the model in the context of optimal sequential treatment regimes, demonstrating that inference based on posterior predictive distributions is feasible also in this case.
Resumo:
The built environment is a major contributor to the world’s carbon dioxide emissions, with a considerable amount of energy being consumed in buildings due to heating, ventilation and air-conditioning, space illumination, use of electrical appliances, etc., to facilitate various anthropogenic activities. The development of sustainable buildings seeks to ameliorate this situation mainly by reducing energy consumption. Sustainable building design, however, is a complicated process involving a large number of design variables, each with a range of feasible values. There are also multiple, often conflicting, objectives involved such as the life cycle costs and occupant satisfaction. One approach to dealing with this is through the use of optimization models. In this paper, a new multi-objective optimization model is developed for sustainable building design by considering the design objectives of cost and energy consumption minimization and occupant comfort level maximization. In a case study demonstration, it is shown that the model can derive a set of suitable design solutions in terms of life cycle cost, energy consumption and indoor environmental quality so as to help the client and design team gain a better understanding of the design space and trade-off patterns between different design objectives. The model can very useful in the conceptual design stages to determine appropriate operational settings to achieve the optimal building performance in terms of minimizing energy consumption and maximizing occupant comfort level.
Resumo:
Early detection of (pre-)signs of ulceration on a diabetic foot is valuable for clinical practice. Hyperspectral imaging is a promising technique for detection and classification of such (pre-)signs. However, the number of the spectral bands should be limited to avoid overfitting, which is critical for pixel classification with hyperspectral image data. The goal was to design a detector/classifier based on spectral imaging (SI) with a small number of optical bandpass filters. The performance and stability of the design were also investigated. The selection of the bandpass filters boils down to a feature selection problem. A dataset was built, containing reflectance spectra of 227 skin spots from 64 patients, measured with a spectrometer. Each skin spot was annotated manually by clinicians as "healthy" or a specific (pre-)sign of ulceration. Statistical analysis on the data set showed the number of required filters is between 3 and 7, depending on additional constraints on the filter set. The stability analysis revealed that shot noise was the most critical factor affecting the classification performance. It indicated that this impact could be avoided in future SI systems with a camera sensor whose saturation level is higher than 106, or by postimage processing.
Resumo:
Salinity gradient power is proposed as a source of renewable energy when two solutions of different salinity are mixed. In particular, Pressure Retarded Osmosis (PRO) coupled with a Reverse Osmosis process (RO) has been previously suggested for power generation, using RO brine as the draw solution. However, integration of PRO with RO may have further value for increasing the extent of water recovery in a desalination process. Consequently, this study was designed to model the impact of various system parameters to better understand how to design and operate practical PRO-RO units. The impact of feed salinity and recovery rate for the RO process on the concentration of draw solution, feed pressure, and membrane area of the PRO process was evaluated. The PRO system was designed to operate at maximum power density of . Model results showed that the PRO power density generated intensified with increasing seawater salinity and RO recovery rate. For an RO process operating at 52% recovery rate and 35 g/L feed salinity, a maximum power density of 24 W/m2 was achieved using 4.5 M NaCl draw solution. When seawater salinity increased to 45 g/L and the RO recovery rate was 46%, the PRO power density increased to 28 W/m2 using 5 M NaCl draw solution. The PRO system was able to increase the recovery rate of the RO by up to 18% depending on seawater salinity and RO recovery rate. This result suggested a potential advantage of coupling PRO process with RO system to increase the recovery rate of the desalination process and reduce brine discharge.
Resumo:
In this paper, we consider non-linear transceiver designs for multiuser multi-input multi-output (MIMO) down-link in the presence of imperfections in the channel state information at the transmitter (CSIT). The base station (BS) is equipped with multiple transmit antennas and each user terminal is equipped with multiple receive antennas. The BS employs Tomlinson-Harashima precoding (THP) for inter-user interference pre-cancellation at the transmitter. We investigate robust THP transceiver designs based on the minimization of BS transmit power with mean square error (MSE) constraints, and balancing of MSE among users with a constraint on the total BS transmit power. We show that these design problems can be solved by iterative algorithms, wherein each iteration involves a pair of convex optimization problems. The robustness of the proposed algorithms to imperfections in CSIT is illustrated through simulations.
Resumo:
A considerable amount of work has been dedicated on the development of analytical solutions for flow of chemical contaminants through soils. Most of the analytical solutions for complex transport problems are closed-form series solutions. The convergence of these solutions depends on the eigen values obtained from a corresponding transcendental equation. Thus, the difficulty in obtaining exact solutions from analytical models encourages the use of numerical solutions for the parameter estimation even though, the later models are computationally expensive. In this paper a combination of two swarm intelligence based algorithms are used for accurate estimation of design transport parameters from the closed-form analytical solutions. Estimation of eigen values from a transcendental equation is treated as a multimodal discontinuous function optimization problem. The eigen values are estimated using an algorithm derived based on glowworm swarm strategy. Parameter estimation of the inverse problem is handled using standard PSO algorithm. Integration of these two algorithms enables an accurate estimation of design parameters using closed-form analytical solutions. The present solver is applied to a real world inverse problem in environmental engineering. The inverse model based on swarm intelligence techniques is validated and the accuracy in parameter estimation is shown. The proposed solver quickly estimates the design parameters with a great precision.
Resumo:
Many biological environments are crowded by macromolecules, organelles and cells which can impede the transport of other cells and molecules. Previous studies have sought to describe these effects using either random walk models or fractional order diffusion equations. Here we examine the transport of both a single agent and a population of agents through an environment containing obstacles of varying size and shape, whose relative densities are drawn from a specified distribution. Our simulation results for a single agent indicate that smaller obstacles are more effective at retarding transport than larger obstacles; these findings are consistent with our simulations of the collective motion of populations of agents. In an attempt to explore whether these kinds of stochastic random walk simulations can be described using a fractional order diffusion equation framework, we calibrate the solution of such a differential equation to our averaged agent density information. Our approach suggests that these kinds of commonly used differential equation models ought to be used with care since we are unable to match the solution of a fractional order diffusion equation to our data in a consistent fashion over a finite time period.
Resumo:
In this paper, we exploit the idea of decomposition to match buyers and sellers in an electronic exchange for trading large volumes of homogeneous goods, where the buyers and sellers specify marginal-decreasing piecewise constant price curves to capture volume discounts. Such exchanges are relevant for automated trading in many e-business applications. The problem of determining winners and Vickrey prices in such exchanges is known to have a worst-case complexity equal to that of as many as (1 + m + n) NP-hard problems, where m is the number of buyers and n is the number of sellers. Our method proposes the overall exchange problem to be solved as two separate and simpler problems: 1) forward auction and 2) reverse auction, which turns out to be generalized knapsack problems. In the proposed approach, we first determine the quantity of units to be traded between the sellers and the buyers using fast heuristics developed by us. Next, we solve a forward auction and a reverse auction using fully polynomial time approximation schemes available in the literature. The proposed approach has worst-case polynomial time complexity. and our experimentation shows that the approach produces good quality solutions to the problem. Note to Practitioners- In recent times, electronic marketplaces have provided an efficient way for businesses and consumers to trade goods and services. The use of innovative mechanisms and algorithms has made it possible to improve the efficiency of electronic marketplaces by enabling optimization of revenues for the marketplace and of utilities for the buyers and sellers. In this paper, we look at single-item, multiunit electronic exchanges. These are electronic marketplaces where buyers submit bids and sellers ask for multiple units of a single item. We allow buyers and sellers to specify volume discounts using suitable functions. Such exchanges are relevant for high-volume business-to-business trading of standard products, such as silicon wafers, very large-scale integrated chips, desktops, telecommunications equipment, commoditized goods, etc. The problem of determining winners and prices in such exchanges is known to involve solving many NP-hard problems. Our paper exploits the familiar idea of decomposition, uses certain algorithms from the literature, and develops two fast heuristics to solve the problem in a near optimal way in worst-case polynomial time.