901 resultados para probabilistic roadmap
Resumo:
Ship seakeeping operability refers to the quantification of motion performance in waves relative to mission requirements. This is used to make decisions about preferred vessel designs, but it can also be used as comprehensive assessment of the benefits of ship-motion-control systems. Traditionally, operability computation aggregates statistics of motion computed over over the envelope of likely environmental conditions in order to determine a coefficient in the range from 0 to 1 called operability. When used for assessment of motion-control systems, the increase of operability is taken as the key performance indicator. The operability coefficient is often given the interpretation of the percentage of time operable. This paper considers an alternative probabilistic approach to this traditional computation of operability. It characterises operability not as a number to which a frequency interpretation is attached, but as a hypothesis that a vessel will attain the desired performance in one mission considering the envelope of likely operational conditions. This enables the use of Bayesian theory to compute the probability of that this hypothesis is true conditional on data from simulations. Thus, the metric considered is the probability of operability. This formulation not only adheres to recent developments in reliability and risk analysis, but also allows incorporating into the analysis more accurate descriptions of ship-motion-control systems since the analysis is not limited to linear ship responses in the frequency domain. The paper also discusses an extension of the approach to the case of assessment of increased levels of autonomy for unmanned marine craft.
Resumo:
Background: A genetic network can be represented as a directed graph in which a node corresponds to a gene and a directed edge specifies the direction of influence of one gene on another. The reconstruction of such networks from transcript profiling data remains an important yet challenging endeavor. A transcript profile specifies the abundances of many genes in a biological sample of interest. Prevailing strategies for learning the structure of a genetic network from high-dimensional transcript profiling data assume sparsity and linearity. Many methods consider relatively small directed graphs, inferring graphs with up to a few hundred nodes. This work examines large undirected graphs representations of genetic networks, graphs with many thousands of nodes where an undirected edge between two nodes does not indicate the direction of influence, and the problem of estimating the structure of such a sparse linear genetic network (SLGN) from transcript profiling data. Results: The structure learning task is cast as a sparse linear regression problem which is then posed as a LASSO (l1-constrained fitting) problem and solved finally by formulating a Linear Program (LP). A bound on the Generalization Error of this approach is given in terms of the Leave-One-Out Error. The accuracy and utility of LP-SLGNs is assessed quantitatively and qualitatively using simulated and real data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) initiative provides gold standard data sets and evaluation metrics that enable and facilitate the comparison of algorithms for deducing the structure of networks. The structures of LP-SLGNs estimated from the INSILICO1, INSILICO2 and INSILICO3 simulated DREAM2 data sets are comparable to those proposed by the first and/or second ranked teams in the DREAM2 competition. The structures of LP-SLGNs estimated from two published Saccharomyces cerevisae cell cycle transcript profiling data sets capture known regulatory associations. In each S. cerevisiae LP-SLGN, the number of nodes with a particular degree follows an approximate power law suggesting that its degree distributions is similar to that observed in real-world networks. Inspection of these LP-SLGNs suggests biological hypotheses amenable to experimental verification. Conclusion: A statistically robust and computationally efficient LP-based method for estimating the topology of a large sparse undirected graph from high-dimensional data yields representations of genetic networks that are biologically plausible and useful abstractions of the structures of real genetic networks. Analysis of the statistical and topological properties of learned LP-SLGNs may have practical value; for example, genes with high random walk betweenness, a measure of the centrality of a node in a graph, are good candidates for intervention studies and hence integrated computational – experimental investigations designed to infer more realistic and sophisticated probabilistic directed graphical model representations of genetic networks. The LP-based solutions of the sparse linear regression problem described here may provide a method for learning the structure of transcription factor networks from transcript profiling and transcription factor binding motif data.
Resumo:
The development of techniques for scaling up classifiers so that they can be applied to problems with large datasets of training examples is one of the objectives of data mining. Recently, AdaBoost has become popular among machine learning community thanks to its promising results across a variety of applications. However, training AdaBoost on large datasets is a major problem, especially when the dimensionality of the data is very high. This paper discusses the effect of high dimensionality on the training process of AdaBoost. Two preprocessing options to reduce dimensionality, namely the principal component analysis and random projection are briefly examined. Random projection subject to a probabilistic length preserving transformation is explored further as a computationally light preprocessing step. The experimental results obtained demonstrate the effectiveness of the proposed training process for handling high dimensional large datasets.
Resumo:
This study investigates the potential of Relevance Vector Machine (RVM)-based approach to predict the ultimate capacity of laterally loaded pile in clay. RVM is a sparse approximate Bayesian kernel method. It can be seen as a probabilistic version of support vector machine. It provides much sparser regressors without compromising performance, and kernel bases give a small but worthwhile improvement in performance. RVM model outperforms the two other models based on root-mean-square-error (RMSE) and mean-absolute-error (MAE) performance criteria. It also stimates the prediction variance. The results presented in this paper clearly highlight that the RVM is a robust tool for prediction Of ultimate capacity of laterally loaded piles in clay.
Resumo:
In this dissertation, I present an overall methodological framework for studying linguistic alternations, focusing specifically on lexical variation in denoting a single meaning, that is, synonymy. As the practical example, I employ the synonymous set of the four most common Finnish verbs denoting THINK, namely ajatella, miettiä, pohtia and harkita ‘think, reflect, ponder, consider’. As a continuation to previous work, I describe in considerable detail the extension of statistical methods from dichotomous linguistic settings (e.g., Gries 2003; Bresnan et al. 2007) to polytomous ones, that is, concerning more than two possible alternative outcomes. The applied statistical methods are arranged into a succession of stages with increasing complexity, proceeding from univariate via bivariate to multivariate techniques in the end. As the central multivariate method, I argue for the use of polytomous logistic regression and demonstrate its practical implementation to the studied phenomenon, thus extending the work by Bresnan et al. (2007), who applied simple (binary) logistic regression to a dichotomous structural alternation in English. The results of the various statistical analyses confirm that a wide range of contextual features across different categories are indeed associated with the use and selection of the selected think lexemes; however, a substantial part of these features are not exemplified in current Finnish lexicographical descriptions. The multivariate analysis results indicate that the semantic classifications of syntactic argument types are on the average the most distinctive feature category, followed by overall semantic characterizations of the verb chains, and then syntactic argument types alone, with morphological features pertaining to the verb chain and extra-linguistic features relegated to the last position. In terms of overall performance of the multivariate analysis and modeling, the prediction accuracy seems to reach a ceiling at a Recall rate of roughly two-thirds of the sentences in the research corpus. The analysis of these results suggests a limit to what can be explained and determined within the immediate sentential context and applying the conventional descriptive and analytical apparatus based on currently available linguistic theories and models. The results also support Bresnan’s (2007) and others’ (e.g., Bod et al. 2003) probabilistic view of the relationship between linguistic usage and the underlying linguistic system, in which only a minority of linguistic choices are categorical, given the known context – represented as a feature cluster – that can be analytically grasped and identified. Instead, most contexts exhibit degrees of variation as to their outcomes, resulting in proportionate choices over longer stretches of usage in texts or speech.
Resumo:
- Introduction There is limited understanding of how young adults’ driving behaviour varies according to long-term substance involvement. It is possible that regular users of amphetamine-type stimulants (i.e. ecstasy (MDMA) and methamphetamine) may have a greater predisposition to engage in drink/drug driving compared to non-users. We compare offence rates, and self-reported drink/drug driving rates, for stimulant users and non-users in Queensland, and examine contributing factors. - Methods The Natural History Study of Drug Use is a prospective longitudinal study using population screening to recruit a probabilistic sample of amphetamine-type stimulant users and non-users aged 19-23 years. At the 4 ½ year follow-up, consent was obtained to extract data from participants’ Queensland driver records (ATS users: n=217, non-users: n=135). Prediction models were developed of offence rates in stimulant users controlling for factors such as aggression and delinquency. - Results Stimulant users were more likely than non-users to have had a drink-driving offence (8.7% vs. 0.8%, p < 0.001). Further, about 26% of ATS users and 14% of non-users self-reported driving under the influence of alcohol during the last 12 months. Among stimulant users, drink-driving was independently associated with last month high-volume alcohol consumption (Incident Rate Ratio (IRR): 5.70, 95% CI: 2.24-14.52), depression (IRR: 1.28, 95% CI: 1.07-1.52), low income (IRR: 3.57, 95% CI: 1.12-11.38), and male gender (IRR: 5.40, 95% CI: 2.05-14.21). - Conclusions Amphetamine-type stimulant use is associated with increased long-term risk of drink-driving, due to a number of behavioural and social factors. Inter-sectoral approaches which target long-term behaviours may reduce offending rates.
Resumo:
Consumer risk assessment is a crucial step in the regulatory approval of pesticide use on food crops. Recently, an additional hurdle has been added to the formal consumer risk assessment process with the introduction of short-term intake or exposure assessment and a comparable short-term toxicity reference, the acute reference dose. Exposure to residues during one meal or over one day is important for short-term or acute intake. Exposure in the short term can be substantially higher than average because the consumption of a food on a single occasion can be very large compared with typical long-term or mean consumption and the food may have a much larger residue than average. Furthermore, the residue level in a single unit of a fruit or vegetable may be higher by a factor (defined as the variability factor, which we have shown to be typically ×3 for the 97.5th percentile unit) than the average residue in the lot. Available marketplace data and supervised residue trial data are examined in an investigation of the variability of residues in units of fruit and vegetables. A method is described for estimating the 97.5th percentile value from sets of unit residue data. Variability appears to be generally independent of the pesticide, the crop, crop unit size and the residue level. The deposition of pesticide on the individual unit during application is probably the most significant factor. The diets used in the calculations ideally come from individual and household surveys with enough consumers of each specific food to determine large portion sizes. The diets should distinguish the different forms of a food consumed, eg canned, frozen or fresh, because the residue levels associated with the different forms may be quite different. Dietary intakes may be calculated by a deterministic method or a probabilistic method. In the deterministic method the intake is estimated with the assumptions of large portion consumption of a ‘high residue’ food (high residue in the sense that the pesticide was used at the highest recommended label rate, the crop was harvested at the smallest interval after treatment and the residue in the edible portion was the highest found in any of the supervised trials in line with these use conditions). The deterministic calculation also includes a variability factor for those foods consumed as units (eg apples, carrots) to allow for the elevated residue in some single units which may not be seen in composited samples. In the probabilistic method the distribution of dietary consumption and the distribution of possible residues are combined in repeated probabilistic calculations to yield a distribution of possible residue intakes. Additional information such as percentage commodity treated and combination of residues from multiple commodities may be incorporated into probabilistic calculations. The IUPAC Advisory Committee on Crop Protection Chemistry has made 11 recommendations relating to acute dietary exposure.
Resumo:
The Gascoyne-Murchison region of Western Australia experiences an arid to semi-arid climate with a highly variable temporal and spatial rainfall distribution. The region has around 39.2 million hectares available for pastoral lease and supports predominantly catle and sheep grazing leases. In recent years a number of climate forecasting systems have been available offering rainfall probabilities with different lead times and a forecast period; however, the extent to which these systems are capable of fulfilling the requirements of the local pastoralists is still ambiguous. Issues can range from ensuring forecasts are issued with sufficient lead time to enable key planning or decisions to be revoked or altered, to ensuring forecast language is simple and clear, to negate possible misunderstandings in interpretation. A climate research project sought to provide an objective method to determine which available forecasting systems had the greatest forecasting skill at times of the year relevant to local property management. To aid this climate research project, the study reported here was undertaken with an overall objective of exploring local pastoralists' climate information needs. We also explored how well they understand common climate forecast terms such as 'mean', median' and 'probability', and how they interpret and apply forecast information to decisions. A stratified, proportional random sampling was used for the purpose of deriving the representative sample based on rainfall-enterprise combinations. In order to provide more time for decision-making than existing operational forecasts that are issued with zero lead time, pastoralists requested that forecasts be issued for May-July and January-March with lead times counting down from 4 to 0 months. We found forecasts of between 20 and 50 mm break-of-season or follow-up rainfall were likely to influence decisions. Eighty percent of pastoralists demonstrated in a test question that they had a poor technical understanding of how to interpret the standard wording of a probabilistic median rainfall forecast. this is worthy of further research to investigate whether inappropriate management decisions are being made because the forecasts are being misunderstood. We found more than half the respondents regularly access and use weather and climate forecasts or outlook information from a range of sources and almost three-quarters considered climate information or tools useful, with preferred methods for accessing this information by email, faxback service, internet and the Department of Agriculture Western Australia's Pastoral Memo. Despite differences in enterprise types and rainfall seasonality across the region we found seasonal climate forecasting needs were relatively consistent. It became clear that providing basic training and working with pastoralists to help them understand regional climatic drivers, climate terminology and jargon, and the best ways to apply the forecasts to enhance decision-making are important to improve their use of information. Consideration could also be given to engaging a range of producers to write the climate forecasts themselves in the language they use and understand, in consultation with the scientists who prepare the forecasts.
Resumo:
This paper is concerned the calculation of flame structure of one-dimensional laminar premixed flames using the technique of operator-splitting. The technique utilizes an explicit method of solution with one step Euler for chemistry and a novel probabilistic scheme for diffusion. The relationship between diffusion phenomenon and Gauss-Markoff process is exploited to obtain an unconditionally stable explicit difference scheme for diffusion. The method has been applied to (a) a model problem, (b) hydrazine decomposition, (c) a hydrogen-oxygen system with 28 reactions with constant Dρ 2 approximation, and (d) a hydrogen-oxygen system (28 reactions) with trace diffusion approximation. Certain interesting aspects of behaviour of the solution with non-unity Lewis number are brought out in the case of hydrazine flame. The results of computation in the most complex case are shown to compare very favourably with those of Warnatz, both in terms of accuracy of results as well as computational time, thus showing that explicit methods can be effective in flame computations. Also computations using the Gear-Hindmarsh for chemistry and the present approach for diffusion have been carried out and comparison of the two methods is presented.
Resumo:
Climate variability and change are risk factors for climate sensitive activities such as agriculture. Managing these risks requires "climate knowledge", i.e. a sound understanding of causes and consequences of climate variability and knowledge of potential management options that are suitable in light of the climatic risks posed. Often such information about prognostic variables (e.g. yield, rainfall, run-off) is provided in probabilistic terms (e.g. via cumulative distribution functions, CDF), whereby the quantitative assessments of these alternative management options is based on such CDFs. Sound statistical approaches are needed in order to assess whether difference between such CDFs are intrinsic features of systems dynamics or chance events (i.e. quantifying evidences against an appropriate null hypothesis). Statistical procedures that rely on such a hypothesis testing framework are referred to as "inferential statistics" in contrast to descriptive statistics (e.g. mean, median, variance of population samples, skill scores). Here we report on the extension of some of the existing inferential techniques that provides more relevant and adequate information for decision making under uncertainty.
Resumo:
The problem of learning correct decision rules to minimize the probability of misclassification is a long-standing problem of supervised learning in pattern recognition. The problem of learning such optimal discriminant functions is considered for the class of problems where the statistical properties of the pattern classes are completely unknown. The problem is posed as a game with common payoff played by a team of mutually cooperating learning automata. This essentially results in a probabilistic search through the space of classifiers. The approach is inherently capable of learning discriminant functions that are nonlinear in their parameters also. A learning algorithm is presented for the team and convergence is established. It is proved that the team can obtain the optimal classifier to an arbitrary approximation. Simulation results with a few examples are presented where the team learns the optimal classifier.
Resumo:
We discuss the effect of fluctuations of the random potential in directions transverse to the current flow in a modified Migdal-Kadanoff approach to probabilistic scaling of conductance with size L, in d-dimensional metallic systems. The conductance cumulants are finite and vary as Ld−1−n for n greater-or-equal, slanted 2 i.e. conductance fluctuations are constant for d = 3. The mean conductance has a non-classical correction with Image Full-size image (<1K) for d greater-or-equal, slanted 2. The form of the higher cumulants is strongly influenced by the transverse potential fluctuations and may be compared with the results of perturbative diagrammatic approaches.
Resumo:
Many fisheries worldwide have adopted vessel monitoring systems (VMS) for compliance purposes. An added benefit of these systems is that they collect a large amount of data on vessel locations at very fine spatial and temporal scales. This data can provide a wealth of information for stock assessment, research, and management. However, since most VMS implementations record vessel location at set time intervals with no regard to vessel activity, some methodology is required to determine which data records correspond to fishing activity. This paper describes a probabilistic approach, based on hidden Markov models (HMMs), to determine vessel activity. A HMM provides a natural framework for the problem and, by definition, models the intrinsic temporal correlation of the data. The paper describes the general approach that was developed and presents an example of this approach applied to the Queensland trawl fishery off the coast of eastern Australia. Finally, a simulation experiment is presented that compares the misallocation rates of the HMM approach with other approaches.
Resumo:
Relaxation labeling processes are a class of mechanisms that solve the problem of assigning labels to objects in a manner that is consistent with respect to some domain-specific constraints. We reformulate this using the model of a team of learning automata interacting with an environment or a high-level critic that gives noisy responses as to the consistency of a tentative labeling selected by the automata. This results in an iterative linear algorithm that is itself probabilistic. Using an explicit definition of consistency we give a complete analysis of this probabilistic relaxation process using weak convergence results for stochastic algorithms. Our model can accommodate a range of uncertainties in the compatibility functions. We prove a local convergence result and show that the point of convergence depends both on the initial labeling and the constraints. The algorithm is implementable in a highly parallel fashion.
Resumo:
The aim of this thesis is to develop a fully automatic lameness detection system that operates in a milking robot. The instrumentation, measurement software, algorithms for data analysis and a neural network model for lameness detection were developed. Automatic milking has become a common practice in dairy husbandry, and in the year 2006 about 4000 farms worldwide used over 6000 milking robots. There is a worldwide movement with the objective of fully automating every process from feeding to milking. Increase in automation is a consequence of increasing farm sizes, the demand for more efficient production and the growth of labour costs. As the level of automation increases, the time that the cattle keeper uses for monitoring animals often decreases. This has created a need for systems for automatically monitoring the health of farm animals. The popularity of milking robots also offers a new and unique possibility to monitor animals in a single confined space up to four times daily. Lameness is a crucial welfare issue in the modern dairy industry. Limb disorders cause serious welfare, health and economic problems especially in loose housing of cattle. Lameness causes losses in milk production and leads to early culling of animals. These costs could be reduced with early identification and treatment. At present, only a few methods for automatically detecting lameness have been developed, and the most common methods used for lameness detection and assessment are various visual locomotion scoring systems. The problem with locomotion scoring is that it needs experience to be conducted properly, it is labour intensive as an on-farm method and the results are subjective. A four balance system for measuring the leg load distribution of dairy cows during milking in order to detect lameness was developed and set up in the University of Helsinki Research farm Suitia. The leg weights of 73 cows were successfully recorded during almost 10,000 robotic milkings over a period of 5 months. The cows were locomotion scored weekly, and the lame cows were inspected clinically for hoof lesions. Unsuccessful measurements, caused by cows standing outside the balances, were removed from the data with a special algorithm, and the mean leg loads and the number of kicks during milking was calculated. In order to develop an expert system to automatically detect lameness cases, a model was needed. A probabilistic neural network (PNN) classifier model was chosen for the task. The data was divided in two parts and 5,074 measurements from 37 cows were used to train the model. The operation of the model was evaluated for its ability to detect lameness in the validating dataset, which had 4,868 measurements from 36 cows. The model was able to classify 96% of the measurements correctly as sound or lame cows, and 100% of the lameness cases in the validation data were identified. The number of measurements causing false alarms was 1.1%. The developed model has the potential to be used for on-farm decision support and can be used in a real-time lameness monitoring system.