970 resultados para model complexity


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Modeling helps to understand and predict the outcome of complex systems. Inductive modeling methodologies are beneficial for modeling the systems where the uncertainties involved in the system do not permit to obtain an accurate physical model. However inductive models, like artificial neural networks (ANNs), may suffer from a few drawbacks involving over-fitting and the difficulty to easily understand the model itself. This can result in user reluctance to accept the model or even complete rejection of the modeling results. Thus, it becomes highly desirable to make such inductive models more comprehensible and to automatically determine the model complexity to avoid over-fitting. In this paper, we propose a novel type of ANN, a mixed transfer function artificial neural network (MTFANN), which aims to improve the complexity fitting and comprehensibility of the most popular type of ANN (MLP - a Multilayer Perceptron).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We modify a selection of interactive modeling tools for use in a procedural modeling environment. These tools are selection, extrusion, subdivision and curve shaping. We create human models to demonstrate that these tools are appropriate for use on hierarchical objects. Our tools support the main benefits of procedural modeling, which are: the use of parameterisation to control and very a model, varying levels of detail, increased model complexity, base shape independence and database amplification. We demonstrate scripts which provide each of these benefits.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A critical question in data mining is that can we always trust what discovered by a data mining system unconditionally? The answer is obviously not. If not, when can we trust the discovery then? What are the factors that affect the reliability of the discovery? How do they affect the reliability of the discovery? These are some interesting questions to be investigated. In this chapter we will firstly provide a definition and the measurements of reliability, and analyse the factors that affect the reliability. We then examine the impact of model complexity, weak links, varying sample sizes and the ability of different learners to the reliability of graphical model discovery. The experimental results reveal that (1) the larger sample size for the discovery, the higher reliability we will get; (2) the stronger a graph link is, the easier the discovery will be and thus the higher the reliability it can achieve; (3) the complexity of a graph also plays an important role in the discovery. The higher the complexity of a graph is, the more difficult to induce the graph and the lower reliability it would be. We also examined the performance difference of different discovery algorithms. This reveals the impact of discovery process. The experimental results show the superior reliability and robustness of MML method to standard significance tests in the recovery of graph links with small samples and weak links.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A fundamental task in pervasive computing is reliable acquisition of contexts from sensor data. This is crucial to the operation of smart pervasive systems and services so that they might behave efficiently and appropriately upon a given context. Simple forms of context can often be extracted directly from raw data. Equally important, or more, is the hidden context and pattern buried inside the data, which is more challenging to discover. Most of existing approaches borrow methods and techniques from machine learning, dominantly employ parametric unsupervised learning and clustering techniques. Being parametric, a severe drawback of these methods is the requirement to specify the number of latent patterns in advance. In this paper, we explore the use of Bayesian nonparametric methods, a recent data modelling framework in machine learning, to infer latent patterns from sensor data acquired in a pervasive setting. Under this formalism, nonparametric prior distributions are used for data generative process, and thus, they allow the number of latent patterns to be learned automatically and grow with the data - as more data comes in, the model complexity can grow to explain new and unseen patterns. In particular, we make use of the hierarchical Dirichlet processes (HDP) to infer atomic activities and interaction patterns from honest signals collected from sociometric badges. We show how data from these sensors can be represented and learned with HDP. We illustrate insights into atomic patterns learned by the model and use them to achieve high-performance clustering. We also demonstrate the framework on the popular Reality Mining dataset, illustrating the ability of the model to automatically infer typical social groups in this dataset. Finally, our framework is generic and applicable to a much wider range of problems in pervasive computing where one needs to infer high-level, latent patterns and contexts from sensor data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Locusts and grasshoppers cause considerable economic damage to agriculture worldwide. The Australian Plague Locust Commission uses multiple pesticides to control locusts in eastern Australia. Avian exposure to agricultural pesticides is of conservation concern, especially in the case of rare and threatened species. The aim of this study was to evaluate the probability of pesticide exposure of native avian species during operational locust control based on knowledge of species occurrence in areas and times of application. Using presence-absence data provided by the Birds Australia Atlas for 1998 to 2002, we developed a series of generalized linear models to predict avian occurrences on a monthly basis in 0.5 degrees grid cells for 280 species over 2 million km2 in eastern Australia. We constructed species-specific models relating occupancy patterns to survey date and location, rainfall, and derived habitat preference. Model complexity depended on the number of observations available. Model output was the probability of occurrence for each species at times and locations of past locust control operations within the 5-year study period. Given the high spatiotemporal variability of locust control events, the variability in predicted bird species presence was high, with 108 of the total 280 species being included at least once in the top 20 predicted species for individual space-time events. The models were evaluated using field surveys collected between 2000 and 2005, at sites with and without locust outbreaks. Model strength varied among species. Some species were under- or over-predicted as times and locations of interest typically did not correspond to those in the prediction data set and certain species were likely attracted to locusts as a food source. Field surveys demonstrated the utility of the spatially explicit species lists derived from the models but also identified the presence of a number of previously unanticipated species. These results also emphasize the need for special consideration of rare and threatened species that are poorly predicted by presence-absence models. This modeling exercise was a useful a priori approach in species risk assessments to identify species present at times and locations of locust control applications, and to discover gaps in our knowledge and need for further focused data collection.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Hidden patterns and contexts play an important part in intelligent pervasive systems. Most of the existing works have focused on simple forms of contexts derived directly from raw signals. High-level constructs and patterns have been largely neglected or remained under-explored in pervasive computing, mainly due to the growing complexity over time and the lack of efficient principal methods to extract them. Traditional parametric modeling approaches from machine learning find it difficult to discover new, unseen patterns and contexts arising from continuous growth of data streams due to its practice of training-then-prediction paradigm. In this work, we propose to apply Bayesian nonparametric models as a systematic and rigorous paradigm to continuously learn hidden patterns and contexts from raw social signals to provide basic building blocks for context-aware applications. Bayesian nonparametric models allow the model complexity to grow with data, fitting naturally to several problems encountered in pervasive computing. Under this framework, we use nonparametric prior distributions to model the data generative process, which helps towards learning the number of latent patterns automatically, adapting to changes in data and discovering never-seen-before patterns, contexts and activities. The proposed methods are agnostic to data types, however our work shall demonstrate to two types of signals: accelerometer activity data and Bluetooth proximal data. © 2014 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The third primary production algorithm round robin (PPARR3) compares output from 24 models that estimate depth-integrated primary production from satellite measurements of ocean color, as well as seven general circulation models (GCMs) coupled with ecosystem or biogeochemical models. Here we compare the global primary production fields corresponding to eight months of 1998 and 1999 as estimated from common input fields of photosynthetically-available radiation (PAR), sea-surface temperature (SST), mixed-layer depth, and chlorophyll concentration. We also quantify the sensitivity of the ocean-color-based models to perturbations in their input variables. The pair-wise correlation between ocean-color models was used to cluster them into groups or related output, which reflect the regions and environmental conditions under which they respond differently. The groups do not follow model complexity with regards to wavelength or depth dependence, though they are related to the manner in which temperature is used to parameterize photosynthesis. Global average PP varies by a factor of two between models. The models diverged the most for the Southern Ocean, SST under 10 degrees C, and chlorophyll concentration exceeding 1 mg Chlm(-3). Based on the conditions under which the model results diverge most, we conclude that current ocean-color-based models are challenged by high-nutrient low-chlorophyll conditions, and extreme temperatures or chlorophyll concentrations. The GCM-based models predict comparable primary production to those based on ocean color: they estimate higher values in the Southern Ocean, at low SST, and in the equatorial band, while they estimate lower values in eutrophic regions (probably because the area of high chlorophyll concentrations is smaller in the GCMs). Further progress in primary production modeling requires improved understanding of the effect of temperature on photosynthesis and better parameterization of the maximum photosynthetic rate. (c) 2006 Elsevier Ltd. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This study aims to compare and validate two soil-vegetation-atmosphere-transfer (SVAT) schemes: TERRA-ML and the Community Land Model (CLM). Both SVAT schemes are run in standalone mode (decoupled from an atmospheric model) and forced with meteorological in-situ measurements obtained at several tropical African sites. Model performance is quantified by comparing simulated sensible and latent heat fluxes with eddy-covariance measurements. Our analysis indicates that the Community Land Model corresponds more closely to the micrometeorological observations, reflecting the advantages of the higher model complexity and physical realism. Deficiencies in TERRA-ML are addressed and its performance is improved: (1) adjusting input data (root depth) to region-specific values (tropical evergreen forest) resolves dry-season underestimation of evapotranspiration; (2) adjusting the leaf area index and albedo (depending on hard-coded model constants) resolves overestimations of both latent and sensible heat fluxes; and (3) an unrealistic flux partitioning caused by overestimated superficial water contents is reduced by adjusting the hydraulic conductivity parameterization. CLM is by default more versatile in its global application on different vegetation types and climates. On the other hand, with its lower degree of complexity, TERRA-ML is much less computationally demanding, which leads to faster calculation times in a coupled climate simulation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of the thesis is to propose a Bayesian estimation through Markov chain Monte Carlo of multidimensional item response theory models for graded responses with complex structures and correlated traits. In particular, this work focuses on the multiunidimensional and the additive underlying latent structures, considering that the first one is widely used and represents a classical approach in multidimensional item response analysis, while the second one is able to reflect the complexity of real interactions between items and respondents. A simulation study is conducted to evaluate the parameter recovery for the proposed models under different conditions (sample size, test and subtest length, number of response categories, and correlation structure). The results show that the parameter recovery is particularly sensitive to the sample size, due to the model complexity and the high number of parameters to be estimated. For a sufficiently large sample size the parameters of the multiunidimensional and additive graded response models are well reproduced. The results are also affected by the trade-off between the number of items constituting the test and the number of item categories. An application of the proposed models on response data collected to investigate Romagna and San Marino residents' perceptions and attitudes towards the tourism industry is also presented.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Predicting failures in a distributed system based on previous events through logistic regression is a standard approach in literature. This technique is not reliable, though, in two situations: in the prediction of rare events, which do not appear in enough proportion for the algorithm to capture, and in environments where there are too many variables, as logistic regression tends to overfit on this situations; while manually selecting a subset of variables to create the model is error- prone. On this paper, we solve an industrial research case that presented this situation with a combination of elastic net logistic regression, a method that allows us to automatically select useful variables, a process of cross-validation on top of it and the application of a rare events prediction technique to reduce computation time. This process provides two layers of cross- validation that automatically obtain the optimal model complexity and the optimal mode l parameters values, while ensuring even rare events will be correctly predicted with a low amount of training instances. We tested this method against real industrial data, obtaining a total of 60 out of 80 possible models with a 90% average model accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An important aspect of Process Simulators for photovoltaics is prediction of defect evolution during device fabrication. Over the last twenty years, these tools have accelerated process optimization, and several Process Simulators for iron, a ubiquitous and deleterious impurity in silicon, have been developed. The diversity of these tools can make it difficult to build intuition about the physics governing iron behavior during processing. Thus, in one unified software environment and using self-consistent terminology, we combine and describe three of these Simulators. We vary structural defect distribution and iron precipitation equations to create eight distinct Models, which we then use to simulate different stages of processing. We find that the structural defect distribution influences the final interstitial iron concentration ([Fe-i]) more strongly than the iron precipitation equations. We identify two regimes of iron behavior: (1) diffusivity-limited, in which iron evolution is kinetically limited and bulk [Fe-i] predictions can vary by an order of magnitude or more, and (2) solubility-limited, in which iron evolution is near thermodynamic equilibrium and the Models yield similar results. This rigorous analysis provides new intuition that can inform Process Simulation, material, and process development, and it enables scientists and engineers to choose an appropriate level of Model complexity based on wafer type and quality, processing conditions, and available computation time.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a Bayesian framework for regression problems, which covers areas which are usually dealt with by function approximation. An online learning algorithm is derived which solves regression problems with a Kalman filter. Its solution always improves with increasing model complexity, without the risk of over-fitting. In the infinite dimension limit it approaches the true Bayesian posterior. The issues of prior selection and over-fitting are also discussed, showing that some of the commonly held beliefs are misleading. The practical implementation is summarised. Simulations using 13 popular publicly available data sets are used to demonstrate the method and highlight important issues concerning the choice of priors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis describes the Generative Topographic Mapping (GTM) --- a non-linear latent variable model, intended for modelling continuous, intrinsically low-dimensional probability distributions, embedded in high-dimensional spaces. It can be seen as a non-linear form of principal component analysis or factor analysis. It also provides a principled alternative to the self-organizing map --- a widely established neural network model for unsupervised learning --- resolving many of its associated theoretical problems. An important, potential application of the GTM is visualization of high-dimensional data. Since the GTM is non-linear, the relationship between data and its visual representation may be far from trivial, but a better understanding of this relationship can be gained by computing the so-called magnification factor. In essence, the magnification factor relates the distances between data points, as they appear when visualized, to the actual distances between those data points. There are two principal limitations of the basic GTM model. The computational effort required will grow exponentially with the intrinsic dimensionality of the density model. However, if the intended application is visualization, this will typically not be a problem. The other limitation is the inherent structure of the GTM, which makes it most suitable for modelling moderately curved probability distributions of approximately rectangular shape. When the target distribution is very different to that, theaim of maintaining an `interpretable' structure, suitable for visualizing data, may come in conflict with the aim of providing a good density model. The fact that the GTM is a probabilistic model means that results from probability theory and statistics can be used to address problems such as model complexity. Furthermore, this framework provides solid ground for extending the GTM to wider contexts than that of this thesis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

2010 Mathematics Subject Classification: 94A17, 62B10, 62F03.