953 resultados para model complexity


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of the thesis is to propose a Bayesian estimation through Markov chain Monte Carlo of multidimensional item response theory models for graded responses with complex structures and correlated traits. In particular, this work focuses on the multiunidimensional and the additive underlying latent structures, considering that the first one is widely used and represents a classical approach in multidimensional item response analysis, while the second one is able to reflect the complexity of real interactions between items and respondents. A simulation study is conducted to evaluate the parameter recovery for the proposed models under different conditions (sample size, test and subtest length, number of response categories, and correlation structure). The results show that the parameter recovery is particularly sensitive to the sample size, due to the model complexity and the high number of parameters to be estimated. For a sufficiently large sample size the parameters of the multiunidimensional and additive graded response models are well reproduced. The results are also affected by the trade-off between the number of items constituting the test and the number of item categories. An application of the proposed models on response data collected to investigate Romagna and San Marino residents' perceptions and attitudes towards the tourism industry is also presented.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Predicting failures in a distributed system based on previous events through logistic regression is a standard approach in literature. This technique is not reliable, though, in two situations: in the prediction of rare events, which do not appear in enough proportion for the algorithm to capture, and in environments where there are too many variables, as logistic regression tends to overfit on this situations; while manually selecting a subset of variables to create the model is error- prone. On this paper, we solve an industrial research case that presented this situation with a combination of elastic net logistic regression, a method that allows us to automatically select useful variables, a process of cross-validation on top of it and the application of a rare events prediction technique to reduce computation time. This process provides two layers of cross- validation that automatically obtain the optimal model complexity and the optimal mode l parameters values, while ensuring even rare events will be correctly predicted with a low amount of training instances. We tested this method against real industrial data, obtaining a total of 60 out of 80 possible models with a 90% average model accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An important aspect of Process Simulators for photovoltaics is prediction of defect evolution during device fabrication. Over the last twenty years, these tools have accelerated process optimization, and several Process Simulators for iron, a ubiquitous and deleterious impurity in silicon, have been developed. The diversity of these tools can make it difficult to build intuition about the physics governing iron behavior during processing. Thus, in one unified software environment and using self-consistent terminology, we combine and describe three of these Simulators. We vary structural defect distribution and iron precipitation equations to create eight distinct Models, which we then use to simulate different stages of processing. We find that the structural defect distribution influences the final interstitial iron concentration ([Fe-i]) more strongly than the iron precipitation equations. We identify two regimes of iron behavior: (1) diffusivity-limited, in which iron evolution is kinetically limited and bulk [Fe-i] predictions can vary by an order of magnitude or more, and (2) solubility-limited, in which iron evolution is near thermodynamic equilibrium and the Models yield similar results. This rigorous analysis provides new intuition that can inform Process Simulation, material, and process development, and it enables scientists and engineers to choose an appropriate level of Model complexity based on wafer type and quality, processing conditions, and available computation time.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a Bayesian framework for regression problems, which covers areas which are usually dealt with by function approximation. An online learning algorithm is derived which solves regression problems with a Kalman filter. Its solution always improves with increasing model complexity, without the risk of over-fitting. In the infinite dimension limit it approaches the true Bayesian posterior. The issues of prior selection and over-fitting are also discussed, showing that some of the commonly held beliefs are misleading. The practical implementation is summarised. Simulations using 13 popular publicly available data sets are used to demonstrate the method and highlight important issues concerning the choice of priors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis describes the Generative Topographic Mapping (GTM) --- a non-linear latent variable model, intended for modelling continuous, intrinsically low-dimensional probability distributions, embedded in high-dimensional spaces. It can be seen as a non-linear form of principal component analysis or factor analysis. It also provides a principled alternative to the self-organizing map --- a widely established neural network model for unsupervised learning --- resolving many of its associated theoretical problems. An important, potential application of the GTM is visualization of high-dimensional data. Since the GTM is non-linear, the relationship between data and its visual representation may be far from trivial, but a better understanding of this relationship can be gained by computing the so-called magnification factor. In essence, the magnification factor relates the distances between data points, as they appear when visualized, to the actual distances between those data points. There are two principal limitations of the basic GTM model. The computational effort required will grow exponentially with the intrinsic dimensionality of the density model. However, if the intended application is visualization, this will typically not be a problem. The other limitation is the inherent structure of the GTM, which makes it most suitable for modelling moderately curved probability distributions of approximately rectangular shape. When the target distribution is very different to that, theaim of maintaining an `interpretable' structure, suitable for visualizing data, may come in conflict with the aim of providing a good density model. The fact that the GTM is a probabilistic model means that results from probability theory and statistics can be used to address problems such as model complexity. Furthermore, this framework provides solid ground for extending the GTM to wider contexts than that of this thesis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

2010 Mathematics Subject Classification: 94A17, 62B10, 62F03.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

3D geographic information system (GIS) is data and computation intensive in nature. Internet users are usually equipped with low-end personal computers and network connections of limited bandwidth. Data reduction and performance optimization techniques are of critical importance in quality of service (QoS) management for online 3D GIS. In this research, QoS management issues regarding distributed 3D GIS presentation were studied to develop 3D TerraFly, an interactive 3D GIS that supports high quality online terrain visualization and navigation. ^ To tackle the QoS management challenges, multi-resolution rendering model, adaptive level of detail (LOD) control and mesh simplification algorithms were proposed to effectively reduce the terrain model complexity. The rendering model is adaptively decomposed into sub-regions of up-to-three detail levels according to viewing distance and other dynamic quality measurements. The mesh simplification algorithm was designed as a hybrid algorithm that combines edge straightening and quad-tree compression to reduce the mesh complexity by removing geometrically redundant vertices. The main advantage of this mesh simplification algorithm is that grid mesh can be directly processed in parallel without triangulation overhead. Algorithms facilitating remote accessing and distributed processing of volumetric GIS data, such as data replication, directory service, request scheduling, predictive data retrieving and caching were also proposed. ^ A prototype of the proposed 3D TerraFly implemented in this research demonstrates the effectiveness of our proposed QoS management framework in handling interactive online 3D GIS. The system implementation details and future directions of this research are also addressed in this thesis. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Light transmission was measured through intact, submerged periphyton communities on artificial seagrass leaves. The periphyton communities were representative of the communities on Thalassia testudinum in subtropical seagrass meadows. The periphyton communities sampled were adhered carbonate sediment, coralline algae, and mixed algal assemblages. Crustose or film-forming periphyton assemblages were best prepared for light transmission measurements using artificial leaves fouled on both sides, while measurements through three-dimensional filamentous algae required the periphyton to be removed from one side. For one-sided samples, light transmission could be measured as the difference between fouled and reference artificial leaf samples. For two-sided samples, the percent periphyton light transmission to the leaf surface was calculated as the square root of the fraction of incident light. Linear, exponential, and hyperbolic equations were evaluated as descriptors of the periphyton dry weight versus light transmission relationship. Hyperbolic and exponential decay models were superior to linear models and exhibited the best fits for the observed relationships. Differences between the coefficients of determination (r2) of hyperbolic and exponential decay models were statistically insignificant. Constraining these models for 100% light transmission at zero periphyton load did not result in any statistically significant loss in the explanatory capability of the models. In most all cases, increasing model complexity using three-parameter models rather than two-parameter models did not significantly increase the amount of variation explained. Constrained two-parameter hyperbolic or exponential decay models were judged best for describing the periphyton dry weight versus light transmission relationship. On T. testudinum in Florida Bay and the Florida Keys, significant differences were not observed in the light transmission characteristics of the varying periphyton communities at different study sites. Using pooled data from the study sites, the hyperbolic decay coefficient for periphyton light transmission was estimated to be 4.36 mg dry wt. cm−2. For exponential models, the exponential decay coefficient was estimated to be 0.16 cm2 mg dry wt.−1.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Applications are subject of a continuous evolution process with a profound impact on their underlining data model, hence requiring frequent updates in the applications' class structure and database structure as well. This twofold problem, schema evolution and instance adaptation, usually known as database evolution, is addressed in this thesis. Additionally, we address concurrency and error recovery problems with a novel meta-model and its aspect-oriented implementation. Modern object-oriented databases provide features that help programmers deal with object persistence, as well as all related problems such as database evolution, concurrency and error handling. In most systems there are transparent mechanisms to address these problems, nonetheless the database evolution problem still requires some human intervention, which consumes much of programmers' and database administrators' work effort. Earlier research works have demonstrated that aspect-oriented programming (AOP) techniques enable the development of flexible and pluggable systems. In these earlier works, the schema evolution and the instance adaptation problems were addressed as database management concerns. However, none of this research was focused on orthogonal persistent systems. We argue that AOP techniques are well suited to address these problems in orthogonal persistent systems. Regarding the concurrency and error recovery, earlier research showed that only syntactic obliviousness between the base program and aspects is possible. Our meta-model and framework follow an aspect-oriented approach focused on the object-oriented orthogonal persistent context. The proposed meta-model is characterized by its simplicity in order to achieve efficient and transparent database evolution mechanisms. Our meta-model supports multiple versions of a class structure by applying a class versioning strategy. Thus, enabling bidirectional application compatibility among versions of each class structure. That is to say, the database structure can be updated because earlier applications continue to work, as well as later applications that have only known the updated class structure. The specific characteristics of orthogonal persistent systems, as well as a metadata enrichment strategy within the application's source code, complete the inception of the meta-model and have motivated our research work. To test the feasibility of the approach, a prototype was developed. Our prototype is a framework that mediates the interaction between applications and the database, providing them with orthogonal persistence mechanisms. These mechanisms are introduced into applications as an {\it aspect} in the aspect-oriented sense. Objects do not require the extension of any super class, the implementation of an interface nor contain a particular annotation. Parametric type classes are also correctly handled by our framework. However, classes that belong to the programming environment must not be handled as versionable due to restrictions imposed by the Java Virtual Machine. Regarding concurrency support, the framework provides the applications with a multithreaded environment which supports database transactions and error recovery. The framework keeps applications oblivious to the database evolution problem, as well as persistence. Programmers can update the applications' class structure because the framework will produce a new version for it at the database metadata layer. Using our XML based pointcut/advice constructs, the framework's instance adaptation mechanism is extended, hence keeping the framework also oblivious to this problem. The potential developing gains provided by the prototype were benchmarked. In our case study, the results confirm that mechanisms' transparency has positive repercussions on the programmer's productivity, simplifying the entire evolution process at application and database levels. The meta-model itself also was benchmarked in terms of complexity and agility. Compared with other meta-models, it requires less meta-object modifications in each schema evolution step. Other types of tests were carried out in order to validate prototype and meta-model robustness. In order to perform these tests, we used an OO7 small size database due to its data model complexity. Since the developed prototype offers some features that were not observed in other known systems, performance benchmarks were not possible. However, the developed benchmark is now available to perform future performance comparisons with equivalent systems. In order to test our approach in a real world scenario, we developed a proof-of-concept application. This application was developed without any persistence mechanisms. Using our framework and minor changes applied to the application's source code, we added these mechanisms. Furthermore, we tested the application in a schema evolution scenario. This real world experience using our framework showed that applications remains oblivious to persistence and database evolution. In this case study, our framework proved to be a useful tool for programmers and database administrators. Performance issues and the single Java Virtual Machine concurrent model are the major limitations found in the framework.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Mechanistic models used for prediction should be parsimonious, as models which are over-parameterised may have poor predictive performance. Determining whether a model is parsimonious requires comparisons with alternative model formulations with differing levels of complexity. However, creating alternative formulations for large mechanistic models is often problematic, and usually time-consuming. Consequently, few are ever investigated. In this paper, we present an approach which rapidly generates reduced model formulations by replacing a model’s variables with constants. These reduced alternatives can be compared to the original model, using data based model selection criteria, to assist in the identification of potentially unnecessary model complexity, and thereby inform reformulation of the model. To illustrate the approach, we present its application to a published radiocaesium plant-uptake model, which predicts uptake on the basis of soil characteristics (e.g. pH, organic matter content, clay content). A total of 1024 reduced model formulations were generated, and ranked according to five model selection criteria: Residual Sum of Squares (RSS), AICc, BIC, MDL and ICOMP. The lowest scores for RSS and AICc occurred for the same reduced model in which pH dependent model components were replaced. The lowest scores for BIC, MDL and ICOMP occurred for a further reduced model in which model components related to the distinction between adsorption on clay and organic surfaces were replaced. Both these reduced models had a lower RSS for the parameterisation dataset than the original model. As a test of their predictive performance, the original model and the two reduced models outlined above were used to predict an independent dataset. The reduced models have lower prediction sums of squares than the original model, suggesting that the latter may be overfitted. The approach presented has the potential to inform model development by rapidly creating a class of alternative model formulations, which can be compared.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Machine learning is widely adopted to decode multi-variate neural time series, including electroencephalographic (EEG) and single-cell recordings. Recent solutions based on deep learning (DL) outperformed traditional decoders by automatically extracting relevant discriminative features from raw or minimally pre-processed signals. Convolutional Neural Networks (CNNs) have been successfully applied to EEG and are the most common DL-based EEG decoders in the state-of-the-art (SOA). However, the current research is affected by some limitations. SOA CNNs for EEG decoding usually exploit deep and heavy structures with the risk of overfitting small datasets, and architectures are often defined empirically. Furthermore, CNNs are mainly validated by designing within-subject decoders. Crucially, the automatically learned features mainly remain unexplored; conversely, interpreting these features may be of great value to use decoders also as analysis tools, highlighting neural signatures underlying the different decoded brain or behavioral states in a data-driven way. Lastly, SOA DL-based algorithms used to decode single-cell recordings rely on more complex, slower to train and less interpretable networks than CNNs, and the use of CNNs with these signals has not been investigated. This PhD research addresses the previous limitations, with reference to P300 and motor decoding from EEG, and motor decoding from single-neuron activity. CNNs were designed light, compact, and interpretable. Moreover, multiple training strategies were adopted, including transfer learning, which could reduce training times promoting the application of CNNs in practice. Furthermore, CNN-based EEG analyses were proposed to study neural features in the spatial, temporal and frequency domains, and proved to better highlight and enhance relevant neural features related to P300 and motor states than canonical EEG analyses. Remarkably, these analyses could be used, in perspective, to design novel EEG biomarkers for neurological or neurodevelopmental disorders. Lastly, CNNs were developed to decode single-neuron activity, providing a better compromise between performance and model complexity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In recent decades, two prominent trends have influenced the data modeling field, namely network analysis and machine learning. This thesis explores the practical applications of these techniques within the domain of drug research, unveiling their multifaceted potential for advancing our comprehension of complex biological systems. The research undertaken during this PhD program is situated at the intersection of network theory, computational methods, and drug research. Across six projects presented herein, there is a gradual increase in model complexity. These projects traverse a diverse range of topics, with a specific emphasis on drug repurposing and safety in the context of neurological diseases. The aim of these projects is to leverage existing biomedical knowledge to develop innovative approaches that bolster drug research. The investigations have produced practical solutions, not only providing insights into the intricacies of biological systems, but also allowing the creation of valuable tools for their analysis. In short, the achievements are: • A novel computational algorithm to identify adverse events specific to fixed-dose drug combinations. • A web application that tracks the clinical drug research response to SARS-CoV-2. • A Python package for differential gene expression analysis and the identification of key regulatory "switch genes". • The identification of pivotal events causing drug-induced impulse control disorders linked to specific medications. • An automated pipeline for discovering potential drug repurposing opportunities. • The creation of a comprehensive knowledge graph and development of a graph machine learning model for predictions. Collectively, these projects illustrate diverse applications of data science and network-based methodologies, highlighting the profound impact they can have in supporting drug research activities.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Dynamical systems modeling tumor growth have been investigated to determine the dynamics between tumor and healthy cells. Recent theoretical investigations indicate that these interactions may lead to different dynamical outcomes, in particular to homoclinic chaos. In the present study, we analyze both topological and dynamical properties of a recently characterized chaotic attractor governing the dynamics of tumor cells interacting with healthy tissue cells and effector cells of the immune system. By using the theory of symbolic dynamics, we first characterize the topological entropy and the parameter space ordering of kneading sequences from one-dimensional iterated maps identified in the dynamics, focusing on the effects of inactivation interactions between both effector and tumor cells. The previous analyses are complemented with the computation of the spectrum of Lyapunov exponents, the fractal dimension and the predictability of the chaotic attractors. Our results show that the inactivation rate of effector cells by the tumor cells has an important effect on the dynamics of the system. The increase of effector cells inactivation involves an inverse Feigenbaum (i.e. period-halving bifurcation) scenario, which results in the stabilization of the dynamics and in an increase of dynamics predictability. Our analyses also reveal that, at low inactivation rates of effector cells, tumor cells undergo strong, chaotic fluctuations, with the dynamics being highly unpredictable. Our findings are discussed in the context of tumor cells potential viability.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Neuroblastoma (NB) is a neural crest-derived childhood tumor characterized by a remarkable phenotypic diversity, ranging from spontaneous regression to fatal metastatic disease. Although the cancer stem cell (CSC) model provides a trail to characterize the cells responsible for tumor onset, the NB tumor-initiating cell (TIC) has not been identified. In this study, the relevance of the CSC model in NB was investigated by taking advantage of typical functional stem cell characteristics. A predictive association was established between self-renewal, as assessed by serial sphere formation, and clinical aggressiveness in primary tumors. Moreover, cell subsets gradually selected during serial sphere culture harbored increased in vivo tumorigenicity, only highlighted in an orthotopic microenvironment. A microarray time course analysis of serial spheres passages from metastatic cells allowed us to specifically "profile" the NB stem cell-like phenotype and to identify CD133, ABC transporter, and WNT and NOTCH genes as spheres markers. On the basis of combined sphere markers expression, at least two distinct tumorigenic cell subpopulations were identified, also shown to preexist in primary NB. However, sphere markers-mediated cell sorting of parental tumor failed to recapitulate the TIC phenotype in the orthotopic model, highlighting the complexity of the CSC model. Our data support the NB stem-like cells as a dynamic and heterogeneous cell population strongly dependent on microenvironmental signals and add novel candidate genes as potential therapeutic targets in the control of high-risk NB.