945 resultados para DISTRIBUTION MODELS


Relevância:

30.00% 30.00%

Publicador:

Resumo:

L’un des problèmes importants en apprentissage automatique est de déterminer la complexité du modèle à apprendre. Une trop grande complexité mène au surapprentissage, ce qui correspond à trouver des structures qui n’existent pas réellement dans les données, tandis qu’une trop faible complexité mène au sous-apprentissage, c’est-à-dire que l’expressivité du modèle est insuffisante pour capturer l’ensemble des structures présentes dans les données. Pour certains modèles probabilistes, la complexité du modèle se traduit par l’introduction d’une ou plusieurs variables cachées dont le rôle est d’expliquer le processus génératif des données. Il existe diverses approches permettant d’identifier le nombre approprié de variables cachées d’un modèle. Cette thèse s’intéresse aux méthodes Bayésiennes nonparamétriques permettant de déterminer le nombre de variables cachées à utiliser ainsi que leur dimensionnalité. La popularisation des statistiques Bayésiennes nonparamétriques au sein de la communauté de l’apprentissage automatique est assez récente. Leur principal attrait vient du fait qu’elles offrent des modèles hautement flexibles et dont la complexité s’ajuste proportionnellement à la quantité de données disponibles. Au cours des dernières années, la recherche sur les méthodes d’apprentissage Bayésiennes nonparamétriques a porté sur trois aspects principaux : la construction de nouveaux modèles, le développement d’algorithmes d’inférence et les applications. Cette thèse présente nos contributions à ces trois sujets de recherches dans le contexte d’apprentissage de modèles à variables cachées. Dans un premier temps, nous introduisons le Pitman-Yor process mixture of Gaussians, un modèle permettant l’apprentissage de mélanges infinis de Gaussiennes. Nous présentons aussi un algorithme d’inférence permettant de découvrir les composantes cachées du modèle que nous évaluons sur deux applications concrètes de robotique. Nos résultats démontrent que l’approche proposée surpasse en performance et en flexibilité les approches classiques d’apprentissage. Dans un deuxième temps, nous proposons l’extended cascading Indian buffet process, un modèle servant de distribution de probabilité a priori sur l’espace des graphes dirigés acycliques. Dans le contexte de réseaux Bayésien, ce prior permet d’identifier à la fois la présence de variables cachées et la structure du réseau parmi celles-ci. Un algorithme d’inférence Monte Carlo par chaîne de Markov est utilisé pour l’évaluation sur des problèmes d’identification de structures et d’estimation de densités. Dans un dernier temps, nous proposons le Indian chefs process, un modèle plus général que l’extended cascading Indian buffet process servant à l’apprentissage de graphes et d’ordres. L’avantage du nouveau modèle est qu’il admet les connections entres les variables observables et qu’il prend en compte l’ordre des variables. Nous présentons un algorithme d’inférence Monte Carlo par chaîne de Markov avec saut réversible permettant l’apprentissage conjoint de graphes et d’ordres. L’évaluation est faite sur des problèmes d’estimations de densité et de test d’indépendance. Ce modèle est le premier modèle Bayésien nonparamétrique permettant d’apprendre des réseaux Bayésiens disposant d’une structure complètement arbitraire.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The blast furnace is the main ironmaking production unit in the world which converts iron ore with coke and hot blast into liquid iron, hot metal, which is used for steelmaking. The furnace acts as a counter-current reactor charged with layers of raw material of very different gas permeability. The arrangement of these layers, or burden distribution, is the most important factor influencing the gas flow conditions inside the furnace, which dictate the efficiency of the heat transfer and reduction processes. For proper control the furnace operators should know the overall conditions in the furnace and be able to predict how control actions affect the state of the furnace. However, due to high temperatures and pressure, hostile atmosphere and mechanical wear it is very difficult to measure internal variables. Instead, the operators have to rely extensively on measurements obtained at the boundaries of the furnace and make their decisions on the basis of heuristic rules and results from mathematical models. It is particularly difficult to understand the distribution of the burden materials because of the complex behavior of the particulate materials during charging. The aim of this doctoral thesis is to clarify some aspects of burden distribution and to develop tools that can aid the decision-making process in the control of the burden and gas distribution in the blast furnace. A relatively simple mathematical model was created for simulation of the distribution of the burden material with a bell-less top charging system. The model developed is fast and it can therefore be used by the operators to gain understanding of the formation of layers for different charging programs. The results were verified by findings from charging experiments using a small-scale charging rig at the laboratory. A basic gas flow model was developed which utilized the results of the burden distribution model to estimate the gas permeability of the upper part of the blast furnace. This combined formulation for gas and burden distribution made it possible to implement a search for the best combination of charging parameters to achieve a target gas temperature distribution. As this mathematical task is discontinuous and non-differentiable, a genetic algorithm was applied to solve the optimization problem. It was demonstrated that the method was able to evolve optimal charging programs that fulfilled the target conditions. Even though the burden distribution model provides information about the layer structure, it neglects some effects which influence the results, such as mixed layer formation and coke collapse. A more accurate numerical method for studying particle mechanics, the Discrete Element Method (DEM), was used to study some aspects of the charging process more closely. Model charging programs were simulated using DEM and compared with the results from small-scale experiments. The mixed layer was defined and the voidage of mixed layers was estimated. The mixed layer was found to have about 12% less voidage than layers of the individual burden components. Finally, a model for predicting the extent of coke collapse when heavier pellets are charged over a layer of lighter coke particles was formulated based on slope stability theory, and was used to update the coke layer distribution after charging in the mathematical model. In designing this revision, results from DEM simulations and charging experiments for some charging programs were used. The findings from the coke collapse analysis can be used to design charging programs with more stable coke layers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract The ultimate problem considered in this thesis is modeling a high-dimensional joint distribution over a set of discrete variables. For this purpose, we consider classes of context-specific graphical models and the main emphasis is on learning the structure of such models from data. Traditional graphical models compactly represent a joint distribution through a factorization justi ed by statements of conditional independence which are encoded by a graph structure. Context-speci c independence is a natural generalization of conditional independence that only holds in a certain context, speci ed by the conditioning variables. We introduce context-speci c generalizations of both Bayesian networks and Markov networks by including statements of context-specific independence which can be encoded as a part of the model structures. For the purpose of learning context-speci c model structures from data, we derive score functions, based on results from Bayesian statistics, by which the plausibility of a structure is assessed. To identify high-scoring structures, we construct stochastic and deterministic search algorithms designed to exploit the structural decomposition of our score functions. Numerical experiments on synthetic and real-world data show that the increased exibility of context-specific structures can more accurately emulate the dependence structure among the variables and thereby improve the predictive accuracy of the models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The study of forest re activity, in its several aspects, is essencial to understand the phenomenon and to prevent environmental public catastrophes. In this context the analysis of monthly number of res along several years is one aspect to have into account in order to better comprehend this tematic. The goal of this work is to analyze the monthly number of forest res in the neighboring districts of Aveiro and Coimbra, Portugal, through dynamic factor models for bivariate count series. We use a bayesian approach, through MCMC methods, to estimate the model parameters as well as to estimate the common latent factor to both series.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Visual recognition is a fundamental research topic in computer vision. This dissertation explores datasets, features, learning, and models used for visual recognition. In order to train visual models and evaluate different recognition algorithms, this dissertation develops an approach to collect object image datasets on web pages using an analysis of text around the image and of image appearance. This method exploits established online knowledge resources (Wikipedia pages for text; Flickr and Caltech data sets for images). The resources provide rich text and object appearance information. This dissertation describes results on two datasets. The first is Berg’s collection of 10 animal categories; on this dataset, we significantly outperform previous approaches. On an additional set of 5 categories, experimental results show the effectiveness of the method. Images are represented as features for visual recognition. This dissertation introduces a text-based image feature and demonstrates that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the Internet. Image tags are noisy. The method obtains the text features of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. This text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. The performance of this feature is tested using PASCAL VOC 2006 and 2007 datasets. This feature performs well; it consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small. With more and more collected training data, computational cost becomes a bottleneck, especially when training sophisticated classifiers such as kernelized SVM. This dissertation proposes a fast training algorithm called Stochastic Intersection Kernel Machine (SIKMA). This proposed training method will be useful for many vision problems, as it can produce a kernel classifier that is more accurate than a linear classifier, and can be trained on tens of thousands of examples in two minutes. It processes training examples one by one in a sequence, so memory cost is no longer the bottleneck to process large scale datasets. This dissertation applies this approach to train classifiers of Flickr groups with many group training examples. The resulting Flickr group prediction scores can be used to measure image similarity between two images. Experimental results on the Corel dataset and a PASCAL VOC dataset show the learned Flickr features perform better on image matching, retrieval, and classification than conventional visual features. Visual models are usually trained to best separate positive and negative training examples. However, when recognizing a large number of object categories, there may not be enough training examples for most objects, due to the intrinsic long-tailed distribution of objects in the real world. This dissertation proposes an approach to use comparative object similarity. The key insight is that, given a set of object categories which are similar and a set of categories which are dissimilar, a good object model should respond more strongly to examples from similar categories than to examples from dissimilar categories. This dissertation develops a regularized kernel machine algorithm to use this category dependent similarity regularization. Experiments on hundreds of categories show that our method can make significant improvement for categories with few or even no positive examples.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Denitrification is a microbially-mediated process that converts nitrate (NO3-) to dinitrogen (N2) gas and has implications for soil fertility, climate change, and water quality. Using PCR, qPCR, and T-RFLP, the effects of environmental drivers and land management on the abundance and composition of functional genes were investigated. Environmental variables affecting gene abundance were soil type, soil depth, nitrogen concentrations, soil moisture, and pH, although each gene was unique in its spatial distribution and controlling factors. The inclusion of microbial variables, specifically genotype and gene abundance, improved denitrification models and highlights the benefit of including microbial data in modeling denitrification. Along with some evidence of niche selection, I show that nirS is a good predictor of denitrification enzyme activity (DEA) and N2O:N2 ratio, especially in alkaline and wetland soils. nirK was correlated to N2O production and became a stronger predictor of DEA in acidic soils, indicating that nirK and nirS are not ecologically redundant.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Building and maintaining muscle is critical to the quality of life for adults and elderly. Physical activity and nutrition are important factors for long-term muscle health. In particular, dietary protein – including protein distribution and quality – are under-appreciated determinants of muscle health for adults. The most unequivocal evidence for the benefit of optimal dietary protein at individual meals is derived from studies of weight management. During the catabolic condition of weight loss, higher protein diets attenuate loss of lean tissue and partition weight loss to body fat when compared with commonly recommended high carbohydrate, low protein diets. Muscle protein turnover is a continuous process in which proteins are degraded, and replaced by newly synthesized proteins. Muscle growth occurs when protein synthesis exceeds protein degradation. Regulation of protein synthesis is complex, with multiple signals influencing this process. The mammalian target of rapamycin (mTORC1) pathway has been identified as a particularly important regulator of protein synthesis, via stimulation of translation initiation. Key regulatory points of translation initiation effected by mTORC1 include assembly of the eukaryotic initiation factor 4F (eIF4F) complex and phosphorylation of the 70 kilodalton ribosomal protein S6 kinase (S6K1). Assembly of the eIF4F initiation complex involves phosphorylation of the inhibitory eIF4E binding protein-1 (4E-BP1), which releases the initiation factor eIF4E and allows it to bind with eIF4G. Binding of eIF4E with eIF4G promotes preparation of the mRNA for binding to the 43S pre-initiation complex. Consumption of the amino acid leucine (Leu) is a key factor determining the anabolic response of muscle protein synthesis (MPS) and mTORC1 signaling to a meal. Research from this dissertation demonstrates that the peak activation of MPS following a complete meal is proportional to the Leu content of a meal and its ability to elevate plasma Leu. Leu has also been implicated as an inhibitor of muscle protein degradation (MPD). In particular, there is evidence suggesting that in muscle wasting conditions Leu supplementation attenuates expression of the ubiquitin-proteosome pathway, which is the primary mode of intracellular protein degradation. However, this is untested in healthy, physiological feeding models. Therefore, an experiment was performed to see if feeding isonitrogenous protein sources with different Leu contents to healthy adult rats would differentially impact ubiquitin-proteosome (protein degradation) outcomes; and if these outcomes are related to the meal responses of plasma Leu. Results showed that higher Leu diets were able to attenuate total proteasome content but had no effect on ubiquitin proteins. This research shows that dietary Leu determines postprandial muscle anabolism. In a parallel line of research, the effects of dietary Leu on changes in muscle mass overtime were investigated. Animals consuming higher Leu diets had larger gastrocnemius muscle weights; furthermore, gastrocnemius muscle weights were correlated with postprandial changes in MPS (r=0.471, P<0.01) and plasma Leu (r=0.400, P=0.01). These results show that the effect of Leu on ubiquitin-proteosome pathways is minimal for healthy adult rats consuming adequate diets. Thus, long-term changes in muscle mass observed in adult rats are likely due to the differences in MPS, rather than MPD. Factors determining the duration of Leu-stimulated MPS were further investigated. Despite continued elevations in plasma Leu and associated translation initiation factors (e.g., S6K1 and 4E-BP1), MPS returned to basal levels ~3 hours after a meal. However, administration of additional nutrients in the form of carbohydrate, Leu, or both ~2 hours after a meal was able to extend the elevation of MPS, in a time and dose dependent manner. This effect led to a novel discovery that decreases in translation elongation activity was associated with increases in activity of AMP kinase, a key cellular energy sensor. This research shows that the Leu density of dietary protein determines anabolic signaling, thereby affecting cellular energetics and body composition.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Despite recent advances in ocean observing arrays and satellite sensors, there remains great uncertainty in the large-scale spatial variations of upper ocean salinity on the interannual to decadal timescales. Consonant with both broad-scale surface warming and the amplification of the global hydrological cycle, observed global multidecadal salinity changes typically have focussed on the linear response to anthropogenic forcing but not on salinity variations due to changes in the static stability and or variability due to the intrinsic ocean or internal climate processes. Here, we examine the static stability and spatiotemporal variability of upper ocean salinity across a hierarchy of models and reanalyses. In particular, we partition the variance into time bands via application of singular spectral analysis, considering sea surface salinity (SSS), the Brunt Väisälä frequency (N2), and the ocean salinity stratification in terms of the stabilizing effect due to the haline part of N2 over the upper 500m. We identify regions of significant coherent SSS variability, either intrinsic to the ocean or in response to the interannually varying atmosphere. Based on consistency across models (CMIP5 and forced experiments) and reanalyses, we identify the stabilizing role of salinity in the tropics—typically associated with heavy precipitation and barrier layer formation, and the role of salinity in destabilizing upper ocean stratification in the subtropical regions where large-scale density compensation typically occurs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objective of this study was to estimate the spatial distribution of work accident risk in the informal work market in the urban zone of an industrialized city in southeast Brazil and to examine concomitant effects of age, gender, and type of occupation after controlling for spatial risk variation. The basic methodology adopted was that of a population-based case-control study with particular interest focused on the spatial location of work. Cases were all casual workers in the city suffering work accidents during a one-year period; controls were selected from the source population of casual laborers by systematic random sampling of urban homes. The spatial distribution of work accidents was estimated via a semiparametric generalized additive model with a nonparametric bidimensional spline of the geographical coordinates of cases and controls as the nonlinear spatial component, and including age, gender, and occupation as linear predictive variables in the parametric component. We analyzed 1,918 cases and 2,245 controls between 1/11/2003 and 31/10/2004 in Piracicaba, Brazil. Areas of significantly high and low accident risk were identified in relation to mean risk in the study region (p < 0.01). Work accident risk for informal workers varied significantly in the study area. Significant age, gender, and occupational group effects on accident risk were identified after correcting for this spatial variation. A good understanding of high-risk groups and high-risk regions underpins the formulation of hypotheses concerning accident causality and the development of effective public accident prevention policies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper is concerned with SIR (susceptible--infected--removed) household epidemic models in which the infection response may be either mild or severe, with the type of response also affecting the infectiousness of an individual. Two different models are analysed. In the first model, the infection status of an individual is predetermined, perhaps due to partial immunity, and in the second, the infection status of an individual depends on the infection status of its infector and on whether the individual was infected by a within- or between-household contact. The first scenario may be modelled using a multitype household epidemic model, and the second scenario by a model we denote by the infector-dependent-severity household epidemic model. Large population results of the two models are derived, with the focus being on the distribution of the total numbers of mild and severe cases in a typical household, of any given size, in the event that the epidemic becomes established. The aim of the paper is to investigate whether it is possible to determine which of the two underlying explanations is causing the varying response when given final size household outbreak data containing mild and severe cases. We conduct numerical studies which show that, given data on sufficiently many households, it is generally possible to discriminate between the two models by comparing the Kullback-Leibler divergence for the two fitted models to these data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aim: To investigate the effect of implant-abutment angulation and crown material on stress distribution of central incisors. Finite element method was used to simulate the clinical situation of a maxillary right central incisor restored by two different implant-abutment angulations, 15° and 25°, using two different crown materials (IPS E-Max CAD and zirconia). Methods: Two 3D finite element models were specially prepared for this research simulating the abutment angulations. Commercial engineering CAD/CAM package was used to model crown, implant abutment complex and bone (cortical and spongy) in 3D. Linear static analysis was performed by applying a 178 N oblique load. The obtained results were compared with former experimental results. Results: Implant Von Mises stress level was negligibly changed with increasing abutment angulation. The abutment with higher angulation is mechanically weaker and expected to fail at lower loading in comparison with the steeper one. Similarly, screw used with abutment angulation of 25° will fail at lower (about one-third) load value the failure load of similar screw used with abutment angulated by 15°. Conclusions: Bone (cortical and spongy) is insensitive to crown material. Increasing abutment angulation from 15° to 25°, increases stress on cortical bone by about 20% and reduces it by about 12% on spongy bone. Crown fracture resistance is dramatically reduced by increasing abutment angulation. Zirconia crown showed better performance than E-Max one.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Despite the extensive implementation of Superstreets on congested arterials, reliable methodologies for such designs remain unavailable. The purpose of this research is to fill the information gap by offering reliable tools to assist traffic professionals in the design of Superstreets with and without signal control. The entire tool developed in this thesis consists of three models. The first model is used to determine the minimum U-turn offset length for an Un-signalized Superstreet, given the arterial headway distribution of the traffic flows and the distribution of critical gaps among drivers. The second model is designed to estimate the queue size and its variation on each critical link in a signalized Superstreet, based on the given signal plan and the range of observed volumes. Recognizing that the operational performance of a Superstreet cannot be achieved without an effective signal plan, the third model is developed to produce a signal optimization method that can generate progression offsets for heavy arterial flows moving into and out of such an intersection design.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Pressure management (PM) is commonly used in water distribution systems (WDSs). In the last decade, a strategic objective in the field has been the development of new scientific and technical methods for its implementation. However, due to a lack of systematic analysis of the results obtained in practical cases, progress has not always been reflected in practical actions. To address this problem, this paper provides a comprehensive analysis of the most innovative issues related to PM. The methodology proposed is based on a case-study comparison of qualitative concepts that involves published work from 140 sources. The results include a qualitative analysis covering four aspects: (1) the objectives yielded by PM; (2) types of regulation, including advanced control systems through electronic controllers; (3) new methods for designing districts; and (4) development of optimization models associated with PM. The evolution of the aforementioned four aspects is examined and discussed. Conclusions regarding the current status of each factor are drawn and proposals for future research outlined

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Doutoramento em Gestão