900 resultados para bayesian inference


Relevância:

60.00% 60.00%

Publicador:

Resumo:

We examined phylogenetic relationships among six species representing three subfamilies, Glirinae, Graphiurinae and Leithiinae with sequences from three nuclear protein-coding genes (apolipoprotein B, APOB; interphotoreceptor retinoid-binding protein, IRBP; recombination-activating gene 1, RAG1). Phylogenetic trees reconstructed from maximum-parsimony (MP), maximum-likelihood (ML) and Bayesian-inference (BI) analyses showed the monophyly of Glirinae (Glis and Glirulus) and Leithiinae (Dryomys, Eliomys and Muscardinus) with strong support, although the branch length maintaining this relationship was very short, implying rapid diversification among the three subfamilies. Divergence time estimates were calculated from ML (local clock model) and Bayesian-dating method using a calibration point of 25 Myr (million years) ago for the divergence between Glis and Glirulus, and 55 Myr ago for the split between lineages of Gliridae and Sciuridae on the basis of fossil records. The results showed that each lineage of Graphiuros, Glis, Glirulus and Muscardinus dates from the Late Oligocene to the Early Miocene period, which is mostly in agreement with fossil records. Taking into account that warm climate harbouring a glirid-favoured forest dominated from Europe to Asia during this period, it is considered that this warm environment triggered the prosperity of the glirid species through the rapid diversification. Glirulus japonicas is suggested to be a relict of this ancient diversification during the warm period.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Context:  Until now, the testosterone/epitestosterone (T/E) ratio is the main marker for detection of testosterone (T) misuse in athletes. As this marker can be influenced by a number of confounding factors, additional steroid profile parameters indicating T misuse can provide substantiating evidence of doping with endogenous steroids. The evaluation of a steroid profile is currently based upon population statistics. Since large inter-individual variations exist, a paradigm shift towards subject-based references is ongoing in doping analysis. Objective:  Proposition of new biomarkers for the detection of testosterone in sports using extensive steroid profiling and an adaptive model based upon Bayesian inference. Subjects:  6 healthy male volunteers were administered with testosterone undecanoate. Population statistics were performed upon steroid profiles from 2014 male Caucasian athletes participating in official sport competition. Design:  An extended search for new biomarkers in a comprehensive steroid profile combined with Bayesian inference techniques as used in the Athlete Biological Passport resulted in a selection of additional biomarkers that may improve detection of testosterone misuse in sports. Results:  Apart from T/E, 4 other steroid ratios (6α-OH-androstenedione/16α-OH-dehydroepiandrostenedione, 4-OH-androstenedione/16α-OH-androstenedione, 7α-OH-testosterone/7β-OH-dehydroepiandrostenedione and dihydrotestosterone/5β-androstane-3α,17β-diol) were identified as sensitive urinary biomarkers for T misuse. These new biomarkers were rated according to relative response, parameter stability, detection time and discriminative power. Conclusion:  Newly selected biomarkers were found suitable for individual referencing within the concept of the Athlete's Biological Passport. The parameters showed improved detection time and discriminative power compared to the T/E ratio. Such biomarkers can support the evidence of doping with small oral doses of testosterone.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Doping with natural steroids can be detected by evaluating the urinary concentrations and ratios of several endogenous steroids. Since these biomarkers of steroid doping are known to present large inter-individual variations, monitoring of individual steroid profiles over time allows switching from population-based towards subject-based reference ranges for improved detection. In an Athlete Biological Passport (ABP), biomarkers data are collated throughout the athlete's sporting career and individual thresholds defined adaptively. For now, this approach has been validated on a limited number of markers of steroid doping, such as the testosterone (T) over epitestosterone (E) ratio to detect T misuse in athletes. Additional markers are required for other endogenous steroids like dihydrotestosterone (DHT) and dehydroepiandrosterone (DHEA). By combining comprehensive steroid profiles composed of 24 steroid concentrations with Bayesian inference techniques for longitudinal profiling, a selection was made for the detection of DHT and DHEA misuse. The biomarkers found were rated according to relative response, parameter stability, discriminative power, and maximal detection time. This analysis revealed DHT/E, DHT/5β-androstane-3α,17β-diol and 5α-androstane-3α,17β-diol/5β-androstane-3α,17β-diol as best biomarkers for DHT administration and DHEA/E, 16α-hydroxydehydroepiandrosterone/E, 7β-hydroxydehydroepiandrosterone/E and 5β-androstane-3α,17β-diol/5α-androstane-3α,17β-diol for DHEA. The selected biomarkers were found suitable for individual referencing. A drastic overall increase in sensitivity was obtained.The use of multiple markers as formalized in an Athlete Steroidal Passport (ASP) can provide firm evidence of doping with endogenous steroids. Copyright © 2010 John Wiley & Sons, Ltd.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND AND AIMS: Although it is well known that fire acts as a selective pressure shaping plant phenotypes, there are no quantitative estimates of the heritability of any trait related to plant persistence under recurrent fires, such as serotiny. In this study, the heritability of serotiny in Pinus halepensis is calculated, and an evaluation is made as to whether fire has left a selection signature on the level of serotiny among populations by comparing the genetic divergence of serotiny with the expected divergence of neutral molecular markers (QST-FST comparison). METHODS: A common garden of P. halepensis was used, located in inland Spain and composed of 145 open-pollinated families from 29 provenances covering the entire natural range of P. halepensis in the Iberian Peninsula and Balearic Islands. Narrow-sense heritability (h(2)) and quantitative genetic differentiation among populations for serotiny (QST) were estimated by means of an 'animal model' fitted by Bayesian inference. In order to determine whether genetic differentiation for serotiny is the result of differential natural selection, QST estimates for serotiny were compared with FST estimates obtained from allozyme data. Finally, a test was made of whether levels of serotiny in the different provenances were related to different fire regimes, using summer rainfall as a proxy for fire regime in each provenance. KEY RESULTS: Serotiny showed a significant narrow-sense heritability (h(2)) of 0·20 (credible interval 0·09-0·40). Quantitative genetic differentiation among provenances for serotiny (QST = 0·44) was significantly higher than expected under a neutral process (FST = 0·12), suggesting adaptive differentiation. A significant negative relationship was found between the serotiny level of trees in the common garden and summer rainfall of their provenance sites. CONCLUSIONS: Serotiny is a heritable trait in P. halepensis, and selection acts on it, giving rise to contrasting serotiny levels among populations depending on the fire regime, and supporting the role of fire in generating genetic divergence for adaptive traits.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Arundinarieae are a large tribe of temperate woody bamboos for which phylogenetics are poorly understood because of limited taxon sampling and lack of informative characters. Aims: This study assessed phylogenetic relationships, origins and classification of Arundinarieae. Methods: DNA sequences (plastid trnL-F; nuclear ITS) were used for parsimony and Bayesian inference including 41 woody bamboo taxa. Divergence dates were estimated using a relaxed Bayesian clock. Results: Arundinarieae were monophyletic but their molecular divergence was low compared to the tropical Bambuseae. Ancestors of the Arundinarieae lineage were estimated to have diverged from the other bamboos 23 (15-30) million years ago (Mya). However, the Arundinarieae radiation occurred 10 (6-16) Mya compared to 18 (11-25) Mya for the tropical Bambuseae. Some groups could be defined within Arundinarieae, but they do not correspond to recognised subtribes such as Arundinariinae or Shibataeinae. Conclusions: Arundinarieae are a relatively ancient bambusoid lineage that underwent a rapid radiation in the late Miocene. The radiation coincides with the continental collision of the Indo-Australian and Eurasian Plates. Arundinarieae are distributed primarily in East Asia and the Himalayas to northern Southeast Asia. It is unknown whether they were present in Asia long before their radiation, but we believe recent dispersal is a more likely scenario. Keywords: Arundinarieae; Bambuseae; internal transcribed spacer (ITS); molecular clock; phylogenetics; radiation; temperate bamboos; Thamnocalaminae; trnL-F

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Contrast enhancement is an image processing technique where the objective is to preprocess the image so that relevant information can be either seen or further processed more reliably. These techniques are typically applied when the image itself or the device used for image reproduction provides poor visibility and distinguishability of different regions of interest inthe image. In most studies, the emphasis is on the visualization of image data,but this human observer biased goal often results to images which are not optimal for automated processing. The main contribution of this study is to express the contrast enhancement as a mapping from N-channel image data to 1-channel gray-level image, and to devise a projection method which results to an image with minimal error to the correct contrast image. The projection, the minimum-error contrast image, possess the optimal contrast between the regions of interest in the image. The method is based on estimation of the probability density distributions of the region values, and it employs Bayesian inference to establish the minimum error projection.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This study offers a statistical analysis of the persistence of annual profits across a sample of firms from different European Union (EU) countries. To this end, a Bayesian dynamic model has been used which enables the annual behaviour of those profits to be broken down into a permanent structural component on the one hand and a transitory component on the other, while also distinguishing between general effects affecting the industry as a whole to which each firm belongs and specific effects affecting each firm in particular. This break down enables the relative importance of those fundamental components to be evaluated. The data analysed come from a sample of 23,293 firms in EU countries selected from the AMADEUS data-base. The period analysed ran from 1999 to 2007 and 21 sectors were analysed, chosen in such a way that there was a sufficiently large number of firms in each country*sector combination for the industry effects to be estimated accurately enough for meaningful comparisons to be made by sector and country. The analysis has been conducted by sector and by country from a Bayesian perspective, thus making the study more flexible and realistic since the estimates obtained do not depend on asymptotic results. In general terms, the study finds that, although the industry effects are significant, more important are the specific effects. That importance varies depending on the sector or the country in which the firm carries out its activity. The influence of firm effects accounts for more than 90% of total variation and display a significantly lower degree of persistence, with adjustment speeds oscillating around 51.1%. However, this pattern is not homogeneous but depends on the sector and country analysed. Industry effects have a more marginal importance, being significantly more persistent, with adjustment speeds oscillating around 10% with this degree of persistence being more homogeneous at both country and sector levels.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Approximate models (proxies) can be employed to reduce the computational costs of estimating uncertainty. The price to pay is that the approximations introduced by the proxy model can lead to a biased estimation. To avoid this problem and ensure a reliable uncertainty quantification, we propose to combine functional data analysis and machine learning to build error models that allow us to obtain an accurate prediction of the exact response without solving the exact model for all realizations. We build the relationship between proxy and exact model on a learning set of geostatistical realizations for which both exact and approximate solvers are run. Functional principal components analysis (FPCA) is used to investigate the variability in the two sets of curves and reduce the dimensionality of the problem while maximizing the retained information. Once obtained, the error model can be used to predict the exact response of any realization on the basis of the sole proxy response. This methodology is purpose-oriented as the error model is constructed directly for the quantity of interest, rather than for the state of the system. Also, the dimensionality reduction performed by FPCA allows a diagnostic of the quality of the error model to assess the informativeness of the learning set and the fidelity of the proxy to the exact model. The possibility of obtaining a prediction of the exact response for any newly generated realization suggests that the methodology can be effectively used beyond the context of uncertainty quantification, in particular for Bayesian inference and optimization.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Notre consommation en eau souterraine, en particulier comme eau potable ou pour l'irrigation, a considérablement augmenté au cours des années. De nombreux problèmes font alors leur apparition, allant de la prospection de nouvelles ressources à la remédiation des aquifères pollués. Indépendamment du problème hydrogéologique considéré, le principal défi reste la caractérisation des propriétés du sous-sol. Une approche stochastique est alors nécessaire afin de représenter cette incertitude en considérant de multiples scénarios géologiques et en générant un grand nombre de réalisations géostatistiques. Nous rencontrons alors la principale limitation de ces approches qui est le coût de calcul dû à la simulation des processus d'écoulements complexes pour chacune de ces réalisations. Dans la première partie de la thèse, ce problème est investigué dans le contexte de propagation de l'incertitude, oú un ensemble de réalisations est identifié comme représentant les propriétés du sous-sol. Afin de propager cette incertitude à la quantité d'intérêt tout en limitant le coût de calcul, les méthodes actuelles font appel à des modèles d'écoulement approximés. Cela permet l'identification d'un sous-ensemble de réalisations représentant la variabilité de l'ensemble initial. Le modèle complexe d'écoulement est alors évalué uniquement pour ce sousensemble, et, sur la base de ces réponses complexes, l'inférence est faite. Notre objectif est d'améliorer la performance de cette approche en utilisant toute l'information à disposition. Pour cela, le sous-ensemble de réponses approximées et exactes est utilisé afin de construire un modèle d'erreur, qui sert ensuite à corriger le reste des réponses approximées et prédire la réponse du modèle complexe. Cette méthode permet de maximiser l'utilisation de l'information à disposition sans augmentation perceptible du temps de calcul. La propagation de l'incertitude est alors plus précise et plus robuste. La stratégie explorée dans le premier chapitre consiste à apprendre d'un sous-ensemble de réalisations la relation entre les modèles d'écoulement approximé et complexe. Dans la seconde partie de la thèse, cette méthodologie est formalisée mathématiquement en introduisant un modèle de régression entre les réponses fonctionnelles. Comme ce problème est mal posé, il est nécessaire d'en réduire la dimensionnalité. Dans cette optique, l'innovation du travail présenté provient de l'utilisation de l'analyse en composantes principales fonctionnelles (ACPF), qui non seulement effectue la réduction de dimensionnalités tout en maximisant l'information retenue, mais permet aussi de diagnostiquer la qualité du modèle d'erreur dans cet espace fonctionnel. La méthodologie proposée est appliquée à un problème de pollution par une phase liquide nonaqueuse et les résultats obtenus montrent que le modèle d'erreur permet une forte réduction du temps de calcul tout en estimant correctement l'incertitude. De plus, pour chaque réponse approximée, une prédiction de la réponse complexe est fournie par le modèle d'erreur. Le concept de modèle d'erreur fonctionnel est donc pertinent pour la propagation de l'incertitude, mais aussi pour les problèmes d'inférence bayésienne. Les méthodes de Monte Carlo par chaîne de Markov (MCMC) sont les algorithmes les plus communément utilisés afin de générer des réalisations géostatistiques en accord avec les observations. Cependant, ces méthodes souffrent d'un taux d'acceptation très bas pour les problèmes de grande dimensionnalité, résultant en un grand nombre de simulations d'écoulement gaspillées. Une approche en deux temps, le "MCMC en deux étapes", a été introduite afin d'éviter les simulations du modèle complexe inutiles par une évaluation préliminaire de la réalisation. Dans la troisième partie de la thèse, le modèle d'écoulement approximé couplé à un modèle d'erreur sert d'évaluation préliminaire pour le "MCMC en deux étapes". Nous démontrons une augmentation du taux d'acceptation par un facteur de 1.5 à 3 en comparaison avec une implémentation classique de MCMC. Une question reste sans réponse : comment choisir la taille de l'ensemble d'entrainement et comment identifier les réalisations permettant d'optimiser la construction du modèle d'erreur. Cela requiert une stratégie itérative afin que, à chaque nouvelle simulation d'écoulement, le modèle d'erreur soit amélioré en incorporant les nouvelles informations. Ceci est développé dans la quatrième partie de la thèse, oú cette méthodologie est appliquée à un problème d'intrusion saline dans un aquifère côtier. -- Our consumption of groundwater, in particular as drinking water and for irrigation, has considerably increased over the years and groundwater is becoming an increasingly scarce and endangered resource. Nofadays, we are facing many problems ranging from water prospection to sustainable management and remediation of polluted aquifers. Independently of the hydrogeological problem, the main challenge remains dealing with the incomplete knofledge of the underground properties. Stochastic approaches have been developed to represent this uncertainty by considering multiple geological scenarios and generating a large number of realizations. The main limitation of this approach is the computational cost associated with performing complex of simulations in each realization. In the first part of the thesis, we explore this issue in the context of uncertainty propagation, where an ensemble of geostatistical realizations is identified as representative of the subsurface uncertainty. To propagate this lack of knofledge to the quantity of interest (e.g., the concentration of pollutant in extracted water), it is necessary to evaluate the of response of each realization. Due to computational constraints, state-of-the-art methods make use of approximate of simulation, to identify a subset of realizations that represents the variability of the ensemble. The complex and computationally heavy of model is then run for this subset based on which inference is made. Our objective is to increase the performance of this approach by using all of the available information and not solely the subset of exact responses. Two error models are proposed to correct the approximate responses follofing a machine learning approach. For the subset identified by a classical approach (here the distance kernel method) both the approximate and the exact responses are knofn. This information is used to construct an error model and correct the ensemble of approximate responses to predict the "expected" responses of the exact model. The proposed methodology makes use of all the available information without perceptible additional computational costs and leads to an increase in accuracy and robustness of the uncertainty propagation. The strategy explored in the first chapter consists in learning from a subset of realizations the relationship between proxy and exact curves. In the second part of this thesis, the strategy is formalized in a rigorous mathematical framework by defining a regression model between functions. As this problem is ill-posed, it is necessary to reduce its dimensionality. The novelty of the work comes from the use of functional principal component analysis (FPCA), which not only performs the dimensionality reduction while maximizing the retained information, but also allofs a diagnostic of the quality of the error model in the functional space. The proposed methodology is applied to a pollution problem by a non-aqueous phase-liquid. The error model allofs a strong reduction of the computational cost while providing a good estimate of the uncertainty. The individual correction of the proxy response by the error model leads to an excellent prediction of the exact response, opening the door to many applications. The concept of functional error model is useful not only in the context of uncertainty propagation, but also, and maybe even more so, to perform Bayesian inference. Monte Carlo Markov Chain (MCMC) algorithms are the most common choice to ensure that the generated realizations are sampled in accordance with the observations. Hofever, this approach suffers from lof acceptance rate in high dimensional problems, resulting in a large number of wasted of simulations. This led to the introduction of two-stage MCMC, where the computational cost is decreased by avoiding unnecessary simulation of the exact of thanks to a preliminary evaluation of the proposal. In the third part of the thesis, a proxy is coupled to an error model to provide an approximate response for the two-stage MCMC set-up. We demonstrate an increase in acceptance rate by a factor three with respect to one-stage MCMC results. An open question remains: hof do we choose the size of the learning set and identify the realizations to optimize the construction of the error model. This requires devising an iterative strategy to construct the error model, such that, as new of simulations are performed, the error model is iteratively improved by incorporating the new information. This is discussed in the fourth part of the thesis, in which we apply this methodology to a problem of saline intrusion in a coastal aquifer.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis was focussed on statistical analysis methods and proposes the use of Bayesian inference to extract information contained in experimental data by estimating Ebola model parameters. The model is a system of differential equations expressing the behavior and dynamics of Ebola. Two sets of data (onset and death data) were both used to estimate parameters, which has not been done by previous researchers in (Chowell, 2004). To be able to use both data, a new version of the model has been built. Model parameters have been estimated and then used to calculate the basic reproduction number and to study the disease-free equilibrium. Estimates of the parameters were useful to determine how well the model fits the data and how good estimates were, in terms of the information they provided about the possible relationship between variables. The solution showed that Ebola model fits the observed onset data at 98.95% and the observed death data at 93.6%. Since Bayesian inference can not be performed analytically, the Markov chain Monte Carlo approach has been used to generate samples from the posterior distribution over parameters. Samples have been used to check the accuracy of the model and other characteristics of the target posteriors.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Stable carbon and nitrogen isotopes in skin and bone of South American sea lions from Brazil and Uruguay were analysed to test the hypothesis that trophic overlap between the sexes is lower during the pre-breeding season than throughout the rest of the year. The isotopic values of skin and bone were used to infer the trophic relationships between the sexes during the pre-breeding period and year round, respectively. Prey species were also analysed to establish a baseline necessary for interpreting the stable isotope ratios of skin and bone. Standard ellipse areas, estimated using Bayesian inference in the SIBER routine of the SIAR package in R, suggested that males and females used a wide diversity of foraging strategies throughout the year and that no differences existed between the sexes. However, the diversity of foraging strategies was largely reduced during the pre-breeding period, with all the individuals of each sex adopting similar strategies, but with the two sexes differing considerably in stable isotope values and the ellipse areas of males and females not overlapping at all. Nevertheless, the results revealed a general increase in the consumption of pelagic prey by both sexes during the pre-breeding period. The progressive crowding of individuals in the areas surrounding the breeding rookeries during the pre-breeding period could lead to an increase in the local population density, which could explain the above reported changes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In any decision making under uncertainties, the goal is mostly to minimize the expected cost. The minimization of cost under uncertainties is usually done by optimization. For simple models, the optimization can easily be done using deterministic methods.However, many models practically contain some complex and varying parameters that can not easily be taken into account using usual deterministic methods of optimization. Thus, it is very important to look for other methods that can be used to get insight into such models. MCMC method is one of the practical methods that can be used for optimization of stochastic models under uncertainty. This method is based on simulation that provides a general methodology which can be applied in nonlinear and non-Gaussian state models. MCMC method is very important for practical applications because it is a uni ed estimation procedure which simultaneously estimates both parameters and state variables. MCMC computes the distribution of the state variables and parameters of the given data measurements. MCMC method is faster in terms of computing time when compared to other optimization methods. This thesis discusses the use of Markov chain Monte Carlo (MCMC) methods for optimization of Stochastic models under uncertainties .The thesis begins with a short discussion about Bayesian Inference, MCMC and Stochastic optimization methods. Then an example is given of how MCMC can be applied for maximizing production at a minimum cost in a chemical reaction process. It is observed that this method performs better in optimizing the given cost function with a very high certainty.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this work is to apply approximate Bayesian computation in combination with Marcov chain Monte Carlo methods in order to estimate the parameters of tuberculosis transmission. The methods are applied to San Francisco data and the results are compared with the outcomes of previous works. Moreover, a methodological idea with the aim to reduce computational time is also described. Despite the fact that this approach is proved to work in an appropriate way, further analysis is needed to understand and test its behaviour in different cases. Some related suggestions to its further enhancement are described in the corresponding chapter.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The two main objectives of Bayesian inference are to estimate parameters and states. In this thesis, we are interested in how this can be done in the framework of state-space models when there is a complete or partial lack of knowledge of the initial state of a continuous nonlinear dynamical system. In literature, similar problems have been referred to as diffuse initialization problems. This is achieved first by extending the previously developed diffuse initialization Kalman filtering techniques for discrete systems to continuous systems. The second objective is to estimate parameters using MCMC methods with a likelihood function obtained from the diffuse filtering. These methods are tried on the data collected from the 1995 Ebola outbreak in Kikwit, DRC in order to estimate the parameters of the system.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many arthropods exhibit behaviours precursory to social life, including adult longevity, parental care, nest loyalty and mutual tolerance, yet there are few examples of social behaviour in this phylum. The small carpenter bees, genus Ceratina, provide important insights into the early stages of sociality. I described the biology and social behaviour of five facultatively social species which exhibit all of the preadaptations for successful group living, yet present ecological and behavioural characteristics that seemingly disfavour frequent colony formation. These species are socially polymorphic with both / solitary and social nests collected in sympatry. Social colonies consist of two adult females, one contributing both foraging and reproductive effort and the second which remains at the nest as a passive guard. Cooperative nesting provides no overt reproductive benefits over solitary nesting, although brood survival tends to be greater in social colonies. Three main theories explain cooperation among conspecifics: mutual benefit, kin selection and manipulation. Lifetime reproductive success calculations revealed that mutual benefit does not explain social behaviour in this group as social colonies have lower per capita life time reproductive success than solitary nests. Genetic pedigrees constructed from allozyme data indicate that kin selection might contribute to the maintenance of social nesting -, as social colonies consist of full sisters and thus some indirect fitness benefits are inherently bestowed on subordinate females as a result of remaining to help their dominant sister. These data suggest that the origin of sociality in ceratinines has principal costs and the great ecological success of highly eusociallineages occurred well after social origins. Ecological constraints such as resource limitation, unfavourable weather conditions and parasite pressure have long been considered some of the most important selective pressures for the evolution of sociality. I assessed the fitness consequences of these three ecological factors for reproductive success of solitary and social colonies and found that nest sites were not limiting, and the frequency of social nesting was consistent across brood rearing seasons. Local weather varied between seasons but was not correlated with reproductive success. Severe parasitism resulted in low reproductive success and total nest failure in solitary nests. Social colonies had higher reproductive success and were never extirpated by parasites. I suggest that social nesting represents a form of bet-hedging. The high frequency of solitary nests suggests that this is the optimal strategy when parasite pressure is low. However, social colonies have a selective advantage over solitary nesting females during periods of extreme parasite pressure. Finally, the small carpenter bees are recorded from all continents except Antarctica. I constructed the first molecular phylogeny of ceratinine bees based on four gene regions of selected species covering representatives from all continents and ecological regions. Maximum parsimony and Bayesian Inference tree topology and fossil dating support an African origin followed by an Old World invasion and New World radiation. All known Old World ceratinines form social colonies while New World species are largely solitary; thus geography and phylogenetic inertia are likely predictors of social evolution in this genus. This integrative approach not only describes the behaviour of several previously unknown or little-known Ceratina species, bu~ highlights the fact that this is an important, though previously unrecognized, model for studying evolutionary transitions from solitary to social behaviour.