863 resultados para random regression model


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background. In over 30 years, the prevalence of overweight for children and adolescents has increased across the United States (Barlow et al., 2007; Ogden, Flegal, Carroll, & Johnson, 2002). Childhood obesity is linked with adverse physiological and psychological issues in youth and affects ethnic/minority populations in disproportionate rates (Barlow et al., 2007; Butte et al., 2006; Butte, Cai, Cole, Wilson, Fisher, Zakeri, Ellis, & Comuzzie, 2007). More importantly, overweight in children and youth tends to track into adulthood (McNaughton, Ball, Mishra, & Crawford, 2008; Ogden et al., 2002). Childhood obesity affects body functions such as the cardiovascular, respiratory, gastrointestinal, and endocrine systems, including emotional health (Barlow et al., 2007, Ogden et al., 2002). Several dietary factors have been associated with the development of obesity in children; however, these factors have not been fully elucidated, especially in ethnic/minority children. In particular, few studies have been done to determine the effects of different meal patterns on the development of obesity in children. Purpose. The purpose of the study is to examine the relationships between daily proportions of energy consumed and energy derived from fat across breakfast, lunch, dinner, and snack, and obesity among Hispanic children and adolescents. Methods. A cross-sectional design was used to evaluate the relationship between dietary patterns and overweight status in Hispanic children and adolescents 4-19 years of age who participated in the Viva La Familia Study. The goal of the Viva La Familia Study was to evaluate genetic and environmental factors affecting childhood obesity and its co-morbidities in the Hispanic population (Butte et al., 2006, 2007). The study enrolled 1030 Hispanic children and adolescents from 319 families and examined factors related to increased body weight by focusing on a multilevel analysis of extensive sociodemographic, genetic, metabolic, and behavioral data. Baseline dietary intakes of the children were collected using 24-hour recalls, and body mass index was calculated from measured height and weight, and classified using the CDC standards. Dietary data were analyzed using a GEE population-averaged panel-data model with a cluster variable family identifier to include possible correlations within related data sets. A linear regression model was used to analyze associations of dietary patterns using possible covariates, and to examine the percentage of daily energy coming from breakfast, lunch, dinner, and snack while adjusting for age, sex, and BMI z-score. Random-effects logistic regression models were used to determine the relationship of the dietary variables with obesity status and to understand if the percent energy intake (%EI) derived from fat from all meals (breakfast, lunch, dinner, and snacks) affected obesity. Results. Older children (age 4-19 years) consumed a higher percent of energy at lunch and dinner and less percent energy from snacks compared to younger children. Age was significantly associated with percentage of total energy intake (%TEI) for lunch, as well as dinner, while no association was found by gender. Percent of energy consumed from dinner significantly differed by obesity status, with obese children consuming more energy at dinner (p = 0.03), but no associations were found between percent energy from fat and obesity across all meals. Conclusions. Information from this study can be used to develop interventions that target dietary intake patterns in obesity prevention programs for Hispanic children and adolescents. In particular, intervention programs for children should target dietary patterns with energy intake that is spread throughout the day and earlier in the day. These results indicate that a longitudinal study should be used to further explore the relationship of dietary patterns and BMI in this and other populations (Dubois et al., 2008; Rodriquez & Moreno, 2006; Thompson et al., 2005; Wilson et al., in review, 2008). ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Traditional comparison of standardized mortality ratios (SMRs) can be misleading if the age-specific mortality ratios are not homogeneous. For this reason, a regression model has been developed which incorporates the mortality ratio as a function of age. This model is then applied to mortality data from an occupational cohort study. The nature of the occupational data necessitates the investigation of mortality ratios which increase with age. These occupational data are used primarily to illustrate and develop the statistical methodology.^ The age-specific mortality ratio (MR) for the covariates of interest can be written as MR(,ij...m) = ((mu)(,ij...m)/(theta)(,ij...m)) = r(.)exp (Z('')(,ij...m)(beta)) where (mu)(,ij...m) and (theta)(,ij...m) denote the force of mortality in the study and chosen standard populations in the ij...m('th) stratum, respectively, r is the intercept, Z(,ij...m) is the vector of covariables associated with the i('th) age interval, and (beta) is a vector of regression coefficients associated with these covariables. A Newton-Raphson iterative procedure has been used for determining the maximum likelihood estimates of the regression coefficients.^ This model provides a statistical method for a logical and easily interpretable explanation of an occupational cohort mortality experience. Since it gives a reasonable fit to the mortality data, it can also be concluded that the model is fairly realistic. The traditional statistical method for the analysis of occupational cohort mortality data is to present a summary index such as the SMR under the assumption of constant (homogeneous) age-specific mortality ratios. Since the mortality ratios for occupational groups usually increase with age, the homogeneity assumption of the age-specific mortality ratios is often untenable. The traditional method of comparing SMRs under the homogeneity assumption is a special case of this model, without age as a covariate.^ This model also provides a statistical technique to evaluate the relative risk between two SMRs or a dose-response relationship among several SMRs. The model presented has application in the medical, demographic and epidemiologic areas. The methods developed in this thesis are suitable for future analyses of mortality or morbidity data when the age-specific mortality/morbidity experience is a function of age or when there is an interaction effect between confounding variables needs to be evaluated. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In regression analysis, covariate measurement error occurs in many applications. The error-prone covariates are often referred to as latent variables. In this proposed study, we extended the study of Chan et al. (2008) on recovering latent slope in a simple regression model to that in a multiple regression model. We presented an approach that applied the Monte Carlo method in the Bayesian framework to the parametric regression model with the measurement error in an explanatory variable. The proposed estimator applied the conditional expectation of latent slope given the observed outcome and surrogate variables in the multiple regression models. A simulation study was presented showing that the method produces estimator that is efficient in the multiple regression model, especially when the measurement error variance of surrogate variable is large.^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The tobacco-specific nitrosamine 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK) is an obvious carcinogen for lung cancer. Since CBMN (Cytokinesis-blocked micronucleus) has been found to be extremely sensitive to NNK-induced genetic damage, it is a potential important factor to predict the lung cancer risk. However, the association between lung cancer and NNK-induced genetic damage measured by CBMN assay has not been rigorously examined. ^ This research develops a methodology to model the chromosomal changes under NNK-induced genetic damage in a logistic regression framework in order to predict the occurrence of lung cancer. Since these chromosomal changes were usually not observed very long due to laboratory cost and time, a resampling technique was applied to generate the Markov chain of the normal and the damaged cell for each individual. A joint likelihood between the resampled Markov chains and the logistic regression model including transition probabilities of this chain as covariates was established. The Maximum likelihood estimation was applied to carry on the statistical test for comparison. The ability of this approach to increase discriminating power to predict lung cancer was compared to a baseline "non-genetic" model. ^ Our method offered an option to understand the association between the dynamic cell information and lung cancer. Our study indicated the extent of DNA damage/non-damage using the CBMN assay provides critical information that impacts public health studies of lung cancer risk. This novel statistical method could simultaneously estimate the process of DNA damage/non-damage and its relationship with lung cancer for each individual.^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

It is well known that an identification problem exists in the analysis of age-period-cohort data because of the relationship among the three factors (date of birth + age at death = date of death). There are numerous suggestions about how to analyze the data. No one solution has been satisfactory. The purpose of this study is to provide another analytic method by extending the Cox's lifetable regression model with time-dependent covariates. The new approach contains the following features: (1) It is based on the conditional maximum likelihood procedure using a proportional hazard function described by Cox (1972), treating the age factor as the underlying hazard to estimate the parameters for the cohort and period factors. (2) The model is flexible so that both the cohort and period factors can be treated as dummy or continuous variables, and the parameter estimations can be obtained for numerous combinations of variables as in a regression analysis. (3) The model is applicable even when the time period is unequally spaced.^ Two specific models are considered to illustrate the new approach and applied to the U.S. prostate cancer data. We find that there are significant differences between all cohorts and there is a significant period effect for both whites and nonwhites. The underlying hazard increases exponentially with age indicating that old people have much higher risk than young people. A log transformation of relative risk shows that the prostate cancer risk declined in recent cohorts for both models. However, prostate cancer risk declined 5 cohorts (25 years) earlier for whites than for nonwhites under the period factor model (0 0 0 1 1 1 1). These latter results are similar to the previous study by Holford (1983).^ The new approach offers a general method to analyze the age-period-cohort data without using any arbitrary constraint in the model. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Interannual environmental variability in Peru is dominated by the El Niño Southern Oscillation (ENSO). The most dramatic changes are associated with the warm El Niño (EN) phase (opposite the cold La Niña phase), which disrupts the normal coastal upwelling and affects the dynamics of many coastal marine and terrestrial resources. This study presents a trophic model for Sechura Bay, located at the northern extension of the Peruvian upwelling system, where ENSO-induced environmental variability is most extreme. Using an initial steady-state model for the year 1996, we explore the dynamics of the ecosystem through the year 2003 (including the strong EN of 1997/98 and the weaker EN of 2002/03). Based on support from literature, we force biomass of several non-trophically-mediated 'drivers' (e.g. Scallops, Benthic detritivores, Octopus, and Littoral fish) to observe whether the fit between historical and simulated changes (by the trophic model) is improved. The results indicate that the Sechura Bay Ecosystem is a relatively inefficient system from a community energetics point of view, likely due to the periodic perturbations of ENSO. A combination of high system productivity and low trophic level target species of invertebrates (i.e. scallops) and fish (i.e. anchoveta) results in high catches and an efficient fishery. The importance of environmental drivers is suggested, given the relatively small improvements in the fit of the simulation with the addition of trophic drivers on remaining functional groups' dynamics. An additional multivariate regression model is presented for the scallop Argopecten purpuratus, which demonstrates a significant correlation between both spawning stock size and riverine discharge-mediated mortality on catch levels. These results are discussed in the context of the appropriateness of trophodynamic modeling in relatively open systems, and how management strategies may be focused given the highly environmentally influenced marine resources of the region.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A research has been carried out in two-lanehighways in the Madrid Region to propose an alternativemodel for the speed-flowrelationship using regular loop data. The model is different in shape and, in some cases, slopes with respect to the contents of Highway Capacity Manual (HCM). A model is proposed for a mountainous area road, something for which the HCM does not provide explicitly a solution. The problem of a mountain road with high flows to access a popular recreational area is discussed, and some solutions are proposed. Up to 7 one-way sections of two-lanehighways have been selected, aiming at covering a significant number of different characteristics, to verify the proposed method the different classes of highways on which the Manual classifies them. In order to enunciate the model and to verify the basic variables of these types of roads a high number of data have been used. The counts were collected in the same way that the Madrid Region Highway Agency performs their counts. A total of 1.471 hours have been collected, in periods of 5 minutes. The models have been verified by means of specific statistical test (R2, T-Student, Durbin-Watson, ANOVA, etc.) and with the diagnostics of the contrast of assumptions (normality, linearity, homoscedasticity and independence). The model proposed for this type of highways with base conditions, can explain the different behaviors as traffic volumes increase, and follows a polynomial multiple regression model of order 3, S shaped. As secondary results of this research, the levels of service and the capacities of this road have been measured with the 2000 HCM methodology, and the results discussed. © 2011 Published by Elsevier Ltd.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Probabilistic modeling is the de�ning characteristic of estimation of distribution algorithms (EDAs) which determines their behavior and performance in optimization. Regularization is a well-known statistical technique used for obtaining an improved model by reducing the generalization error of estimation, especially in high-dimensional problems. `1-regularization is a type of this technique with the appealing variable selection property which results in sparse model estimations. In this thesis, we study the use of regularization techniques for model learning in EDAs. Several methods for regularized model estimation in continuous domains based on a Gaussian distribution assumption are presented, and analyzed from di�erent aspects when used for optimization in a high-dimensional setting, where the population size of EDA has a logarithmic scale with respect to the number of variables. The optimization results obtained for a number of continuous problems with an increasing number of variables show that the proposed EDA based on regularized model estimation performs a more robust optimization, and is able to achieve signi�cantly better results for larger dimensions than other Gaussian-based EDAs. We also propose a method for learning a marginally factorized Gaussian Markov random �eld model using regularization techniques and a clustering algorithm. The experimental results show notable optimization performance on continuous additively decomposable problems when using this model estimation method. Our study also covers multi-objective optimization and we propose joint probabilistic modeling of variables and objectives in EDAs based on Bayesian networks, speci�cally models inspired from multi-dimensional Bayesian network classi�ers. It is shown that with this approach to modeling, two new types of relationships are encoded in the estimated models in addition to the variable relationships captured in other EDAs: objectivevariable and objective-objective relationships. An extensive experimental study shows the e�ectiveness of this approach for multi- and many-objective optimization. With the proposed joint variable-objective modeling, in addition to the Pareto set approximation, the algorithm is also able to obtain an estimation of the multi-objective problem structure. Finally, the study of multi-objective optimization based on joint probabilistic modeling is extended to noisy domains, where the noise in objective values is represented by intervals. A new version of the Pareto dominance relation for ordering the solutions in these problems, namely �-degree Pareto dominance, is introduced and its properties are analyzed. We show that the ranking methods based on this dominance relation can result in competitive performance of EDAs with respect to the quality of the approximated Pareto sets. This dominance relation is then used together with a method for joint probabilistic modeling based on `1-regularization for multi-objective feature subset selection in classi�cation, where six di�erent measures of accuracy are considered as objectives with interval values. The individual assessment of the proposed joint probabilistic modeling and solution ranking methods on datasets with small-medium dimensionality, when using two di�erent Bayesian classi�ers, shows that comparable or better Pareto sets of feature subsets are approximated in comparison to standard methods.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

El proceso de cambio de una sociedad industrial a una sociedad del conocimiento, que experimenta el mundo globalizado en el siglo XXI, induce a las empresas y organizaciones a desarrollar ventajas competitivas y sostenibles basadas en sus activos intangibles, entre los cuales destacan los sistemas de gestión en general y los sistemas de gestión de la calidad (SGC) en particular. Las organizaciones dedicadas a la producción de petróleo están influenciadas por dicha tendencia. El petróleo es un recurso natural con reservas limitadas, cuya producción y consumo ha crecido progresivamente, aportando la mayor cuota (35 %) del total de la energía que se consume en el mundo contemporáneo, aporte que se mantendrá hasta el año 2035, según las previsiones más conservadoras. Por tanto, se hace necesario desarrollar modelos de producción innovadores, que contribuyan a la mejora del factor de recobro de los yacimientos y de la vida útil de los mismos, al tiempo que satisfagan los requerimientos de producción y consumo diarios de los exigentes mercados globales. El objeto de esta investigación es el desarrollo de un modelo de gestión de la calidad y su efecto en el desempeño organizacional, a través del efecto mediador de los constructos satisfacción del cliente interno y gestión del conocimiento en la producción de petróleo. Esta investigación de carácter explicativo, no experimental, transeccional y ex-postfacto, se realizó en la región petrolífera del lago de Maracaibo, al occidente de Venezuela, la cual tiene más de 70 años en producción y cuenta con yacimientos maduros. La población objeto de estudio fue de 369 trabajadores petroleros, quienes participaron en las mesas técnicas de la calidad, durante los meses de mayo y julio del año 2012, los cuales en su mayoría están en proceso de formación como analistas, asesores y auditores de los SGC. La técnica de muestreo aplicada fue de tipo aleatorio simple, con una muestra de 252 individuos. A la misma se le aplicó un cuestionario diseñado ad hoc, el cual fue validado por las técnicas de juicio de expertos y prueba piloto. El procedimiento de investigación se realizó a través de una secuencia, que incluyó la elaboración de un modelo teórico, basado en la revisión del estado del arte; un modelo factorial, sobre la base del análisis factorial de los datos de la encuesta; un modelo de regresión lineal, elaborado a través del método de regresión lineal simple y múltiple; un modelo de análisis de sendero, realizado con el software Amos 20 SPSS y finalmente, un modelo informático, realizado con el simulador Vensim PLE v.6.2. Los resultados obtenidos indican que el modelo teórico se transformó en un modelo empírico, en el cual, la variable independiente fue el SGC, la variable mediadora fue la integración de las dimensiones eliminación de la no conformidad, satisfacción del cliente interno y aprendizaje organizacional (ENCSCIAO) y la variable respuesta la integración de las dimensiones desempeño organizacional y aprendizaje organizacional (DOOA). Se verificó el efecto mediador del ENSCIAO sobre la relación SGC-DOOA con una bondad del ajuste, del 42,65%. En el modelo de regresión múltiple se encontró que las variables determinantes son eliminación de la no conformidad (ENC), conocimiento adquirido (CA) y conocimiento espontáneo (CE), lo cual fue corroborado con el modelo de análisis de sendero. El modelo informático se desarrolló empleando datos aproximados de una unidad de producción tipo, generándose cuatro escenarios; siendo el más favorable, aquel en el cual se aplicaba el SGC y variables relacionadas, reduciendo la desviación de la producción, incrementando el factor de recobro y ampliando la vida útil del yacimiento. Se concluye que la aplicación del SGC y constructos relacionados favorece el desempeño y la producción de las unidades de explotación de yacimientos petrolíferos maduros. Los principales aportes de la tesis son la obtención de un modelo de gestión de la producción de petróleo en yacimientos maduros, basado en los SGC. Asimismo, el desarrollo de un concepto de gestión de la calidad asociado a la reducción de la desviación de la producción petrolífera anual, al incremento del factor de recobro y al aumento de la vida útil del yacimiento. Las futuras líneas de investigación están orientadas a la aplicación del modelo en contextos reales y específicos, para medir su impacto y realizar los ajustes pertinentes. ABSTRACT The process of change from an industrial society to a knowledge-based society, which undergoes the globalized world in the twenty-first century, induces companies and organizations to develop a sustainable and competitive advantages based on its intangible assets, among which are noteworthy the management systems in general and particularly the quality management systems (QMS). Organizations engaged in oil production are influenced by said trend. Oil is a natural resource with limited reserves, where production and consumption has grown progressively, providing the largest share (35%) of the total energy consumed in the contemporary world, a contribution that will remain until the year 2035 according to the more conservative trust estimations. Therefore, it becomes necessary to develop innovative production models which contribute with the improvement of reservoirs´ recovery factor and the lifetime thereof, while meeting the production requirements and daily consumption of demanding global markets. The aim of this research is to develop a model of quality management and its effect on organizational performance through the mediator effect of the constructs, internal customer satisfaction and knowledge management in oil production. This research of explanatory nature, not experimental, transactional and expos-facto was carried out in the oil-region of Maracaibo Lake located to the west of Venezuela, which has more than 70 years in continuous production and has mature reservoirs. The population under study was 369 oil workers who participated in the technical quality workshops, during the months of May and July of 2012, the majority of which were in the process of training as analysts, consultants and auditors of the QMS. The sampling technique applied was simple random type. To a sample of 252 individuals of the population it was applied an ad hoc designed questionnaire, which was validated by the techniques of expert judgment and pilot test. The research procedure was performed through a sequence, which included the elaboration of a theoretical model, based on the review of the state of the art; a factorial model with based on factorial analysis of the survey data; a linear regression model, developed through the method of simple and multiple linear regression; a structural equation model, made with software °Amos 20 SPSS° and finally, a computer model, performed with the simulator Vensim PLE v.6.2. The results indicate that the theoretical model was transformed into an empirical model, in which the independent variable was the QMS, the mediator variable was the integration of the dimensions: elimination of non-conformity, internal customer satisfaction and organizational learning (ENCSCIAO) and the response variable the integration of the dimensions, organizational performance and learning organizational (DOOA). ENSCIAO´s mediator effect on the relation QMS-DOOA was verified with a goodness of fit of 42,65%. In the multiple regression model was found to be the determining variables are elimination of nonconformity (ENC), knowledge acquired (CA) and spontaneous knowledge (EC), which was verified with the structural equation model. The computer model was developed based on approximate data of an oil production unit type, creating four (04) scenarios; being the most favorable, that one which it was applied the QMS and related variables, reducing the production deviation, increasing the recovery factor and extending the lifetime of the reservoir. It is concluded that QMS implementation powered with the related constructs, favors performance and production of mature oilfield of exploitation reservoirs units. The main contributions of this thesis are obtaining a management model for oil production in mature oilfields, based on QMS. In addition, development of a concept of quality associated to reduce the annual oil production deviation, increase the recovery factor and increase oilfield lifetime. Future lines of research are oriented to the implementation of this model in real and specific contexts to measure its impact and make the necessary adjustments that might take place.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper analyses the relationship between productive efficiency and online-social-networks (OSN) in Spanish telecommunications firms. A data-envelopment-analysis (DEA) is used and several indicators of business ?social Media? activities are incorporated. A super-efficiency analysis and bootstrapping techniques are performed to increase the model?s robustness and accuracy. Then, a logistic regression model is applied to characterise factors and drivers of good performance in OSN. Results reveal the company?s ability to absorb and utilise OSNs as a key factor in improving the productive efficiency. This paper presents a model for assessing the strategic performance of the presence and activity in OSN.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Road accidents are a very relevant issue in many countries and macroeconomic models are very frequently applied by academia and administrations to reduce their frequency and consequences. The selection of explanatory variables and response transformation parameter within the Bayesian framework for the selection of the set of explanatory variables a TIM and 3IM (two input and three input models) procedures are proposed. The procedure also uses the DIC and pseudo -R2 goodness of fit criteria. The model to which the methodology is applied is a dynamic regression model with Box-Cox transformation (BCT) for the explanatory variables and autorgressive (AR) structure for the response. The initial set of 22 explanatory variables are identified. The effects of these factors on the fatal accident frequency in Spain, during 2000-2012, are estimated. The dependent variable is constructed considering the stochastic trend component.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We propose a general procedure for solving incomplete data estimation problems. The procedure can be used to find the maximum likelihood estimate or to solve estimating equations in difficult cases such as estimation with the censored or truncated regression model, the nonlinear structural measurement error model, and the random effects model. The procedure is based on the general principle of stochastic approximation and the Markov chain Monte-Carlo method. Applying the theory on adaptive algorithms, we derive conditions under which the proposed procedure converges. Simulation studies also indicate that the proposed procedure consistently converges to the maximum likelihood estimate for the structural measurement error logistic regression model.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This research proposes a methodology to improve computed individual prediction values provided by an existing regression model without having to change either its parameters or its architecture. In other words, we are interested in achieving more accurate results by adjusting the calculated regression prediction values, without modifying or rebuilding the original regression model. Our proposition is to adjust the regression prediction values using individual reliability estimates that indicate if a single regression prediction is likely to produce an error considered critical by the user of the regression. The proposed method was tested in three sets of experiments using three different types of data. The first set of experiments worked with synthetically produced data, the second with cross sectional data from the public data source UCI Machine Learning Repository and the third with time series data from ISO-NE (Independent System Operator in New England). The experiments with synthetic data were performed to verify how the method behaves in controlled situations. In this case, the outcomes of the experiments produced superior results with respect to predictions improvement for artificially produced cleaner datasets with progressive worsening with the addition of increased random elements. The experiments with real data extracted from UCI and ISO-NE were done to investigate the applicability of the methodology in the real world. The proposed method was able to improve regression prediction values by about 95% of the experiments with real data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

wgttest performs a test proposed by DuMouchel and Duncan (1983) to evaluate whether the weighted and unweighted estimates of a regression model are significantly different.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

BACKGROUND Respiratory tract infections and subsequent airway inflammation occur early in the life of infants with cystic fibrosis. However, detailed information about the microbial composition of the respiratory tract in infants with this disorder is scarce. We aimed to undertake longitudinal in-depth characterisation of the upper respiratory tract microbiota in infants with cystic fibrosis during the first year of life. METHODS We did this prospective cohort study at seven cystic fibrosis centres in Switzerland. Between Feb 1, 2011, and May 31, 2014, we enrolled 30 infants with a diagnosis of cystic fibrosis. Microbiota characterisation was done with 16S rRNA gene pyrosequencing and oligotyping of nasal swabs collected every 2 weeks from the infants with cystic fibrosis. We compared these data with data for an age-matched cohort of 47 healthy infants. We additionally investigated the effect of antibiotic treatment on the microbiota of infants with cystic fibrosis. Statistical methods included regression analyses with a multivariable multilevel linear model with random effects to correct for clustering on the individual level. FINDINGS We analysed 461 nasal swabs taken from the infants with cystic fibrosis; the cohort of healthy infants comprised 872 samples. The microbiota of infants with cystic fibrosis differed compositionally from that of healthy infants (p=0·001). This difference was also found in exclusively antibiotic-naive samples (p=0·001). The disordering was mainly, but not solely, due to an overall increase in the mean relative abundance of Staphylococcaceae in infants with cystic fibrosis compared with healthy infants (multivariable linear regression model stratified by age and adjusted for season; second month: coefficient 16·2 [95% CI 0·6-31·9]; p=0·04; third month: 17·9 [3·3-32·5]; p=0·02; fourth month: 21·1 [7·8-34·3]; p=0·002). Oligotyping analysis enabled differentiation between Staphylococcus aureus and coagulase-negative Staphylococci. Whereas the analysis showed a decrease in S aureus at and after antibiotic treatment, coagulase-negative Staphylococci increased. INTERPRETATION Our study describes compositional differences in the microbiota of infants with cystic fibrosis compared with healthy controls, and disordering of the microbiota on antibiotic administration. Besides S aureus, coagulase-negative Staphylococci also contributed to the disordering identified in these infants. These findings are clinically important in view of the crucial role that bacterial pathogens have in the disease progression of cystic fibrosis in early life. Our findings could be used to inform future studies of the effect of antibiotic treatment on the microbiota in infants with cystic fibrosis, and could assist in the prevention of early disease progression in infants with this disorder. FUNDING Swiss National Science Foundation, Fondation Botnar, the Swiss Society for Cystic Fibrosis, and the Swiss Lung Association Bern.