870 resultados para panel data modeling
Resumo:
Chiasma and crossover are two related biological processes of great importance in the understanding genetic variation. The study of these processes is straightforward in organisms where all products of meiosis are recovered and can be observed. This is not the case in mammals. Our understanding of these processes depends on our ability to model them. In this study I describe the biological processes that underline chiasma and crossover as well as the two main inference problems associated with these processes: i) in mammals we only recover one of the four products of meiosis and, ii) in general, we do not observe where the crossovers actually happen, but we find an interval containing type-2 censored information. NPML estimate was proposed and used in this work and used to compare chromosome length and chromosome expansion through the crosses.
A new analysis of hydrographic data in the Atlantic and its application to an inverse modeling study
Resumo:
The goal of this study is to provide a framework for future researchers to understand and use the FARSITE wildfire-forecasting model with data assimilation. Current wildfire models lack the ability to provide accurate prediction of fire front position faster than real-time. When FARSITE is coupled with a recursive ensemble filter, the data assimilation forecast method improves. The scope includes an explanation of the standalone FARSITE application, technical details on FARSITE integration with a parallel program coupler called OpenPALM, and a model demonstration of the FARSITE-Ensemble Kalman Filter software using the FireFlux I experiment by Craig Clements. The results show that the fire front forecast is improved with the proposed data-driven methodology than with the standalone FARSITE model.
Resumo:
Agroforestry has large potential for carbon (C) sequestration while providing many economical, social, and ecological benefits via its diversified products. Airborne lidar is considered as the most accurate technology for mapping aboveground biomass (AGB) over landscape levels. However, little research in the past has been done to study AGB of agroforestry systems using airborne lidar data. Focusing on an agroforestry system in the Brazilian Amazon, this study first predicted plot-level AGB using fixed-effects regression models that assumed the regression coefficients to be constants. The model prediction errors were then analyzed from the perspectives of tree DBH (diameter at breast height)?height relationships and plot-level wood density, which suggested the need for stratifying agroforestry fields to improve plot-level AGB modeling. We separated teak plantations from other agroforestry types and predicted AGB using mixed-effects models that can incorporate the variation of AGB-height relationship across agroforestry types. We found that, at the plot scale, mixed-effects models led to better model prediction performance (based on leave-one-out cross-validation) than the fixed-effects models, with the coefficient of determination (R2) increasing from 0.38 to 0.64. At the landscape level, the difference between AGB densities from the two types of models was ~10% on average and up to ~30% at the pixel level. This study suggested the importance of stratification based on tree AGB allometry and the utility of mixed-effects models in modeling and mapping AGB of agroforestry systems.
Resumo:
Spatio-temporal modelling is an area of increasing importance in which models and methods have often been developed to deal with specific applications. In this study, a spatio-temporal model was used to estimate daily rainfall data. Rainfall records from several weather stations, obtained from the Agritempo system for two climatic homogeneous zones, were used. Rainfall values obtained for two fixed dates (January 1 and May 1, 2012) using the spatio-temporal model were compared with the geostatisticals techniques of ordinary kriging and ordinary cokriging with altitude as auxiliary variable. The spatio-temporal model was more than 17% better at producing estimates of daily precipitation compared to kriging and cokriging in the first zone and more than 18% in the second zone. The spatio-temporal model proved to be a versatile technique, adapting to different seasons and dates.
Resumo:
Currently, the identification of two cryptic Iberian amphibians, Discoglossus galganoi Capula, Nascetti, Lanza, Bullini and Crespo, 1985 and Discoglossus jeanneae Busack, 1986, relies on molecular characterization. To provide a means to discern the distributions of these species, we used 385-base-pair sequences of the cytochrome b gene to identify 54 Spanish populations of Discoglossus. These data and a series of environmental variables were used to build up a logistic regression model capable of probabilistically designating a specimen of Discoglossus found in any Universal Transverse Mercator (UTM) grid cell of 10 km × 10 km to one of the two species. Western longitudes, wide river basins, and semipermeable (mainly siliceous) and sandstone substrates favored the presence of D. galganoi, while eastern longitudes, mountainous areas, severe floodings, and impermeable (mainly clay) or basic (limestone and gypsum) substrates favored D. jeanneae. Fifteen percent of the UTM cells were predicted to be shared by both species, whereas 51% were clearly in favor of D. galganoi and 34% were in favor of D. jeanneae, considering odds of 4:1. These results suggest that these two species have parapatric distributions and allow for preliminary identification of potential secondary contact areas. The method applied here can be generalized and used for other geographic problems posed by cryptic species.
Resumo:
The fast development of Information Communication Technologies (ICT) offers new opportunities to realize future smart cities. To understand, manage and forecast the city's behavior, it is necessary the analysis of different kinds of data from the most varied dataset acquisition systems. The aim of this research activity in the framework of Data Science and Complex Systems Physics is to provide stakeholders with new knowledge tools to improve the sustainability of mobility demand in future cities. Under this perspective, the governance of mobility demand generated by large tourist flows is becoming a vital issue for the quality of life in Italian cities' historical centers, which will worsen in the next future due to the continuous globalization process. Another critical theme is sustainable mobility, which aims to reduce private transportation means in the cities and improve multimodal mobility. We analyze the statistical properties of urban mobility of Venice, Rimini, and Bologna by using different datasets provided by companies and local authorities. We develop algorithms and tools for cartography extraction, trips reconstruction, multimodality classification, and mobility simulation. We show the existence of characteristic mobility paths and statistical properties depending on transport means and user's kinds. Finally, we use our results to model and simulate the overall behavior of the cars moving in the Emilia Romagna Region and the pedestrians moving in Venice with software able to replicate in silico the demand for mobility and its dynamic.
Resumo:
The coastal ocean is a complex environment with extremely dynamic processes that require a high-resolution and cross-scale modeling approach in which all hydrodynamic fields and scales are considered integral parts of the overall system. In the last decade, unstructured-grid models have been used to advance in seamless modeling between scales. On the other hand, the data assimilation methodologies to improve the unstructured-grid models in the coastal seas have been developed only recently and need significant advancements. Here, we link the unstructured-grid ocean modeling to the variational data assimilation methods. In particular, we show results from the modeling system SANIFS based on SHYFEM fully-baroclinic unstructured-grid model interfaced with OceanVar, a state-of-art variational data assimilation scheme adopted for several systems based on a structured grid. OceanVar implements a 3DVar DA scheme. The combination of three linear operators models the background error covariance matrix. The vertical part is represented using multivariate EOFs for temperature, salinity, and sea level anomaly. The horizontal part is assumed to be Gaussian isotropic and is modeled using a first-order recursive filter algorithm designed for structured and regular grids. Here we introduced a novel recursive filter algorithm for unstructured grids. A local hydrostatic adjustment scheme models the rapidly evolving part of the background error covariance. We designed two data assimilation experiments using SANIFS implementation interfaced with OceanVar over the period 2017-2018, one with only temperature and salinity assimilation by Argo profiles and the second also including sea level anomaly. The results showed a successful implementation of the approach and the added value of the assimilation for the active tracer fields. While looking at the broad basin, no significant improvements are highlighted for the sea level, requiring future investigations. Furthermore, a Machine Learning methodology based on an LSTM network has been used to predict the model SST increments.
Resumo:
The present Dissertation shows how recent statistical analysis tools and open datasets can be exploited to improve modelling accuracy in two distinct yet interconnected domains of flood hazard (FH) assessment. In the first Part, unsupervised artificial neural networks are employed as regional models for sub-daily rainfall extremes. The models aim to learn a robust relation to estimate locally the parameters of Gumbel distributions of extreme rainfall depths for any sub-daily duration (1-24h). The predictions depend on twenty morphoclimatic descriptors. A large study area in north-central Italy is adopted, where 2238 annual maximum series are available. Validation is performed over an independent set of 100 gauges. Our results show that multivariate ANNs may remarkably improve the estimation of percentiles relative to the benchmark approach from the literature, where Gumbel parameters depend on mean annual precipitation. Finally, we show that the very nature of the proposed ANN models makes them suitable for interpolating predicted sub-daily rainfall quantiles across space and time-aggregation intervals. In the second Part, decision trees are used to combine a selected blend of input geomorphic descriptors for predicting FH. Relative to existing DEM-based approaches, this method is innovative, as it relies on the combination of three characteristics: (1) simple multivariate models, (2) a set of exclusively DEM-based descriptors as input, and (3) an existing FH map as reference information. First, the methods are applied to northern Italy, represented with the MERIT DEM (∼90m resolution), and second, to the whole of Italy, represented with the EU-DEM (25m resolution). The results show that multivariate approaches may (a) significantly enhance flood-prone areas delineation relative to a selected univariate one, (b) provide accurate predictions of expected inundation depths, (c) produce encouraging results in extrapolation, (d) complete the information of imperfect reference maps, and (e) conveniently convert binary maps into continuous representation of FH.
Resumo:
American tegumentary leishmaniasis (ATL) is a disease transmitted to humans by the female sandflies of the genus Lutzomyia. Several factors are involved in the disease transmission cycle. In this work only rainfall and deforestation were considered to assess the variability in the incidence of ATL. In order to reach this goal, monthly recorded data of the incidence of ATL in Orán, Salta, Argentina, were used, in the period 1985-2007. The square root of the relative incidence of ATL and the corresponding variance were formulated as time series, and these data were smoothed by moving averages of 12 and 24 months, respectively. The same procedure was applied to the rainfall data. Typical months, which are April, August, and December, were found and allowed us to describe the dynamical behavior of ATL outbreaks. These results were tested at 95% confidence level. We concluded that the variability of rainfall would not be enough to justify the epidemic outbreaks of ATL in the period 1997-2000, but it consistently explains the situation observed in the years 2002 and 2004. Deforestation activities occurred in this region could explain epidemic peaks observed in both years and also during the entire time of observation except in 2005-2007.
Resumo:
In this work, all publicly-accessible published findings on Alicyclobacillus acidoterrestris heat resistance in fruit beverages as affected by temperature and pH were compiled. Then, study characteristics (protocols, fruit and variety, °Brix, pH, temperature, heating medium, culture medium, inactivation method, strains, etc.) were extracted from the primary studies, and some of them incorporated to a meta-analysis mixed-effects linear model based on the basic Bigelow equation describing the heat resistance parameters of this bacterium. The model estimated mean D* values (time needed for one log reduction at a temperature of 95 °C and a pH of 3.5) of Alicyclobacillus in beverages of different fruits, two different concentration types, with and without bacteriocins, and with and without clarification. The zT (temperature change needed to cause one log reduction in D-values) estimated by the meta-analysis model were compared to those ('observed' zT values) reported in the primary studies, and in all cases they were within the confidence intervals of the model. The model was capable of predicting the heat resistance parameters of Alicyclobacillus in fruit beverages beyond the types available in the meta-analytical data. It is expected that the compilation of the thermal resistance of Alicyclobacillus in fruit beverages, carried out in this study, will be of utility to food quality managers in the determination or validation of the lethality of their current heat treatment processes.
Resumo:
Below cloud scavenging processes have been investigated considering a numerical simulation, local atmospheric conditions and particulate matter (PM) concentrations, at different sites in Germany. The below cloud scavenging model has been coupled with bulk particulate matter counter TSI (Trust Portacounter dataset, consisting of the variability prediction of the particulate air concentrations during chosen rain events. The TSI samples and meteorological parameters were obtained during three winter Campaigns: at Deuselbach, March 1994, consisting in three different events; Sylt, April 1994 and; Freiburg, March 1995. The results show a good agreement between modeled and observed air concentrations, emphasizing the quality of the conceptual model used in the below cloud scavenging numerical modeling. The results between modeled and observed data have also presented high square Pearson coefficient correlations over 0.7 and significant, except the Freiburg Campaign event. The differences between numerical simulations and observed dataset are explained by the wind direction changes and, perhaps, the absence of advection mass terms inside the modeling. These results validate previous works based on the same conceptual model.
Resumo:
Background: Malaria is an important threat to travelers visiting endemic regions. The risk of acquiring malaria is complex and a number of factors including transmission intensity, duration of exposure, season of the year and use of chemoprophylaxis have to be taken into account estimating risk. Materials and methods: A mathematical model was developed to estimate the risk of non-immune individual acquiring falciparum malaria when traveling to the Amazon region of Brazil. The risk of malaria infection to travelers was calculated as a function of duration of exposure and season of arrival. Results: The results suggest significant variation of risk for non-immune travelers depending on arrival season, duration of the visit and transmission intensity. The calculated risk for visitors staying longer than 4 months during peak transmission was 0.5% per visit. Conclusions: Risk estimates based on mathematical modeling based on accurate data can be a valuable tool in assessing risk/benefits and cost/benefits when deciding on the value of interventions for travelers to malaria endemic regions.
Resumo:
Background: Analyses of population structure and breed diversity have provided insight into the origin and evolution of cattle. Previously, these studies have used a low density of microsatellite markers, however, with the large number of single nucleotide polymorphism markers that are now available, it is possible to perform genome wide population genetic analyses in cattle. In this study, we used a high-density panel of SNP markers to examine population structure and diversity among eight cattle breeds sampled from Bos indicus and Bos taurus. Results: Two thousand six hundred and forty one single nucleotide polymorphisms ( SNPs) spanning all of the bovine autosomal genome were genotyped in Angus, Brahman, Charolais, Dutch Black and White Dairy, Holstein, Japanese Black, Limousin and Nelore cattle. Population structure was examined using the linkage model in the program STRUCTURE and Fst estimates were used to construct a neighbor-joining tree to represent the phylogenetic relationship among these breeds. Conclusion: The whole-genome SNP panel identified several levels of population substructure in the set of examined cattle breeds. The greatest level of genetic differentiation was detected between the Bos taurus and Bos indicus breeds. When the Bos indicus breeds were excluded from the analysis, genetic differences among beef versus dairy and European versus Asian breeds were detected among the Bos taurus breeds. Exploration of the number of SNP loci required to differentiate between breeds showed that for 100 SNP loci, individuals could only be correctly clustered into breeds 50% of the time, thus a large number of SNP markers are required to replace the 30 microsatellite markers that are currently commonly used in genetic diversity studies.
Resumo:
Melanoma is a highly aggressive and therapy resistant tumor for which the identification of specific markers and therapeutic targets is highly desirable. We describe here the development and use of a bioinformatic pipeline tool, made publicly available under the name of EST2TSE, for the in silico detection of candidate genes with tissue-specific expression. Using this tool we mined the human EST (Expressed Sequence Tag) database for sequences derived exclusively from melanoma. We found 29 UniGene clusters of multiple ESTs with the potential to predict novel genes with melanoma-specific expression. Using a diverse panel of human tissues and cell lines, we validated the expression of a subset of three previously uncharacterized genes (clusters Hs.295012, Hs.518391, and Hs.559350) to be highly restricted to melanoma/melanocytes and named them RMEL1, 2 and 3, respectively. Expression analysis in nevi, primary melanomas, and metastatic melanomas revealed RMEL1 as a novel melanocytic lineage-specific gene up-regulated during melanoma development. RMEL2 expression was restricted to melanoma tissues and glioblastoma. RMEL3 showed strong up-regulation in nevi and was lost in metastatic tumors. Interestingly, we found correlations of RMEL2 and RMEL3 expression with improved patient outcome, suggesting tumor and/or metastasis suppressor functions for these genes. The three genes are composed of multiple exons and map to 2q12.2, 1q25.3, and 5q11.2, respectively. They are well conserved throughout primates, but not other genomes, and were predicted as having no coding potential, although primate-conserved and human-specific short ORFs could be found. Hairpin RNA secondary structures were also predicted. Concluding, this work offers new melanoma-specific genes for future validation as prognostic markers or as targets for the development of therapeutic strategies to treat melanoma.