956 resultados para Models for count data
Resumo:
We analyze a real data set pertaining to reindeer fecal pellet-group counts obtained from a survey conducted in a forest area in northern Sweden. In the data set, over 70% of counts are zeros, and there is high spatial correlation. We use conditionally autoregressive random effects for modeling of spatial correlation in a Poisson generalized linear mixed model (GLMM), quasi-Poisson hierarchical generalized linear model (HGLM), zero-inflated Poisson (ZIP), and hurdle models. The quasi-Poisson HGLM allows for both under- and overdispersion with excessive zeros, while the ZIP and hurdle models allow only for overdispersion. In analyzing the real data set, we see that the quasi-Poisson HGLMs can perform better than the other commonly used models, for example, ordinary Poisson HGLMs, spatial ZIP, and spatial hurdle models, and that the underdispersed Poisson HGLMs with spatial correlation fit the reindeer data best. We develop R codes for fitting these models using a unified algorithm for the HGLMs. Spatial count response with an extremely high proportion of zeros, and underdispersion can be successfully modeled using the quasi-Poisson HGLM with spatial random effects.
Resumo:
Understanding how aquatic species grow is fundamental in fisheries because stock assessment often relies on growth dependent statistical models. Length-frequency-based methods become important when more applicable data for growth model estimation are either not available or very expensive. In this article, we develop a new framework for growth estimation from length-frequency data using a generalized von Bertalanffy growth model (VBGM) framework that allows for time-dependent covariates to be incorporated. A finite mixture of normal distributions is used to model the length-frequency cohorts of each month with the means constrained to follow a VBGM. The variances of the finite mixture components are constrained to be a function of mean length, reducing the number of parameters and allowing for an estimate of the variance at any length. To optimize the likelihood, we use a minorization–maximization (MM) algorithm with a Nelder–Mead sub-step. This work was motivated by the decline in catches of the blue swimmer crab (BSC) (Portunus armatus) off the east coast of Queensland, Australia. We test the method with a simulation study and then apply it to the BSC fishery data.
Resumo:
There is a growing societal need to address the increasing prevalence of behavioral health issues, such as obesity, alcohol or drug use, and general lack of treatment adherence for a variety of health problems. The statistics, worldwide and in the USA, are daunting. Excessive alcohol use is the third leading preventable cause of death in the United States (with 79,000 deaths annually), and is responsible for a wide range of health and social problems. On the positive side though, these behavioral health issues (and associated possible diseases) can often be prevented with relatively simple lifestyle changes, such as losing weight with a diet and/or physical exercise, or learning how to reduce alcohol consumption. Medicine has therefore started to move toward finding ways of preventively promoting wellness, rather than solely treating already established illness.^ Evidence-based patient-centered Brief Motivational Interviewing (BMI) interventions have been found particularly effective in helping people find intrinsic motivation to change problem behaviors after short counseling sessions, and to maintain healthy lifestyles over the long-term. Lack of locally available personnel well-trained in BMI, however, often limits access to successful interventions for people in need. To fill this accessibility gap, Computer-Based Interventions (CBIs) have started to emerge. Success of the CBIs, however, critically relies on insuring engagement and retention of CBI users so that they remain motivated to use these systems and come back to use them over the long term as necessary.^ Because of their text-only interfaces, current CBIs can therefore only express limited empathy and rapport, which are the most important factors of health interventions. Fortunately, in the last decade, computer science research has progressed in the design of simulated human characters with anthropomorphic communicative abilities. Virtual characters interact using humans’ innate communication modalities, such as facial expressions, body language, speech, and natural language understanding. By advancing research in Artificial Intelligence (AI), we can improve the ability of artificial agents to help us solve CBI problems.^ To facilitate successful communication and social interaction between artificial agents and human partners, it is essential that aspects of human social behavior, especially empathy and rapport, be considered when designing human-computer interfaces. Hence, the goal of the present dissertation is to provide a computational model of rapport to enhance an artificial agent’s social behavior, and to provide an experimental tool for the psychological theories shaping the model. Parts of this thesis were already published in [LYL+12, AYL12, AL13, ALYR13, LAYR13, YALR13, ALY14].^
Resumo:
The rapid growth of virtualized data centers and cloud hosting services is making the management of physical resources such as CPU, memory, and I/O bandwidth in data center servers increasingly important. Server management now involves dealing with multiple dissimilar applications with varying Service-Level-Agreements (SLAs) and multiple resource dimensions. The multiplicity and diversity of resources and applications are rendering administrative tasks more complex and challenging. This thesis aimed to develop a framework and techniques that would help substantially reduce data center management complexity. We specifically addressed two crucial data center operations. First, we precisely estimated capacity requirements of client virtual machines (VMs) while renting server space in cloud environment. Second, we proposed a systematic process to efficiently allocate physical resources to hosted VMs in a data center. To realize these dual objectives, accurately capturing the effects of resource allocations on application performance is vital. The benefits of accurate application performance modeling are multifold. Cloud users can size their VMs appropriately and pay only for the resources that they need; service providers can also offer a new charging model based on the VMs performance instead of their configured sizes. As a result, clients will pay exactly for the performance they are actually experiencing; on the other hand, administrators will be able to maximize their total revenue by utilizing application performance models and SLAs. This thesis made the following contributions. First, we identified resource control parameters crucial for distributing physical resources and characterizing contention for virtualized applications in a shared hosting environment. Second, we explored several modeling techniques and confirmed the suitability of two machine learning tools, Artificial Neural Network and Support Vector Machine, to accurately model the performance of virtualized applications. Moreover, we suggested and evaluated modeling optimizations necessary to improve prediction accuracy when using these modeling tools. Third, we presented an approach to optimal VM sizing by employing the performance models we created. Finally, we proposed a revenue-driven resource allocation algorithm which maximizes the SLA-generated revenue for a data center.
Resumo:
The semiarid region of northeastern Brazil, the Caatinga, is extremely important due to its biodiversity and endemism. Measurements of plant physiology are crucial to the calibration of Dynamic Global Vegetation Models (DGVMs) that are currently used to simulate the responses of vegetation in face of global changes. In a field work realized in an area of preserved Caatinga forest located in Petrolina, Pernambuco, measurements of carbon assimilation (in response to light and CO2) were performed on 11 individuals of Poincianella microphylla, a native species that is abundant in this region. These data were used to calibrate the maximum carboxylation velocity (Vcmax) used in the INLAND model. The calibration techniques used were Multiple Linear Regression (MLR), and data mining techniques as the Classification And Regression Tree (CART) and K-MEANS. The results were compared to the UNCALIBRATED model. It was found that simulated Gross Primary Productivity (GPP) reached 72% of observed GPP when using the calibrated Vcmax values, whereas the UNCALIBRATED approach accounted for 42% of observed GPP. Thus, this work shows the benefits of calibrating DGVMs using field ecophysiological measurements, especially in areas where field data is scarce or non-existent, such as in the Caatinga
Resumo:
Gastrointestinal stromal tumors (GIST) are the most common di tumors of the gastrointestinal tract, arising from the interstitial cells of Cajal (ICCs) or their precursors. The vast majority of GISTs (75–85% of GIST) harbor KIT or PDGFRA mutations. A small percentage of GIST (about 10‐15%) do not harbor any of these driver mutations and have historically been called wild-type (WT). Among them, from 20% to 40% show loss of function of the succinate dehydrogenase complex (SDH), also defined as SDH‐deficient GIST. SDH-deficient GISTs display distinctive clinical and pathological features, and can be sporadic or associated with Carney triad or Carney-Stratakis syndrome. These tumors arise most frequently in the stomach with predilection to distal stomach and antrum, have a multi-nodular growth, display a histological epithelioid phenotype, and present frequent lympho-vascular invasion. Occurrence of lymph node metastases and indolent course are representative features of SDH-deficient GISTs. This subset of GIST is known for the immunohistochemical loss of succinate dehydrogenase subunit B (SDHB), which signals the loss of function of the entire SDH-complex. The overall aim of my PhD project consists of the comprehensive characterization of SDH deficient GIST. Throughout the project, clinical, molecular and cellular characterizations were performed using next-generation sequencing technologies (NGS), that has the potential to allow the identification of molecular patterns useful for the diagnosis and development of novel treatments. Moreover, while there are many different cell lines and preclinical models of KIT/PDGFRA mutant GIST, no reliable cell model of SDH-deficient GIST has currently been developed, which could be used for studies on tumor evolution and in vitro assessments of drug response. Therefore, another aim of this project was to develop a pre-clinical model of SDH deficient GIST using the novel technology of induced pluripotent stem cells (iPSC).
Resumo:
Imaging technologies are widely used in application fields such as natural sciences, engineering, medicine, and life sciences. A broad class of imaging problems reduces to solve ill-posed inverse problems (IPs). Traditional strategies to solve these ill-posed IPs rely on variational regularization methods, which are based on minimization of suitable energies, and make use of knowledge about the image formation model (forward operator) and prior knowledge on the solution, but lack in incorporating knowledge directly from data. On the other hand, the more recent learned approaches can easily learn the intricate statistics of images depending on a large set of data, but do not have a systematic method for incorporating prior knowledge about the image formation model. The main purpose of this thesis is to discuss data-driven image reconstruction methods which combine the benefits of these two different reconstruction strategies for the solution of highly nonlinear ill-posed inverse problems. Mathematical formulation and numerical approaches for image IPs, including linear as well as strongly nonlinear problems are described. More specifically we address the Electrical impedance Tomography (EIT) reconstruction problem by unrolling the regularized Gauss-Newton method and integrating the regularization learned by a data-adaptive neural network. Furthermore we investigate the solution of non-linear ill-posed IPs introducing a deep-PnP framework that integrates the graph convolutional denoiser into the proximal Gauss-Newton method with a practical application to the EIT, a recently introduced promising imaging technique. Efficient algorithms are then applied to the solution of the limited electrods problem in EIT, combining compressive sensing techniques and deep learning strategies. Finally, a transformer-based neural network architecture is adapted to restore the noisy solution of the Computed Tomography problem recovered using the filtered back-projection method.
Resumo:
In this thesis, we investigate the role of applied physics in epidemiological surveillance through the application of mathematical models, network science and machine learning. The spread of a communicable disease depends on many biological, social, and health factors. The large masses of data available make it possible, on the one hand, to monitor the evolution and spread of pathogenic organisms; on the other hand, to study the behavior of people, their opinions and habits. Presented here are three lines of research in which an attempt was made to solve real epidemiological problems through data analysis and the use of statistical and mathematical models. In Chapter 1, we applied language-inspired Deep Learning models to transform influenza protein sequences into vectors encoding their information content. We then attempted to reconstruct the antigenic properties of different viral strains using regression models and to identify the mutations responsible for vaccine escape. In Chapter 2, we constructed a compartmental model to describe the spread of a bacterium within a hospital ward. The model was informed and validated on time series of clinical measurements, and a sensitivity analysis was used to assess the impact of different control measures. Finally (Chapter 3) we reconstructed the network of retweets among COVID-19 themed Twitter users in the early months of the SARS-CoV-2 pandemic. By means of community detection algorithms and centrality measures, we characterized users’ attention shifts in the network, showing that scientific communities, initially the most retweeted, lost influence over time to national political communities. In the Conclusion, we highlighted the importance of the work done in light of the main contemporary challenges for epidemiological surveillance. In particular, we present reflections on the importance of nowcasting and forecasting, the relationship between data and scientific research, and the need to unite the different scales of epidemiological surveillance.
Resumo:
Disconnectivity between the Default Mode Network (DMN) nodes can cause clinical symptoms and cognitive deficits in Alzheimer׳s disease (AD). We aimed to examine the structural connectivity between DMN nodes, to verify the extent in which white matter disconnection affects cognitive performance. MRI data of 76 subjects (25 mild AD, 21 amnestic Mild Cognitive Impairment subjects and 30 controls) were acquired on a 3.0T scanner. ExploreDTI software (fractional Anisotropy threshold=0.25 and the angular threshold=60°) calculated axial, radial, and mean diffusivities, fractional anisotropy and streamline count. AD patients showed lower fractional anisotropy (P=0.01) and streamline count (P=0.029), and higher radial diffusivity (P=0.014) than controls in the cingulum. After correction for white matter atrophy, only fractional anisotropy and radial diffusivity remained significantly lower in AD compared to controls (P=0.003 and P=0.05). In the parahippocampal bundle, AD patients had lower mean and radial diffusivities (P=0.048 and P=0.013) compared to controls, from which only radial diffusivity survived for white matter adjustment (P=0.05). Regression models revealed that cognitive performance is also accounted for by white matter microstructural values. Structural connectivity within the DMN is important to the execution of high-complexity tasks, probably due to its relevant role in the integration of the network.
Resumo:
The aim of this study was to comparatively assess dental arch width, in the canine and molar regions, by means of direct measurements from plaster models, photocopies and digitized images of the models. The sample consisted of 130 pairs of plaster models, photocopies and digitized images of the models of white patients (n = 65), both genders, with Class I and Class II Division 1 malocclusions, treated by standard Edgewise mechanics and extraction of the four first premolars. Maxillary and mandibular intercanine and intermolar widths were measured by a calibrated examiner, prior to and after orthodontic treatment, using the three modes of reproduction of the dental arches. Dispersion of the data relative to pre- and posttreatment intra-arch linear measurements (mm) was represented as box plots. The three measuring methods were compared by one-way ANOVA for repeated measurements (α = 0.05). Initial / final mean values varied as follows: 33.94 to 34.29 mm / 34.49 to 34.66 mm (maxillary intercanine width); 26.23 to 26.26 mm / 26.77 to 26.84 mm (mandibular intercanine width); 49.55 to 49.66 mm / 47.28 to 47.45 mm (maxillary intermolar width) and 43.28 to 43.41 mm / 40.29 to 40.46 mm (mandibular intermolar width). There were no statistically significant differences between mean dental arch widths estimated by the three studied methods, prior to and after orthodontic treatment. It may be concluded that photocopies and digitized images of the plaster models provided reliable reproductions of the dental arches for obtaining transversal intra-arch measurements.
Resumo:
The purpose of this study was to develop and validate equations to estimate the aboveground phytomass of a 30 years old plot of Atlantic Forest. In two plots of 100 m², a total of 82 trees were cut down at ground level. For each tree, height and diameter were measured. Leaves and woody material were separated in order to determine their fresh weights in field conditions. Samples of each fraction were oven dried at 80 °C to constant weight to determine their dry weight. Tree data were divided into two random samples. One sample was used for the development of the regression equations, and the other for validation. The models were developed using single linear regression analysis, where the dependent variable was the dry mass, and the independent variables were height (h), diameter (d) and d²h. The validation was carried out using Pearson correlation coefficient, paired t-Student test and standard error of estimation. The best equations to estimate aboveground phytomass were: lnDW = -3.068+2.522lnd (r² = 0.91; s y/x = 0.67) and lnDW = -3.676+0.951ln d²h (r² = 0.94; s y/x = 0.56).
Resumo:
Geographic Data Warehouses (GDW) are one of the main technologies used in decision-making processes and spatial analysis, and the literature proposes several conceptual and logical data models for GDW. However, little effort has been focused on studying how spatial data redundancy affects SOLAP (Spatial On-Line Analytical Processing) query performance over GDW. In this paper, we investigate this issue. Firstly, we compare redundant and non-redundant GDW schemas and conclude that redundancy is related to high performance losses. We also analyze the issue of indexing, aiming at improving SOLAP query performance on a redundant GDW. Comparisons of the SB-index approach, the star-join aided by R-tree and the star-join aided by GiST indicate that the SB-index significantly improves the elapsed time in query processing from 25% up to 99% with regard to SOLAP queries defined over the spatial predicates of intersection, enclosure and containment and applied to roll-up and drill-down operations. We also investigate the impact of the increase in data volume on the performance. The increase did not impair the performance of the SB-index, which highly improved the elapsed time in query processing. Performance tests also show that the SB-index is far more compact than the star-join, requiring only a small fraction of at most 0.20% of the volume. Moreover, we propose a specific enhancement of the SB-index to deal with spatial data redundancy. This enhancement improved performance from 80 to 91% for redundant GDW schemas.
Resumo:
In this work we report on a comparison of some theoretical models usually used to fit the dependence on temperature of the fundamental energy gap of semiconductor materials. We used in our investigations the theoretical models of Viña, Pässler-p and Pässler-ρ to fit several sets of experimental data, available in the literature for the energy gap of GaAs in the temperature range from 12 to 974 K. Performing several fittings for different values of the upper limit of the analyzed temperature range (Tmax), we were able to follow in a systematic way the evolution of the fitting parameters up to the limit of high temperatures and make a comparison between the zero-point values obtained from the different models by extrapolating the linear dependence of the gaps at high T to T = 0 K and that determined by the dependence of the gap on isotope mass. Using experimental data measured by absorption spectroscopy, we observed the non-linear behavior of Eg(T) of GaAs for T > ΘD.