942 resultados para Data Driven Modeling


Relevância:

80.00% 80.00%

Publicador:

Resumo:

In longitudinal data analysis, our primary interest is in the regression parameters for the marginal expectations of the longitudinal responses; the longitudinal correlation parameters are of secondary interest. The joint likelihood function for longitudinal data is challenging, particularly for correlated discrete outcome data. Marginal modeling approaches such as generalized estimating equations (GEEs) have received much attention in the context of longitudinal regression. These methods are based on the estimates of the first two moments of the data and the working correlation structure. The confidence regions and hypothesis tests are based on the asymptotic normality. The methods are sensitive to misspecification of the variance function and the working correlation structure. Because of such misspecifications, the estimates can be inefficient and inconsistent, and inference may give incorrect results. To overcome this problem, we propose an empirical likelihood (EL) procedure based on a set of estimating equations for the parameter of interest and discuss its characteristics and asymptotic properties. We also provide an algorithm based on EL principles for the estimation of the regression parameters and the construction of a confidence region for the parameter of interest. We extend our approach to variable selection for highdimensional longitudinal data with many covariates. In this situation it is necessary to identify a submodel that adequately represents the data. Including redundant variables may impact the model’s accuracy and efficiency for inference. We propose a penalized empirical likelihood (PEL) variable selection based on GEEs; the variable selection and the estimation of the coefficients are carried out simultaneously. We discuss its characteristics and asymptotic properties, and present an algorithm for optimizing PEL. Simulation studies show that when the model assumptions are correct, our method performs as well as existing methods, and when the model is misspecified, it has clear advantages. We have applied the method to two case examples.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper will explore a data-driven approach called Sales Resource Management (SRM) that can provide real insight into sales management. The DSMT (Diagnosis, Strategy, Metrics and Tools) framework can be used to solve field sales management challenges. This paper focus on the 6P's strategy of SRM and illustrates how to use them to solve the CAPS (Concentration, Attrition, Performance and Spend) challenges. © 2010 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Research endeavors on spoken dialogue systems in the 1990s and 2000s have led to the deployment of commercial spoken dialogue systems (SDS) in microdomains such as customer service automation, reservation/booking and question answering systems. Recent research in SDS has been focused on the development of applications in different domains (e.g. virtual counseling, personal coaches, social companions) which requires more sophistication than the previous generation of commercial SDS. The focus of this research project is the delivery of behavior change interventions based on the brief intervention counseling style via spoken dialogue systems. Brief interventions (BI) are evidence-based, short, well structured, one-on-one counseling sessions. Many challenges are involved in delivering BIs to people in need, such as finding the time to administer them in busy doctors' offices, obtaining the extra training that helps staff become comfortable providing these interventions, and managing the cost of delivering the interventions. Fortunately, recent developments in spoken dialogue systems make the development of systems that can deliver brief interventions possible. The overall objective of this research is to develop a data-driven, adaptable dialogue system for brief interventions for problematic drinking behavior, based on reinforcement learning methods. The implications of this research project includes, but are not limited to, assessing the feasibility of delivering structured brief health interventions with a data-driven spoken dialogue system. Furthermore, while the experimental system focuses on harmful alcohol drinking as a target behavior in this project, the produced knowledge and experience may also lead to implementation of similarly structured health interventions and assessments other than the alcohol domain (e.g. obesity, drug use, lack of exercise), using statistical machine learning approaches. In addition to designing a dialog system, the semantic and emotional meanings of user utterances have high impact on interaction. To perform domain specific reasoning and recognize concepts in user utterances, a named-entity recognizer and an ontology are designed and evaluated. To understand affective information conveyed through text, lexicons and sentiment analysis module are developed and tested.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An investigation into karst hazard in southern Ontario has been undertaken with the intention of leading to the development of predictive karst models for this region. The reason these are not currently feasible is a lack of sufficient karst data, though this is not entirely due to the lack of karst features. Geophysical data was collected at Lake on the Mountain, Ontario as part of this karst investigation. This data was collected in order to validate the long-standing hypothesis that Lake on the Mountain was formed from a sinkhole collapse. Sub-bottom acoustic profiling data was collected in order to image the lake bottom sediments and bedrock. Vertical bedrock features interpreted as solutionally enlarged fractures were taken as evidence for karst processes on the lake bottom. Additionally, the bedrock topography shows a narrower and more elongated basin than was previously identified, and this also lies parallel to a mapped fault system in the area. This suggests that Lake on the Mountain was formed over a fault zone which also supports the sinkhole hypothesis as it would provide groundwater pathways for karst dissolution to occur. Previous sediment cores suggest that Lake on the Mountain would have formed at some point during the Wisconsinan glaciation with glacial meltwater and glacial loading as potential contributing factors to sinkhole development. A probabilistic karst model for the state of Kentucky, USA, has been generated using the Weights of Evidence method. This model is presented as an example of the predictive capabilities of these kind of data-driven modelling techniques and to show how such models could be applied to karst in Ontario. The model was able to classify 70% of the validation dataset correctly while minimizing false positive identifications. This is moderately successful and could stand to be improved. Finally, suggestions to improving the current karst model of southern Ontario are suggested with the goal of increasing investigation into karst in Ontario and streamlining the reporting system for sinkholes, caves, and other karst features so as to improve the current Ontario karst database.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Quantitative methods can help us understand how underlying attributes contribute to movement patterns. Applying principal components analysis (PCA) to whole-body motion data may provide an objective data-driven method to identify unique and statistically important movement patterns. Therefore, the primary purpose of this study was to determine if athletes’ movement patterns can be differentiated based on skill level or sport played using PCA. Motion capture data from 542 athletes performing three sport-screening movements (i.e. bird-dog, drop jump, T-balance) were analyzed. A PCA-based pattern recognition technique was used to analyze the data. Prior to analyzing the effects of skill level or sport on movement patterns, methodological considerations related to motion analysis reference coordinate system were assessed. All analyses were addressed as case-studies. For the first case study, referencing motion data to a global (lab-based) coordinate system compared to a local (segment-based) coordinate system affected the ability to interpret important movement features. Furthermore, for the second case study, where the interpretability of PCs was assessed when data were referenced to a stationary versus a moving segment-based coordinate system, PCs were more interpretable when data were referenced to a stationary coordinate system for both the bird-dog and T-balance task. As a result of the findings from case study 1 and 2, only stationary segment-based coordinate systems were used in cases 3 and 4. During the bird-dog task, elite athletes had significantly lower scores compared to recreational athletes for principal component (PC) 1. For the T-balance movement, elite athletes had significantly lower scores compared to recreational athletes for PC 2. In both analyses the lower scores in elite athletes represented a greater range of motion. Finally, case study 4 reported differences in athletes’ movement patterns who competed in different sports, and significant differences in technique were detected during the bird-dog task. Through these case studies, this thesis highlights the feasibility of applying PCA as a movement pattern recognition technique in athletes. Future research can build on this proof-of-principle work to develop robust quantitative methods to help us better understand how underlying attributes (e.g. height, sex, ability, injury history, training type) contribute to performance.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Este artículo sugiere un enfoque nuevo a la enseñanza de las dos estructuras gramaticales la pasiva refleja y el “se” impersonal para las clases universitarias de E/LE. Concretamente, se argumenta que las dos se deberían tratar como construcciones pasivas, basada en un análisis léxico-funcional de ellas que enfoca la lingüística contrastiva. Incluso para la instrucción de E/LE, se recomienda una aproximación contrastiva en la que se enfocan tanto la reflexión metalingüística como la competencia del estudiante en el L2. Específicamente, el uso de córpora lingüísticos en la clase forma una parte integral de la instrucción. El uso de un corpus estimula la curiosidad del estudiante, le expone a material de lengua auténtica, y promulga la reflexión inductiva independiente.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Video games have become one of the largest entertainment industries, and their power to capture the attention of players worldwide soon prompted the idea of using games to improve education. However, these educational games, commonly referred to as serious games, face different challenges when brought into the classroom, ranging from pragmatic issues (e.g. a high development cost) to deeper educational issues, including a lack of understanding of how the students interact with the games and how the learning process actually occurs. This chapter explores the potential of data-driven approaches to improve the practical applicability of serious games. Existing work done by the entertainment and learning industries helps to build a conceptual model of the tasks required to analyze player interactions in serious games (gaming learning analytics or GLA). The chapter also describes the main ongoing initiatives to create reference GLA infrastructures and their connection to new emerging specifications from the educational technology field. Finally, it explores how this data-driven GLA will help in the development of a new generation of more effective educational games and new business models that will support their expansion. This results in additional ethical implications, which are discussed at the end of the chapter.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Microsecond long Molecular Dynamics (MD) trajectories of biomolecular processes are now possible due to advances in computer technology. Soon, trajectories long enough to probe dynamics over many milliseconds will become available. Since these timescales match the physiological timescales over which many small proteins fold, all atom MD simulations of protein folding are now becoming popular. To distill features of such large folding trajectories, we must develop methods that can both compress trajectory data to enable visualization, and that can yield themselves to further analysis, such as the finding of collective coordinates and reduction of the dynamics. Conventionally, clustering has been the most popular MD trajectory analysis technique, followed by principal component analysis (PCA). Simple clustering used in MD trajectory analysis suffers from various serious drawbacks, namely, (i) it is not data driven, (ii) it is unstable to noise and change in cutoff parameters, and (iii) since it does not take into account interrelationships amongst data points, the separation of data into clusters can often be artificial. Usually, partitions generated by clustering techniques are validated visually, but such validation is not possible for MD trajectories of protein folding, as the underlying structural transitions are not well understood. Rigorous cluster validation techniques may be adapted, but it is more crucial to reduce the dimensions in which MD trajectories reside, while still preserving their salient features. PCA has often been used for dimension reduction and while it is computationally inexpensive, being a linear method, it does not achieve good data compression. In this thesis, I propose a different method, a nonmetric multidimensional scaling (nMDS) technique, which achieves superior data compression by virtue of being nonlinear, and also provides a clear insight into the structural processes underlying MD trajectories. I illustrate the capabilities of nMDS by analyzing three complete villin headpiece folding and six norleucine mutant (NLE) folding trajectories simulated by Freddolino and Schulten [1]. Using these trajectories, I make comparisons between nMDS, PCA and clustering to demonstrate the superiority of nMDS. The three villin headpiece trajectories showed great structural heterogeneity. Apart from a few trivial features like early formation of secondary structure, no commonalities between trajectories were found. There were no units of residues or atoms found moving in concert across the trajectories. A flipping transition, corresponding to the flipping of helix 1 relative to the plane formed by helices 2 and 3 was observed towards the end of the folding process in all trajectories, when nearly all native contacts had been formed. However, the transition occurred through a different series of steps in all trajectories, indicating that it may not be a common transition in villin folding. The trajectories showed competition between local structure formation/hydrophobic collapse and global structure formation in all trajectories. Our analysis on the NLE trajectories confirms the notion that a tight hydrophobic core inhibits correct 3-D rearrangement. Only one of the six NLE trajectories folded, and it showed no flipping transition. All the other trajectories get trapped in hydrophobically collapsed states. The NLE residues were found to be buried deeply into the core, compared to the corresponding lysines in the villin headpiece, thereby making the core tighter and harder to undo for 3-D rearrangement. Our results suggest that the NLE may not be a fast folder as experiments suggest. The tightness of the hydrophobic core may be a very important factor in the folding of larger proteins. It is likely that chaperones like GroEL act to undo the tight hydrophobic core of proteins, after most secondary structure elements have been formed, so that global rearrangement is easier. I conclude by presenting facts about chaperone-protein complexes and propose further directions for the study of protein folding.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This study positioned the federal No Child Left Behind (NCLB) Act of 2002 as a reified colonizing entity, inscribing its hegemonic authority upon the professional identity and work of school principals within their school communities of practice. Pressure on educators and students intensifies each year as the benchmark for Adequate Yearly Progress under the NCLB policy is raised, resulting in standards-based reform, scripted curriculum and pedagogy, absence of elective subjects, and a general lack of autonomy critical to the work of teachers as they approach each unique class and student (Crocco & Costigan, 2007; Mabry & Margolis, 2006). Emphasis on high stakes standardized testing as the indicator for student achievement (Popham, 2005) affects educators’ professional identity through dramatic pedagological and structural changes in schools (Day, Flores, & Viana, 2007). These dramatic changes to the ways our nation conducts schooling must be understood and thought about critically from school leaders’ perspectives as their professional identity is influenced by large scale NCLB school reform. The author explored the impact No Child Left Behind reform had on the professional identity of fourteen, veteran Illinois principals leading in urban, small urban, suburban, and rural middle and elementary schools. Qualitative data were collected during semi-structured interviews and focus groups and analyzed using a dual theoretical framework of postcolonial and identity theories. Postcolonial theory provided a lens from which the author applied a metaphor of colonization to principals’ experiences as colonized-colonizers in a time of school reform. Principal interview data illustrated many examples of NCLB as a colonizing authority having a significant impact on the professional identity of school leaders. This framework was used to interpret data in a unique and alternative way and contributed to the need to better understand the ways school leaders respond to district-level, state-level, and national-level accountability policies (Sloan, 2000). Identity theory situated principals as professionals shaped by the communities of practice in which they lead. Principals’ professional identity has become more data-driven as a result of NCLB and their role as instructional leaders has intensified. The data showed that NCLB has changed the work and professional identity of principals in terms of use of data, classroom instruction, Response to Intervention, and staffing changes. Although NCLB defines success in terms of meeting or exceeding the benchmark for Adequate Yearly Progress, principals’ view AYP as only one measurement of their success. The need to meet the benchmark for AYP is a present reality that necessitates school-wide attention to reading and math achievement. At this time, principals leading in affluent, somewhat homogeneous schools typically experience less pressure and more power under NCLB and are more often labeled “successful” school communities. In contrast, principals leading in schools with more heterogeneity experience more pressure and lack of power under NCLB and are more often labeled “failing” school communities. Implications from this study for practitioners and policymakers include a need to reexamine the intents and outcomes of the policy for all school communities, especially in terms of power and voice. Recommendations for policy reform include moving to a growth model with multi-year assessments that make sense for individual students rather than one standardized test score as the measure for achievement. Overall, the study reveals enhancements and constraints NCLB policy has caused in a variety of school contexts, which have affected the professional identity of school leaders.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Although tyrosine kinase inhibitors (TKIs) such as imatinib have transformed chronic myelogenous leukemia (CML) into a chronic condition, these therapies are not curative in the majority of cases. Most patients must continue TKI therapy indefinitely, a requirement that is both expensive and that compromises a patient's quality of life. While TKIs are known to reduce leukemic cells' proliferative capacity and to induce apoptosis, their effects on leukemic stem cells, the immune system, and the microenvironment are not fully understood. A more complete understanding of their global therapeutic effects would help us to identify any limitations of TKI monotherapy and to address these issues through novel combination therapies. Mathematical models are a complementary tool to experimental and clinical data that can provide valuable insights into the underlying mechanisms of TKI therapy. Previous modeling efforts have focused on CML patients who show biphasic and triphasic exponential declines in BCR-ABL ratio during therapy. However, our patient data indicates that many patients treated with TKIs show fluctuations in BCR-ABL ratio yet are able to achieve durable remissions. To investigate these fluctuations, we construct a mathematical model that integrates CML with a patient's autologous immune response to the disease. In our model, we define an immune window, which is an intermediate range of leukemic concentrations that lead to an effective immune response against CML. While small leukemic concentrations provide insufficient stimulus, large leukemic concentrations actively suppress a patient's immune system, thus limiting it's ability to respond. Our patient data and modeling results suggest that at diagnosis, a patient's high leukemic concentration is able to suppress their immune system. TKI therapy drives the leukemic population into the immune window, allowing the patient's immune cells to expand and eventually mount an efficient response against the residual CML. This response drives the leukemic population below the immune window, causing the immune population to contract and allowing the leukemia to partially recover. The leukemia eventually reenters the immune window, thus stimulating a sequence of weaker immune responses as the two populations approach equilibrium. We hypothesize that a patient's autologous immune response to CML may explain the fluctuations in BCR-ABL ratio that are regularly seen during TKI therapy. These fluctuations may serve as a signature of a patient's individual immune response to CML. By applying our modeling framework to patient data, we are able to construct an immune profile that can then be used to propose patient-specific combination therapies aimed at further reducing a patient's leukemic burden. Our characterization of a patient's anti-leukemia immune response may be especially valuable in the study of drug resistance, treatment cessation, and combination therapy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With the development of variable-data-driven digital presses - where each document printed is potentially unique - there is a need for pre-press optimization to identify material that is invariant from document to document. In this way rasterisation can be confined solely to those areas which change between successive documents thereby alleviating a potential performance bottleneck. Given a template document specified in terms of layout functions, where actual data is bound at the last possible moment before printing, we look at deriving and exploiting the invariant properties of layout functions from their formal specifications. We propose future work on generic extraction of invariance from such properties for certain classes of layout functions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This work is aimed at understanding and unifying information on epidemiological modelling methods and how those methods relate to public policy addressing human health, specifically in the context of infectious disease prevention, pandemic planning, and health behaviour change. This thesis employs multiple qualitative and quantitative methods, and presents as a manuscript of several individual, data-driven projects that are combined in a narrative arc. The first chapter introduces the scope and complexity of this interdisciplinary undertaking, describing several topical intersections of importance. The second chapter begins the presentation of original data, and describes in detail two exercises in computational epidemiological modelling pertinent to pandemic influenza planning and policy, and progresses in the next chapter to present additional original data on how the confidence of the public in modelling methodology may have an effect on their planned health behaviour change as recommended in public health policy. The thesis narrative continues in the final data-driven chapter to describe how health policymakers use modelling methods and scientific evidence to inform and construct health policies for the prevention of infectious diseases, and concludes with a narrative chapter that evaluates the breadth of this data and recommends strategies for the optimal use of modelling methodologies when informing public health policy in applied public health scenarios.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Denitrification is a microbially-mediated process that converts nitrate (NO3-) to dinitrogen (N2) gas and has implications for soil fertility, climate change, and water quality. Using PCR, qPCR, and T-RFLP, the effects of environmental drivers and land management on the abundance and composition of functional genes were investigated. Environmental variables affecting gene abundance were soil type, soil depth, nitrogen concentrations, soil moisture, and pH, although each gene was unique in its spatial distribution and controlling factors. The inclusion of microbial variables, specifically genotype and gene abundance, improved denitrification models and highlights the benefit of including microbial data in modeling denitrification. Along with some evidence of niche selection, I show that nirS is a good predictor of denitrification enzyme activity (DEA) and N2O:N2 ratio, especially in alkaline and wetland soils. nirK was correlated to N2O production and became a stronger predictor of DEA in acidic soils, indicating that nirK and nirS are not ecologically redundant.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

International audience