908 resultados para predictive analytics
Resumo:
La Universidad EAFIT, en los últimos años, por medio de la realización de varias investigaciones, ha estado desarrollado una propuesta con la cual se busca definir los componentes tecnológicos que deben componer un ecosistema de aplicaciones educativas, con el fin de apalancar la adopción del modelo de ubicuidad en las instituciones de educación superior -- Por medio del grupo de investigación de desarrollo e innovación en Tecnologías de la Información y las Comunicaciones (GIDITIC) ha realizado la selección de los primeros componentes del ecosistema en trabajos de tesis de grado de anteriores investigaciones[1, 2] -- Adicionalmente, algunos trabajos realizados por el gobierno local de la Alcaldía de Medellín en su proyecto de Medellín Ciudad Inteligente[3], también realizó una selección de algunos componentes que son necesarios para la implementación del portal -- Ambas iniciativas coinciden en la inclusión de un componente de registro de actividades, conocido como \Sistema de almacenamiento de experiencias" (LRS) -- Dados estos antecedentes, se pretende realizar una implementación de un LRS que cumpla con los objetivos buscados en el proyecto de la Universidad, siguiendo estándares que permitan asegurar la interoperabilidad con los otros componentes del ecosistema de aplicaciones educativas
Resumo:
In today's fast-paced and interconnected digital world, the data generated by an increasing number of applications is being modeled as dynamic graphs. The graph structure encodes relationships among data items, while the structural changes to the graphs as well as the continuous stream of information produced by the entities in these graphs make them dynamic in nature. Examples include social networks where users post status updates, images, videos, etc.; phone call networks where nodes may send text messages or place phone calls; road traffic networks where the traffic behavior of the road segments changes constantly, and so on. There is a tremendous value in storing, managing, and analyzing such dynamic graphs and deriving meaningful insights in real-time. However, a majority of the work in graph analytics assumes a static setting, and there is a lack of systematic study of the various dynamic scenarios, the complexity they impose on the analysis tasks, and the challenges in building efficient systems that can support such tasks at a large scale. In this dissertation, I design a unified streaming graph data management framework, and develop prototype systems to support increasingly complex tasks on dynamic graphs. In the first part, I focus on the management and querying of distributed graph data. I develop a hybrid replication policy that monitors the read-write frequencies of the nodes to decide dynamically what data to replicate, and whether to do eager or lazy replication in order to minimize network communication and support low-latency querying. In the second part, I study parallel execution of continuous neighborhood-driven aggregates, where each node aggregates the information generated in its neighborhoods. I build my system around the notion of an aggregation overlay graph, a pre-compiled data structure that enables sharing of partial aggregates across different queries, and also allows partial pre-computation of the aggregates to minimize the query latencies and increase throughput. Finally, I extend the framework to support continuous detection and analysis of activity-based subgraphs, where subgraphs could be specified using both graph structure as well as activity conditions on the nodes. The query specification tasks in my system are expressed using a set of active structural primitives, which allows the query evaluator to use a set of novel optimization techniques, thereby achieving high throughput. Overall, in this dissertation, I define and investigate a set of novel tasks on dynamic graphs, design scalable optimization techniques, build prototype systems, and show the effectiveness of the proposed techniques through extensive evaluation using large-scale real and synthetic datasets.
Resumo:
Leafy greens are essential part of a healthy diet. Because of their health benefits, production and consumption of leafy greens has increased considerably in the U.S. in the last few decades. However, leafy greens are also associated with a large number of foodborne disease outbreaks in the last few years. The overall goal of this dissertation was to use the current knowledge of predictive models and available data to understand the growth, survival, and death of enteric pathogens in leafy greens at pre- and post-harvest levels. Temperature plays a major role in the growth and death of bacteria in foods. A growth-death model was developed for Salmonella and Listeria monocytogenes in leafy greens for varying temperature conditions typically encountered during supply chain. The developed growth-death models were validated using experimental dynamic time-temperature profiles available in the literature. Furthermore, these growth-death models for Salmonella and Listeria monocytogenes and a similar model for E. coli O157:H7 were used to predict the growth of these pathogens in leafy greens during transportation without temperature control. Refrigeration of leafy greens meets the purposes of increasing their shelf-life and mitigating the bacterial growth, but at the same time, storage of foods at lower temperature increases the storage cost. Nonlinear programming was used to optimize the storage temperature of leafy greens during supply chain while minimizing the storage cost and maintaining the desired levels of sensory quality and microbial safety. Most of the outbreaks associated with consumption of leafy greens contaminated with E. coli O157:H7 have occurred during July-November in the U.S. A dynamic system model consisting of subsystems and inputs (soil, irrigation, cattle, wildlife, and rainfall) simulating a farm in a major leafy greens producing area in California was developed. The model was simulated incorporating the events of planting, irrigation, harvesting, ground preparation for the new crop, contamination of soil and plants, and survival of E. coli O157:H7. The predictions of this system model are in agreement with the seasonality of outbreaks. This dissertation utilized the growth, survival, and death models of enteric pathogens in leafy greens during production and supply chain.
Resumo:
Several models have been studied on predictive epidemics of arthropod vectored plant viruses in an attempt to bring understanding to the complex but specific relationship between the three cornered pathosystem (virus, vector and host plant), as well as their interactions with the environment. A large body of studies mainly focuses on weather based models as management tool for monitoring pests and diseases, with very few incorporating the contribution of vector's life processes in the disease dynamics, which is an essential aspect when mitigating virus incidences in a crop stand. In this study, we hypothesized that the multiplication and spread of tomato spotted wilt virus (TSWV) in a crop stand is strongly related to its influences on Frankliniella occidentalis preferential behavior and life expectancy. Model dynamics of important aspects in disease development within TSWV-F. occidentalis-host plant interactions were developed, focusing on F. occidentalis' life processes as influenced by TSWV. The results show that the influence of TSWV on F. occidentalis preferential behaviour leads to an estimated increase in relative acquisition rate of the virus, and up to 33% increase in transmission rate to healthy plants. Also, increased life expectancy; which relates to improved fitness, is dependent on the virus induced preferential behaviour, consequently promoting multiplication and spread of the virus in a crop stand. The development of vector-based models could further help in elucidating the role of tri-trophic interactions in agricultural disease systems. Use of the model to examine the components of the disease process could also boost our understanding on how specific epidemiological characteristics interact to cause diseases in crops. With this level of understanding we can efficiently develop more precise control strategies for the virus and the vector.
Resumo:
As usage metrics continue to attain an increasingly central role in library system assessment and analysis, librarians tasked with system selection, implementation, and support are driven to identify metric approaches that simultaneously require less technical complexity and greater levels of data granularity. Such approaches allow systems librarians to present evidence-based claims of platform usage behaviors while reducing the resources necessary to collect such information, thereby representing a novel approach to real-time user analysis as well as dual benefit in active and preventative cost reduction. As part of the DSpace implementation for the MD SOAR initiative, the Consortial Library Application Support (CLAS) division has begun test implementation of the Google Tag Manager analytic system in an attempt to collect custom analytical dimensions to track author- and university-specific download behaviors. Building on the work of Conrad , CLAS seeks to demonstrate that the GTM approach to custom analytics provides both granular metadata-based usage statistics in an approach that will prove extensible for additional statistical gathering in the future. This poster will discuss the methodology used to develop these custom tag approaches, the benefits of using the GTM model, and the risks and benefits associated with further implementation.
Resumo:
Objective: The aim of the study is to examine the distribution of integrated covariate and its association with blood pressure (BP) among children in Anhui province, China, and assess the predictive value of integrated covariate to children hypertension. Methods: A total of 2,828 subjects (1,588 male and 1,240 female) aged 7-17 years participated in this study. Height, weight, waistline, hipline and BP of all subjects were measured, obesity and overweight were defined by an international standard, specifying the measurement, the reference population, and the age and sex specific cut off points. High BP status was defined as systolic blood pressure (SBP) and/or diastolic blood pressure (DBP) > 95th percentile for age and gender. Results: Our results revealed that the prevalence of children hypertension was 11.03%, the SBP and DBP of obesity group were significantly higher than that of normal group. Anthropometric obesity indices such as body mass index (BMI) were positively correlated with SBP and DBP. Integrated covariate had a better performance than the single covariate in the receiver-operating characteristic (ROC) curve, the cut-off value; the sensitivity and the specificity of the integrated covariate were 0.112, 0.577, 0.683, respectively. Conclusion: Integrated covariate is a simple and effective anthropometric index to identify childhood hypertension.
Resumo:
Sequences of timestamped events are currently being generated across nearly every domain of data analytics, from e-commerce web logging to electronic health records used by doctors and medical researchers. Every day, this data type is reviewed by humans who apply statistical tests, hoping to learn everything they can about how these processes work, why they break, and how they can be improved upon. To further uncover how these processes work the way they do, researchers often compare two groups, or cohorts, of event sequences to find the differences and similarities between outcomes and processes. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two cohorts of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event). Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome. Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. Visual analytics tools leverage humans' ability to easily see patterns and anomalies that they were not expecting, but is limited by uncertainty in findings. Statistical tools emphasize finding significant differences in the data, but often requires researchers have a concrete question and doesn't facilitate more general exploration of the data. Combining visual analytics tools with statistical methods leverages the benefits of both approaches for quicker and easier insight discovery. Integrating statistics into a visualization tool presents many challenges on the frontend (e.g., displaying the results of many different metrics concisely) and in the backend (e.g., scalability challenges with running various metrics on multi-dimensional data at once). I begin by exploring the problem of comparing cohorts of event sequences and understanding the questions that analysts commonly ask in this task. From there, I demonstrate that combining automated statistics with an interactive user interface amplifies the benefits of both types of tools, thereby enabling analysts to conduct quicker and easier data exploration, hypothesis generation, and insight discovery. The direct contributions of this dissertation are: (1) a taxonomy of metrics for comparing cohorts of temporal event sequences, (2) a statistical framework for exploratory data analysis with a method I refer to as high-volume hypothesis testing (HVHT), (3) a family of visualizations and guidelines for interaction techniques that are useful for understanding and parsing the results, and (4) a user study, five long-term case studies, and five short-term case studies which demonstrate the utility and impact of these methods in various domains: four in the medical domain, one in web log analysis, two in education, and one each in social networks, sports analytics, and security. My dissertation contributes an understanding of how cohorts of temporal event sequences are commonly compared and the difficulties associated with applying and parsing the results of these metrics. It also contributes a set of visualizations, algorithms, and design guidelines for balancing automated statistics with user-driven analysis to guide users to significant, distinguishing features between cohorts. This work opens avenues for future research in comparing two or more groups of temporal event sequences, opening traditional machine learning and data mining techniques to user interaction, and extending the principles found in this dissertation to data types beyond temporal event sequences.
Resumo:
Supply chains are ubiquitous in any commercial delivery systems. The exchange of goods and services, from different supply points to distinct destinations scattered along a given geographical area, requires the management of stocks and vehicles fleets in order to minimize costs while maintaining good quality services. Even if the operating conditions remain constant over a given time horizon, managing a supply chain is a very complex task. Its complexity increases exponentially with both the number of network nodes and the dynamical operational changes. Moreover, the management system must be adaptive in order to easily cope with several disturbances such as machinery and vehicles breakdowns or changes in demand. This work proposes the use of a model predictive control paradigm in order to tackle the above referred issues. The obtained simulation results suggest that this strategy promotes an easy tasks rescheduling in case of disturbances or anticipated changes in operating conditions. © Springer International Publishing Switzerland 2017
Resumo:
We investigate key characteristics of Ca²⁺ puffs in deterministic and stochastic frameworks that all incorporate the cellular morphology of IP[subscript]3 receptor channel clusters. In a first step, we numerically study Ca²⁺ liberation in a three dimensional representation of a cluster environment with reaction-diffusion dynamics in both the cytosol and the lumen. These simulations reveal that Ca²⁺ concentrations at a releasing cluster range from 80 µM to 170 µM and equilibrate almost instantaneously on the time scale of the release duration. These highly elevated Ca²⁺ concentrations eliminate Ca²⁺ oscillations in a deterministic model of an IP[subscript]3R channel cluster at physiological parameter values as revealed by a linear stability analysis. The reason lies in the saturation of all feedback processes in the IP[subscript]3R gating dynamics, so that only fluctuations can restore experimentally observed Ca²⁺ oscillations. In this spirit, we derive master equations that allow us to analytically quantify the onset of Ca²⁺ puffs and hence the stochastic time scale of intracellular Ca²⁺ dynamics. Moving up the spatial scale, we suggest to formulate cellular dynamics in terms of waiting time distribution functions. This approach prevents the state space explosion that is typical for the description of cellular dynamics based on channel states and still contains information on molecular fluctuations. We illustrate this method by studying global Ca²⁺ oscillations.
Resumo:
This study investigated the rate of human papillomavirus (HPV) persistence, associated risk factors, and predictors of cytological alteration outcomes in a cohort of human immunodeficiency virus-infected pregnant women over an 18-month period. HPV was typed through L1 gene sequencing in cervical smears collected during gestation and at 12 months after delivery. Outcomes were defined as nonpersistence (clearance of the HPV in the 2nd sample), re-infection (detection of different types of HPV in the 2 samples), and type-specific HPV persistence (the same HPV type found in both samples). An unfavourable cytological outcome was considered when the second exam showed progression to squamous intraepithelial lesion or high squamous intraepithelial lesion. Ninety patients were studied. HPV DNA persistence occurred in 50% of the cases composed of type-specific persistence (30%) or re-infection (20%). A low CD4+ T-cell count at entry was a risk factor for type-specific, re-infection, or HPV DNA persistence. The odds ratio (OR) was almost three times higher in the type-specific group when compared with the re-infection group (OR = 2.8; 95% confidence interval: 0.43-22.79). Our findings show that bonafide (type-specific) HPV persistence is a stronger predictor for the development of cytological abnormalities, highlighting the need for HPV typing as opposed to HPV DNA testing in the clinical setting.
Resumo:
Assessing the fit of a model is an important final step in any statistical analysis, but this is not straightforward when complex discrete response models are used. Cross validation and posterior predictions have been suggested as methods to aid model criticism. In this paper a comparison is made between four methods of model predictive assessment in the context of a three level logistic regression model for clinical mastitis in dairy cattle; cross validation, a prediction using the full posterior predictive distribution and two “mixed” predictive methods that incorporate higher level random effects simulated from the underlying model distribution. Cross validation is considered a gold standard method but is computationally intensive and thus a comparison is made between posterior predictive assessments and cross validation. The analyses revealed that mixed prediction methods produced results close to cross validation whilst the full posterior predictive assessment gave predictions that were over-optimistic (closer to the observed disease rates) compared with cross validation. A mixed prediction method that simulated random effects from both higher levels was best at identifying the outlying level two (farm-year) units of interest. It is concluded that this mixed prediction method, simulating random effects from both higher levels, is straightforward and may be of value in model criticism of multilevel logistic regression, a technique commonly used for animal health data with a hierarchical structure.
Resumo:
Dissertação de Mestrado, Engenharia Electrónica e Telecomunicações, Faculdade de Ciências e Tecnologia, Universidade do Algarve, 2015
Resumo:
Genomic selection (GS) has been used to compute genomic estimated breeding values (GEBV) of individuals; however, it has only been applied to animal and major plant crops due to high costs. Besides, breeding and selection is performed at the family level in some crops. We aimed to study the implementation of genome-wide family selection (GWFS) in two loblolly pine (Pinus taeda L.) populations: i) the breeding population CCLONES composed of 63 families (5-20 individuals per family), phenotyped for four traits (stem diameter, stem rust susceptibility, tree stiffness and lignin content) and genotyped using an Illumina Infinium assay with 4740 polymorphic SNPs, and ii) a simulated population that reproduced the same pedigree as CCLONES, 5000 polymorphic loci and two traits (oligogenic and polygenic). In both populations, phenotypic and genotypic data was pooled at the family level in silico. Phenotypes were averaged across replicates for all the individuals and allele frequency was computed for each SNP. Marker effects were estimated at the individual (GEBV) and family (GEFV) levels with Bayes-B using the package BGLR in R and models were validated using 10-fold cross validations. Predicted ability, computed by correlating phenotypes with GEBV and GEFV, was always higher for GEFV in both populations, even after standardizing GEFV predictions to be comparable to GEBV. Results revealed great potential for using GWFS in breeding programs that select families, such as most outbreeding forage species. A significant drop in genotyping costs as one sample per family is needed would allow the application of GWFS in minor crops.
Resumo:
Recently, the interest of the automotive market for hybrid vehicles has increased due to the more restrictive pollutants emissions legislation and to the necessity of decreasing the fossil fuel consumption, since such solution allows a consistent improvement of the vehicle global efficiency. The term hybridization regards the energy flow in the powertrain of a vehicle: a standard vehicle has, usually, only one energy source and one energy tank; instead, a hybrid vehicle has at least two energy sources. In most cases, the prime mover is an internal combustion engine (ICE) while the auxiliary energy source can be mechanical, electrical, pneumatic or hydraulic. It is expected from the control unit of a hybrid vehicle the use of the ICE in high efficiency working zones and to shut it down when it is more convenient, while using the EMG at partial loads and as a fast torque response during transients. However, the battery state of charge may represent a limitation for such a strategy. That’s the reason why, in most cases, energy management strategies are based on the State Of Charge, or SOC, control. Several studies have been conducted on this topic and many different approaches have been illustrated. The purpose of this dissertation is to develop an online (usable on-board) control strategy in which the operating modes are defined using an instantaneous optimization method that minimizes the equivalent fuel consumption of a hybrid electric vehicle. The equivalent fuel consumption is calculated by taking into account the total energy used by the hybrid powertrain during the propulsion phases. The first section presents the hybrid vehicles characteristics. The second chapter describes the global model, with a particular focus on the energy management strategies usable for the supervisory control of such a powertrain. The third chapter shows the performance of the implemented controller on a NEDC cycle compared with the one obtained with the original control strategy.