6 resultados para Google Analytics
em Duke University
Resumo:
Cumulon is a system aimed at simplifying the development and deployment of statistical analysis of big data in public clouds. Cumulon allows users to program in their familiar language of matrices and linear algebra, without worrying about how to map data and computation to specific hardware and cloud software platforms. Given user-specified requirements in terms of time, monetary cost, and risk tolerance, Cumulon automatically makes intelligent decisions on implementation alternatives, execution parameters, as well as hardware provisioning and configuration settings -- such as what type of machines and how many of them to acquire. Cumulon also supports clouds with auction-based markets: it effectively utilizes computing resources whose availability varies according to market conditions, and suggests best bidding strategies for them. Cumulon explores two alternative approaches toward supporting such markets, with different trade-offs between system and optimization complexity. Experimental study is conducted to show the efficiency of Cumulon's execution engine, as well as the optimizer's effectiveness in finding the optimal plan in the vast plan space.
Resumo:
BACKGROUND: Writing plays a central role in the communication of scientific ideas and is therefore a key aspect in researcher education, ultimately determining the success and long-term sustainability of their careers. Despite the growing popularity of e-learning, we are not aware of any existing study comparing on-line vs. traditional classroom-based methods for teaching scientific writing. METHODS: Forty eight participants from a medical, nursing and physiotherapy background from US and Brazil were randomly assigned to two groups (n = 24 per group): An on-line writing workshop group (on-line group), in which participants used virtual communication, google docs and standard writing templates, and a standard writing guidance training (standard group) where participants received standard instruction without the aid of virtual communication and writing templates. Two outcomes, manuscript quality was assessed using the scores obtained in Six subgroup analysis scale as the primary outcome measure, and satisfaction scores with Likert scale were evaluated. To control for observer variability, inter-observer reliability was assessed using Fleiss's kappa. A post-hoc analysis comparing rates of communication between mentors and participants was performed. Nonparametric tests were used to assess intervention efficacy. RESULTS: Excellent inter-observer reliability among three reviewers was found, with an Intraclass Correlation Coefficient (ICC) agreement = 0.931882 and ICC consistency = 0.932485. On-line group had better overall manuscript quality (p = 0.0017, SSQSavg score 75.3 +/- 14.21, ranging from 37 to 94) compared to the standard group (47.27 +/- 14.64, ranging from 20 to 72). Participant satisfaction was higher in the on-line group (4.3 +/- 0.73) compared to the standard group (3.09 +/- 1.11) (p = 0.001). The standard group also had fewer communication events compared to the on-line group (0.91 +/- 0.81 vs. 2.05 +/- 1.23; p = 0.0219). CONCLUSION: Our protocol for on-line scientific writing instruction is better than standard face-to-face instruction in terms of writing quality and student satisfaction. Future studies should evaluate the protocol efficacy in larger longitudinal cohorts involving participants from different languages.
Resumo:
An enterprise information system (EIS) is an integrated data-applications platform characterized by diverse, heterogeneous, and distributed data sources. For many enterprises, a number of business processes still depend heavily on static rule-based methods and extensive human expertise. Enterprises are faced with the need for optimizing operation scheduling, improving resource utilization, discovering useful knowledge, and making data-driven decisions.
This thesis research is focused on real-time optimization and knowledge discovery that addresses workflow optimization, resource allocation, as well as data-driven predictions of process-execution times, order fulfillment, and enterprise service-level performance. In contrast to prior work on data analytics techniques for enterprise performance optimization, the emphasis here is on realizing scalable and real-time enterprise intelligence based on a combination of heterogeneous system simulation, combinatorial optimization, machine-learning algorithms, and statistical methods.
On-demand digital-print service is a representative enterprise requiring a powerful EIS.We use real-life data from Reischling Press, Inc. (RPI), a digit-print-service provider (PSP), to evaluate our optimization algorithms.
In order to handle the increase in volume and diversity of demands, we first present a high-performance, scalable, and real-time production scheduling algorithm for production automation based on an incremental genetic algorithm (IGA). The objective of this algorithm is to optimize the order dispatching sequence and balance resource utilization. Compared to prior work, this solution is scalable for a high volume of orders and it provides fast scheduling solutions for orders that require complex fulfillment procedures. Experimental results highlight its potential benefit in reducing production inefficiencies and enhancing the productivity of an enterprise.
We next discuss analysis and prediction of different attributes involved in hierarchical components of an enterprise. We start from a study of the fundamental processes related to real-time prediction. Our process-execution time and process status prediction models integrate statistical methods with machine-learning algorithms. In addition to improved prediction accuracy compared to stand-alone machine-learning algorithms, it also performs a probabilistic estimation of the predicted status. An order generally consists of multiple series and parallel processes. We next introduce an order-fulfillment prediction model that combines advantages of multiple classification models by incorporating flexible decision-integration mechanisms. Experimental results show that adopting due dates recommended by the model can significantly reduce enterprise late-delivery ratio. Finally, we investigate service-level attributes that reflect the overall performance of an enterprise. We analyze and decompose time-series data into different components according to their hierarchical periodic nature, perform correlation analysis,
and develop univariate prediction models for each component as well as multivariate models for correlated components. Predictions for the original time series are aggregated from the predictions of its components. In addition to a significant increase in mid-term prediction accuracy, this distributed modeling strategy also improves short-term time-series prediction accuracy.
In summary, this thesis research has led to a set of characterization, optimization, and prediction tools for an EIS to derive insightful knowledge from data and use them as guidance for production management. It is expected to provide solutions for enterprises to increase reconfigurability, accomplish more automated procedures, and obtain data-driven recommendations or effective decisions.
Resumo:
Emergency departments are challenging research settings, where truly informed consent can be difficult to obtain. A deeper understanding of emergency medical patients' opinions about research is needed. We conducted a systematic review and meta-summary of quantitative and qualitative studies on which values, attitudes, or beliefs of emergent medical research participants influence research participation. We included studies of adults that investigated opinions toward emergency medicine research participation. We excluded studies focused on the association between demographics or consent document features and participation and those focused on non-emergency research. In August 2011, we searched the following databases: MEDLINE, EMBASE, Google Scholar, Scirus, PsycINFO, AgeLine and Global Health. Titles, abstracts and then full manuscripts were independently evaluated by two reviewers. Disagreements were resolved by consensus and adjudicated by a third author. Studies were evaluated for bias using standardised scores. We report themes associated with participation or refusal. Our initial search produced over 1800 articles. A total of 44 articles were extracted for full-manuscript analysis, and 14 were retained based on our eligibility criteria. Among factors favouring participation, altruism and personal health benefit had the highest frequency. Mistrust of researchers, feeling like a 'guinea pig' and risk were leading factors favouring refusal. Many studies noted limitations of informed consent processes in emergent conditions. We conclude that highlighting the benefits to the participant and society, mitigating risk and increasing public trust may increase research participation in emergency medical research. New methods for conducting informed consent in such studies are needed.
Resumo:
MOTIVATION: Technological advances that allow routine identification of high-dimensional risk factors have led to high demand for statistical techniques that enable full utilization of these rich sources of information for genetics studies. Variable selection for censored outcome data as well as control of false discoveries (i.e. inclusion of irrelevant variables) in the presence of high-dimensional predictors present serious challenges. This article develops a computationally feasible method based on boosting and stability selection. Specifically, we modified the component-wise gradient boosting to improve the computational feasibility and introduced random permutation in stability selection for controlling false discoveries. RESULTS: We have proposed a high-dimensional variable selection method by incorporating stability selection to control false discovery. Comparisons between the proposed method and the commonly used univariate and Lasso approaches for variable selection reveal that the proposed method yields fewer false discoveries. The proposed method is applied to study the associations of 2339 common single-nucleotide polymorphisms (SNPs) with overall survival among cutaneous melanoma (CM) patients. The results have confirmed that BRCA2 pathway SNPs are likely to be associated with overall survival, as reported by previous literature. Moreover, we have identified several new Fanconi anemia (FA) pathway SNPs that are likely to modulate survival of CM patients. AVAILABILITY AND IMPLEMENTATION: The related source code and documents are freely available at https://sites.google.com/site/bestumich/issues. CONTACT: yili@umich.edu.
Resumo:
BACKGROUND: In recent years large bibliographic databases have made much of the published literature of biology available for searches. However, the capabilities of the search engines integrated into these databases for text-based bibliographic searches are limited. To enable searches that deliver the results expected by comparative anatomists, an underlying logical structure known as an ontology is required. DEVELOPMENT AND TESTING OF THE ONTOLOGY: Here we present the Mammalian Feeding Muscle Ontology (MFMO), a multi-species ontology focused on anatomical structures that participate in feeding and other oral/pharyngeal behaviors. A unique feature of the MFMO is that a simple, computable, definition of each muscle, which includes its attachments and innervation, is true across mammals. This construction mirrors the logical foundation of comparative anatomy and permits searches using language familiar to biologists. Further, it provides a template for muscles that will be useful in extending any anatomy ontology. The MFMO is developed to support the Feeding Experiments End-User Database Project (FEED, https://feedexp.org/), a publicly-available, online repository for physiological data collected from in vivo studies of feeding (e.g., mastication, biting, swallowing) in mammals. Currently the MFMO is integrated into FEED and also into two literature-specific implementations of Textpresso, a text-mining system that facilitates powerful searches of a corpus of scientific publications. We evaluate the MFMO by asking questions that test the ability of the ontology to return appropriate answers (competency questions). We compare the results of queries of the MFMO to results from similar searches in PubMed and Google Scholar. RESULTS AND SIGNIFICANCE: Our tests demonstrate that the MFMO is competent to answer queries formed in the common language of comparative anatomy, but PubMed and Google Scholar are not. Overall, our results show that by incorporating anatomical ontologies into searches, an expanded and anatomically comprehensive set of results can be obtained. The broader scientific and publishing communities should consider taking up the challenge of semantically enabled search capabilities.