991 resultados para digitization, statistics, Google Analytics


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Vols. for 1913- published also in Nebraska State Board of Agriculture. Annual report, 1913- .

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Title varies slightly.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Encyclopaedia slavica sanctorum (eslavsanct.net) is designed as a complex heterogenous multimedia product. It is part of the project Encyclopaedia Slavica Sanctorum: Saints and Holy Places in Bulgaria (in electronic and Guthenberg versions). Until 2013, its web-based platform for online management and presentation of structured digital content has been prepared and numerous materials have been input. The platform is developed using the server technologies PHP, MySQL and HTML, JavaScript, CSS on the client side. The search in the e-ESS can be made by different parameters (12, or combinations of parameters), such as saints’ or feasts’ names, type of sainthood, types of texts dedicated to the saints, dates of saints’ commemorations, and several others. Both guests and registered users can search in the e-ESS but the latter have access to much more information including the publications of original sources. The e-platform allows for making statistics of what have been searched and read. The software used for content and access analysis is BI tool QlikView. As an analysis services provider, it is connected to the e-ESS objects repository and tracking services by a preliminary created data warehouse. The data warehouse is updated automatically, achieving real time analytics solution. The paper discusses some of the statistics results of the use of the e-ESS: the activities of the editors, users, and guests, the types of searches, the most often viewed object, such as the date of January 1 and the article on St. Basil the Great which is one of the richest encyclopaedia articles and includes both matadata and original sources published, both from medieval Slavonic manuscripts and popular culture records.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a practical approach for digitizing city landmarks based on free and limited Web resources. The digital replicas are then placed on the Web using popular services, like Google earth, and are accessible to a huge user base. The method is easily applicable and quite valuable to organizations with limited funding.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents the Accurate Google Cloud Simulator (AGOCS) – a novel high-fidelity Cloud workload simulator based on parsing real workload traces, which can be conveniently used on a desktop machine for day-to-day research. Our simulation is based on real-world workload traces from a Google Cluster with 12.5K nodes, over a period of a calendar month. The framework is able to reveal very precise and detailed parameters of the executed jobs, tasks and nodes as well as to provide actual resource usage statistics. The system has been implemented in Scala language with focus on parallel execution and an easy-to-extend design concept. The paper presents the detailed structural framework for AGOCS and discusses our main design decisions, whilst also suggesting alternative and possibly performance enhancing future approaches. The framework is available via the Open Source GitHub repository.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sequences of timestamped events are currently being generated across nearly every domain of data analytics, from e-commerce web logging to electronic health records used by doctors and medical researchers. Every day, this data type is reviewed by humans who apply statistical tests, hoping to learn everything they can about how these processes work, why they break, and how they can be improved upon. To further uncover how these processes work the way they do, researchers often compare two groups, or cohorts, of event sequences to find the differences and similarities between outcomes and processes. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two cohorts of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event). Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome. Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. Visual analytics tools leverage humans' ability to easily see patterns and anomalies that they were not expecting, but is limited by uncertainty in findings. Statistical tools emphasize finding significant differences in the data, but often requires researchers have a concrete question and doesn't facilitate more general exploration of the data. Combining visual analytics tools with statistical methods leverages the benefits of both approaches for quicker and easier insight discovery. Integrating statistics into a visualization tool presents many challenges on the frontend (e.g., displaying the results of many different metrics concisely) and in the backend (e.g., scalability challenges with running various metrics on multi-dimensional data at once). I begin by exploring the problem of comparing cohorts of event sequences and understanding the questions that analysts commonly ask in this task. From there, I demonstrate that combining automated statistics with an interactive user interface amplifies the benefits of both types of tools, thereby enabling analysts to conduct quicker and easier data exploration, hypothesis generation, and insight discovery. The direct contributions of this dissertation are: (1) a taxonomy of metrics for comparing cohorts of temporal event sequences, (2) a statistical framework for exploratory data analysis with a method I refer to as high-volume hypothesis testing (HVHT), (3) a family of visualizations and guidelines for interaction techniques that are useful for understanding and parsing the results, and (4) a user study, five long-term case studies, and five short-term case studies which demonstrate the utility and impact of these methods in various domains: four in the medical domain, one in web log analysis, two in education, and one each in social networks, sports analytics, and security. My dissertation contributes an understanding of how cohorts of temporal event sequences are commonly compared and the difficulties associated with applying and parsing the results of these metrics. It also contributes a set of visualizations, algorithms, and design guidelines for balancing automated statistics with user-driven analysis to guide users to significant, distinguishing features between cohorts. This work opens avenues for future research in comparing two or more groups of temporal event sequences, opening traditional machine learning and data mining techniques to user interaction, and extending the principles found in this dissertation to data types beyond temporal event sequences.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Search engines have forever changed the way people access and discover knowledge, allowing information about almost any subject to be quickly and easily retrieved within seconds. As increasingly more material becomes available electronically the influence of search engines on our lives will continue to grow. This presents the problem of how to find what information is contained in each search engine, what bias a search engine may have, and how to select the best search engine for a particular information need. This research introduces a new method, search engine content analysis, in order to solve the above problem. Search engine content analysis is a new development of traditional information retrieval field called collection selection, which deals with general information repositories. Current research in collection selection relies on full access to the collection or estimations of the size of the collections. Also collection descriptions are often represented as term occurrence statistics. An automatic ontology learning method is developed for the search engine content analysis, which trains an ontology with world knowledge of hundreds of different subjects in a multilevel taxonomy. This ontology is then mined to find important classification rules, and these rules are used to perform an extensive analysis of the content of the largest general purpose Internet search engines in use today. Instead of representing collections as a set of terms, which commonly occurs in collection selection, they are represented as a set of subjects, leading to a more robust representation of information and a decrease of synonymy. The ontology based method was compared with ReDDE (Relevant Document Distribution Estimation method for resource selection) using the standard R-value metric, with encouraging results. ReDDE is the current state of the art collection selection method which relies on collection size estimation. The method was also used to analyse the content of the most popular search engines in use today, including Google and Yahoo. In addition several specialist search engines such as Pubmed and the U.S. Department of Agriculture were analysed. In conclusion, this research shows that the ontology based method mitigates the need for collection size estimation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Google Online Marketing Challenge is an ongoing collaboration between Google and academics, to give students experiential learning. The Challenge gives student teams US$200 in AdWords, Google’s flagship advertising product, to develop online marketing campaigns for actual businesses. The end result is an engaging in-class exercise that provides students and professors with an exciting and pedagogically rigorous competition. Results from surveys at the end of the Challenge reveal positive appraisals from the three—students, businesses, and professors—main constituents; general agreement between students and instructors regarding learning outcomes; and a few points of difference between students and instructors. In addition to describing the Challenge and its outcomes, this article reviews the postparticipation questionnaires and subsequent datasets. The questionnaires and results are publicly available, and this article invites educators to mine the datasets, share their results, and offer suggestions for future iterations of the Challenge.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Introduction: The Google Online Marketing Challenge is a global competition in which student teams run advertising campaigns for small and medium-sized businesses (SMEs) using AdWords, Google’s text-based advertisements. In 2008, its inaugural year, over 8,000 students and 300 instructors from 47 countries representing over 200 schools participated. The Challenge ran in undergraduate and graduate classes in disciplines such as marketing, tourism, advertising, communication and information systems. Combining advertising and education, the Challenge gives student hands-on experience in the increasingly important field of online marketing, engages them with local businesses and motivates them through the thrill of a global competition. Student teams receive US$200 in AdWords credits, Google’s premier advertising product that offers cost-per-click advertisements. The teams then recruit and work with a local business to devise an effective online marketing campaign. Students first outline a strategy, run a series of campaigns, and provide their business with recommendations to improve their online marketing. Teams submit two written reports for judging by 14 academics in eight countries. In addition, Google AdWords experts judge teams on their campaign statistics such as success metrics and account management. Rather than a marketing simulation against a computer or hypothetical marketing plans for hypothetical businesses, the Challenges has student teams develop and manage real online advertising campaigns for their clients and compete against peers globally.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Matrix function approximation is a current focus of worldwide interest and finds application in a variety of areas of applied mathematics and statistics. In this thesis we focus on the approximation of A^(-α/2)b, where A ∈ ℝ^(n×n) is a large, sparse symmetric positive definite matrix and b ∈ ℝ^n is a vector. In particular, we will focus on matrix function techniques for sampling from Gaussian Markov random fields in applied statistics and the solution of fractional-in-space partial differential equations. Gaussian Markov random fields (GMRFs) are multivariate normal random variables characterised by a sparse precision (inverse covariance) matrix. GMRFs are popular models in computational spatial statistics as the sparse structure can be exploited, typically through the use of the sparse Cholesky decomposition, to construct fast sampling methods. It is well known, however, that for sufficiently large problems, iterative methods for solving linear systems outperform direct methods. Fractional-in-space partial differential equations arise in models of processes undergoing anomalous diffusion. Unfortunately, as the fractional Laplacian is a non-local operator, numerical methods based on the direct discretisation of these equations typically requires the solution of dense linear systems, which is impractical for fine discretisations. In this thesis, novel applications of Krylov subspace approximations to matrix functions for both of these problems are investigated. Matrix functions arise when sampling from a GMRF by noting that the Cholesky decomposition A = LL^T is, essentially, a `square root' of the precision matrix A. Therefore, we can replace the usual sampling method, which forms x = L^(-T)z, with x = A^(-1/2)z, where z is a vector of independent and identically distributed standard normal random variables. Similarly, the matrix transfer technique can be used to build solutions to the fractional Poisson equation of the form ϕn = A^(-α/2)b, where A is the finite difference approximation to the Laplacian. Hence both applications require the approximation of f(A)b, where f(t) = t^(-α/2) and A is sparse. In this thesis we will compare the Lanczos approximation, the shift-and-invert Lanczos approximation, the extended Krylov subspace method, rational approximations and the restarted Lanczos approximation for approximating matrix functions of this form. A number of new and novel results are presented in this thesis. Firstly, we prove the convergence of the matrix transfer technique for the solution of the fractional Poisson equation and we give conditions by which the finite difference discretisation can be replaced by other methods for discretising the Laplacian. We then investigate a number of methods for approximating matrix functions of the form A^(-α/2)b and investigate stopping criteria for these methods. In particular, we derive a new method for restarting the Lanczos approximation to f(A)b. We then apply these techniques to the problem of sampling from a GMRF and construct a full suite of methods for sampling conditioned on linear constraints and approximating the likelihood. Finally, we consider the problem of sampling from a generalised Matern random field, which combines our techniques for solving fractional-in-space partial differential equations with our method for sampling from GMRFs.

Relevância:

20.00% 20.00%

Publicador: