940 resultados para visitor information, network services, data collecting, data analysis, statistics, locating
Resumo:
"December 1984."
Resumo:
Quantile computation has many applications including data mining and financial data analysis. It has been shown that an is an element of-approximate summary can be maintained so that, given a quantile query d (phi, is an element of), the data item at rank [phi N] may be approximately obtained within the rank error precision is an element of N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different phi and is an element of poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.
Resumo:
Background and purpose Survey data quality is a combination of the representativeness of the sample, the accuracy and precision of measurements, data processing and management with several subcomponents in each. The purpose of this paper is to show how, in the final risk factor surveys of the WHO MONICA Project, information on data quality were obtained, quantified, and used in the analysis. Methods and results In the WHO MONICA (Multinational MONItoring of trends and determinants in CArdiovascular disease) Project, the information about the data quality components was documented in retrospective quality assessment reports. On the basis of the documented information and the survey data, the quality of each data component was assessed and summarized using quality scores. The quality scores were used in sensitivity testing of the results both by excluding populations with low quality scores and by weighting the data by its quality scores. Conclusions Detailed documentation of all survey procedures with standardized protocols, training, and quality control are steps towards optimizing data quality. Quantifying data quality is a further step. Methods used in the WHO MONICA Project could be adopted to improve quality in other health surveys.
Resumo:
Collaborate Filtering is one of the most popular recommendation algorithms. Most Collaborative Filtering algorithms work with a static set of data. This paper introduces a novel approach to providing recommendations using Collaborative Filtering when user rating is received over an incoming data stream. In an incoming stream there are massive amounts of data arriving rapidly making it impossible to save all the records for later analysis. By dynamically building a decision tree for every item as data arrive, the incoming data stream is used effectively although an inevitable trade off between accuracy and amount of memory used is introduced. By adding a simple personalization step using a hierarchy of the items, it is possible to improve the predicted ratings made by each decision tree and generate recommendations in real-time. Empirical studies with the dynamically built decision trees show that the personalization step improves the overall predicted accuracy.
Resumo:
The data available during the drug discovery process is vast in amount and diverse in nature. To gain useful information from such data, an effective visualisation tool is required. To provide better visualisation facilities to the domain experts (screening scientist, biologist, chemist, etc.),we developed a software which is based on recently developed principled visualisation algorithms such as Generative Topographic Mapping (GTM) and Hierarchical Generative Topographic Mapping (HGTM). The software also supports conventional visualisation techniques such as Principal Component Analysis, NeuroScale, PhiVis, and Locally Linear Embedding (LLE). The software also provides global and local regression facilities . It supports regression algorithms such as Multilayer Perceptron (MLP), Radial Basis Functions network (RBF), Generalised Linear Models (GLM), Mixture of Experts (MoE), and newly developed Guided Mixture of Experts (GME). This user manual gives an overview of the purpose of the software tool, highlights some of the issues to be taken care while creating a new model, and provides information about how to install & use the tool. The user manual does not require the readers to have familiarity with the algorithms it implements. Basic computing skills are enough to operate the software.
Resumo:
Fibre-to-the-premises (FTTP) has been long sought as the ultimate solution to satisfy the demand for broadband access in the foreseeable future, and offer distance-independent data rate within access network reach. However, currently deployed FTTP networks have in most cases only replaced the transmission medium, without improving the overall architecture, resulting in deployments that are only cost efficient in densely populated areas (effectively increasing the digital divide). In addition, the large potential increase in access capacity cannot be matched by a similar increase in core capacity at competitive cost, effectively moving the bottleneck from access to core. DISCUS is a European Integrated Project that, building on optical-centric solutions such as Long-Reach Passive Optical access and flat optical core, aims to deliver a cost-effective architecture for ubiquitous broadband services. One of the key features of the project is the end-to-end approach, which promises to deliver a complete network design and a conclusive analysis of its economic viability. © 2013 IEEE.
Resumo:
Diabetes patients might suffer from an unhealthy life, long-term treatment and chronic complicated diseases. The decreasing hospitalization rate is a crucial problem for health care centers. This study combines the bagging method with base classifier decision tree and costs-sensitive analysis for diabetes patients' classification purpose. Real patients' data collected from a regional hospital in Thailand were analyzed. The relevance factors were selected and used to construct base classifier decision tree models to classify diabetes and non-diabetes patients. The bagging method was then applied to improve accuracy. Finally, asymmetric classification cost matrices were used to give more alternative models for diabetes data analysis.
Resumo:
The Twitter System is the biggest social network in the world, and everyday millions of tweets are posted and talked about, expressing various views and opinions. A large variety of research activities have been conducted to study how the opinions can be clustered and analyzed, so that some tendencies can be uncovered. Due to the inherent weaknesses of the tweets - very short texts and very informal styles of writing - it is rather hard to make an investigation of tweet data analysis giving results with good performance and accuracy. In this paper, we intend to attack the problem from another aspect - using a two-layer structure to analyze the twitter data: LDA with topic map modelling. The experimental results demonstrate that this approach shows a progress in twitter data analysis. However, more experiments with this method are expected in order to ensure that the accurate analytic results can be maintained.
Resumo:
Recommendation systems aim to help users make decisions more efficiently. The most widely used method in recommendation systems is collaborative filtering, of which, a critical step is to analyze a user's preferences and make recommendations of products or services based on similarity analysis with other users' ratings. However, collaborative filtering is less usable for recommendation facing the "cold start" problem, i.e. few comments being given to products or services. To tackle this problem, we propose an improved method that combines collaborative filtering and data classification. We use hotel recommendation data to test the proposed method. The accuracy of the recommendation is determined by the rankings. Evaluations regarding the accuracies of Top-3 and Top-10 recommendation lists using the 10-fold cross-validation method and ROC curves are conducted. The results show that the Top-3 hotel recommendation list proposed by the combined method has the superiority of the recommendation performance than the Top-10 list under the cold start condition in most of the times.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
Presentation Research of the Practicum and externships has a long history and involves important aspects for analysis. For example, the recent changes taking place in university grades allot more credits to the Practicum course in all grades, and the Company-University collaboration has exposed the need to study in new learning environments. The rise of ICT practices like ePortfolios, which require technological solutions and methods supported by experimentation, study and research, require particular examination due to the dynamic momentum of technological innovation. Tutoring the Practicum and externships requires remote monitoring and communication using ePortfolios, and competence-based assessment and students’ requirement to provide evidence of learning require the best tutoring methods available with ePortfolios. Among the elements of ePortfolios, eRubrics emerge as a tool for design, communication and competence-assessment. This project aims to consolidate a research line on eRubrics, already undertaken by another project -I+D+i [EDU2010-15432]- in order to expand the network of researchers and Centres of Excellence in Spain and other countries: Harvard University in USA, University of Cologne in Germany, University of Colima in Mexico, Federal University of Parana, University of Santa Catarina in Brasil, and Stockholm University in Sweden(1). This new project [EDU2013-41974-P](2) examines the impact of eRubrics on tutoring and on assessing the Practicum course and externships. Through technology, distance tutoring grants an extra dimension to human communication. New forms of teaching with technological mediation are on the rise and are highly valuable, not only for formal education but especially in both public and private sectors of non-formal education, such as occupational training, unemployed education and public servant training. Objectives Obj. 1. To analyse models of technology used in assessing learning in the Practicum of all grades at Spanish Faculties of Education. Obj. 2. To study models of learning assessment measured by eRubrics in the Practicum. Obj. 3. To analyse communication through eRubrics between students and their tutors at university and practice centres, focusing on students’ understanding of competences and evidences to be assessed in the Practicum. Obj. 4. To design assessment services and products, in order to federate companies and practice centres with training institutions. Among many other features, it has the following functions CoRubric(3) 1. The possibility to assess people, products or services by using rubrics. 2. Ipsative assessment. 3. Designing fully flexible rubrics. 4. Drafting reports and exporting results from eRubrics in a project. 5. Students and teachers talk about the evaluation and application of the criteria Methodology, Methods, Research Instruments or Sources Used The project will use techniques to collect and analyse data from two methodological approaches: 1. In order to meet the first objective, we suggest an initial exploratory descriptive study (Buendía Eisman, Colás Bravo & Hernández Pina, 1998), which involves conducting interviews with Practicum coordinators from all educational grades across Spain, as well as analysing the contents of the teaching guides used in all educational grades across Spain. 55 academic managers were interviewed from about 10 faculties of education in public universities in Spain (20%), and course guides 376 universities from 36 public institutions in Spain (72%) are analyzed. 2. In order to satisfy the second objective, 7 universities have been selected to implement the project two instruments aimed at tutors practice centers and tutors of the faculty. All instruments for collecting data were validated by experts using the Delphi method. The selection of experts had three aspects: years of professional experience, number and quality of publications in the field (Practicum, Educational Technology and Teacher Training), and self-rating of their knowledge. The resulting data was calculated using the Coefficient of Competence (Kcomp) (Martínez, Zúñiga, Sala & Meléndez, 2012). Results in all cases showed an average experience of more than 0.09 points. The two instruments of the first objective were validated during the first half of 2014-15 year, data collected during the second half. And the second objective during the first half of 2015-16 year and data collection for the second half. The set of four instruments (two for each objective 1 and 2) have the same dimensions as each of the sources (Coordinators, course guides, tutors of practice centers and faculty) as they were: a. Institution-Organization, b. Nature of internships, c. Relationship between agents, d. Management Practicum, e. Assessment. F. Technological support, g. Training and h. Assessment Ethics. Conclusions, Expected Outcomes or Findings The first results respond to Objective 1, where we find different conclusions depending on each of the six dimensions. In the case of internal regulations governing the organization and structure of the practicum, we note that most traditional degrees (Elementary and Primary grades) share common internal rules, in particular development methodology and criteria against other grades (Pedagogy and Social Education ). It is also true that the centers of practices in last cases are very different from each other and can be a public institution, a school, a company, a museum, etc. The memory with a 56.34% and 43.67% daily activities are more demands on students in all degrees, Lesson plans 28.18% 19.72% Portfolio 26.7% Didactic units and Others 32,4%. The technical support has been mainly used the platform of the University 47.89% and 57.75% Email, followed by other services and tools 9.86% and rubric platforms 1.41%. The assessment criteria are divided between formal aspects of 12.38%, Written expresión 12.38%, treatment of the subject 14.45%, methodological rigor of work 10.32%, and Level of argument Clarity and relevance of conclusions 10.32%. In general terms, we could say that there is a trend and debate between formative assessment against a accreditation. It has not yet had sufficient time to further study and confront other dimensions and sources of information. We hope to provide more analysis and conclusions in the conference date.
Resumo:
MEGAGEO - Moving megaliths in the Neolithic is a project that intends to find the provenience of lithic materials in the construction of tombs. A multidisciplinary approach is carried out, with researchers from several of the knowledge fields involved. This work presents a spatial data warehouse specially developed for this project that comprises information from national archaeological databases, geographic and geological information and new geochemical and petrographic data obtained during the project. The use of the spatial data warehouse proved to be essential in the data analysis phase of the project. The Redondo Area is presented as a case study for the application of the spatial data warehouse to analyze the relations between geochemistry, geology and the tombs in this area.
Resumo:
In this thesis, we investigate the role of applied physics in epidemiological surveillance through the application of mathematical models, network science and machine learning. The spread of a communicable disease depends on many biological, social, and health factors. The large masses of data available make it possible, on the one hand, to monitor the evolution and spread of pathogenic organisms; on the other hand, to study the behavior of people, their opinions and habits. Presented here are three lines of research in which an attempt was made to solve real epidemiological problems through data analysis and the use of statistical and mathematical models. In Chapter 1, we applied language-inspired Deep Learning models to transform influenza protein sequences into vectors encoding their information content. We then attempted to reconstruct the antigenic properties of different viral strains using regression models and to identify the mutations responsible for vaccine escape. In Chapter 2, we constructed a compartmental model to describe the spread of a bacterium within a hospital ward. The model was informed and validated on time series of clinical measurements, and a sensitivity analysis was used to assess the impact of different control measures. Finally (Chapter 3) we reconstructed the network of retweets among COVID-19 themed Twitter users in the early months of the SARS-CoV-2 pandemic. By means of community detection algorithms and centrality measures, we characterized users’ attention shifts in the network, showing that scientific communities, initially the most retweeted, lost influence over time to national political communities. In the Conclusion, we highlighted the importance of the work done in light of the main contemporary challenges for epidemiological surveillance. In particular, we present reflections on the importance of nowcasting and forecasting, the relationship between data and scientific research, and the need to unite the different scales of epidemiological surveillance.
Resumo:
This paper proposes a regression model considering the modified Weibull distribution. This distribution can be used to model bathtub-shaped failure rate functions. Assuming censored data, we consider maximum likelihood and Jackknife estimators for the parameters of the model. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and we also present some ways to perform global influence. Besides, for different parameter settings, sample sizes and censoring percentages, various simulations are performed and the empirical distribution of the modified deviance residual is displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended for a martingale-type residual in log-modified Weibull regression models with censored data. Finally, we analyze a real data set under log-modified Weibull regression models. A diagnostic analysis and a model checking based on the modified deviance residual are performed to select appropriate models. (c) 2008 Elsevier B.V. All rights reserved.