Biblioteca Digital

917 resultados para Web Log Data

A survey of data mining techniques for social media analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.

Capturing and sharing our collective expertise on climate data: the CHARMe project

Relevância:

30.00% 30.00%

Publicador:

Resumo:

For users of climate services, the ability to quickly determine the datasets that best fit one's needs would be invaluable. The volume, variety and complexity of climate data makes this judgment difficult. The ambition of CHARMe ("Characterization of metadata to enable high-quality climate services") is to give a wider interdisciplinary community access to a range of supporting information, such as journal articles, technical reports or feedback on previous applications of the data. The capture and discovery of this "commentary" information, often created by data users rather than data providers, and currently not linked to the data themselves, has not been significantly addressed previously. CHARMe applies the principles of Linked Data and open web standards to associate, record, search and publish user-derived annotations in a way that can be read both by users and automated systems. Tools have been developed within the CHARMe project that enable annotation capability for data delivery systems already in wide use for discovering climate data. In addition, the project has developed advanced tools for exploring data and commentary in innovative ways, including an interactive data explorer and comparator ("CHARMe Maps") and a tool for correlating climate time series with external "significant events" (e.g. instrument failures or large volcanic eruptions) that affect the data quality. Although the project focuses on climate science, the concepts are general and could be applied to other fields. All CHARMe system software is open-source, released under a liberal licence, permitting future projects to re-use the source code as they wish.

Effects of a web-based personalized intervention on physical activity in European adults: a randomized controlled trial

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: The high prevalence of physical inactivity worldwide calls for innovative and more effective ways to promote physical activity (PA). There are limited objective data on the effectiveness of Web-based personalized feedback on increasing PA in adults. Objective: It is hypothesized that providing personalized advice based on PA measured objectively alongside diet, phenotype, or genotype information would lead to larger and more sustained changes in PA, compared with nonpersonalized advice. Methods: A total of 1607 adults in seven European countries were randomized to either a control group (nonpersonalized advice, Level 0, L0) or to one of three personalized groups receiving personalized advice via the Internet based on current PA plus diet (Level 1, L1), PA plus diet and phenotype (Level 2, L2), or PA plus diet, phenotype, and genotype (Level 3, L3). PA was measured for 6 months using triaxial accelerometers, and self-reported using the Baecke questionnaire. Outcomes were objective and self-reported PA after 3 and 6 months. Results: While 1270 participants (85.81% of 1480 actual starters) completed the 6-month trial, 1233 (83.31%) self-reported PA at both baseline and month 6, but only 730 (49.32%) had sufficient objective PA data at both time points. For the total cohort after 6 months, a greater improvement in self-reported total PA (P=.02) and PA during leisure (nonsport) (P=.03) was observed in personalized groups compared with the control group. For individuals advised to increase PA, we also observed greater improvements in those two self-reported indices (P=.006 and P=.008, respectively) with increased personalization of the advice (L2 and L3 vs L1). However, there were no significant differences in accelerometer results between personalized and control groups, and no significant effect of adding phenotypic or genotypic information to the tailored feedback at month 3 or 6. After 6 months, there were small but significant improvements in the objectively measured physical activity level (P<.05), moderate PA (P<.01), and sedentary time (P<.001) for individuals advised to increase PA, but these changes were similar across all groups. Conclusions: Different levels of personalization produced similar small changes in objective PA. We found no evidence that personalized advice is more effective than conventional “one size fits all” guidelines to promote changes in PA in our Web-based intervention when PA was measured objectively. Based on self-reports, PA increased to a greater extent with more personalized advice. Thus, it is crucial to measure PA objectively in any PA intervention study.

Photometric redshifts and k-corrections for the Sloan Digital Sky Survey Data Release 7

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a catalogue of galaxy photometric redshifts and k-corrections for the Sloan Digital Sky Survey Data Release 7 (SDSS-DR7), available on the World Wide Web. The photometric redshifts were estimated with an artificial neural network using five ugriz bands, concentration indices and Petrosian radii in the g and r bands. We have explored our redshift estimates with different training sets, thus concluding that the best choice for improving redshift accuracy comprises the main galaxy sample (MGS), the luminous red galaxies and the galaxies of active galactic nuclei covering the redshift range 0 < z < 0.3. For the MGS, the photometric redshift estimates agree with the spectroscopic values within rms = 0.0227. The distribution of photometric redshifts derived in the range 0 < z(phot) < 0.6 agrees well with the model predictions. k-corrections were derived by calibration of the k-correct_v4.2 code results for the MGS with the reference-frame (z = 0.1) (g - r) colours. We adopt a linear dependence of k-corrections on redshift and (g - r) colours that provide suitable distributions of luminosity and colours for galaxies up to redshift z(phot) = 0.6 comparable to the results in the literature. Thus, our k-correction estimate procedure is a powerful, low computational time algorithm capable of reproducing suitable results that can be used for testing galaxy properties at intermediate redshifts using the large SDSS data base.

The perception of accessibility in Web development by academy, industry and government: a survey of the Brazilian scenario

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Accessibility has become a serious issue to be considered by various sectors of the society. However, what are the differences between the perception of accessibility by academy, government and industry? In this paper, we present an analysis of this issue based on a large survey carried out with 613 participants involved with Web development, from all of the 27 Brazilian states. The paper presents results from the data analysis for each sector, along with statistical tests regarding the main different issues related to each of the sectors, such as: government and law, industry and techniques, academy and education. The concern about accessibility law is poor even amongst people from government sector. The analyses have also pointed out that the academy has not been addressing accessibility training accordingly. The knowledge about proper techniques to produce accessible contents is better than other sectors`, but still limited in industry. Stronger investments in training and in the promotion of consciousness about the law may be pointed as the most important tools to help a more effective policy on Web accessibility in Brazil.

On estimation and influence diagnostics for log-Birnbaum-Saunders Student-t regression models: Full Bayesian analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this paper is to develop a Bayesian approach for log-Birnbaum-Saunders Student-t regression models under right-censored survival data. Markov chain Monte Carlo (MCMC) methods are used to develop a Bayesian procedure for the considered model. In order to attenuate the influence of the outlying observations on the parameter estimates, we present in this paper Birnbaum-Saunders models in which a Student-t distribution is assumed to explain the cumulative damage. Also, some discussions on the model selection to compare the fitted models are given and case deletion influence diagnostics are developed for the joint posterior distribution based on the Kullback-Leibler divergence. The developed procedures are illustrated with a real data set. (C) 2010 Elsevier B.V. All rights reserved.

Generalized log-gamma regression models with cure fraction

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, the generalized log-gamma regression model is modified to allow the possibility that long-term survivors may be present in the data. This modification leads to a generalized log-gamma regression model with a cure rate, encompassing, as special cases, the log-exponential, log-Weibull and log-normal regression models with a cure rate typically used to model such data. The models attempt to simultaneously estimate the effects of explanatory variables on the timing acceleration/deceleration of a given event and the surviving fraction, that is, the proportion of the population for which the event never occurs. The normal curvatures of local influence are derived under some usual perturbation schemes and two martingale-type residuals are proposed to assess departures from the generalized log-gamma error assumption as well as to detect outlying observations. Finally, a data set from the medical area is analyzed.

Upgrading a TCABR data analysis and acquisition system for remote participation using Java, XML, RCP and modern client/server communication/authentication

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The TCABR data analysis and acquisition system has been upgraded to support a joint research programme using remote participation technologies. The architecture of the new system uses Java language as programming environment. Since application parameters and hardware in a joint experiment are complex with a large variability of components, requirements and specification solutions need to be flexible and modular, independent from operating system and computer architecture. To describe and organize the information on all the components and the connections among them, systems are developed using the extensible Markup Language (XML) technology. The communication between clients and servers uses remote procedure call (RPC) based on the XML (RPC-XML technology). The integration among Java language, XML and RPC-XML technologies allows to develop easily a standard data and communication access layer between users and laboratories using common software libraries and Web application. The libraries allow data retrieval using the same methods for all user laboratories in the joint collaboration, and the Web application allows a simple graphical user interface (GUI) access. The TCABR tokamak team in collaboration with the IPFN (Instituto de Plasmas e Fusao Nuclear, Instituto Superior Tecnico, Universidade Tecnica de Lisboa) is implementing this remote participation technologies. The first version was tested at the Joint Experiment on TCABR (TCABRJE), a Host Laboratory Experiment, organized in cooperation with the IAEA (International Atomic Energy Agency) in the framework of the IAEA Coordinated Research Project (CRP) on ""Joint Research Using Small Tokamaks"". (C) 2010 Elsevier B.V. All rights reserved.

Deviance residuals in generalised log-gamma regression models with censored observations

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this article, we compare three residuals based on the deviance component in generalised log-gamma regression models with censored observations. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and the empirical distribution of each residual is displayed and compared with the standard normal distribution. For all cases studied, the empirical distributions of the proposed residuals are in general symmetric around zero, but only a martingale-type residual presented negligible kurtosis for the majority of the cases studied. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended for the martingale-type residual in generalised log-gamma regression models with censored data. A lifetime data set is analysed under log-gamma regression models and a model checking based on the martingale-type residual is performed.

Comparing diagnostic tests with missing data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When missing data occur in studies designed to compare the accuracy of diagnostic tests, a common, though naive, practice is to base the comparison of sensitivity, specificity, as well as of positive and negative predictive values on some subset of the data that fits into methods implemented in standard statistical packages. Such methods are usually valid only under the strong missing completely at random (MCAR) assumption and may generate biased and less precise estimates. We review some models that use the dependence structure of the completely observed cases to incorporate the information of the partially categorized observations into the analysis and show how they may be fitted via a two-stage hybrid process involving maximum likelihood in the first stage and weighted least squares in the second. We indicate how computational subroutines written in R may be used to fit the proposed models and illustrate the different analysis strategies with observational data collected to compare the accuracy of three distinct non-invasive diagnostic methods for endometriosis. The results indicate that even when the MCAR assumption is plausible, the naive partial analyses should be avoided.

Missing data mechanisms and their implications on the analysis of categorical data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We review some issues related to the implications of different missing data mechanisms on statistical inference for contingency tables and consider simulation studies to compare the results obtained under such models to those where the units with missing data are disregarded. We confirm that although, in general, analyses under the correct missing at random and missing completely at random models are more efficient even for small sample sizes, there are exceptions where they may not improve the results obtained by ignoring the partially classified data. We show that under the missing not at random (MNAR) model, estimates on the boundary of the parameter space as well as lack of identifiability of the parameters of saturated models may be associated with undesirable asymptotic properties of maximum likelihood estimators and likelihood ratio tests; even in standard cases the bias of the estimators may be low only for very large samples. We also show that the probability of a boundary solution obtained under the correct MNAR model may be large even for large samples and that, consequently, we may not always conclude that a MNAR model is misspecified because the estimate is on the boundary of the parameter space.

The log-bimodal-skew-normal model. A geochemical application

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The main objective of this paper is to study a logarithm extension of the bimodal skew normal model introduced by Elal-Olivero et al. [1]. The model can then be seen as an alternative to the log-normal model typically used for fitting positive data. We study some basic properties such as the distribution function and moments, and discuss maximum likelihood for parameter estimation. We report results of an application to a real data set related to nickel concentration in soil samples. Model fitting comparison with several alternative models indicates that the model proposed presents the best fit and so it can be quite useful in real applications for chemical data on substance concentration. Copyright (C) 2011 John Wiley & Sons, Ltd.

A web based decision support system for status assessment in advanced parkinson

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this work is to develop a web based decision support system, based onfuzzy logic, to assess the motor state of Parkinson patients on their performance in onscreenmotor tests in a test battery on a hand computer. A set of well defined rules, basedon an expert’s knowledge, were made to diagnose the current state of the patient. At theend of a period, an overall score is calculated which represents the overall state of thepatient during the period. Acceptability of the rules is based on the absolute differencebetween patient’s own assessment of his condition and the diagnosed state. Anyinconsistency can be tracked by highlighted as an alert in the system. Graphicalpresentation of data aims at enhanced analysis of patient’s state and performancemonitoring by the clinic staff. In general, the system is beneficial for the clinic staff,patients, project managers and researchers.

Integration av webbaserat analysprogram och kartprogram : Integration of web based analysis-software andmap-software

Relevância:

30.00% 30.00%

Publicador:

Resumo:

I takt med att GIS (Grafiska InformationsSystem) blir allt vanligare och mer användarvänligt har WM-data sett att kunder skulle ha intresse i att kunna koppla information från sin verksamhet till en kartbild. Detta för att lättare kunna ta till sig informationen om hur den geografiskt finns utspridd över ett område för att t.ex. ordna effektivare tranporter. WM-data, som det här arbetet är utfört åt, avser att ta fram en prototyp som sedan kan visas upp för att påvisa för kunder och andra intressenter att detta är möjligt att genomföra genom att skapa en integration mellan redan befintliga system. I det här arbetet har prototypen tagits fram med skogsindustrin och dess lager som inriktning. Befintliga program som integrationen ska skapas mellan är båda webbaserade och körs i en webbläsare. Analysprogrammet som ska användas heter Insikt och är utvecklat av företaget Trimma, kartprogrammet heter GIMS som är WM-datas egna program. Det ska vara möjligt att i Insikt analysera data och skapa en rapport. Den ska sedan skickas till GIMS där informationen skrivs ut på kartan på den plats som respektive information hör till. Det ska även gå att välja ut ett eller flera områden i kartan och skicka till Insikt för att analysera information från enbart de utvalda områdena. En prototyp med önskad funktionalitet har under arbetets gång tagits fram, men för att ha en säljbar produkt är en del arbeta kvar. Prototypen har visats för ett antal intresserade som tyckte det var intressant och tror att det är något som skulle kunna användas flitigt inom många områden.

Advanced Building Energy Data Visualization

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Advanced Building Energy Data Visualization is a way to detect performance problems in commercialbuildings. By placing sensors in a building that collects data from example, air temperature and electricalpower, then makes it possible to calculate the data in Data Visualization software. This softwaregenerates visual diagrams so the building manager or building operator can see if for example thepower consumption is to high.A first step (before sensors are installed in a building) to see how the energy consumption is in abuilding can be to use a Benchmarking Tool. There is a number of Benchmarking Tools that is availablefor free on the Internet. Each tool have a bit different approach, but they all show how much energyconsumption there is in a building compared to other similar buildings.In this study a new web design for the benchmarking tool CalARCH has been developed. CalARCHis developed at the Berkeley Lab in Berkeley, California, USA. CalARCH uses data collected only frombuildings in California, and is only for comparing buildings in California with other similar buildingsin the state.Five different versions of the web site were made. Then a web survey was done to determine whichversion would be the best for CalARCH. The results showed that Version 5 and Version 3 was the best.Then a new version was made, based on these two versions. This study was made at the LawrenceBerkeley Laboratory.

«
1
2
...
38
39
40
41
42
43
44
...
61
62
»