942 resultados para Blog datasets
Resumo:
The treatment of factual data has been widely studied in different areas of Natural Language Processing (NLP). However, processing subjective information still poses important challenges. This paper presents research aimed at assessing techniques that have been suggested as appropriate in the context of subjective - Opinion Question Answering (OQA). We evaluate the performance of an OQA with these new components and propose methods to optimally tackle the issues encountered. We assess the impact of including additional resources and processes with the purpose of improving the system performance on two distinct blog datasets. The improvements obtained for the different combination of tools are statistically significant. We thus conclude that the proposed approach is adequate for the OQA task, offering a good strategy to deal with opinionated questions.
Resumo:
This paper discusses about effectiveness of blogs for reflective learning in design education. Students in two animation units were asked to complete their online journal via blog in terms of reflective learning. Students were encouraged to respond their weekly outcomes and project development process to their blog and share it with other students. A survey was undertaken to evaluate their learning experience and one of the key outcomes indicates that interaction design for social network is significantly important to blog based learning design.
Resumo:
Association rule mining has made many advances in the area of knowledge discovery. However, the quality of the discovered association rules is a big concern and has drawn more and more attention recently. One problem with the quality of the discovered association rules is the huge size of the extracted rule set. Often for a dataset, a huge number of rules can be extracted, but many of them can be redundant to other rules and thus useless in practice. Mining non-redundant rules is a promising approach to solve this problem. In this paper, we firstly propose a definition for redundancy; then we propose a concise representation called Reliable basis for representing non-redundant association rules for both exact rules and approximate rules. An important contribution of this paper is that we propose to use the certainty factor as the criteria to measure the strength of the discovered association rules. With the criteria, we can determine the boundary between redundancy and non-redundancy to ensure eliminating as many redundant rules as possible without reducing the inference capacity of and the belief to the remaining extracted non-redundant rules. We prove that the redundancy elimination based on the proposed Reliable basis does not reduce the belief to the extracted rules. We also prove that all association rules can be deduced from the Reliable basis. Therefore the Reliable basis is a lossless representation of association rules. Experimental results show that the proposed Reliable basis can significantly reduce the number of extracted rules.
Resumo:
Scientists need to transfer semantically similar queries across multiple heterogeneous linked datasets. These queries may require data from different locations and the results are not simple to combine due to differences between datasets. A query model was developed to make it simple to distribute queries across different datasets using RDF as the result format. The query model, based on the concept of publicly recognised namespaces for parts of each scientific dataset, was implemented with a configuration that includes a large number of current biological and chemical datasets. The configuration is flexible, providing the ability to transparently use both private and public datasets in any query. A prototype implementation of the model was used to resolve queries for the Bio2RDF website, including both Bio2RDF datasets and other datasets that do not follow the Bio2RDF URI conventions.
Resumo:
In today’s electronic world vast amounts of knowledge is stored within many datasets and databases. Often the default format of this data means that the knowledge within is not immediately accessible, but rather has to be mined and extracted. This requires automated tools and they need to be effective and efficient. Association rule mining is one approach to obtaining knowledge stored with datasets / databases which includes frequent patterns and association rules between the items / attributes of a dataset with varying levels of strength. However, this is also association rule mining’s downside; the number of rules that can be found is usually very big. In order to effectively use the association rules (and the knowledge within) the number of rules needs to be kept manageable, thus it is necessary to have a method to reduce the number of association rules. However, we do not want to lose knowledge through this process. Thus the idea of non-redundant association rule mining was born. A second issue with association rule mining is determining which ones are interesting. The standard approach has been to use support and confidence. But they have their limitations. Approaches which use information about the dataset’s structure to measure association rules are limited, but could yield useful association rules if tapped. Finally, while it is important to be able to get interesting association rules from a dataset in a manageable size, it is equally as important to be able to apply them in a practical way, where the knowledge they contain can be taken advantage of. Association rules show items / attributes that appear together frequently. Recommendation systems also look at patterns and items / attributes that occur together frequently in order to make a recommendation to a person. It should therefore be possible to bring the two together. In this thesis we look at these three issues and propose approaches to help. For discovering non-redundant rules we propose enhanced approaches to rule mining in multi-level datasets that will allow hierarchically redundant association rules to be identified and removed, without information loss. When it comes to discovering interesting association rules based on the dataset’s structure we propose three measures for use in multi-level datasets. Lastly, we propose and demonstrate an approach that allows for association rules to be practically and effectively used in a recommender system, while at the same time improving the recommender system’s performance. This especially becomes evident when looking at the user cold-start problem for a recommender system. In fact our proposal helps to solve this serious problem facing recommender systems.
Resumo:
Recent studies on automatic new topic identification in Web search engine user sessions demonstrated that neural networks are successful in automatic new topic identification. However most of this work applied their new topic identification algorithms on data logs from a single search engine. In this study, we investigate whether the application of neural networks for automatic new topic identification are more successful on some search engines than others. Sample data logs from the Norwegian search engine FAST (currently owned by Overture) and Excite are used in this study. Findings of this study suggest that query logs with more topic shifts tend to provide more successful results on shift-based performance measures, whereas logs with more topic continuations tend to provide better results on continuation-based performance measures.
Resumo:
Blogs and other online platforms for personal writing such as LiveJournal have been of interest to researchers across the social sciences and humanities for a decade now. Although growth in the uptake of blogging has stalled somewhat since the heyday of blogs in the early 2000s, blogging continues to be a major genre of Internet-based communication. Indeed, at the same time that mass participation has moved on to Facebook, Twitter, and other more recent communication phenomena, what has been left behind by the wave of mass adoption is a slightly smaller but all the more solidly established blogosphere of engaged and committed participants. Blogs are now an accepted part of institutional, group, and personal communications strategies (Bruns and Jacobs, 2006); in style and substance, they are situated between the more static information provided by conventional Websites and Webpages and the continuous newsfeeds provided through Facebook and Twitter updates. Blogs provide a vehicle for authors (and their commenters) to think through given topics in the space of a few hundred to a few thousand words – expanding, perhaps, on shorter tweets, and possibly leading to the publication of more fully formed texts elsewhere. Additionally, they are also a very flexible medium: they readily provide the functionality to include images, audio, video, and other additional materials – as well as the fundamental tool of blogging, the hyperlink itself. Indeed, the role of the link in blogs and blog posts should not be underestimated. Whatever the genre and topic that individual bloggers engage in, for the most part blogging is used to provide timely updates and commentary – and it is typical for such material to link both to relevant posts made by other bloggers, and to previous posts by the present author, both to background material which provides readers with further information about the blogger’s current topic, and to news stories and articles which the blogger found interesting or worthy of critique. Especially where bloggers are part of a larger community of authors sharing similar interests or views (and such communities are often indicated by the presence of yet another type of link – in blogrolls, often in a sidebar on the blog site, which list the blogger’s friends or favourites), then, the reciprocal writing and linking of posts often constitutes an asynchronous, distributed conversation that unfolds over the course of days, weeks, and months. Research into blogs is interesting for a variety of reasons, therefore. For one, a qualitative analysis of one or several blogs can reveal the cognitive and communicative processes through which individual bloggers define their online identity, position themselves in relation to fellow bloggers, frame particular themes, topics and stories, and engage with one another’s points of view. It may also shed light on how such processes may differ across different communities of interest, perhaps in correlation with the different societal framing and valorisation of specific areas of interest, with the socioeconomic backgrounds of individual bloggers, or with other external or internal factors. Such qualitative research now looks back on a decade-long history (for key collections, see Gurak, et al., 2004; Bruns and Jacobs, 2006; also see Walker Rettberg, 2008) and has recently shifted also to specifically investigate how blogging practices differ across different cultures (Russell and Echchaibi, 2009). Other studies have also investigated the practices and motivations of bloggers in specific countries from a sociological perspective, through large-scale surveys (e.g. Schmidt, 2009). Blogs have also been directly employed within both K-12 and higher education, across many disciplines, as tools for reflexive learning and discussion (Burgess, 2006).
Resumo:
The availability of new media as a universal communication tool has an impact on the power of the general public to comment on a variety of issues. This paper examines this increase in consumer power with respect to bloggers. The research context is controversial advertising, and specifically Tourism Australia’s “Where the bloody hell are you?” campaign. By utilising Denegri-Knott’s (2006) four on-line power strategies, a content analysis of weblogs reveals that consumers are distributing information, opinion and even banned advertising material, thereby forming power hubs of like-minded people, with the potential to become online pressure groups. The consequences and implications of this augmented power on regulators, advertisers and bloggers are explored. The findings contribute to the understanding of blogs as a new communication platform and bloggers as a new demographic of activists in the process of advertising.
Resumo:
Humanitarian entrants remain invisible in existing populations datasets, and this has significant implications for health care and health policy. We suggest adding 'year of arrival' to population datasets; enabling the combination of 'country of birth' and 'year of arrival' to be used as a proxy for refugee status.
Resumo:
Although topic detection and tracking techniques have made great progress, most of the researchers seldom pay more attention to the following two aspects. First, the construction of a topic model does not take the characteristics of different topics into consideration. Second, the factors that determine the formation and development of hot topics are not further analyzed. In order to correctly extract news blog hot topics, the paper views the above problems in a new perspective based on the W2T (Wisdom Web of Things) methodology, in which the characteristics of blog users, context of topic propagation and information granularity are investigated in a unified way. The motivations and features of blog users are first analyzed to understand the characteristics of news blog topics. Then the context of topic propagation is decomposed into the blog community, topic network and opinion network, respectively. Some important factors such as the user behavior pattern, opinion leader and network opinion are identified to track the development trends of news blog topics. Moreover, a blog hot topic detection algorithm is proposed, in which news blog hot topics are identified by measuring the duration, topic novelty, attention degree of users and topic growth. Experimental results show that the proposed method is feasible and effective. These results are also useful for further studying the formation mechanism of opinion leaders in blogspace.
Resumo:
ForscherInnen aus Sozial- und Geisteswissenschaften interessieren sich seit nunmehr einem Jahrzehnt für Blogs, Online-Tagebücher und Online-Journale. Auch wenn die Zuwachsrate der Blogosphäre seit der Blütezeit des Bloggens in den 2000ern stagniert, bleiben Blogs doch eines der bedeutendsten Genres der internetgestützten Kommunikation. Tatsächlich ist nach der Massenabwanderung zu Facebook, Twitter und anderen erst in jüngerer Zeit entstandenen Kommunikationsmitteln eine etwas kleinere, aber umso stärker etablierte Blogosphäre von engagierten und eingeschworenen Teilnehmenden übriggeblieben. Blogs werden mittlerweile als Teil einer institutionellen, persönlichen und Gruppen-Kommunikationstrategie akzeptiert. In Stil und Inhalt liegen sie zwischen den statischeren Informationen auf konventionellen Websites und den ständig aktualisierten Facebook- und Twitter-Newsfeeds. Blogs ermöglichen es ihren AutorInnen (und deren KommentatorInnen), bestimmte Themen im Umfang von einigen hundert bis zu einigen tausend Wörtern zu durchdenken, in kürzeren Posts ins Detail zu gehen und ggf. intensiver durchdachte Texte anderswo zu publizieren. Zudem sind sie auch ein sehr flexibles Medium: Bilder, Audio-, Video- sowie andere Materialien können mühelos eingefügt werden - und natürlich auch das grundlegende Instrument des Bloggens: Hyperlinks.
Resumo:
Within the QUT Business School (QUTBS)– researchers across economics, finance and accounting depend on data driven research. They analyze historic and global financial data across a range of instruments to understand the relationships and effects between them as they respond to news and events in their region. Scholars and Higher Degree Research Students in turn seek out universities which offer these particular datasets to further their research. This involves downloading and manipulating large datasets, often with a focus on depth of detail, frequency and long tail historical data. This is stock exchange data and has potential commercial value therefore the license for access tends to be very expensive. This poster reports the following findings: •The library has a part to play in freeing up researchers from the burden of negotiating subscriptions, fundraising and managing the legal requirements around license and access. •The role of the library is to communicate the nature and potential of these complex resources across the university to disciplines as diverse as Mathematics, Health, Information Systems and Creative Industries. •Has demonstrated clear concrete support for research by QUT Library and built relationships into faculty. It has made data available to all researchers and attracted new HDRs. The aim is to reach the output threshold of research outputs to submit into FOR Code 1502 (Banking, Finance and Investment) for ERA 2015. •It is difficult to identify what subset of dataset will be obtained given somewhat vague price tiers. •The integrity of data is variable as it is limited by the way it is collected, this occasionally raises issues for researchers(Cook, Campbell, & Kelly, 2012) •Improved library understanding of the content of our products and the nature of financial based research is a necessary part of the service.
Resumo:
In this paper, we provide an overview of the Social Event Detection (SED) task that is part of the MediaEval Bench mark for Multimedia Evaluation 2013. This task requires participants to discover social events and organize the re- lated media items in event-specific clusters within a collection of Web multimedia. Social events are events that are planned by people, attended by people and for which the social multimedia are also captured by people. We describe the challenges, datasets, and the evaluation methodology.
Resumo:
This paper presents large, accurately calibrated and time-synchronised datasets, gathered outdoors in controlled environmental conditions, using an unmanned ground vehicle (UGV), equipped with a wide variety of sensors. It discusses how the data collection process was designed, the conditions in which these datasets have been gathered, and some possible outcomes of their exploitation, in particular for the evaluation of performance of sensors and perception algorithms for UGVs.