942 resultados para Blog datasets


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This research falls in the area of enhancing the quality of tag-based item recommendation systems. It aims to achieve this by employing a multi-dimensional user profile approach and by analyzing the semantic aspects of tags. Tag-based recommender systems have two characteristics that need to be carefully studied in order to build a reliable system. Firstly, the multi-dimensional correlation, called as tag assignment , should be appropriately modelled in order to create the user profiles [1]. Secondly, the semantics behind the tags should be considered properly as the flexibility with their design can cause semantic problems such as synonymy and polysemy [2]. This research proposes to address these two challenges for building a tag-based item recommendation system by employing tensor modeling as the multi-dimensional user profile approach, and the topic model as the semantic analysis approach. The first objective is to optimize the tensor model reconstruction and to improve the model performance in generating quality rec-ommendation. A novel Tensor-based Recommendation using Probabilistic Ranking (TRPR) method [3] has been developed. Results show this method to be scalable for large datasets and outperforming the benchmarking methods in terms of accuracy. The memory efficient loop implements the n-mode block-striped (matrix) product for tensor reconstruction as an approximation of the initial tensor. The probabilistic ranking calculates the probabil-ity of users to select candidate items using their tag preference list based on the entries generated from the reconstructed tensor. The second objective is to analyse the tag semantics and utilize the outcome in building the tensor model. This research proposes to investigate the problem using topic model approach to keep the tags nature as the “social vocabulary” [4]. For the tag assignment data, topics can be generated from the occurrences of tags given for an item. However there is only limited amount of tags availa-ble to represent items as collection of topics, since an item might have only been tagged by using several tags. Consequently, the generated topics might not able to represent the items appropriately. Furthermore, given that each tag can belong to any topics with various probability scores, the occurrence of tags cannot simply be mapped by the topics to build the tensor model. A standard weighting technique will not appropriately calculate the value of tagging activity since it will define the context of an item using a tag instead of a topic.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The latest case of a popular YouTube blogger being sued for using music by other artists in her videos without permission raises the question of who really benefits from the re-use of music. In a claim filed this month, the electronic dance music label Ultra Records allege that beauty blogger Michelle Phan’s videos infringe their copyrights in nearly 50 cases. Phan is a self-made internet star who began posting makeup and self-help tutorials on YouTube in 2007. She has more than 6.7 million subscribers on her YouTube channel and has made a career from the associated advertising and endorsement revenue, book deal and even her own line of makeup.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis presents an association rule mining approach, association hierarchy mining (AHM). Different to the traditional two-step bottom-up rule mining, AHM adopts one-step top-down rule mining strategy to improve the efficiency and effectiveness of mining association rules from datasets. The thesis also presents a novel approach to evaluate the quality of knowledge discovered by AHM, which focuses on evaluating information difference between the discovered knowledge and the original datasets. Experiments performed on the real application, characterizing network traffic behaviour, have shown that AHM achieves encouraging performance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we propose the hybrid use of illuminant invariant and RGB images to perform image classification of urban scenes despite challenging variation in lighting conditions. Coping with lighting change (and the shadows thereby invoked) is a non-negotiable requirement for long term autonomy using vision. One aspect of this is the ability to reliably classify scene components in the presence of marked and often sudden changes in lighting. This is the focus of this paper. Posed with the task of classifying all parts in a scene from a full colour image, we propose that lighting invariant transforms can reduce the variability of the scene, resulting in a more reliable classification. We leverage the ideas of “data transfer” for classification, beginning with full colour images for obtaining candidate scene-level matches using global image descriptors. This is commonly followed by superpixellevel matching with local features. However, we show that if the RGB images are subjected to an illuminant invariant transform before computing the superpixel-level features, classification is significantly more robust to scene illumination effects. The approach is evaluated using three datasets. The first being our own dataset and the second being the KITTI dataset using manually generated ground truth for quantitative analysis. We qualitatively evaluate the method on a third custom dataset over a 750m trajectory.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

1.Marine ecosystems provide critically important goods and services to society, and hence their accelerated degradation underpins an urgent need to take rapid, ambitious and informed decisions regarding their conservation and management. 2.The capacity, however, to generate the detailed field data required to inform conservation planning at appropriate scales is limited by time and resource consuming methods for collecting and analysing field data at the large scales required. 3.The ‘Catlin Seaview Survey’, described here, introduces a novel framework for large-scale monitoring of coral reefs using high-definition underwater imagery collected using customized underwater vehicles in combination with computer vision and machine learning. This enables quantitative and geo-referenced outputs of coral reef features such as habitat types, benthic composition, and structural complexity (rugosity) to be generated across multiple kilometre-scale transects with a spatial resolution ranging from 2 to 6 m2. 4.The novel application of technology described here has enormous potential to contribute to our understanding of coral reefs and associated impacts by underpinning management decisions with kilometre-scale measurements of reef health. 5.Imagery datasets from an initial survey of 500 km of seascape are freely available through an online tool called the Catlin Global Reef Record. Outputs from the image analysis using the technologies described here will be updated on the online repository as work progresses on each dataset. 6.Case studies illustrate the utility of outputs as well as their potential to link to information from remote sensing. The potential implications of the innovative technologies on marine resource management and conservation are also discussed, along with the accuracy and efficiency of the methodologies deployed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Extracting frequent subtrees from the tree structured data has important applications in Web mining. In this paper, we introduce a novel canonical form for rooted labelled unordered trees called the balanced-optimal-search canonical form (BOCF) that can handle the isomorphism problem efficiently. Using BOCF, we define a tree structure guided scheme based enumeration approach that systematically enumerates only the valid subtrees. Finally, we present the balanced optimal search tree miner (BOSTER) algorithm based on BOCF and the proposed enumeration approach, for finding frequent induced subtrees from a database of labelled rooted unordered trees. Experiments on the real datasets compare the efficiency of BOSTER over the two state-of-the-art algorithms for mining induced unordered subtrees, HybridTreeMiner and UNI3. The results are encouraging.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents an algorithm for mining unordered embedded subtrees using the balanced-optimal-search canonical form (BOCF). A tree structure guided scheme based enumeration approach is defined using BOCF for systematically enumerating the valid subtrees only. Based on this canonical form and enumeration technique, the balanced optimal search embedded subtree mining algorithm (BEST) is introduced for mining embedded subtrees from a database of labelled rooted unordered trees. The extensive experiments on both synthetic and real datasets demonstrate the efficiency of BEST over the two state-of-the-art algorithms for mining embedded unordered subtrees, SLEUTH and U3.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Runtuhnya rezim sentralistik Orde Baru mengubah konstelasi sosial politik di Indonesia. Persoalan-persoalan perempuan yang ditabukan pada zaman Orde Baru mulai diberikan ruang untuk didiskusikan secara terbuka. Istilah pemberdayaan perempuan, perempuan di parlemen, kuota untuk perempuan, kepemimpinan perempuan, ataupun independensi perempuan mulai sering terdengar. Bukan berarti riuh suara pembebasan itu yang mendominasi lapangan sebab upaya pemenjaraan tubuh perempuan meriah juga di era reformasi. Saat ini upaya kontrol terhadap tubuh perempuan dilakukan oleh sipil berjubah agama. Lihat misalnya di layar TV, sangat terang-terangan Front Pembela Islam (FPI) terekam menyerbu tempat hiburan malam dan melakukan razia kemaksiatan di tempat mahasiswa indekos. Tidak jarang kelompok sipil mengatasnamakan organisasi agama tertentu mengeluarkan pernyataan dihadapan wartawan dengan mencela beberapa artis dangdut perempuan atau selebriti perempuan lainnya yang dikenal berpakaian “seksi”.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recently, attempts to improve decision making in species management have focussed on uncertainties associated with modelling temporal fluctuations in populations. Reducing model uncertainty is challenging; while larger samples improve estimation of species trajectories and reduce statistical errors, they typically amplify variability in observed trajectories. In particular, traditional modelling approaches aimed at estimating population trajectories usually do not account well for nonlinearities and uncertainties associated with multi-scale observations characteristic of large spatio-temporal surveys. We present a Bayesian semi-parametric hierarchical model for simultaneously quantifying uncertainties associated with model structure and parameters, and scale-specific variability over time. We estimate uncertainty across a four-tiered spatial hierarchy of coral cover from the Great Barrier Reef. Coral variability is well described; however, our results show that, in the absence of additional model specifications, conclusions regarding coral trajectories become highly uncertain when considering multiple reefs, suggesting that management should focus more at the scale of individual reefs. The approach presented facilitates the description and estimation of population trajectories and associated uncertainties when variability cannot be attributed to specific causes and origins. We argue that our model can unlock value contained in large-scale datasets, provide guidance for understanding sources of uncertainty, and support better informed decision making

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A tag-based item recommendation method generates an ordered list of items, likely interesting to a particular user, using the users past tagging behaviour. However, the users tagging behaviour varies in different tagging systems. A potential problem in generating quality recommendation is how to build user profiles, that interprets user behaviour to be effectively used, in recommendation models. Generally, the recommendation methods are made to work with specific types of user profiles, and may not work well with different datasets. In this paper, we investigate several tagging data interpretation and representation schemes that can lead to building an effective user profile. We discuss the various benefits a scheme brings to a recommendation method by highlighting the representative features of user tagging behaviours on a specific dataset. Empirical analysis shows that each interpretation scheme forms a distinct data representation which eventually affects the recommendation result. Results on various datasets show that an interpretation scheme should be selected based on the dominant usage in the tagging data (i.e. either higher amount of tags or higher amount of items present). The usage represents the characteristic of user tagging behaviour in the system. The results also demonstrate how the scheme is able to address the cold-start user problem.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Monitoring the environment with acoustic sensors is an effective method for understanding changes in ecosystems. Through extensive monitoring, large-scale, ecologically relevant, datasets can be produced that can inform environmental policy. The collection of acoustic sensor data is a solved problem; the current challenge is the management and analysis of raw audio data to produce useful datasets for ecologists. This paper presents the applied research we use to analyze big acoustic datasets. Its core contribution is the presentation of practical large-scale acoustic data analysis methodologies. We describe details of the data workflows we use to provide both citizen scientists and researchers practical access to large volumes of ecoacoustic data. Finally, we propose a work in progress large-scale architecture for analysis driven by a hybrid cloud-and-local production-grade website.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Low voltage distribution networks feature a high degree of load unbalance and the addition of rooftop photovoltaic is driving further unbalances in the network. Single phase consumers are distributed across the phases but even if the consumer distribution was well balanced when the network was constructed changes will occur over time. Distribution transformer losses are increased by unbalanced loadings. The estimation of transformer losses is a necessary part of the routine upgrading and replacement of transformers and the identification of the phase connections of households allows a precise estimation of the phase loadings and total transformer loss. This paper presents a new technique and preliminary test results for a method of automatically identifying the phase of each customer by correlating voltage information from the utility's transformer system with voltage information from customer smart meters. The techniques are novel as they are purely based upon a time series of electrical voltage measurements taken at the household and at the distribution transformer. Experimental results using a combination of electrical power and current of the real smart meter datasets demonstrate the performance of our techniques.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A new technique is presented for automatically identifying the phase connection of domestic customers. Voltage information from a reference three phase house is correlated with voltage information from other customer electricity meters on the same network to determine the highest probability phase connection. The techniques are purely based upon a time series of electrical voltage measurements taken by the household smart meters and no additional equipment is required. The method is demonstrated using real smart meter datasets to correctly identify the phase connections of 75 consumers on a low voltage distribution feeder.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents an online, unsupervised training algorithm enabling vision-based place recognition across a wide range of changing environmental conditions such as those caused by weather, seasons, and day-night cycles. The technique applies principal component analysis to distinguish between aspects of a location’s appearance that are condition-dependent and those that are condition-invariant. Removing the dimensions associated with environmental conditions produces condition-invariant images that can be used by appearance-based place recognition methods. This approach has a unique benefit – it requires training images from only one type of environmental condition, unlike existing data-driven methods that require training images with labelled frame correspondences from two or more environmental conditions. The method is applied to two benchmark variable condition datasets. Performance is equivalent or superior to the current state of the art despite the lesser training requirements, and is demonstrated to generalise to previously unseen locations.