4 resultados para datamining

em Queensland University of Technology - ePrints Archive


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Network crawling and visualisation tools and other datamining systems are now advanced enough to provide significant new impulses to the study of cultural activity on the Web. A growing range of studies focus on communicative processes in the blogosphere – including for example Adamic & Glance’s 2005 map of political allegiances during the 2004 U.S. presidential election and Kelly & Etling’s 2008 study of blogging practices in Iran. There remain a number of significant shortcomings in the application of such tools and methodologies to the study of blogging; these relate both to how the content of blogs is analysed, and to how the network maps resulting from such studies are understood. Our project highlights and addresses such shortcomings.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Dealing with product yield and quality in manufacturing industries is getting more difficult due to the increasing volume and complexity of data and quicker time to market expectations. Data mining offers tools for quick discovery of relationships, patterns and knowledge in large databases. Growing self-organizing map (GSOM) is established as an efficient unsupervised datamining algorithm. In this study some modifications to the original GSOM are proposed for manufacturing yield improvement by clustering. These modifications include introduction of a clustering quality measure to evaluate the performance of the programme in separating good and faulty products and a filtering index to reduce noise from the dataset. Results show that the proposed method is able to effectively differentiate good and faulty products. It will help engineers construct the knowledge base to predict product quality automatically from collected data and provide insights for yield improvement.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Our paper approaches Twitter through the lens of “platform politics” (Gillespie, 2010), focusing in particular on controversies around user data access, ownership, and control. We characterise different actors in the Twitter data ecosystem: private and institutional end users of Twitter, commercial data resellers such as Gnip and DataSift, data scientists, and finally Twitter, Inc. itself; and describe their conflicting interests. We furthermore study Twitter’s Terms of Service and application programming interface (API) as material instantiations of regulatory instruments used by the platform provider and argue for a more promotion of data rights and literacy to strengthen the position of end users.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We examine some variations of standard probability designs that preferentially sample sites based on how easy they are to access. Preferential sampling designs deliver unbiased estimates of mean and sampling variance and will ease the burden of data collection but at what cost to our design efficiency? Preferential sampling has the potential to either increase or decrease sampling variance depending on the application. We carry out a simulation study to gauge what effect it will have when sampling Soil Organic Carbon (SOC) values in a large agricultural region in south-eastern Australia. Preferential sampling in this region can reduce the distance to travel by up to 16%. Our study is based on a dataset of predicted SOC values produced from a datamining exercise. We consider three designs and two ways to determine ease of access. The overall conclusion is that sampling performance deteriorates as the strength of preferential sampling increases, due to the fact the regions of high SOC are harder to access. So our designs are inadvertently targeting regions of low SOC value. The good news, however, is that Generalised Random Tessellation Stratification (GRTS) sampling designs are not as badly affected as others and GRTS remains an efficient design compared to competitors.