765 resultados para Sentiment Analysis, Opinion Mining, Twitter
Resumo:
Big Data and predictive analytics have received significant attention from the media and academic literature throughout the past few years, and it is likely that these emerging technologies will materially impact the mining sector. This short communication argues, however, that these technological forces will probably unfold differently in the mining industry than they have in many other sectors because of significant differences in the marginal cost of data capture and storage. To this end, we offer a brief overview of what Big Data and predictive analytics are, and explain how they are bringing about changes in a broad range of sectors. We discuss the “N=all” approach to data collection being promoted by many consultants and technology vendors in the marketplace but, by considering the economic and technical realities of data acquisition and storage, we then explain why a “n « all” data collection strategy probably makes more sense for the mining sector. Finally, towards shaping the industry’s policies with regards to technology-related investments in this area, we conclude by putting forward a conceptual model for leveraging Big Data tools and analytical techniques that is a more appropriate fit for the mining sector.
Resumo:
There is considerable debate about the effects the inclusion of men in nursing have on the quality of patient care and the profession itself. Whilst nursing is seen as a predominately female orientated career, it is often forgotten that the patron saint of nursing is actually a man – St Camillus of Lellis, a 16th century Italian Monk. However, evolution both politically and religiously had meant that the contemporary male figure within the nursing fraternity slowly gave way to women as men became more engaged with careers more befitting their social standing such as medicine, the church or the military Surprisingly, opinion about whether men are suitable within the profession continues to be a divided issue. Men enter the profession for a multitude of reasons, yet barriers whether emotional, verbal or sexual are still present. However, nursing is attractive because the variety of work enables an easy transition between specialties and the scope for career advancement is exciting both clinically and academically especially with the recent inception of nurse practitioner and nurse consultant roles.
Resumo:
The 2008 US election has been heralded as the first presidential election of the social media era, but took place at a time when social media were still in a state of comparative infancy; so much so that the most important platform was not Facebook or Twitter, but the purpose-built campaign site my.barackobama.com, which became the central vehicle for the most successful electoral fundraising campaign in American history. By 2012, the social media landscape had changed: Facebook and, to a somewhat lesser extent, Twitter are now well-established as the leading social media platforms in the United States, and were used extensively by the campaign organisations of both candidates. As third-party spaces controlled by independent commercial entities, however, their use necessarily differs from that of home-grown, party-controlled sites: from the point of view of the platform itself, a @BarackObama or @MittRomney is technically no different from any other account, except for the very high follower count and an exceptional volume of @mentions. In spite of the significant social media experience which Democrat and Republican campaign strategists had already accumulated during the 2008 campaign, therefore, the translation of such experience to the use of Facebook and Twitter in their 2012 incarnations still required a substantial amount of new work, experimentation, and evaluation. This chapter examines the Twitter strategies of the leading accounts operated by both campaign headquarters: the ‘personal’ candidate accounts @BarackObama and @MittRomney as well as @JoeBiden and @PaulRyanVP, and the campaign accounts @Obama2012 and @TeamRomney. Drawing on datasets which capture all tweets from and at these accounts during the final months of the campaign (from early September 2012 to the immediate aftermath of the election night), we reconstruct the campaigns’ approaches to using Twitter for electioneering from the quantitative and qualitative patterns of their activities, and explore the resonance which these accounts have found with the wider Twitter userbase. A particular focus of our investigation in this context will be on the tweeting styles of these accounts: the mixture of original messages, @replies, and retweets, and the level and nature of engagement with everyday Twitter followers. We will examine whether the accounts chose to respond (by @replying) to the messages of support or criticism which were directed at them, whether they retweeted any such messages (and whether there was any preferential retweeting of influential or – alternatively – demonstratively ordinary users), and/or whether they were used mainly to broadcast and disseminate prepared campaign messages. Our analysis will highlight any significant differences between the accounts we examine, trace changes in style over the course of the final campaign months, and correlate such stylistic differences with the respective electoral positioning of the candidates. Further, we examine the use of these accounts during moments of heightened attention (such as the presidential and vice-presidential debates, or in the context of controversies such as that caused by the publication of the Romney “47%” video; additional case studies may emerge over the remainder of the campaign) to explore how they were used to present or defend key talking points, and exploit or avert damage from campaign gaffes. A complementary analysis of the messages directed at the campaign accounts (in the form of @replies or retweets) will also provide further evidence for the extent to which these talking points were picked up and disseminated by the wider Twitter population. Finally, we also explore the use of external materials (links to articles, images, videos, and other content on the campaign sites themselves, in the mainstream media, or on other platforms) by the campaign accounts, and the resonance which these materials had with the wider follower base of these accounts. This provides an indication of the integration of Twitter into the overall campaigning process, by highlighting how the platform was used as a means of encouraging the viral spread of campaign propaganda (such as advertising materials) or of directing user attention towards favourable media coverage. By building on comprehensive, large datasets of Twitter activity (as of early October, our combined datasets comprise some 3.8 million tweets) which we process and analyse using custom-designed social media analytics tools, and by using our initial quantitative analysis to guide further qualitative evaluation of Twitter activity around these campaign accounts, we are able to provide an in-depth picture of the use of Twitter in political campaigning during the 2012 US election which will provide detailed new insights social media use in contemporary elections. This analysis will then also be able to serve as a touchstone for the analysis of social media use in subsequent elections, in the USA as well as in other developed nations where Twitter and other social media platforms are utilised in electioneering.
Resumo:
The issue of the usefulness of different prosopis species versus their status as weeds is a matter of hot debate around the world. The tree Prosopis juliflora had until 2000 been proclaimed weedy in its native range in South America and elsewhere in the dry tropics. P. juliflora or mesquite has a 90-year history in Sudan. During the early 1990s a popular opinion in central Sudan and the Sudanese Government had begun to consider prosopis a noxious weed and a problematic tree species due to its aggressive ability to invade farmlands and pastures, especially in and around irrigated agricultural lands. As a consequence prosopis was officially declared an invasive alien species also in Sudan, and in 1995 a presidential decree for its eradication was issued. Using a total economic valuation (TEV) approach, this study analysed the impacts of prosopis on the local livelihoods in two contrasting irrigated agricultural schemes. Primarily a problem-based approach was used in which the derivation of non-market values was captured using ecological economic tools. In the New Halfa Irrigation Scheme in Kassala State, four separate household surveys were conducted due to diversity between the respective population groups. The main aim was here to study the magnitude of environmental economic benefits and costs derived from the invasion of prosopis in a large agricultural irrigation scheme on clay soil. Another study site, the Gandato Irrigation Scheme in River Nile State represented impacts from prosopis that an irrigation scheme was confronted with on sandy soil in the arid and semi-arid ecozones along the main River Nile. The two cases showed distinctly different effects of prosopis but both indicated the benefits to exceed the costs. The valuation on clay soil in New Halfa identified a benefit/cost ratio of 2.1, while this indicator equalled 46 on the sandy soils of Gandato. The valuation results were site-specific and based on local market prices. The most important beneficial impacts of prosopis on local livelihoods were derived from free-grazing forage for livestock, environmental conservation of the native vegetation, wood and non-wood forest products, as well as shelterbelt effects. The main social costs from prosopis were derived from weeding and clearing it from farm lands and from canalsides, from thorn injuries to humans and livestock, as well as from repair expenses vehicle tyre punctures. Of the population groups, the tenants faced most of the detrimental impacts, while the landless population groups (originating from western and eastern Sudan) as well as the nomads were highly dependent on this tree resource. For the Gandato site the monetized benefit-cost ratio of 46 still excluded several additional beneficial impacts of prosopis in the area that were difficult to quantify and monetize credibly. In River Nile State the beneficial impact could thus be seen as completely outweighing the costs of prosopis. The results can contributed to the formulation of national and local forest and agricultural policies related to prosopis in Sudan and also be used in other countries faced with similar impacts caused by this tree.
Resumo:
Predicting which species are likely to cause serious impacts in the future is crucial for targeting management efforts, but the characteristics of such species remain largely unconfirmed. We use data and expert opinion on tropical and subtropical grasses naturalised in Australia since European settlement to identify naturalised and high-impact species and subsequently to test whether high-impact species are predictable. High-impact species for the three main affected sectors (environment, pastoral and agriculture) were determined by assessing evidence against pre-defined criteria. Twenty-one of the 155 naturalised species (14%) were classified as high-impact, including four that affected more than one sector. High-impact species were more likely to have faster spread rates (regions invaded per decade) and to be semi-aquatic. Spread rate was best explained by whether species had been actively spread (as pasture), and time since naturalisation, but may not be explanatory as it was tightly correlated with range size and incidence rate. Giving more weight to minimising the chance of overlooking high-impact species, a priority for biosecurity, meant a wider range of predictors was required to identify high-impact species, and the predictive power of the models was reduced. By-sector analysis of predictors of high impact species was limited by their relative rarity, but showed sector differences, including to the universal predictors (spread rate and habitat) and life history. Furthermore, species causing high impact to agriculture have changed in the past 10 years with changes in farming practice, highlighting the importance of context in determining impact. A rationale for invasion ecology is to improve the prediction and response to future threats. Although our study identifies some universal predictors, it suggests improved prediction will require a far greater emphasis on impact rather than invasiveness, and will need to account for the individual circumstances of affected sectors and the relative rarity of high-impact species.
Resumo:
In this paper, we present the results of an exploratory study that examined the problem of automating content analysis of student online discussion transcripts. We looked at the problem of coding discussion transcripts for the levels of cognitive presence, one of the three main constructs in the Community of Inquiry (CoI) model of distance education. Using Coh-Metrix and LIWC features, together with a set of custom features developed to capture discussion context, we developed a random forest classification system that achieved 70.3% classification accuracy and 0.63 Cohen's kappa, which is significantly higher than values reported in the previous studies. Besides improvement in classification accuracy, the developed system is also less sensitive to overfitting as it uses only 205 classification features, which is around 100 times less features than in similar systems based on bag-of-words features. We also provide an overview of the classification features most indicative of the different phases of cognitive presence that gives an additional insights into the nature of cognitive presence learning cycle. Overall, our results show great potential of the proposed approach, with an added benefit of providing further characterization of the cognitive presence coding scheme.
Resumo:
Digital elevation models (DEMs) have been an important topic in geography and surveying sciences for decades due to their geomorphological importance as the reference surface for gravita-tion-driven material flow, as well as the wide range of uses and applications. When DEM is used in terrain analysis, for example in automatic drainage basin delineation, errors of the model collect in the analysis results. Investigation of this phenomenon is known as error propagation analysis, which has a direct influence on the decision-making process based on interpretations and applications of terrain analysis. Additionally, it may have an indirect influence on data acquisition and the DEM generation. The focus of the thesis was on the fine toposcale DEMs, which are typically represented in a 5-50m grid and used in the application scale 1:10 000-1:50 000. The thesis presents a three-step framework for investigating error propagation in DEM-based terrain analysis. The framework includes methods for visualising the morphological gross errors of DEMs, exploring the statistical and spatial characteristics of the DEM error, making analytical and simulation-based error propagation analysis and interpreting the error propagation analysis results. The DEM error model was built using geostatistical methods. The results show that appropriate and exhaustive reporting of various aspects of fine toposcale DEM error is a complex task. This is due to the high number of outliers in the error distribution and morphological gross errors, which are detectable with presented visualisation methods. In ad-dition, the use of global characterisation of DEM error is a gross generalisation of reality due to the small extent of the areas in which the decision of stationarity is not violated. This was shown using exhaustive high-quality reference DEM based on airborne laser scanning and local semivariogram analysis. The error propagation analysis revealed that, as expected, an increase in the DEM vertical error will increase the error in surface derivatives. However, contrary to expectations, the spatial au-tocorrelation of the model appears to have varying effects on the error propagation analysis depend-ing on the application. The use of a spatially uncorrelated DEM error model has been considered as a 'worst-case scenario', but this opinion is now challenged because none of the DEM derivatives investigated in the study had maximum variation with spatially uncorrelated random error. Sig-nificant performance improvement was achieved in simulation-based error propagation analysis by applying process convolution in generating realisations of the DEM error model. In addition, typology of uncertainty in drainage basin delineations is presented.
Resumo:
Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.
Resumo:
This thesis studies human gene expression space using high throughput gene expression data from DNA microarrays. In molecular biology, high throughput techniques allow numerical measurements of expression of tens of thousands of genes simultaneously. In a single study, this data is traditionally obtained from a limited number of sample types with a small number of replicates. For organism-wide analysis, this data has been largely unavailable and the global structure of human transcriptome has remained unknown. This thesis introduces a human transcriptome map of different biological entities and analysis of its general structure. The map is constructed from gene expression data from the two largest public microarray data repositories, GEO and ArrayExpress. The creation of this map contributed to the development of ArrayExpress by identifying and retrofitting the previously unusable and missing data and by improving the access to its data. It also contributed to creation of several new tools for microarray data manipulation and establishment of data exchange between GEO and ArrayExpress. The data integration for the global map required creation of a new large ontology of human cell types, disease states, organism parts and cell lines. The ontology was used in a new text mining and decision tree based method for automatic conversion of human readable free text microarray data annotations into categorised format. The data comparability and minimisation of the systematic measurement errors that are characteristic to each lab- oratory in this large cross-laboratories integrated dataset, was ensured by computation of a range of microarray data quality metrics and exclusion of incomparable data. The structure of a global map of human gene expression was then explored by principal component analysis and hierarchical clustering using heuristics and help from another purpose built sample ontology. A preface and motivation to the construction and analysis of a global map of human gene expression is given by analysis of two microarray datasets of human malignant melanoma. The analysis of these sets incorporate indirect comparison of statistical methods for finding differentially expressed genes and point to the need to study gene expression on a global level.
Resumo:
We investigate the extent and nature of use of use of twitter for financial reporting by ASX listed companies. We consider 199 financial information related tweets from 14 ASX listed companies’ Twitter accounts. A thematic analysis of these tweets shows ‘Earnings’ and ‘Operational Performance’ are the most discussed financial reporting themes. Further, a comparison across industry sectors reveals that listed companies from varies industries show different usage patterns of financial reporting on Twitter. The examination of tweet sentiments also indicates a reporting bias within these tweets, as listed companies are more willing to disclose positive financial reporting tweets.
Resumo:
With the development of wearable and mobile computing technology, more and more people start using sleep-tracking tools to collect personal sleep data on a daily basis aiming at understanding and improving their sleep. While sleep quality is influenced by many factors in a person’s lifestyle context, such as exercise, diet and steps walked, existing tools simply visualize sleep data per se on a dashboard rather than analyse those data in combination with contextual factors. Hence many people find it difficult to make sense of their sleep data. In this paper, we present a cloud-based intelligent computing system named SleepExplorer that incorporates sleep domain knowledge and association rule mining for automated analysis on personal sleep data in light of contextual factors. Experiments show that the same contextual factors can play a distinct role in sleep of different people, and SleepExplorer could help users discover factors that are most relevant to their personal sleep.
Resumo:
Data mining involves nontrivial process of extracting knowledge or patterns from large databases. Genetic Algorithms are efficient and robust searching and optimization methods that are used in data mining. In this paper we propose a Self-Adaptive Migration Model GA (SAMGA), where parameters of population size, the number of points of crossover and mutation rate for each population are adaptively fixed. Further, the migration of individuals between populations is decided dynamically. This paper gives a mathematical schema analysis of the method stating and showing that the algorithm exploits previously discovered knowledge for a more focused and concentrated search of heuristically high yielding regions while simultaneously performing a highly explorative search on the other regions of the search space. The effective performance of the algorithm is then shown using standard testbed functions and a set of actual classification datamining problems. Michigan style of classifier was used to build the classifier and the system was tested with machine learning databases of Pima Indian Diabetes database, Wisconsin Breast Cancer database and few others. The performance of our algorithm is better than others.
Resumo:
Social media play a prominent role in mediating issues of public concern, not only providing the stage on which public debates play out but also shaping their topics and dynamics. Building on and extending existing approaches to both issue mapping and social media analysis, this article explores ways of accounting for popular media practices and the special case of ‘born digital’ sociocultural controversies. We present a case study of the GamerGate controversy with a particular focus on a spike in activity associated with a 2015 Law and Order: SVU episode about gender-based violence and harassment in games culture that was widely interpreted as being based on events associated with GamerGate. The case highlights the importance and challenges of accounting for the cultural dynamics of digital media within and across platforms.
Resumo:
This paper addresses the challenges of flood mapping using multispectral images. Quantitative flood mapping is critical for flood damage assessment and management. Remote sensing images obtained from various satellite or airborne sensors provide valuable data for this application, from which the information on the extent of flood can be extracted. However the great challenge involved in the data interpretation is to achieve more reliable flood extent mapping including both the fully inundated areas and the 'wet' areas where trees and houses are partly covered by water. This is a typical combined pure pixel and mixed pixel problem. In this paper, an extended Support Vector Machines method for spectral unmixing developed recently has been applied to generate an integrated map showing both pure pixels (fully inundated areas) and mixed pixels (trees and houses partly covered by water). The outputs were compared with the conventional mean based linear spectral mixture model, and better performance was demonstrated with a subset of Landsat ETM+ data recorded at the Daly River Basin, NT, Australia, on 3rd March, 2008, after a flood event.
Resumo:
Automatic identification of software faults has enormous practical significance. This requires characterizing program execution behavior and the use of appropriate data mining techniques on the chosen representation. In this paper, we use the sequence of system calls to characterize program execution. The data mining tasks addressed are learning to map system call streams to fault labels and automatic identification of fault causes. Spectrum kernels and SVM are used for the former while latent semantic analysis is used for the latter The techniques are demonstrated for the intrusion dataset containing system call traces. The results show that kernel techniques are as accurate as the best available results but are faster by orders of magnitude. We also show that latent semantic indexing is capable of revealing fault-specific features.