765 resultados para Sentiment Analysis, Opinion Mining, Twitter
Resumo:
In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1 - -10, was obtained for intelligibility and naturalness respectively.
Resumo:
In this paper, we explore the idea of social role theory (SRT) and propose a novel regularized topic model which incorporates SRT into the generative process of social media content. We assume that a user can play multiple social roles, and each social role serves to fulfil different duties and is associated with a role-driven distribution over latent topics. In particular, we focus on social roles corresponding to the most common social activities on social networks. Our model is instantiated on microblogs, i.e., Twitter and community question-answering (cQA), i.e., Yahoo! Answers, where social roles on Twitter include "originators" and "propagators", and roles on cQA are "askers" and "answerers". Both explicit and implicit interactions between users are taken into account and modeled as regularization factors. To evaluate the performance of our proposed method, we have conducted extensive experiments on two Twitter datasets and two cQA datasets. Furthermore, we also consider multi-role modeling for scientific papers where an author's research expertise area is considered as a social role. A novel application of detecting users' research interests through topical keyword labeling based on the results of our multi-role model has been presented. The evaluation results have shown the feasibility and effectiveness of our model.
Resumo:
The purpose of this paper is to explain the notion of clustering and a concrete clustering method- agglomerative hierarchical clustering algorithm. It shows how a data mining method like clustering can be applied to the analysis of stocks, traded on the Bulgarian Stock Exchange in order to identify similar temporal behavior of the traded stocks. This problem is solved with the aid of a data mining tool that is called XLMiner™ for Microsoft Excel Office.
Resumo:
Purpose - The purpose of this paper is to assess high-dimensional visualisation, combined with pattern matching, as an approach to observing dynamic changes in the ways people tweet about science topics. Design/methodology/approach - The high-dimensional visualisation approach was applied to three scientific topics to test its effectiveness for longitudinal analysis of message framing on Twitter over two disjoint periods in time. The paper uses coding frames to drive categorisation and visual analytics of tweets discussing the science topics. Findings - The findings point to the potential of this mixed methods approach, as it allows sufficiently high sensitivity to recognise and support the analysis of non-trending as well as trending topics on Twitter. Research limitations/implications - Three topics are studied and these illustrate a range of frames, but results may not be representative of all scientific topics. Social implications - Funding bodies increasingly encourage scientists to participate in public engagement. As social media provides an avenue actively utilised for public communication, understanding the nature of the dialog on this medium is important for the scientific community and the public at large. Originality/value - This study differs from standard approaches to the analysis of microblog data, which tend to focus on machine driven analysis large-scale datasets. It provides evidence that this approach enables practical and effective analysis of the content of midsize to large collections of microposts.
Resumo:
In the nonparametric framework of Data Envelopment Analysis the statistical properties of its estimators have been investigated and only asymptotic results are available. For DEA estimators results of practical use have been proved only for the case of one input and one output. However, in the real world problems the production process is usually well described by many variables. In this paper a machine learning approach to variable aggregation based on Canonical Correlation Analysis is presented. This approach is applied for efficiency estimation of all the farms in Terceira Island of the Azorean archipelago.
Resumo:
Social media has become an effective channel for communicating both trends and public opinion on current events. However the automatic topic classification of social media content pose various challenges. Topic classification is a common technique used for automatically capturing themes that emerge from social media streams. However, such techniques are sensitive to the evolution of topics when new event-dependent vocabularies start to emerge (e.g., Crimea becoming relevant to War Conflict during the Ukraine crisis in 2014). Therefore, traditional supervised classification methods which rely on labelled data could rapidly become outdated. In this paper we propose a novel transfer learning approach to address the classification task of new data when the only available labelled data belong to a previous epoch. This approach relies on the incorporation of knowledge from DBpedia graphs. Our findings show promising results in understanding how features age, and how semantic features can support the evolution of topic classifiers.
Resumo:
Software product line modeling aims at capturing a set of software products in an economic yet meaningful way. We introduce a class of variability models that capture the sharing between the software artifacts forming the products of a software product line (SPL) in a hierarchical fashion, in terms of commonalities and orthogonalities. Such models are useful when analyzing and verifying all products of an SPL, since they provide a scheme for divide-and-conquer-style decomposition of the analysis or verification problem at hand. We define an abstract class of SPLs for which variability models can be constructed that are optimal w.r.t. the chosen representation of sharing. We show how the constructed models can be fed into a previously developed algorithmic technique for compositional verification of control-flow temporal safety properties, so that the properties to be verified are iteratively decomposed into simpler ones over orthogonal parts of the SPL, and are not re-verified over the shared parts. We provide tool support for our technique, and evaluate our tool on a small but realistic SPL of cash desks.
Resumo:
This study focuses on empirical investigations and seeks implications by utilizing three different methodologies to test various aspects of trader behavior. The first methodology utilizes Prospect Theory to determine trader behavior during periods of extreme wealth contracting periods. Secondly, a threshold model to examine the sentiment variable is formulated and thirdly a study is made of the contagion effect and trader behavior. ^ The connection between consumers' sense of financial well-being or sentiment and stock market performance has been studied at length. However, without data on actual versus experimental performance, implications based on this relationship are meaningless. The empirical agenda included examining a proprietary file of daily trader activities over a five-year period. Overall, during periods of extreme wealth altering conditions, traders "satisfice" rather than choose the "best" alternative. A trader's degree of loss aversion depends on his/her prior investment performance. A model that explains the behavior of traders during periods of turmoil is developed. Prospect Theory and the data file influenced the design of the model. ^ Additional research included testing a model that permitted the data to signal the crisis through a threshold model. The third empirical study sought to investigate the existence of contagion caused by declining global wealth effects using evidence from the mining industry in Canada. Contagion, where a financial crisis begins locally and subsequently spreads elsewhere, has been studied in terms of correlations among similar regions. The results provide support for Prospect Theory in two out of the three empirical studies. ^ The dissertation emphasizes the need for specifying precise, testable models of investors' expectations by providing tools to identify paradoxical behavior patterns. True enhancements in this field must include empirical research utilizing reliable data sources to mitigate data mining problems and allow researchers to distinguish between expectations-based and risk-based explanations of behavior. Through this type of research, it may be possible to systematically exploit "irrational" market behavior. ^
Resumo:
An Automatic Vehicle Location (AVL) system is a computer-based vehicle tracking system that is capable of determining a vehicle's location in real time. As a major technology of the Advanced Public Transportation System (APTS), AVL systems have been widely deployed by transit agencies for purposes such as real-time operation monitoring, computer-aided dispatching, and arrival time prediction. AVL systems make a large amount of transit performance data available that are valuable for transit performance management and planning purposes. However, the difficulties of extracting useful information from the huge spatial-temporal database have hindered off-line applications of the AVL data. ^ In this study, a data mining process, including data integration, cluster analysis, and multiple regression, is proposed. The AVL-generated data are first integrated into a Geographic Information System (GIS) platform. The model-based cluster method is employed to investigate the spatial and temporal patterns of transit travel speeds, which may be easily translated into travel time. The transit speed variations along the route segments are identified. Transit service periods such as morning peak, mid-day, afternoon peak, and evening periods are determined based on analyses of transit travel speed variations for different times of day. The seasonal patterns of transit performance are investigated by using the analysis of variance (ANOVA). Travel speed models based on the clustered time-of-day intervals are developed using important factors identified as having significant effects on speed for different time-of-day periods. ^ It has been found that transit performance varied from different seasons and different time-of-day periods. The geographic location of a transit route segment also plays a role in the variation of the transit performance. The results of this research indicate that advanced data mining techniques have good potential in providing automated techniques of assisting transit agencies in service planning, scheduling, and operations control. ^
Resumo:
The nation's freeway systems are becoming increasingly congested. A major contribution to traffic congestion on freeways is due to traffic incidents. Traffic incidents are non-recurring events such as accidents or stranded vehicles that cause a temporary roadway capacity reduction, and they can account for as much as 60 percent of all traffic congestion on freeways. One major freeway incident management strategy involves diverting traffic to avoid incident locations by relaying timely information through Intelligent Transportation Systems (ITS) devices such as dynamic message signs or real-time traveler information systems. The decision to divert traffic depends foremost on the expected duration of an incident, which is difficult to predict. In addition, the duration of an incident is affected by many contributing factors. Determining and understanding these factors can help the process of identifying and developing better strategies to reduce incident durations and alleviate traffic congestion. A number of research studies have attempted to develop models to predict incident durations, yet with limited success. ^ This dissertation research attempts to improve on this previous effort by applying data mining techniques to a comprehensive incident database maintained by the District 4 ITS Office of the Florida Department of Transportation (FDOT). Two categories of incident duration prediction models were developed: "offline" models designed for use in the performance evaluation of incident management programs, and "online" models for real-time prediction of incident duration to aid in the decision making of traffic diversion in the event of an ongoing incident. Multiple data mining analysis techniques were applied and evaluated in the research. The multiple linear regression analysis and decision tree based method were applied to develop the offline models, and the rule-based method and a tree algorithm called M5P were used to develop the online models. ^ The results show that the models in general can achieve high prediction accuracy within acceptable time intervals of the actual durations. The research also identifies some new contributing factors that have not been examined in past studies. As part of the research effort, software code was developed to implement the models in the existing software system of District 4 FDOT for actual applications. ^
Resumo:
With advances in science and technology, computing and business intelligence (BI) systems are steadily becoming more complex with an increasing variety of heterogeneous software and hardware components. They are thus becoming progressively more difficult to monitor, manage and maintain. Traditional approaches to system management have largely relied on domain experts through a knowledge acquisition process that translates domain knowledge into operating rules and policies. It is widely acknowledged as a cumbersome, labor intensive, and error prone process, besides being difficult to keep up with the rapidly changing environments. In addition, many traditional business systems deliver primarily pre-defined historic metrics for a long-term strategic or mid-term tactical analysis, and lack the necessary flexibility to support evolving metrics or data collection for real-time operational analysis. There is thus a pressing need for automatic and efficient approaches to monitor and manage complex computing and BI systems. To realize the goal of autonomic management and enable self-management capabilities, we propose to mine system historical log data generated by computing and BI systems, and automatically extract actionable patterns from this data. This dissertation focuses on the development of different data mining techniques to extract actionable patterns from various types of log data in computing and BI systems. Four key problems—Log data categorization and event summarization, Leading indicator identification , Pattern prioritization by exploring the link structures , and Tensor model for three-way log data are studied. Case studies and comprehensive experiments on real application scenarios and datasets are conducted to show the effectiveness of our proposed approaches.
Resumo:
The most important factor that affects the decision making process in finance is the risk which is usually measured by variance (total risk) or systematic risk (beta). Since investors’ sentiment (whether she is an optimist or pessimist) plays a very important role in the choice of beta measure, any decision made for the same asset within the same time horizon will be different for different individuals. In other words, there will neither be homogeneity of beliefs nor the rational expectation prevalent in the market due to behavioral traits. This dissertation consists of three essays. In the first essay, “ Investor Sentiment and Intrinsic Stock Prices”, a new technical trading strategy was developed using a firm specific individual sentiment measure. This behavioral based trading strategy forecasts a range within which a stock price moves in a particular period and can be used for stock trading. Results indicate that sample firms trade within a range and give signals as to when to buy or sell. In the second essay, “Managerial Sentiment and the Value of the Firm”, examined the effect of managerial sentiment on the project selection process using net present value criterion and also effect of managerial sentiment on the value of firm. Final analysis reported that high sentiment and low sentiment managers obtain different values for the same firm before and after the acceptance of a project. Changes in the cost of capital, weighted cost of average capital were found due to managerial sentiment. In the last essay, “Investor Sentiment and Optimal Portfolio Selection”, analyzed how the investor sentiment affects the nature and composition of the optimal portfolio as well as the portfolio performance. Results suggested that the choice of the investor sentiment completely changes the portfolio composition, i.e., the high sentiment investor will have a completely different choice of assets in the portfolio in comparison with the low sentiment investor. The results indicated the practical application of behavioral model based technical indicator for stock trading. Additional insights developed include the valuation of firms with a behavioral component and the importance of distinguishing portfolio performance based on sentiment factors.
Resumo:
The United States has been increasingly concerned with the transnational threat posed by infectious diseases. Effective policy implementation to contain the spread of these diseases requires active engagement and support of the American public. To influence American public opinion and enlist support for related domestic and foreign policies, both domestic agencies and international organizations have framed infectious diseases as security threats, human rights disasters, economic risks, and as medical dangers. This study investigates whether American attitudes and opinions about infectious diseases are influenced by how the issue is framed. It also asks which issue frame has been most influential in shaping public opinion about global infectious diseases when people are exposed to multiple frames. The impact of media frames on public perception of infectious diseases is examined through content analysis of newspaper reports. Stories on SARS, avian flu, and HIV/AIDS were sampled from coverage in The New York Times and The Washington Post between 1999 and 2007. Surveys of public opinion on infectious diseases in the same time period were also drawn from databases like Health Poll Search and iPoll. Statistical analysis tests the relationship between media framing of diseases and changes in public opinion. Results indicate that no one frame was persuasive across all diseases. The economic frame had a significant effect on public opinion about SARS, as did the biomedical frame in the case of avian flu. Both the security and human rights frames affected opinion and increased public support for policies intended to prevent or treat HIV/AIDS. The findings also address the debate on the role and importance of domestic public opinion as a factor in domestic and foreign policy decisions of governments in an increasingly interconnected world. The public is able to make reasonable evaluations of the frames and the domestic and foreign policy issues emphasized in the frames.
Resumo:
The arrival of Cuba’s Information Technology (IT) and Communications Minister Ramiro Valdés to Venezuela in the Spring of 2010 to serve as a ‘consultant’ to the Venezuelan government awakened a new reality in that country. Rampant with deep economic troubles, escalating crime, a murder rate that has doubled since Chávez took over in 1999, and an opposition movement led by university students and other activists who use the Internet as their primary weapon, Venezuela has resorted to Cuba for help. In a country where in large part traditional media outlets have been censored or are government-controlled, the Internet and its online social networks have become the place to obtain, as well as disseminate, unfiltered information. As such, Internet growth and use of its social networks has skyrocketed in Venezuela, making it one of Latin America’s highest Web users. Because of its increased use to spark political debate among Venezuelans and publish information that differs with the official government line, Chávez has embarked on an initiative to bring the Internet to the poor and others who would otherwise not have access, by establishing government-sponsored Internet Info Centers throughout the country, to disseminate information to his followers. With the help of Cuban advisors, who for years have been a part of Venezuela’s defense, education, and health care initiatives, Chávez has apparently taken to adapting Cuba’s methodology for the control of information. He has begun to take special steps toward also controlling the type of information flowing through the country’s online social networks, considering the implementation of a government-controlled single Internet access point in Venezuela. Simultaneously, in adapting to Venezuela’s Internet reality, Chávez has engaged online by creating his own Twitter account in an attempt to influence public opinion, primarily of those who browse the Web. With a rapidly growing following that may soon reach one million subscribers, Chávez claims to have set up his own online trench to wage cyber space battle.
Resumo:
Many systems and applications are continuously producing events. These events are used to record the status of the system and trace the behaviors of the systems. By examining these events, system administrators can check the potential problems of these systems. If the temporal dynamics of the systems are further investigated, the underlying patterns can be discovered. The uncovered knowledge can be leveraged to predict the future system behaviors or to mitigate the potential risks of the systems. Moreover, the system administrators can utilize the temporal patterns to set up event management rules to make the system more intelligent. With the popularity of data mining techniques in recent years, these events grad- ually become more and more useful. Despite the recent advances of the data mining techniques, the application to system event mining is still in a rudimentary stage. Most of works are still focusing on episodes mining or frequent pattern discovering. These methods are unable to provide a brief yet comprehensible summary to reveal the valuable information from the high level perspective. Moreover, these methods provide little actionable knowledge to help the system administrators to better man- age the systems. To better make use of the recorded events, more practical techniques are required. From the perspective of data mining, three correlated directions are considered to be helpful for system management: (1) Provide concise yet comprehensive summaries about the running status of the systems; (2) Make the systems more intelligence and autonomous; (3) Effectively detect the abnormal behaviors of the systems. Due to the richness of the event logs, all these directions can be solved in the data-driven manner. And in this way, the robustness of the systems can be enhanced and the goal of autonomous management can be approached. This dissertation mainly focuses on the foregoing directions that leverage tem- poral mining techniques to facilitate system management. More specifically, three concrete topics will be discussed, including event, resource demand prediction, and streaming anomaly detection. Besides the theoretic contributions, the experimental evaluation will also be presented to demonstrate the effectiveness and efficacy of the corresponding solutions.