996 resultados para Social Analytics


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In today's fast-paced and interconnected digital world, the data generated by an increasing number of applications is being modeled as dynamic graphs. The graph structure encodes relationships among data items, while the structural changes to the graphs as well as the continuous stream of information produced by the entities in these graphs make them dynamic in nature. Examples include social networks where users post status updates, images, videos, etc.; phone call networks where nodes may send text messages or place phone calls; road traffic networks where the traffic behavior of the road segments changes constantly, and so on. There is a tremendous value in storing, managing, and analyzing such dynamic graphs and deriving meaningful insights in real-time. However, a majority of the work in graph analytics assumes a static setting, and there is a lack of systematic study of the various dynamic scenarios, the complexity they impose on the analysis tasks, and the challenges in building efficient systems that can support such tasks at a large scale. In this dissertation, I design a unified streaming graph data management framework, and develop prototype systems to support increasingly complex tasks on dynamic graphs. In the first part, I focus on the management and querying of distributed graph data. I develop a hybrid replication policy that monitors the read-write frequencies of the nodes to decide dynamically what data to replicate, and whether to do eager or lazy replication in order to minimize network communication and support low-latency querying. In the second part, I study parallel execution of continuous neighborhood-driven aggregates, where each node aggregates the information generated in its neighborhoods. I build my system around the notion of an aggregation overlay graph, a pre-compiled data structure that enables sharing of partial aggregates across different queries, and also allows partial pre-computation of the aggregates to minimize the query latencies and increase throughput. Finally, I extend the framework to support continuous detection and analysis of activity-based subgraphs, where subgraphs could be specified using both graph structure as well as activity conditions on the nodes. The query specification tasks in my system are expressed using a set of active structural primitives, which allows the query evaluator to use a set of novel optimization techniques, thereby achieving high throughput. Overall, in this dissertation, I define and investigate a set of novel tasks on dynamic graphs, design scalable optimization techniques, build prototype systems, and show the effectiveness of the proposed techniques through extensive evaluation using large-scale real and synthetic datasets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sequences of timestamped events are currently being generated across nearly every domain of data analytics, from e-commerce web logging to electronic health records used by doctors and medical researchers. Every day, this data type is reviewed by humans who apply statistical tests, hoping to learn everything they can about how these processes work, why they break, and how they can be improved upon. To further uncover how these processes work the way they do, researchers often compare two groups, or cohorts, of event sequences to find the differences and similarities between outcomes and processes. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two cohorts of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event). Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome. Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. Visual analytics tools leverage humans' ability to easily see patterns and anomalies that they were not expecting, but is limited by uncertainty in findings. Statistical tools emphasize finding significant differences in the data, but often requires researchers have a concrete question and doesn't facilitate more general exploration of the data. Combining visual analytics tools with statistical methods leverages the benefits of both approaches for quicker and easier insight discovery. Integrating statistics into a visualization tool presents many challenges on the frontend (e.g., displaying the results of many different metrics concisely) and in the backend (e.g., scalability challenges with running various metrics on multi-dimensional data at once). I begin by exploring the problem of comparing cohorts of event sequences and understanding the questions that analysts commonly ask in this task. From there, I demonstrate that combining automated statistics with an interactive user interface amplifies the benefits of both types of tools, thereby enabling analysts to conduct quicker and easier data exploration, hypothesis generation, and insight discovery. The direct contributions of this dissertation are: (1) a taxonomy of metrics for comparing cohorts of temporal event sequences, (2) a statistical framework for exploratory data analysis with a method I refer to as high-volume hypothesis testing (HVHT), (3) a family of visualizations and guidelines for interaction techniques that are useful for understanding and parsing the results, and (4) a user study, five long-term case studies, and five short-term case studies which demonstrate the utility and impact of these methods in various domains: four in the medical domain, one in web log analysis, two in education, and one each in social networks, sports analytics, and security. My dissertation contributes an understanding of how cohorts of temporal event sequences are commonly compared and the difficulties associated with applying and parsing the results of these metrics. It also contributes a set of visualizations, algorithms, and design guidelines for balancing automated statistics with user-driven analysis to guide users to significant, distinguishing features between cohorts. This work opens avenues for future research in comparing two or more groups of temporal event sequences, opening traditional machine learning and data mining techniques to user interaction, and extending the principles found in this dissertation to data types beyond temporal event sequences.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Prior research shows that electronic word of mouth (eWOM) wields considerable influence over consumer behavior. However, as the volume and variety of eWOM grows, firms are faced with challenges in analyzing and responding to this information. In this dissertation, I argue that to meet the new challenges and opportunities posed by the expansion of eWOM and to more accurately measure its impacts on firms and consumers, we need to revisit our methodologies for extracting insights from eWOM. This dissertation consists of three essays that further our understanding of the value of social media analytics, especially with respect to eWOM. In the first essay, I use machine learning techniques to extract semantic structure from online reviews. These semantic dimensions describe the experiences of consumers in the service industry more accurately than traditional numerical variables. To demonstrate the value of these dimensions, I show that they can be used to substantially improve the accuracy of econometric models of firm survival. In the second essay, I explore the effects on eWOM of online deals, such as those offered by Groupon, the value of which to both consumers and merchants is controversial. Through a combination of Bayesian econometric models and controlled lab experiments, I examine the conditions under which online deals affect online reviews and provide strategies to mitigate the potential negative eWOM effects resulting from online deals. In the third essay, I focus on how eWOM can be incorporated into efforts to reduce foodborne illness, a major public health concern. I demonstrate how machine learning techniques can be used to monitor hygiene in restaurants through crowd-sourced online reviews. I am able to identify instances of moral hazard within the hygiene inspection scheme used in New York City by leveraging a dictionary specifically crafted for this purpose. To the extent that online reviews provide some visibility into the hygiene practices of restaurants, I show how losses from information asymmetry may be partially mitigated in this context. Taken together, this dissertation contributes by revisiting and refining the use of eWOM in the service sector through a combination of machine learning and econometric methodologies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

From a future history of 2025: Continuous development is common for build/test (continuous integration) and operations (devOps). This trend continues through the lifecycle, into what we call `devUsage': continuous usage validation. In addition to ensuring systems meet user needs, organisations continuously validate their legal and ethical use. The rise of end-user programming and multi-sided platforms exacerbate validation challenges. A separate trend isthe specialisation of software engineering for technical domains, including data analytics. This domain has specific validation challenges. We must validate the accuracy of sta-tistical models, but also whether they have illegal or unethical biases. Usage needs addressed by machine learning are sometimes not speci able in the traditional sense, and statistical models are often `black boxes'. We describe future research to investigate solutions to these devUsage challenges for data analytics systems. We will adapt risk management and governance frameworks previously used for soft-ware product qualities, use social network communities for input from aligned stakeholder groups, and perform cross-validation using autonomic experimentation, cyber-physical data streams, and online discursive feedback.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research examines the use of social media by organisations for communication with stakeholders during a crisis and provides a theoretical framework for guiding organisations in this area.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis advances the area of applied machine learning, sentiment and psycholinguistic analysis in social media for health analytics. In particular, the thesis views social media as a gigantic form of 'sensor' to inform about mental health community and related topics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Knowing when to compete and when to cooperate to maximize opportunities for equal access to activities and materials in groups is critical to children's social and cognitive development. The present study examined the individual (gender, social competence) and contextual factors (gender context) that may determine why some children are more successful than others. One hundred and fifty-six children (M age=6.5 years) were divided into 39 groups of four and videotaped while engaged in a task that required them to cooperate in order to view cartoons. Children within all groups were unfamiliar to one another. Groups varied in gender composition (all girls, all boys, or mixed-sex) and social competence (high vs. low). Group composition by gender interaction effects were found. Girls were most successful at gaining viewing time in same-sex groups, and least successful in mixed-sex groups. Conversely, boys were least successful in same-sex groups and most successful in mixed-sex groups. Similar results were also found at the group level of analysis; however, the way in which the resources were distributed differed as a function of group type. Same-sex girl groups were inequitable but efficient whereas same-sex boy groups were more equitable than mixed groups but inefficient compared to same-sex girl groups. Social competence did not influence children's behavior. The findings from the present study highlight the effect of gender context on cooperation and competition and the relevance of adopting an unfamiliar peer paradigm when investigating children's social behavior.