14 resultados para Impala, Hadoop, Big Data, HDFS, Social Business Intelligence, SBI, cloudera
em Digital Commons at Florida International University
Resumo:
With advances in science and technology, computing and business intelligence (BI) systems are steadily becoming more complex with an increasing variety of heterogeneous software and hardware components. They are thus becoming progressively more difficult to monitor, manage and maintain. Traditional approaches to system management have largely relied on domain experts through a knowledge acquisition process that translates domain knowledge into operating rules and policies. It is widely acknowledged as a cumbersome, labor intensive, and error prone process, besides being difficult to keep up with the rapidly changing environments. In addition, many traditional business systems deliver primarily pre-defined historic metrics for a long-term strategic or mid-term tactical analysis, and lack the necessary flexibility to support evolving metrics or data collection for real-time operational analysis. There is thus a pressing need for automatic and efficient approaches to monitor and manage complex computing and BI systems. To realize the goal of autonomic management and enable self-management capabilities, we propose to mine system historical log data generated by computing and BI systems, and automatically extract actionable patterns from this data. This dissertation focuses on the development of different data mining techniques to extract actionable patterns from various types of log data in computing and BI systems. Four key problems—Log data categorization and event summarization, Leading indicator identification , Pattern prioritization by exploring the link structures , and Tensor model for three-way log data are studied. Case studies and comprehensive experiments on real application scenarios and datasets are conducted to show the effectiveness of our proposed approaches.
Resumo:
Thanks to the advanced technologies and social networks that allow the data to be widely shared among the Internet, there is an explosion of pervasive multimedia data, generating high demands of multimedia services and applications in various areas for people to easily access and manage multimedia data. Towards such demands, multimedia big data analysis has become an emerging hot topic in both industry and academia, which ranges from basic infrastructure, management, search, and mining to security, privacy, and applications. Within the scope of this dissertation, a multimedia big data analysis framework is proposed for semantic information management and retrieval with a focus on rare event detection in videos. The proposed framework is able to explore hidden semantic feature groups in multimedia data and incorporate temporal semantics, especially for video event detection. First, a hierarchical semantic data representation is presented to alleviate the semantic gap issue, and the Hidden Coherent Feature Group (HCFG) analysis method is proposed to capture the correlation between features and separate the original feature set into semantic groups, seamlessly integrating multimedia data in multiple modalities. Next, an Importance Factor based Temporal Multiple Correspondence Analysis (i.e., IF-TMCA) approach is presented for effective event detection. Specifically, the HCFG algorithm is integrated with the Hierarchical Information Gain Analysis (HIGA) method to generate the Importance Factor (IF) for producing the initial detection results. Then, the TMCA algorithm is proposed to efficiently incorporate temporal semantics for re-ranking and improving the final performance. At last, a sampling-based ensemble learning mechanism is applied to further accommodate the imbalanced datasets. In addition to the multimedia semantic representation and class imbalance problems, lack of organization is another critical issue for multimedia big data analysis. In this framework, an affinity propagation-based summarization method is also proposed to transform the unorganized data into a better structure with clean and well-organized information. The whole framework has been thoroughly evaluated across multiple domains, such as soccer goal event detection and disaster information management.
Resumo:
This dissertation addresses how the cultural dimensions of individualism and collectivism affect the attributions people make for unethical behavior at work. The moderating effect of ethnicity is also examined by considering two culturally diverse groups: Hispanics and Anglos. The sample for this study is a group of business graduate students from two universities in the Southeast. A 20-minute survey was distributed to master's degree students at their classroom and later on returned to the researcher. Individualism and collectivism were operationalized as by a set of attitude items, while unethical work behavior was introduced in the form of hypothetical descriptions or scenarios. Data analysis employed multiple group confirmatory factor analysis for both independent and dependent variables, and subsequently multiple group LISREL models, in order to test predictions. Results confirmed the expected link between cultural variables and attribution responses, although the role of independent variables shifted, due to the moderating effect of ethnicity, and to the nuances of each particular situation. ^
Resumo:
This study explored individual difference factors to help explain the discrepancy that has been found to exist between self and other ratings in prior research. Particularly, personality characteristics of the self-rater were researched in the current study as a potential antecedent for self-other rating agreement. Self, peer, and supervisor ratings were provided for global performance as well as five competencies specific to the organization being examined. Four rating tendency categories, over-raters, under-raters, in-agreement (good), and in-agreement (poor), established in research by Atwater and Yammarino were used as the basis of the current research. The sample for rating comparisons within the current study consisted of 283 self and supervisor dyads and 275 for self and peer dyads from a large financial organization. Measures included a custom multi-rater performance instrument and the personality survey instrument, ASSESS, which measures 20 specific personality characteristics. MANCOVAs were then performed on this data to examine if specific personality characteristics significantly distinguished the four rating tendency groups. Examination of all personality dimensions and overall performance uncovered significant findings among rating groups for self-supervisor rating comparisons but not for self-peer rating comparisons. Examination of specific personality dimensions for self-supervisory ratings group comparisons and overall performance showed Detail Interest to be an important characteristic among the hypothesized variables. For self-supervisor rating comparisons and specific competencies, support was found for the hypothesized personality dimension of Fact-based Thinking which distinguished the four rating groups for the competency, Builds Relationships. For both self-supervisor and self-peer rating comparisons, the competencies, Builds Relationships and Leads in a Learning Environment, were found to have significant relationship with several personality characteristics, however, these relationships were not consistent with the hypotheses in the current study. Several unhypothesized personality dimensions were also found to distinguish rating groups for both self-supervisor and self-peer comparisons on overall performance and various competencies. Results of the current study hold implications for the training and development session that occur after a 360-degree evaluation process. Particularly, it is suggested that feedback sessions may be designed according to particular rating tendencies to maximize the interpretation, acceptance and use of evaluation information. ^
Resumo:
In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. ^ Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. ^ In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data. ^
Resumo:
In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.
Resumo:
During the past decade, there has been a dramatic increase by postsecondary institutions in providing academic programs and course offerings in a multitude of formats and venues (Biemiller, 2009; Kucsera & Zimmaro, 2010; Lang, 2009; Mangan, 2008). Strategies pertaining to reapportionment of course-delivery seat time have been a major facet of these institutional initiatives; most notably, within many open-door 2-year colleges. Often, these enrollment-management decisions are driven by the desire to increase market-share, optimize the usage of finite facility capacity, and contain costs, especially during these economically turbulent times. So, while enrollments have surged to the point where nearly one in three 18-to-24 year-old U.S. undergraduates are community college students (Pew Research Center, 2009), graduation rates, on average, still remain distressingly low (Complete College America, 2011). Among the learning-theory constructs related to seat-time reapportionment efforts is the cognitive phenomenon commonly referred to as the spacing effect, the degree to which learning is enhanced by a series of shorter, separated sessions as opposed to fewer, more massed episodes. This ex post facto study explored whether seat time in a postsecondary developmental-level algebra course is significantly related to: course success; course-enrollment persistence; and, longitudinally, the time to successfully complete a general-education-level mathematics course. Hierarchical logistic regression and discrete-time survival analysis were used to perform a multi-level, multivariable analysis of a student cohort (N = 3,284) enrolled at a large, multi-campus, urban community college. The subjects were retrospectively tracked over a 2-year longitudinal period. The study found that students in long seat-time classes tended to withdraw earlier and more often than did their peers in short seat-time classes (p < .05). Additionally, a model comprised of nine statistically significant covariates (all with p-values less than .01) was constructed. However, no longitudinal seat-time group differences were detected nor was there sufficient statistical evidence to conclude that seat time was predictive of developmental-level course success. A principal aim of this study was to demonstrate—to educational leaders, researchers, and institutional-research/business-intelligence professionals—the advantages and computational practicability of survival analysis, an underused but more powerful way to investigate changes in students over time.
Resumo:
A model was tested to examine relationships among leadership behaviors, team diversity, and team process measures with team performance and satisfaction at both the team and leader-member levels of analysis. Relationships between leadership behavior and team demographic and cognitive diversity were hypothesized to have both direct effects on organizational outcomes as well as indirect effects through team processes. Leader member differences were investigated to determine the effects of leader-member diversity leader-member exchange quality, individual effectiveness and satisfaction.^ Leadership had little direct effect on team performance, but several strong positive indirect effects through team processes. Demographic Diversity had no impact on team processes, directly impacted only one performance measure, and moderated the leadership to team process relationship.^ Cognitive Diversity had a number of direct and indirect effects on team performance, the net effects uniformly positive, and did not moderate the leadership to team process relationship.^ In sum, the team model suggests a complex combination of leadership behaviors positively impacting team processes, demographic diversity having little impact on team process or performance, cognitive diversity having a positive net impact impact, and team processes having mixed effects on team outcomes.^ At the leader-member level, leadership behaviors were a strong predictor of Leader-Member Exchange (LMX) quality. Leader-member demographic and cognitive dissimilarity were each predictors of LMX quality, but failed to moderate the leader behavior to LMX quality relationship. LMX quality was strongly and positively related to self reported effectiveness and satisfaction.^ The study makes several contributions to the literature. First, it explicitly links leadership and team diversity. Second, demographic and cognitive diversity are conceptualized as distinct and multi-faceted constructs. Third, a methodology for creating an index of categorical demographic and interval cognitive measures is provided so that diversity can be measured in a holistic conjoint fashion. Fourth, the study simultaneously investigates the impact of diversity at the team and leader-member levels of analyses. Fifth, insights into the moderating impact of different forms of team diversity on the leadership to team process relationship are provided. Sixth, this study incorporates a wide range of objective and independent measures to provide a 360$\sp\circ$ assessment of team performance. ^
Resumo:
In this dissertation, I examine both theoretically and empirically the relationship between stock prices and income distribution using an endogenous growth model with social status impatience.^ The theoretical part looks into how status impatience and current economic status jointly determine time preference, savings, future economic status, stock prices, growth and wealth distribution in the steady state. This work builds on Burgstaller and Karayalcin (1996).^ More specifically, I look at (i) the effects of the distribution of status impatience levels on the distribution of steady state assets, incomes and consumption and (ii) the effects of changes in relative levels of status impatience on stock prices. Therefore, from (i) and (ii), I derive the correlation between stock prices, incomes and asset distribution. Also, the analysis of the stack market is undertaken in the presence of adjustment costs to investments.^ The empirical chapter looks at (i) the correlation between income inequality and long run economic growth on the one hand and (ii) the correlation between stock market prices and income inequality on the other. The role of stock prices and social status is examined to better understand the forces that enable a country to grow overtime and to determine why output per capita varies across countries. The data are from Summers and Heston (1988), Barro and Wolf (1989), Alesina and Rodrik (1994), Global financial Database (1997) and the World Bank. Data for social status are collected through a primary sample survey on the internet. Twenty-five developed and developing countries are included in the sample.^ The model developed in this study was specified as a system of simultaneous equations, in which per capita growth rate and income inequality were endogenous variables. Additionally, stock price index and social status measures were also incorporated. The results indicate that income inequality is inversely related to economic growth. In addition, increase in income inequality arising from higher stock prices constrains growth. Moreover, where social status is determined by income levels, it influences long run growth. Therefore, these results support findings of Persson and Tabellini (1994) and Alesina and Rodrik (1994). ^
Resumo:
During the past decade, there has been a dramatic increase by postsecondary institutions in providing academic programs and course offerings in a multitude of formats and venues (Biemiller, 2009; Kucsera & Zimmaro, 2010; Lang, 2009; Mangan, 2008). Strategies pertaining to reapportionment of course-delivery seat time have been a major facet of these institutional initiatives; most notably, within many open-door 2-year colleges. Often, these enrollment-management decisions are driven by the desire to increase market-share, optimize the usage of finite facility capacity, and contain costs, especially during these economically turbulent times. So, while enrollments have surged to the point where nearly one in three 18-to-24 year-old U.S. undergraduates are community college students (Pew Research Center, 2009), graduation rates, on average, still remain distressingly low (Complete College America, 2011). Among the learning-theory constructs related to seat-time reapportionment efforts is the cognitive phenomenon commonly referred to as the spacing effect, the degree to which learning is enhanced by a series of shorter, separated sessions as opposed to fewer, more massed episodes. This ex post facto study explored whether seat time in a postsecondary developmental-level algebra course is significantly related to: course success; course-enrollment persistence; and, longitudinally, the time to successfully complete a general-education-level mathematics course. Hierarchical logistic regression and discrete-time survival analysis were used to perform a multi-level, multivariable analysis of a student cohort (N = 3,284) enrolled at a large, multi-campus, urban community college. The subjects were retrospectively tracked over a 2-year longitudinal period. The study found that students in long seat-time classes tended to withdraw earlier and more often than did their peers in short seat-time classes (p < .05). Additionally, a model comprised of nine statistically significant covariates (all with p-values less than .01) was constructed. However, no longitudinal seat-time group differences were detected nor was there sufficient statistical evidence to conclude that seat time was predictive of developmental-level course success. A principal aim of this study was to demonstrate—to educational leaders, researchers, and institutional-research/business-intelligence professionals—the advantages and computational practicability of survival analysis, an underused but more powerful way to investigate changes in students over time.
Resumo:
China's emergence as an economic powerhouse has often been portrayed as threatening to America's economic strength and to its very identity as "the global hegemon." The media's alarmist response to an economic competitor is familiar to those who remember US-Japanese relations in the 1980s. In order to better understand the basis of American threat perception, this study explores the independent and interactive impact of three variables (perceptions of the Other's capabilities, perceptions of the Other as a threat versus as an opportunity, and perceptions of the Other's political culture) on attitudes toward two different economic competitors (Japan 1977-1995 and China 1985-2011). Utilizing four methods (historical process tracing, public polling data analysis, social scientific experimentation, and content analysis), this study demonstrates that increases in the Other's economic capabilities have a much smaller impact on attitudes than is commonly believed. It further shows that while perceptions of threat/opportunity played a significant role in shaping attitudinal response toward Japan, perceptions of political culture are the most important factor driving attitudes toward China today. This study contributes to a better understanding of how states react to threats and construct negative images of their economic rivals. It also helps to explain the current Sino-American relationship and enables better predictions as to its potential future course. Finally, these findings contribute to cultural explanations of the democratic peace phenomenon and provide a boundary condition (political culture) for the liberal proposition that opportunity ameliorates conflict in the economic realm.^
Resumo:
This symposium describes a multi-dimensional strategy to examine fidelity of implementation in an authentic school district context. An existing large-district peer mentoring program provides an example. The presentation will address development of a logic model to articulate a theory of change; collaborative creation of a data set aligned with essential concepts and research questions; identification of independent, dependent, and covariate variables; issues related to use of big data that include conditioning and transformation of data prior to analysis; operationalization of a strategy to capture fidelity of implementation data from all stakeholders; and ways in which fidelity indicators might be used.
Resumo:
China’s emergence as an economic powerhouse has often been portrayed as threatening to America’s economic strength and to its very identity as “the global hegemon.” The media’s alarmist response to an economic competitor is familiar to those who remember US-Japanese relations in the 1980s. In order to better understand the basis of American threat perception, this study explores the independent and interactive impact of three variables (perceptions of the Other’s capabilities, perceptions of the Other as a threat versus as an opportunity, and perceptions of the Other’s political culture) on attitudes toward two different economic competitors (Japan 1977-1995 and China 1985-2011). Utilizing four methods (historical process tracing, public polling data analysis, social scientific experimentation, and content analysis), this study demonstrates that increases in the Other’s economic capabilities have a much smaller impact on attitudes than is commonly believed. It further shows that while perceptions of threat/opportunity played a significant role in shaping attitudinal response toward Japan, perceptions of political culture are the most important factor driving attitudes toward China today. This study contributes to a better understanding of how states react to threats and construct negative images of their economic rivals. It also helps to explain the current Sino-American relationship and enables better predictions as to its potential future course. Finally, these findings contribute to cultural explanations of the democratic peace phenomenon and provide a boundary condition (political culture) for the liberal proposition that opportunity ameliorates conflict in the economic realm.
Resumo:
Purpose: Most individuals do not perceive a need for substance use treatment despite meeting diagnostic criteria for substance use disorders and they are least likely to pursue treatment voluntarily. There are also those who perceive a need for treatment and yet do not pursue it. This study aimed to understand which factors increase the likelihood of perceiving a need for treatment for individuals who meet diagnostic criteria for substance use disorders in the hopes to better assist with more targeted efforts for gender-specific treatment recruitment and retention. Using Andersen and Newman’s (1973/2005) model of individual determinants of healthcare utilization, the central hypothesis of the study was that gender moderates the relationship between substance use problem severity and perceived treatment need, so that women with increasing problems due to their use of substances are more likely than men to perceive a need for treatment. Additional predisposing and enabling factors from Andersen and Newman’s (1973/2005) model were included in the study to understand their impact on perceived need. Method: The study was a secondary data analysis of the 2010 National Survey on Drug Use and Health (NSDUH) using logistic regression. The weighted sample consisted of a total 20,077,235 American household residents (The unweighted sample was 5,484 participants). Results of the logistic regression were verified using Relogit software for rare events logistic regression due to the rare event of perceived treatment need (King & Zeng, 2001a; 2001b). Results: The moderating effect of female gender was not found. Conversely, men were significantly more likely than women to perceive a need for treatment as substance use problem severity increased. The study also found that a number of factors such as race, ethnicity, socioeconomic status, age, marital status, education, co-occurring mental health disorders, and prior treatment history differently impacted the likelihood of perceiving a need for treatment among men and women. Conclusion: Perceived treatment need among individuals who meet criteria for substance use disorders is rare, but identifying factors associated with an increased likelihood of perceiving need for treatment can help the development of gender-appropriate outreach and recruitment for social work treatment, and public health messages.