6 resultados para Big Data graphs
em Digital Commons at Florida International University
Resumo:
Graph-structured databases are widely prevalent, and the problem of effective search and retrieval from such graphs has been receiving much attention recently. For example, the Web can be naturally viewed as a graph. Likewise, a relational database can be viewed as a graph where tuples are modeled as vertices connected via foreign-key relationships. Keyword search querying has emerged as one of the most effective paradigms for information discovery, especially over HTML documents in the World Wide Web. One of the key advantages of keyword search querying is its simplicity—users do not have to learn a complex query language, and can issue queries without any prior knowledge about the structure of the underlying data. The purpose of this dissertation was to develop techniques for user-friendly, high quality and efficient searching of graph structured databases. Several ranked search methods on data graphs have been studied in the recent years. Given a top-k keyword search query on a graph and some ranking criteria, a keyword proximity search finds the top-k answers where each answer is a substructure of the graph containing all query keywords, which illustrates the relationship between the keyword present in the graph. We applied keyword proximity search on the web and the page graph of web documents to find top-k answers that satisfy user’s information need and increase user satisfaction. Another effective ranking mechanism applied on data graphs is the authority flow based ranking mechanism. Given a top- k keyword search query on a graph, an authority-flow based search finds the top-k answers where each answer is a node in the graph ranked according to its relevance and importance to the query. We developed techniques that improved the authority flow based search on data graphs by creating a framework to explain and reformulate them taking in to consideration user preferences and feedback. We also applied the proposed graph search techniques for Information Discovery over biological databases. Our algorithms were experimentally evaluated for performance and quality. The quality of our method was compared to current approaches by using user surveys.
Resumo:
Thanks to the advanced technologies and social networks that allow the data to be widely shared among the Internet, there is an explosion of pervasive multimedia data, generating high demands of multimedia services and applications in various areas for people to easily access and manage multimedia data. Towards such demands, multimedia big data analysis has become an emerging hot topic in both industry and academia, which ranges from basic infrastructure, management, search, and mining to security, privacy, and applications. Within the scope of this dissertation, a multimedia big data analysis framework is proposed for semantic information management and retrieval with a focus on rare event detection in videos. The proposed framework is able to explore hidden semantic feature groups in multimedia data and incorporate temporal semantics, especially for video event detection. First, a hierarchical semantic data representation is presented to alleviate the semantic gap issue, and the Hidden Coherent Feature Group (HCFG) analysis method is proposed to capture the correlation between features and separate the original feature set into semantic groups, seamlessly integrating multimedia data in multiple modalities. Next, an Importance Factor based Temporal Multiple Correspondence Analysis (i.e., IF-TMCA) approach is presented for effective event detection. Specifically, the HCFG algorithm is integrated with the Hierarchical Information Gain Analysis (HIGA) method to generate the Importance Factor (IF) for producing the initial detection results. Then, the TMCA algorithm is proposed to efficiently incorporate temporal semantics for re-ranking and improving the final performance. At last, a sampling-based ensemble learning mechanism is applied to further accommodate the imbalanced datasets. In addition to the multimedia semantic representation and class imbalance problems, lack of organization is another critical issue for multimedia big data analysis. In this framework, an affinity propagation-based summarization method is also proposed to transform the unorganized data into a better structure with clean and well-organized information. The whole framework has been thoroughly evaluated across multiple domains, such as soccer goal event detection and disaster information management.
Resumo:
This symposium describes a multi-dimensional strategy to examine fidelity of implementation in an authentic school district context. An existing large-district peer mentoring program provides an example. The presentation will address development of a logic model to articulate a theory of change; collaborative creation of a data set aligned with essential concepts and research questions; identification of independent, dependent, and covariate variables; issues related to use of big data that include conditioning and transformation of data prior to analysis; operationalization of a strategy to capture fidelity of implementation data from all stakeholders; and ways in which fidelity indicators might be used.
Resumo:
In the analysis - Recreational Food Service Is Big Business - by Gary Horvath, President, Recreational Foodservice Division, Service America Corporation and Mickey Warner, Associate Professor School of Hospitality Management at Florida International University, Horvath and Warner initially state: “Recreational food service is very different from routine food service management. The authors review the market and the management planning and challenges that create that difference.” Recreational food is loosely defined by the authors as food for special events. These can be one-time events, repeated events that are not on a fixed schedule [i.e. concerts], weekly events such as football-baseball-or basketball games, or other similar venues. Concessions are a large part of these fan based settings. “An anticipated 101,000 fans at a per capita spending of $5-6 [were expected]. A typical concessions menu of hot dogs, popcorn, soda, beer, snacks, novelty foods, candy, and tobacco products comprises this market segment,” say Horvath and Warner in reference to the Super-Bowl XXI football championship game, held in Rose Bowl stadium in Pasadena, California, on January 25, 1987. Some of the article is based upon that event. These food service efforts focus on the individual fan, but do extend to the corporate-organizational level as well. Your authors will have you know that catering is definitely a part of this equation. The monies spent and earned are phenomenal. “Special events of this type attract numerous corporate catering opportunities for companies entertaining VIP guest lists,” the authors inform. “Hospitality tents usually consist of a pregame cocktail party and buffet and a post-game celebration with musical entertainment held in lavishly decorated tents erected at the site. In this case a total of 5,000 covers, at a price of $200 each, for 12-15 separate parties were anticipated.” Horvath and Warner also want you to know that novelties and souvenirs make up an essential part of this, the recreational food service market. “Novelties and souvenirs are a primary market and source of revenue for every stadium food service operator,” say Horvath and Warner. The term, “per capita spending is the measurement used by the industry to evaluate sales potential per attendee at an event,” say the authors. Of course, with the solid revenue figures involved as well as the number of people anticipated for such events, planning is crucial, say Horvath and Warner. Training of staff, purchasing and supply, money and banking, facility access, and equipment, are a few of the elements to be negotiated. Through both graphs and text, Horvath and Warner do provide a fairly detailed outline of what a six-step event plan consists of.
Resumo:
The purpose of this study was to determine the degree to which the Big-Five personality taxonomy, as represented by the Minnesota Multiphasic Personality Inventory (MMPI), California Psychological Inventory (CPI), and Inwald Personality Inventory (IPI) scales, predicted a variety of police officer job performance criteria. Data were collected archivally for 270 sworn police officers from a large Southeastern municipality. Predictive data consisted of scores on the MMPI, CPI, and IPI scales as grouped in terms of the Big-Five factors. The overall score on the Wonderlic was included in order to assess criterion variance accounted for by cognitive ability. Additionally, a psychologist's overall rating of predicted job fit was utilized to assess the variance accounted for by a psychological interview. Criterion data consisted of supervisory ratings of overall job performance, State Examination scores, police academy grades, and termination. Based on the literature, it was hypothesized that officers who are higher on Extroversion, Conscientiousness, Agreeableness, Openness to Experience, and lower on Neuroticism, otherwise known as the Big-Five factors, would outperform their peers across a variety of job performance criteria. Additionally, it was hypothesized that police officers who are higher in cognitive ability and masculinity, and lower in mania would also outperform their counterparts. Results indicated that many of the Big-Five factors, namely, Neuroticism, Conscientiousness, Agreeableness, and Openness to Experience, were predictive of several of the job performance criteria. Such findings imply that the Big-Five is a useful predictor of police officer job performance. Study limitations and implications for future research are discussed. ^
Resumo:
With the exponential growth of the usage of web-based map services, the web GIS application has become more and more popular. Spatial data index, search, analysis, visualization and the resource management of such services are becoming increasingly important to deliver user-desired Quality of Service. First, spatial indexing is typically time-consuming and is not available to end-users. To address this, we introduce TerraFly sksOpen, an open-sourced an Online Indexing and Querying System for Big Geospatial Data. Integrated with the TerraFly Geospatial database [1-9], sksOpen is an efficient indexing and query engine for processing Top-k Spatial Boolean Queries. Further, we provide ergonomic visualization of query results on interactive maps to facilitate the user’s data analysis. Second, due to the highly complex and dynamic nature of GIS systems, it is quite challenging for the end users to quickly understand and analyze the spatial data, and to efficiently share their own data and analysis results with others. Built on the TerraFly Geo spatial database, TerraFly GeoCloud is an extra layer running upon the TerraFly map and can efficiently support many different visualization functions and spatial data analysis models. Furthermore, users can create unique URLs to visualize and share the analysis results. TerraFly GeoCloud also enables the MapQL technology to customize map visualization using SQL-like statements [10]. Third, map systems often serve dynamic web workloads and involve multiple CPU and I/O intensive tiers, which make it challenging to meet the response time targets of map requests while using the resources efficiently. Virtualization facilitates the deployment of web map services and improves their resource utilization through encapsulation and consolidation. Autonomic resource management allows resources to be automatically provisioned to a map service and its internal tiers on demand. v-TerraFly are techniques to predict the demand of map workloads online and optimize resource allocations, considering both response time and data freshness as the QoS target. The proposed v-TerraFly system is prototyped on TerraFly, a production web map service, and evaluated using real TerraFly workloads. The results show that v-TerraFly can accurately predict the workload demands: 18.91% more accurate; and efficiently allocate resources to meet the QoS target: improves the QoS by 26.19% and saves resource usages by 20.83% compared to traditional peak load-based resource allocation.