3 resultados para Relational data

em DRUM (Digital Repository at the University of Maryland)


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Relational reasoning, or the ability to identify meaningful patterns within any stream of information, is a fundamental cognitive ability associated with academic success across a variety of domains of learning and levels of schooling. However, the measurement of this construct has been historically problematic. For example, while the construct is typically described as multidimensional—including the identification of multiple types of higher-order patterns—it is most often measured in terms of a single type of pattern: analogy. For that reason, the Test of Relational Reasoning (TORR) was conceived and developed to include three other types of patterns that appear to be meaningful in the educational context: anomaly, antinomy, and antithesis. Moreover, as a way to focus on fluid relational reasoning ability, the TORR was developed to include, except for the directions, entirely visuo-spatial stimuli, which were designed to be as novel as possible for the participant. By focusing on fluid intellectual processing, the TORR was also developed to be fairly administered to undergraduate students—regardless of the particular gender, language, and ethnic groups they belong to. However, although some psychometric investigations of the TORR have been conducted, its actual fairness across those demographic groups has yet to be empirically demonstrated. Therefore, a systematic investigation of differential-item-functioning (DIF) across demographic groups on TORR items was conducted. A large (N = 1,379) sample, representative of the University of Maryland on key demographic variables, was collected, and the resulting data was analyzed using a multi-group, multidimensional item-response theory model comparison procedure. Using this procedure, no significant DIF was found on any of the TORR items across any of the demographic groups of interest. This null finding is interpreted as evidence of the cultural-fairness of the TORR, and potential test-development choices that may have contributed to that cultural-fairness are discussed. For example, the choice to make the TORR an untimed measure, to use novel stimuli, and to avoid stereotype threat in test administration, may have contributed to its cultural-fairness. Future steps for psychometric research on the TORR, and substantive research utilizing the TORR, are also presented and discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Relation-inferred self-efficacy (RISE), a relatively new concept, is defined as a target individual’s beliefs about how an observer, often a relationship partner, perceives the target’s ability to perform certain actions successfully. Along with self-efficacy (i.e., one’s beliefs about his or her own ability) and other-efficacy (i.e., one’s beliefs about his or her partner’s ability), RISE makes up a three part system of interrelated efficacy beliefs known as the relational efficacy model (Lent & Lopez, 2002). Previous research has shown this model to be helpful in understanding how relational dyads, including coach-athlete, advisor-advisee, and romantic partners, contribute to the development of self-efficacy beliefs. The clinical supervision dyad (i.e., supervisor-supervisee), is another context in which relational efficacy beliefs may play an important role. This study investigated the relationship between counseling self-efficacy, RISE, and other-efficacy within the context of clinical supervision. Specifically, it examined whether supervisee perceptions about how their supervisor sees their counseling ability (RISE) related to how supervisees see their own counseling ability (counseling self-efficacy), and what moderates this relationship. The study also sought to discover the degree to which RISE mediated the relationship between supervisor working alliance and counseling self-efficacy. Data were collected from 240 graduate students who were currently enrolled in counseling related fields, working with at least one client, and receiving regular supervision. Results demonstrated that years of experience and RISE predicted counseling self-efficacy and that the relationship between RISE and counseling self-efficacy was, as expected, moderated by other-efficacy. Contrary to expectations, however, counseling experience and level of client difficulty did not moderate the relationship between RISE and counseling self-efficacy. These findings suggest that the relationship between RISE and counseling self-efficacy was stronger when supervisees saw their supervisors as capable therapists. Furthermore, RISE was found to fully mediate the relationship between supervisor working alliance and counseling self-efficacy. Future research directions and implications for training and supervision are discussed.