987 resultados para data replication


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The key to the correct application of ANOVA is careful experimental design and matching the correct analysis to that design. The following points should therefore, be considered before designing any experiment: 1. In a single factor design, ensure that the factor is identified as a 'fixed' or 'random effect' factor. 2. In more complex designs, with more than one factor, there may be a mixture of fixed and random effect factors present, so ensure that each factor is clearly identified. 3. Where replicates can be grouped or blocked, the advantages of a randomised blocks design should be considered. There should be evidence, however, that blocking can sufficiently reduce the error variation to counter the loss of DF compared with a randomised design. 4. Where different treatments are applied sequentially to a patient, the advantages of a three-way design in which the different orders of the treatments are included as an 'effect' should be considered. 5. Combining different factors to make a more efficient experiment and to measure possible factor interactions should always be considered. 6. The effect of 'internal replication' should be taken into account in a factorial design in deciding the number of replications to be used. Where possible, each error term of the ANOVA should have at least 15 DF. 7. Consider carefully whether a particular factorial design can be considered to be a split-plot or a repeated measures design. If such a design is appropriate, consider how to continue the analysis bearing in mind the problem of using post hoc tests in this situation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study re-examines the afterimage paradigm which claims to show that a minority produces a conversion in a task involving afterimage judgements (more private influence than public influence) as opposed to mere compliance produced by a majority. Subsequent failures to replicate this finding have suggested that the changes in the afterimages could be attributed to increased attention due to an ambiguous stimulus coupled with subject suspiciousness. This study attempted to replicate the original experiment but with an unambiguous stimulus in order to remove potential biases. The results showed shifts in afterimages consistent with the increased attention hypothesis for a minority and majority and these were unaffected by the level of suspiciousness reported by the subjects. Additional data shows that no shifts were found in a no-influence control condition showing that shifts were related to exposure to a deviant source and not to response repetition.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Since the introduction of the Net Promoter concept there has been a vivid and ongoing debate among academics and practitioners about the performance of the Net Promoter Score (NPS) in comparison to other customer metrics, such as customer satisfaction, to predict company growth rates. We report results from a study using data from customers and firms in the Netherlands on the relationship between different satisfaction and loyalty metrics as well as the NPS with sales revenue growth, gross margins and net operating cash flows. We find that all metrics perform equally well in predicting current gross margins and current sales revenue growth and equally poor for predicting future sales growth and gross margins as well as current and future net cash flows. The NPS is neither superior nor inferior to other metrics. Taken together, our study suggests that the predictive capability of customer metrics, such as NPS, for future company growth rates is limited. © 2013 Elsevier B.V.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Consideration of the influence of test technique and data analysis method is important for data comparison and design purposes. The paper highlights the effects of replication interval, crack growth rate averaging and curve-fitting procedures on crack growth rate results for a Ni-base alloy. It is shown that an upper bound crack growth rate line is not appropriate for use in fatigue design, and that the derivative of a quadratic fit to the a vs N data looks promising. However, this type of averaging, or curve fitting, is not useful in developing an understanding of microstructure/crack tip interactions. For this purpose, simple replica-to-replica growth rate calculations are preferable. © 1988.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to be analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss Preprocessing, Standardization, Imputation, Projection, Regularization, Penalization, Compression, Reduction, Selection, Kernelization, Hybridization, Parallelization, Aggregation, Randomization, Replication, Sequentialization. Indeed, it is important to emphasize right away that the so-called no free lunch theorem applies here, in the sense that there is no universally superior method that outperforms all other methods on all categories of bigness. It is also important to stress the fact that simplicity in the sense of Ockham’s razor non-plurality principle of parsimony tends to reign supreme when it comes to massive data. We conclude with a comparison of the predictive performance of some of the most commonly used methods on a few data sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In New Zealand and Australia, the BRACElet project has been investigating students' acquisition of programming skills in introductory programming courses. The project has explored students' skills in basic syntax, tracing code, understanding code, and writing code, seeking to establish the relationships between these skills. This ITiCSE working group report presents the most recent step in the BRACElet project, which includes replication of earlier analysis using a far broader pool of naturally occurring data, refinement of the SOLO taxonomy in code-explaining questions, extension of the taxonomy to code-writing questions, extension of some earlier studies on students' 'doodling' while answering exam questions, and exploration of a further theoretical basis for work that until now has been primarily empirical.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Currently the data storage industry is facing huge challenges with respect to the conventional method of recording data known as longitudinal magnetic recording. This technology is fast approaching a fundamental physical limit, known as the superparamagnetic limit. A unique way of deferring the superparamagnetic limit incorporates the patterning of magnetic media. This method exploits the use of lithography tools to predetermine the areal density. Various nanofabrication schemes are employed to pattern the magnetic material are Focus Ion Beam (FIB), E-beam Lithography (EBL), UV-Optical Lithography (UVL), Self-assembled Media Synthesis and Nanoimprint Lithography (NIL). Although there are many challenges to manufacturing patterned media, the large potential gains offered in terms of areal density make it one of the most promising new technologies on the horizon for future hard disk drives. Thus, this dissertation contributes to the development of future alternative data storage devices and deferring the superparamagnetic limit by designing and characterizing patterned magnetic media using a novel nanoimprint replication process called "Step and Flash Imprint lithography". As opposed to hot embossing and other high temperature-low pressure processes, SFIL can be performed at low pressure and room temperature. Initial experiments carried out, consisted of process flow design for the patterned structures on sputtered Ni-Fe thin films. The main one being the defectivity analysis for the SFIL process conducted by fabricating and testing devices of varying feature sizes (50 nm to 1 μm) and inspecting them optically as well as testing them electrically. Once the SFIL process was optimized, a number of Ni-Fe coated wafers were imprinted with a template having the patterned topography. A minimum feature size of 40 nm was obtained with varying pitch (1:1, 1:1.5, 1:2, and 1:3). The Characterization steps involved extensive SEM study at each processing step as well as Atomic Force Microscopy (AFM) and Magnetic Force Microscopy (MFM) analysis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The primary goal of this dissertation is the study of patterns of viral evolution inferred from serially-sampled sequence data, i.e., sequence data obtained from strains isolated at consecutive time points from a single patient or host. RNA viral populations have an extremely high genetic variability, largely due to their astronomical population sizes within host systems, high replication rate, and short generation time. It is this aspect of their evolution that demands special attention and a different approach when studying the evolutionary relationships of serially-sampled sequence data. New methods that analyze serially-sampled data were developed shortly after a groundbreaking HIV-1 study of several patients from which viruses were isolated at recurring intervals over a period of 10 or more years. These methods assume a tree-like evolutionary model, while many RNA viruses have the capacity to exchange genetic material with one another using a process called recombination. ^ A genealogy involving recombination is best described by a network structure. A more general approach was implemented in a new computational tool, Sliding MinPD, one that is mindful of the sampling times of the input sequences and that reconstructs the viral evolutionary relationships in the form of a network structure with implicit representations of recombination events. The underlying network organization reveals unique patterns of viral evolution and could help explain the emergence of disease-associated mutants and drug-resistant strains, with implications for patient prognosis and treatment strategies. In order to comprehensively test the developed methods and to carry out comparison studies with other methods, synthetic data sets are critical. Therefore, appropriate sequence generators were also developed to simulate the evolution of serially-sampled recombinant viruses, new and more through evaluation criteria for recombination detection methods were established, and three major comparison studies were performed. The newly developed tools were also applied to "real" HIV-1 sequence data and it was shown that the results represented within an evolutionary network structure can be interpreted in biologically meaningful ways. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Acknowledgements We thank Brian Roberts and Mike Harris for responding to our questions regarding their paper; Zoltan Dienes for advice on Bayes factors; Denise Fischer, Melanie Römer, Ioana Stanciu, Aleksandra Romanczuk, Stefano Uccelli, Nuria Martos Sánchez, and Rosa María Beño Ruiz de la Sierra for help collecting data; Eva Viviani for managing data collection in Parma. We thank Maurizio Gentilucci for letting us use his lab, and the Centro Intradipartimentale Mente e Cervello (CIMeC), University of Trento, and especially Francesco Pavani for lending us his motion tracking equipment. We thank Rachel Foster for proofreading. KKK was supported by a Ph.D. scholarship as part of a grant to VHF within the International Graduate Research Training Group on Cross-Modal Interaction in Natural and Artificial Cognitive Systems (CINACS; DFG IKG-1247) and TS by a grant (DFG – SCHE 735/3-1); both from the German Research Council.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There is a rich history of social science research centering on racial inequalities that continue to be observed across various markets (e.g., labor, housing, and credit markets) and social milieus. Existing research on racial discrimination in consumer markets, however, is relatively scarce and that which has been done has disproportionately focused on consumers as the victims of race-based mistreatment. As such, we know relatively little about how consumers contribute to inequalities in their roles as perpetrators of racial discrimination. In response, in this paper we elaborate on a line of research that is only in its’ infancy stages of development and yet is ripe with opportunities to advance the literature on consumer racial discrimination and racial earnings inequities among tip dependent employees in the United States. Specifically, we analyze data derived from a large exit survey of restaurant consumers (n=378) in an attempt to replicate, extend, and further explore the recently documented effect of service providers’ race on restaurant consumers’ tipping decisions. Our results indicate that both White and Black restaurant customers discriminate against Black servers by tipping them less than their White coworkers. Importantly, we find no evidence that this Black tip penalty is the result of interracial differences in service skills possessed by Black and White servers. We conclude by delineating directions for future research in this neglected but salient area study.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Since 2005, harmonized catch assessment surveys (CASs) have been implemented on Lake Victoria in the three riparian countries Uganda, Kenya, and Tanzania to monitor the commercial fish stocks and provide their management advice. The regionally harmonized standard operating procedures for CASs have not been wholly followed due to logistical difficulties. Yet the new approaches adopted have not been documented. This study investigated the alternative approaches used to estimate fish catches on the lake with the aim of determining the most reliable one for providing management advice and also the effect of current sampling routine on the precision of catch estimates provided. The study found the currently used lake-wide approach less reliable and more biased in providing catch estimates compared to the district based approach. Noticeable differences were detected in catch estimates between different months of the year. The study recommends future analyses of CAS data collected on the lake to follow the district based approach. Future CASs should also consider seasonal variations in the sampling design by providing for replication of sampling. The SOPs need updating to document the procedures that deviate from the original sampling design.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hoekstra et al. (Psychonomic Bulletin & Review, 2014, 21:1157–1164) surveyed the interpretation of confidence intervals (CIs) by first-year students, master students, and researchers with six items expressing misinterpretations of CIs. They asked respondents to answer all items, computed the number of items endorsed, and concluded that misinterpretation of CIs is robust across groups. Their design may have produced this outcome artifactually for reasons that we describe. This paper discusses first the two interpretations of CIs and, hence, why misinterpretation cannot be inferred from endorsement of some of the items. Next, a re-analysis of Hoekstra et al.’s data reveals some puzzling differences between first-year and master students that demand further investigation. For that purpose, we designed a replication study with an extended questionnaire including two additional items that express correct interpretations of CIs (to compare endorsement of correct vs. nominally incorrect interpretations) and we asked master students to indicate which items they would have omitted had they had the option (to distinguish deliberate from uninformed endorsement caused by the forced-response format). Results showed that incognizant first-year students endorsed correct and nominally incorrect items identically, revealing that the two item types are not differentially attractive superficially; in contrast, master students were distinctively more prone to endorsing correct items when their uninformed responses were removed, although they admitted to nescience more often that might have been expected. Implications for teaching practices are discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Replication of eukaryotic chromosomes initiates at multiple sites called replication origins. Replication origins are best understood in the budding yeast Saccharomyces cerevisiae, where several complementary studies have mapped their locations genome-wide. We have collated these datasets, taking account of the resolution of each study, to generate a single list of distinct origin sites. OriDB provides a web-based catalogue of these confirmed and predicted S.cerevisiae DNA replication origin sites. Each proposed or confirmed origin site appears as a record in OriDB, with each record comprising seven pages. These pages provide, in text and graphical formats, the following information: genomic location and chromosome context of the origin site; time of origin replication; DNA sequence of proposed or experimentally confirmed origin elements; free energy required to open the DNA duplex (stress-induced DNA duplex destabilization or SIDD); and phylogenetic conservation of sequence elements. In addition, OriDB encourages community submission of additional information for each origin site through a User Notes facility. Origin sites are linked to several external resources, including the Saccharomyces Genome Database (SGD) and relevant publications at PubMed. Finally, a Chromosome Viewer utility allows users to interactively generate graphical representations of DNA replication data genome-wide. OriDB is available at www.oridb.org.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper addresses a problem on implementing an asynchronous replication scheme in utility-based computing environment. The problem needs a special attention as most of the existing replication scheme in this computing system whether implicitly support synchronous replication and/or only consider read-only job. Therefore, we propose an intelligent framework that can reinforce an effective resource selection scheme by allowing the components that give impact on the performance such as resource/data freshness of the replicated system in such environment to be considered. We exploit an Update Ordering (UO) approach and reconcile these components in designing the framework. Important issues such as job propagation delay and job propagation rules are especially addressed. Our experiments show that the proposed framework is capable to become a platform of an effective resource selection scheme and achieve a good result with good system performance as compared to existing algorithms.