973 resultados para Data replication
Resumo:
Currently the data storage industry is facing huge challenges with respect to the conventional method of recording data known as longitudinal magnetic recording. This technology is fast approaching a fundamental physical limit, known as the superparamagnetic limit. A unique way of deferring the superparamagnetic limit incorporates the patterning of magnetic media. This method exploits the use of lithography tools to predetermine the areal density. Various nanofabrication schemes are employed to pattern the magnetic material are Focus Ion Beam (FIB), E-beam Lithography (EBL), UV-Optical Lithography (UVL), Self-assembled Media Synthesis and Nanoimprint Lithography (NIL). Although there are many challenges to manufacturing patterned media, the large potential gains offered in terms of areal density make it one of the most promising new technologies on the horizon for future hard disk drives. Thus, this dissertation contributes to the development of future alternative data storage devices and deferring the superparamagnetic limit by designing and characterizing patterned magnetic media using a novel nanoimprint replication process called "Step and Flash Imprint lithography". As opposed to hot embossing and other high temperature-low pressure processes, SFIL can be performed at low pressure and room temperature. Initial experiments carried out, consisted of process flow design for the patterned structures on sputtered Ni-Fe thin films. The main one being the defectivity analysis for the SFIL process conducted by fabricating and testing devices of varying feature sizes (50 nm to 1 μm) and inspecting them optically as well as testing them electrically. Once the SFIL process was optimized, a number of Ni-Fe coated wafers were imprinted with a template having the patterned topography. A minimum feature size of 40 nm was obtained with varying pitch (1:1, 1:1.5, 1:2, and 1:3). The Characterization steps involved extensive SEM study at each processing step as well as Atomic Force Microscopy (AFM) and Magnetic Force Microscopy (MFM) analysis.
Resumo:
The primary goal of this dissertation is the study of patterns of viral evolution inferred from serially-sampled sequence data, i.e., sequence data obtained from strains isolated at consecutive time points from a single patient or host. RNA viral populations have an extremely high genetic variability, largely due to their astronomical population sizes within host systems, high replication rate, and short generation time. It is this aspect of their evolution that demands special attention and a different approach when studying the evolutionary relationships of serially-sampled sequence data. New methods that analyze serially-sampled data were developed shortly after a groundbreaking HIV-1 study of several patients from which viruses were isolated at recurring intervals over a period of 10 or more years. These methods assume a tree-like evolutionary model, while many RNA viruses have the capacity to exchange genetic material with one another using a process called recombination. ^ A genealogy involving recombination is best described by a network structure. A more general approach was implemented in a new computational tool, Sliding MinPD, one that is mindful of the sampling times of the input sequences and that reconstructs the viral evolutionary relationships in the form of a network structure with implicit representations of recombination events. The underlying network organization reveals unique patterns of viral evolution and could help explain the emergence of disease-associated mutants and drug-resistant strains, with implications for patient prognosis and treatment strategies. In order to comprehensively test the developed methods and to carry out comparison studies with other methods, synthetic data sets are critical. Therefore, appropriate sequence generators were also developed to simulate the evolution of serially-sampled recombinant viruses, new and more through evaluation criteria for recombination detection methods were established, and three major comparison studies were performed. The newly developed tools were also applied to "real" HIV-1 sequence data and it was shown that the results represented within an evolutionary network structure can be interpreted in biologically meaningful ways. ^
Resumo:
Acknowledgements We thank Brian Roberts and Mike Harris for responding to our questions regarding their paper; Zoltan Dienes for advice on Bayes factors; Denise Fischer, Melanie Römer, Ioana Stanciu, Aleksandra Romanczuk, Stefano Uccelli, Nuria Martos Sánchez, and Rosa María Beño Ruiz de la Sierra for help collecting data; Eva Viviani for managing data collection in Parma. We thank Maurizio Gentilucci for letting us use his lab, and the Centro Intradipartimentale Mente e Cervello (CIMeC), University of Trento, and especially Francesco Pavani for lending us his motion tracking equipment. We thank Rachel Foster for proofreading. KKK was supported by a Ph.D. scholarship as part of a grant to VHF within the International Graduate Research Training Group on Cross-Modal Interaction in Natural and Artificial Cognitive Systems (CINACS; DFG IKG-1247) and TS by a grant (DFG – SCHE 735/3-1); both from the German Research Council.
Resumo:
There is a rich history of social science research centering on racial inequalities that continue to be observed across various markets (e.g., labor, housing, and credit markets) and social milieus. Existing research on racial discrimination in consumer markets, however, is relatively scarce and that which has been done has disproportionately focused on consumers as the victims of race-based mistreatment. As such, we know relatively little about how consumers contribute to inequalities in their roles as perpetrators of racial discrimination. In response, in this paper we elaborate on a line of research that is only in its’ infancy stages of development and yet is ripe with opportunities to advance the literature on consumer racial discrimination and racial earnings inequities among tip dependent employees in the United States. Specifically, we analyze data derived from a large exit survey of restaurant consumers (n=378) in an attempt to replicate, extend, and further explore the recently documented effect of service providers’ race on restaurant consumers’ tipping decisions. Our results indicate that both White and Black restaurant customers discriminate against Black servers by tipping them less than their White coworkers. Importantly, we find no evidence that this Black tip penalty is the result of interracial differences in service skills possessed by Black and White servers. We conclude by delineating directions for future research in this neglected but salient area study.
Resumo:
In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.
Resumo:
Since 2005, harmonized catch assessment surveys (CASs) have been implemented on Lake Victoria in the three riparian countries Uganda, Kenya, and Tanzania to monitor the commercial fish stocks and provide their management advice. The regionally harmonized standard operating procedures for CASs have not been wholly followed due to logistical difficulties. Yet the new approaches adopted have not been documented. This study investigated the alternative approaches used to estimate fish catches on the lake with the aim of determining the most reliable one for providing management advice and also the effect of current sampling routine on the precision of catch estimates provided. The study found the currently used lake-wide approach less reliable and more biased in providing catch estimates compared to the district based approach. Noticeable differences were detected in catch estimates between different months of the year. The study recommends future analyses of CAS data collected on the lake to follow the district based approach. Future CASs should also consider seasonal variations in the sampling design by providing for replication of sampling. The SOPs need updating to document the procedures that deviate from the original sampling design.
Resumo:
Hoekstra et al. (Psychonomic Bulletin & Review, 2014, 21:1157–1164) surveyed the interpretation of confidence intervals (CIs) by first-year students, master students, and researchers with six items expressing misinterpretations of CIs. They asked respondents to answer all items, computed the number of items endorsed, and concluded that misinterpretation of CIs is robust across groups. Their design may have produced this outcome artifactually for reasons that we describe. This paper discusses first the two interpretations of CIs and, hence, why misinterpretation cannot be inferred from endorsement of some of the items. Next, a re-analysis of Hoekstra et al.’s data reveals some puzzling differences between first-year and master students that demand further investigation. For that purpose, we designed a replication study with an extended questionnaire including two additional items that express correct interpretations of CIs (to compare endorsement of correct vs. nominally incorrect interpretations) and we asked master students to indicate which items they would have omitted had they had the option (to distinguish deliberate from uninformed endorsement caused by the forced-response format). Results showed that incognizant first-year students endorsed correct and nominally incorrect items identically, revealing that the two item types are not differentially attractive superficially; in contrast, master students were distinctively more prone to endorsing correct items when their uninformed responses were removed, although they admitted to nescience more often that might have been expected. Implications for teaching practices are discussed.
Resumo:
Replication of eukaryotic chromosomes initiates at multiple sites called replication origins. Replication origins are best understood in the budding yeast Saccharomyces cerevisiae, where several complementary studies have mapped their locations genome-wide. We have collated these datasets, taking account of the resolution of each study, to generate a single list of distinct origin sites. OriDB provides a web-based catalogue of these confirmed and predicted S.cerevisiae DNA replication origin sites. Each proposed or confirmed origin site appears as a record in OriDB, with each record comprising seven pages. These pages provide, in text and graphical formats, the following information: genomic location and chromosome context of the origin site; time of origin replication; DNA sequence of proposed or experimentally confirmed origin elements; free energy required to open the DNA duplex (stress-induced DNA duplex destabilization or SIDD); and phylogenetic conservation of sequence elements. In addition, OriDB encourages community submission of additional information for each origin site through a User Notes facility. Origin sites are linked to several external resources, including the Saccharomyces Genome Database (SGD) and relevant publications at PubMed. Finally, a Chromosome Viewer utility allows users to interactively generate graphical representations of DNA replication data genome-wide. OriDB is available at www.oridb.org.
Resumo:
Background: The role of common, low to intermediate risk alleles in breast cancer need to be examined due to their relatively high prevalence. Among many cellular pathways, replication has a pivotal role in cell division and frequently targeted during carcinogenesis. Replication is governed by a host of genes involved in a number of different pathways. This study investigates the effects of replication-gene variants in relation to breast cancer and how this relationship is affected by ethnicity, menopausal status and breast tumour subtype. Methods: Data from a case-control study with 997 incident breast cancer cases and 1,050 age frequency matched controls in Vancouver, British Columbia and Kingston, Ontario were used. Unconditional logistic regression was used to calculate odds ratios between 45 replication gene variants and breast cancer risk, assuming an additive genetic model adjusted for age and centre, presented for Europeans and East Asians separately. Polytomous logistic regression was used to assess odds ratios between each SNP and four breast cancer subtypes defined by hormone receptor status among Europeans. All analyses were stratified by menopausal status. The Benjamini–Hochberg false discovery rate (FDR) was used to address multiple comparisons. Results: Among Europeans, the SNPs in FGFR2, TOX3 and 11q13 loci were associated with breast cancer after controlling for multiple comparisons. Test of heterogeneity showed the SNPs rs1045185, rs4973768, rs672888, rs1219648, rs2420946 among Europeans and rs889312 among East Asians conferred differential risk across the tumour subtypes. Conclusions: Specific SNPs in replication genes were associated with breast cancer, and the risk level differed by tumour subtype defined by ER/PR/Her2 status and ethnicity.
Resumo:
High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.
Resumo:
The article seeks to investigate patterns of performance and relationships between grip strength, gait speed and self-rated health, and investigate the relationships between them, considering the variables of gender, age and family income. This was conducted in a probabilistic sample of community-dwelling elderly aged 65 and over, members of a population study on frailty. A total of 689 elderly people without cognitive deficit suggestive of dementia underwent tests of gait speed and grip strength. Comparisons between groups were based on low, medium and high speed and strength. Self-related health was assessed using a 5-point scale. The males and the younger elderly individuals scored significantly higher on grip strength and gait speed than the female and oldest did; the richest scored higher than the poorest on grip strength and gait speed; females and men aged over 80 had weaker grip strength and lower gait speed; slow gait speed and low income arose as risk factors for a worse health evaluation. Lower muscular strength affects the self-rated assessment of health because it results in a reduction in functional capacity, especially in the presence of poverty and a lack of compensatory factors.
Resumo:
Obstructive sleep apnea syndrome has a high prevalence among adults. Cephalometric variables can be a valuable method for evaluating patients with this syndrome. To correlate cephalometric data with the apnea-hypopnea sleep index. We performed a retrospective and cross-sectional study that analyzed the cephalometric data of patients followed in the Sleep Disorders Outpatient Clinic of the Discipline of Otorhinolaryngology of a university hospital, from June 2007 to May 2012. Ninety-six patients were included, 45 men, and 51 women, with a mean age of 50.3 years. A total of 11 patients had snoring, 20 had mild apnea, 26 had moderate apnea, and 39 had severe apnea. The distance from the hyoid bone to the mandibular plane was the only variable that showed a statistically significant correlation with the apnea-hypopnea index. Cephalometric variables are useful tools for the understanding of obstructive sleep apnea syndrome. The distance from the hyoid bone to the mandibular plane showed a statistically significant correlation with the apnea-hypopnea index.
Resumo:
In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.
Resumo:
To assess the completeness and reliability of the Information System on Live Births (Sinasc) data. A cross-sectional analysis of the reliability and completeness of Sinasc's data was performed using a sample of Live Birth Certificate (LBC) from 2009, related to births from Campinas, Southeast Brazil. For data analysis, hospitals were grouped according to category of service (Unified National Health System, private or both), 600 LBCs were randomly selected and the data were collected in LBC-copies through mothers and newborns' hospital records and by telephone interviews. The completeness of LBCs was evaluated, calculating the percentage of blank fields, and the LBCs agreement comparing the originals with the copies was evaluated by Kappa and intraclass correlation coefficients. The percentage of completeness of LBCs ranged from 99.8%-100%. For the most items, the agreement was excellent. However, the agreement was acceptable for marital status, maternal education and newborn infants' race/color, low for prenatal visits and presence of birth defects, and very low for the number of deceased children. The results showed that the municipality Sinasc is reliable for most of the studied variables. Investments in training of the professionals are suggested in an attempt to improve system capacity to support planning and implementation of health activities for the benefit of maternal and child population.
Resumo:
Often in biomedical research, we deal with continuous (clustered) proportion responses ranging between zero and one quantifying the disease status of the cluster units. Interestingly, the study population might also consist of relatively disease-free as well as highly diseased subjects, contributing to proportion values in the interval [0, 1]. Regression on a variety of parametric densities with support lying in (0, 1), such as beta regression, can assess important covariate effects. However, they are deemed inappropriate due to the presence of zeros and/or ones. To evade this, we introduce a class of general proportion density, and further augment the probabilities of zero and one to this general proportion density, controlling for the clustering. Our approach is Bayesian and presents a computationally convenient framework amenable to available freeware. Bayesian case-deletion influence diagnostics based on q-divergence measures are automatic from the Markov chain Monte Carlo output. The methodology is illustrated using both simulation studies and application to a real dataset from a clinical periodontology study.