Biblioteca Digital

956 resultados para scalable coding

Secure and Scalable Statistical Computation of Questionnaire Data in R

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Collecting data via a questionnaire and analyzing them while preserving respondents’ privacy may increase the number of respondents and the truthfulness of their responses. It may also reduce the systematic differences between respondents and non-respondents. In this paper, we propose a privacy-preserving method for collecting and analyzing survey responses using secure multi-party computation (SMC). The method is secure under the semi-honest adversarial model. The proposed method computes a wide variety of statistics. Total and stratified statistical counts are computed using the secure protocols developed in this paper. Then, additional statistics, such as a contingency table, a chi-square test, an odds ratio, and logistic regression, are computed within the R statistical environment using the statistical counts as building blocks. The method was evaluated on a questionnaire dataset of 3,158 respondents sampled for a medical study and simulated questionnaire datasets of up to 50,000 respondents. The computation time for the statistical analyses linearly scales as the number of respondents increases. The results show that the method is efficient and scalable for practical use. It can also be used for other applications in which categorical data are collected.

A Scalable General Purpose System for Large-Scale Graph Processing

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Graph analytics is an important and computationally demanding class of data analytics. It is essential to balance scalability, ease-of-use and high performance in large scale graph analytics. As such, it is necessary to hide the complexity of parallelism, data distribution and memory locality behind an abstract interface. The aim of this work is to build a scalable graph analytics framework that does not demand significant parallel programming experience based on NUMA-awareness.
The realization of such a system faces two key problems:
(i)~how to develop a scale-free parallel programming framework that scales efficiently across NUMA domains; (ii)~how to efficiently apply graph partitioning in order to create separate and largely independent work items that can be distributed among threads.

Rare coding variants in the phospholipase D3 gene confer risk for Alzheimer's disease.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genome-wide association studies (GWAS) have identified several risk variants for late-onset Alzheimer's disease (LOAD)1, 2. These common variants have replicable but small effects on LOAD risk and generally do not have obvious functional effects. Low-frequency coding variants, not detected by GWAS, are predicted to include functional variants with larger effects on risk. To identify low-frequency coding variants with large effects on LOAD risk, we carried out whole-exome sequencing (WES) in 14 large LOAD families and follow-up analyses of the candidate variants in several large LOAD case–control data sets. A rare variant in PLD3 (phospholipase D3; Val232Met) segregated with disease status in two independent families and doubled risk for Alzheimer’s disease in seven independent case–control series with a total of more than 11,000 cases and controls of European descent. Gene-based burden analyses in 4,387 cases and controls of European descent and 302 African American cases and controls, with complete sequence data for PLD3, reveal that several variants in this gene increase risk for Alzheimer’s disease in both populations. PLD3 is highly expressed in brain regions that are vulnerable to Alzheimer’s disease pathology, including hippocampus and cortex, and is expressed at significantly lower levels in neurons from Alzheimer’s disease brains compared to control brains. Overexpression of PLD3 leads to a significant decrease in intracellular amyloid-β precursor protein (APP) and extracellular Aβ42 and Aβ40 (the 42- and 40-residue isoforms of the amyloid-β peptide), and knockdown of PLD3 leads to a significant increase in extracellular Aβ42 and Aβ40. Together, our genetic and functional data indicate that carriers of PLD3 coding variants have a twofold increased risk for LOAD and that PLD3 influences APP processing. This study provides an example of how densely affected families may help to identify rare variants with large effects on risk for disease or other complex traits.

A Scalable Design Framework for Variability Management in Large-Scale Software Product Lines

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Variability management is one of the major challenges in software product line adoption, since it needs to be efficiently managed at various levels of the software product line development process (e.g., requirement analysis, design, implementation, etc.). One of the main challenges within variability management is the handling and effective visualization of large-scale (industry-size) models, which in many projects, can reach the order of thousands, along with the dependency relationships that exist among them. These have raised many concerns regarding the scalability of current variability management tools and techniques and their lack of industrial adoption. To address the scalability issues, this work employed a combination of quantitative and qualitative research methods to identify the reasons behind the limited scalability of existing variability management tools and techniques. In addition to producing a comprehensive catalogue of existing tools, the outcome form this stage helped understand the major limitations of existing tools. Based on the findings, a novel approach was created for managing variability that employed two main principles for supporting scalability. First, the separation-of-concerns principle was employed by creating multiple views of variability models to alleviate information overload. Second, hyperbolic trees were used to visualise models (compared to Euclidian space trees traditionally used). The result was an approach that can represent models encompassing hundreds of variability points and complex relationships. These concepts were demonstrated by implementing them in an existing variability management tool and using it to model a real-life product line with over a thousand variability points. Finally, in order to assess the work, an evaluation framework was designed based on various established usability assessment best practices and standards. The framework was then used with several case studies to benchmark the performance of this work against other existing tools.

Scalable unstructured mesh decomposition

Relevância:

20.00% 20.00%

Publicador:

Resumo:

As the efficiency of parallel software increases it is becoming common to measure near linear speedup for many applications. For a problem size N on P processors then with software running at O(N=P ) the performance restrictions due to file i/o systems and mesh decomposition running at O(N) become increasingly apparent especially for large P . For distributed memory parallel systems an additional limit to scalability results from the finite memory size available for i/o scatter/gather operations. Simple strategies developed to address the scalability of scatter/gather operations for unstructured mesh based applications have been extended to provide scalable mesh decomposition through the development of a parallel graph partitioning code, JOSTLE [8]. The focus of this work is directed towards the development of generic strategies that can be incorporated into the Computer Aided Parallelisation Tools (CAPTools) project.

Questions and answers from new higher education subject coding system webinar – 7 June 2016

Relevância:

20.00% 20.00%

Publicador:

Probabilistic Models for Scalable Knowledge Graph Construction

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.

Investigation of the role of long non-coding RNAs in oncogene induced cellular senescence

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cellular senescence is a stable arrest of cell proliferation induced by several factors such as activated oncogenes, oxidative stress and shortening of telomeres. Senescence acts as a tumour suppression mechanism to halt the progression of cancer. However, senescence may also impact negatively upon tissue regeneration, thus contributing to the effects of ageing. The eukaryotic genome is controlled by various modes of transcriptional and translational regulation. Focus has therefore centred on the role of long non- coding RNAs (lncRNAs) in regulating the genome. Accordingly, understanding how lncRNAs function to regulate the senescent genome is integral to improving our knowledge and understanding of tumour suppression and ageing. Within this study, I set out to investigate the expression of lncRNAs’ expression within models of senescence. Through a custom expression array, I have shown that expression of multiple different lncRNAs is up-regulated and down regulated in IMR90 replicative senescent fibroblasts and oncogene-induced senescent melanocytes. LncRNA expression was determined to be specific to stable senescence-associated cell arrest and predominantly within the nucleus of senescent cells. In order to examine the function of lncRNA expression in senescence, I selected lncRNA transcript ENST0000430998 (lncRNA_98) to focus my investigations upon. LncRNA_98 was robustly upregulated within multiple models of senescence and efficiently depleted using anti-sense oligonucleotide technology. Characterisation and unbiased RNA-sequencing of lncRNA_98 deficient senescent cells highlighted a list of genes that are regulated by lncRNA_98 expression in senescent cells and may regulate aspects of the senescence program. Specifically, the formation of SAHF was impeded upon depletion of lncRNA_98 expression and levels of total pRB protein expression severely decreased. Validation and recapitulation of consequences of pRB depletion was confirmed through lncRNA_98 knock-out cells generated using CRISPR technology. Surprisingly, inhibition of ATM kinase functions permitted the restoration of pRB protein levels within lncRNA_98 deficient cells. I propose that lncRNA_98 antagonizes the ability of ATM kinase to downregulate pRB expression at a post-transcriptional level, thereby potentiating senescence. Furthermore, lncRNA expression was detected within fibroblasts of old individuals and visualised within senescent melanocytes in human benign nevi, a barrier to melanoma progression. Conversely, mining of 337 TCGA primary melanoma data sets highlighted that the lncRNA_98 gene and its expression was lost from a significant proportion of melanoma samples, consistent with lncRNA_98 having a tumour suppressor functions. The data presented in this study illustrates that lncRNA_98 expression has a regulatory role over pRB expression in senescence and may regulate aspects of tumourigenesis and ageing.

Scalable semantic aware context storage

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The number of connected devices collecting and distributing real-world information through various systems, is expected to soar in the coming years. As the number of such connected devices grows, it becomes increasingly difficult to store and share all these new sources of information. Several context representation schemes try to standardize this information, but none of them have been widely adopted. In previous work we addressed this challenge, however our solution had some drawbacks: poor semantic extraction and scalability. In this paper we discuss ways to efficiently deal with representation schemes' diversity and propose a novel d-dimension organization model. Our evaluation shows that d-dimension model improves scalability and semantic extraction.

RoMEO Studies 8: self-archiving: the logic behind the colour-coding used in the Copyright Knowledge Bank

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose – The purpose of this research is to show how the self-archiving of journal papers is a major step towards providing open access to research. However, copyright transfer agreements (CTAs) that are signed by an author prior to publication often indicate whether, and in what form, self-archiving is allowed. The SHERPA/RoMEO database enables easy access to publishers' policies in this area and uses a colour-coding scheme to classify publishers according to their self-archiving status. The database is currently being redeveloped and renamed the Copyright Knowledge Bank. However, it will still assign a colour to individual publishers indicating whether pre-prints can be self-archived (yellow), post-prints can be self-archived (blue), both pre-print and post-print can be archived (green) or neither (white). The nature of CTAs means that these decisions are rarely as straightforward as they may seem, and this paper describes the thinking and considerations that were used in assigning these colours in the light of the underlying principles and definitions of open access. Approach – Detailed analysis of a large number of CTAs led to the development of controlled vocabulary of terms which was carefully analysed to determine how these terms equate to the definition and “spirit” of open access. Findings – The paper reports on how conditions outlined by publishers in their CTAs, such as how or where a paper can be self-archived, affect the assignment of a self-archiving colour to the publisher. Value – The colour assignment is widely used by authors and repository administrators in determining whether academic papers can be self-archived. This paper provides a starting-point for further discussion and development of publisher classification in the open access environment.

Bayesian Optimisation Algorithm for Nurse Scheduling, Scalable Optimization via Probabilistic Modeling

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Our research has shown that schedules can be built mimicking a human scheduler by using a set of rules that involve domain knowledge. This chapter presents a Bayesian Optimization Algorithm (BOA) for the nurse scheduling problem that chooses such suitable scheduling rules from a set for each nurse’s assignment. Based on the idea of using probabilistic models, the BOA builds a Bayesian network for the set of promising solutions and samples these networks to generate new candidate solutions. Computational results from 52 real data instances demonstrate the success of this approach. It is also suggested that the learning mechanism in the proposed algorithm may be suitable for other scheduling problems.

Scalable synthetic method for IT-SOFCs compounds

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Póster presentado en: 12th EUROPEAN SOFC & SOE FORUM 2016. 5–8 July 2016, KKL Lucerne/Switzerland

Investigation of the functional roles of host cell proteins involved in Coronavirus infection using highly specific and scalable RNA interference (RNAi) approach

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Since its identification in the 1990s, the RNA interference (RNAi) pathway has proven extremely useful in elucidating the function of proteins in the context of cells and even whole organisms. In particular, this sequence-specific and powerful loss-of-function approach has greatly simplified the study of the role of host cell factors implicated in the life cycle of viruses. Here, we detail the RNAi method we have developed and used to specifically knock down the expression of ezrin, an actin binding protein that was identified by yeast two-hybrid screening to interact with the Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) spike (S) protein. This method was used to study the role of ezrin, specifically during the entry stage of SARS-CoV infection.

COMPRESSIVE QUANTIZATION FOR SCALABLE CLOUD RADIO ACCESS NETWORKS

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the proliferation of new mobile devices and applications, the demand for ubiquitous wireless services has increased dramatically in recent years. The explosive growth in the wireless traffic requires the wireless networks to be scalable so that they can be efficiently extended to meet the wireless communication demands. In a wireless network, the interference power typically grows with the number of devices without necessary coordination among them. On the other hand, large scale coordination is always difficult due to the low-bandwidth and high-latency interfaces between access points (APs) in traditional wireless networks. To address this challenge, cloud radio access network (C-RAN) has been proposed, where a pool of base band units (BBUs) are connected to the distributed remote radio heads (RRHs) via high bandwidth and low latency links (i.e., the front-haul) and are responsible for all the baseband processing. But the insufficient front-haul link capacity may limit the scale of C-RAN and prevent it from fully utilizing the benefits made possible by the centralized baseband processing. As a result, the front-haul link capacity becomes a bottleneck in the scalability of C-RAN. In this dissertation, we explore the scalable C-RAN in the effort of tackling this challenge. In the first aspect of this dissertation, we investigate the scalability issues in the existing wireless networks and propose a novel time-reversal (TR) based scalable wireless network in which the interference power is naturally mitigated by the focusing effects of TR communications without coordination among APs or terminal devices (TDs). Due to this nice feature, it is shown that the system can be easily extended to serve more TDs. Motivated by the nice properties of TR communications in providing scalable wireless networking solutions, in the second aspect of this dissertation, we apply the TR based communications to the C-RAN and discover the TR tunneling effects which alleviate the traffic load in the front-haul links caused by the increment of TDs. We further design waveforming schemes to optimize the downlink and uplink transmissions in the TR based C-RAN, which are shown to improve the downlink and uplink transmission accuracies. Consequently, the traffic load in the front-haul links is further alleviated by the reducing re-transmissions caused by transmission errors. Moreover, inspired by the TR-based C-RAN, we propose the compressive quantization scheme which applies to the uplink of multi-antenna C-RAN so that more antennas can be utilized with the limited front-haul capacity, which provide rich spatial diversity such that the massive TDs can be served more efficiently.

SCALABLE APPROXIMATION OF KERNEL FUZZY C-MEANS

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Virtually every sector of business and industry that uses computing, including financial analysis, search engines, and electronic commerce, incorporate Big Data analysis into their business model. Sophisticated clustering algorithms are popular for deducing the nature of data by assigning labels to unlabeled data. We address two main challenges in Big Data. First, by definition, the volume of Big Data is too large to be loaded into a computer’s memory (this volume changes based on the computer used or available, but there is always a data set that is too large for any computer). Second, in real-time applications, the velocity of new incoming data prevents historical data from being stored and future data from being accessed. Therefore, we propose our Streaming Kernel Fuzzy c-Means (stKFCM) algorithm, which reduces both computational complexity and space complexity significantly. The proposed stKFCM only requires O(n2) memory where n is the (predetermined) size of a data subset (or data chunk) at each time step, which makes this algorithm truly scalable (as n can be chosen based on the available memory). Furthermore, only 2n2 elements of the full N × N (where N >> n) kernel matrix need to be calculated at each time-step, thus reducing both the computation time in producing the kernel elements and also the complexity of the FCM algorithm. Empirical results show that stKFCM, even with relatively very small n, can provide clustering performance as accurately as kernel fuzzy c-means run on the entire data set while achieving a significant speedup.

«
1
2
...
28
29
30
31
32
33
34
...
63
64
»