114 resultados para scalable
Resumo:
In the past few years, there has been a steady increase in the attention, importance and focus of green initiatives related to data centers. While various energy aware measures have been developed for data centers, the requirement of improving the performance efficiency of application assignment at the same time has yet to be fulfilled. For instance, many energy aware measures applied to data centers maintain a trade-off between energy consumption and Quality of Service (QoS). To address this problem, this paper presents a novel concept of profiling to facilitate offline optimization for a deterministic application assignment to virtual machines. Then, a profile-based model is established for obtaining near-optimal allocations of applications to virtual machines with consideration of three major objectives: energy cost, CPU utilization efficiency and application completion time. From this model, a profile-based and scalable matching algorithm is developed to solve the profile-based model. The assignment efficiency of our algorithm is then compared with that of the Hungarian algorithm, which does not scale well though giving the optimal solution.
Resumo:
Acoustic sensors allow scientists to scale environmental monitoring over large spatiotemporal scales. The faunal vocalisations captured by these sensors can answer ecological questions, however, identifying these vocalisations within recorded audio is difficult: automatic recognition is currently intractable and manual recognition is slow and error prone. In this paper, a semi-automated approach to call recognition is presented. An automated decision support tool is tested that assists users in the manual annotation process. The respective strengths of human and computer analysis are used to complement one another. The tool recommends the species of an unknown vocalisation and thereby minimises the need for the memorization of a large corpus of vocalisations. In the case of a folksonomic tagging system, recommending species tags also minimises the proliferation of redundant tag categories. We describe two algorithms: (1) a “naïve” decision support tool (16%–64% sensitivity) with efficiency of O(n) but which becomes unscalable as more data is added and (2) a scalable alternative with 48% sensitivity and an efficiency ofO(log n). The improved algorithm was also tested in a HTML-based annotation prototype. The result of this work is a decision support tool for annotating faunal acoustic events that may be utilised by other bioacoustics projects.
Resumo:
Monitoring the environment with acoustic sensors is an effective method for understanding changes in ecosystems. Through extensive monitoring, large-scale, ecologically relevant, datasets can be produced that can inform environmental policy. The collection of acoustic sensor data is a solved problem; the current challenge is the management and analysis of raw audio data to produce useful datasets for ecologists. This paper presents the applied research we use to analyze big acoustic datasets. Its core contribution is the presentation of practical large-scale acoustic data analysis methodologies. We describe details of the data workflows we use to provide both citizen scientists and researchers practical access to large volumes of ecoacoustic data. Finally, we propose a work in progress large-scale architecture for analysis driven by a hybrid cloud-and-local production-grade website.
Resumo:
Malaria is a global health problem; an effective vaccine is urgently needed. Due to the relative poverty and lack of infrastructure in malaria endemic areas, DNA-based vaccines that are stable at ambient temperatures and easy to formulate have great potential. While attention has been focused mainly on antigen selection, vector design and efficacy assessment, the development of a rapid and commercially viable process to manufacture DNA is generally overlooked. We report here a continuous purification technique employing an optimized stationary adsorbent to allow high-vaccine recovery, low-processing time, and, hence, high-productivity. A 40.0 mL monolithic stationary phase was synthesized and functionalized with amino groups from 2-Chloro-N,N- diethylethylamine hydrochloride for anion-exchange isolation of a plasmid DNA (pDNA) that encodes a malaria vaccine candidate, VR1020-PyMSP4/5. Physical characterization of the monolithic polymer showed a macroporous material with a modal pore diameter of 750 nm. The final vaccine product isolated after 3 min elution was homogeneous supercoiled plasmid with gDNA, RNA and protein levels in keeping with clinical regulatory standards. Toxicological studies of the pVR1020-PyMSP4/5 showed a minimum endotoxin level of 0.28 EU/m.g pDNA. This cost-effective technique is cGMP compatible and highly scalable for the production of DNA-based vaccines in commercial quantities, when such vaccines prove to be effective against malaria. © 2008 American Institute of Chemical Engineers.
Resumo:
A novel method has been developed to synthesize mesoporous silica spheres using commercial silica colloids (SNOWTEX) as precursors and electrolytes (ammonium nitrate and sodium chloride) as destabilizers. Crosslinked polyacrylamide hydrogel was used as a temporary barrier to obtain dispersible spherical mesoporous silica particles. The influences of synthesis conditions including solution composition and calcination temperature on the formation of the mesoporous silica particles were systematically investigated. The structure and morphology of the mesoporous silica particles were characterized via scanning electron microscopy (SEM) and N2 sorption technique. Mesoporous silica particles with particle diameters ranging from 0.5 to 1.6 μm were produced whilst the BET surface area was in the range of 31-123 m2 g-1. Their pore size could be adjusted from 14.1 to 28.8 nm by increasing the starting particle diameter from 20-30 nm up to 70-100 nm. A simple and cost effective method is reported that should open up new opportunities for the synthesis of scalable host materials with controllable structures.
Resumo:
Systems-level identification and analysis of cellular circuits in the brain will require the development of whole-brain imaging with single-cell resolution. To this end, we performed comprehensive chemical screening to develop a whole-brain clearing and imaging method, termed CUBIC (clear, unobstructed brain imaging cocktails and computational analysis). CUBIC is a simple and efficient method involving the immersion of brain samples in chemical mixtures containing aminoalcohols, which enables rapid whole-brain imaging with single-photon excitation microscopy. CUBIC is applicable to multicolor imaging of fluorescent proteins or immunostained samples in adult brains and is scalable from a primate brain to subcellular structures. We also developed a whole-brain cell-nuclear counterstaining protocol and a computational image analysis pipeline that, together with CUBIC reagents, enable the visualization and quantification of neural activities induced by environmental stimulation. CUBIC enables time-course expression profiling of whole adult brains with single-cell resolution.
Resumo:
This paper addresses the development of trust in the use of Open Data through incorporation of appropriate authentication and integrity parameters for use by end user Open Data application developers in an architecture for trustworthy Open Data Services. The advantages of this architecture scheme is that it is far more scalable, not another certificate-based hierarchy that has problems with certificate revocation management. With the use of a Public File, if the key is compromised: it is a simple matter of the single responsible entity replacing the key pair with a new one and re-performing the data file signing process. Under this proposed architecture, the the Open Data environment does not interfere with the internal security schemes that might be employed by the entity. However, this architecture incorporates, when needed, parameters from the entity, e.g. person who authorized publishing as Open Data, at the time that datasets are created/added.
Resumo:
Discounted Cumulative Gain (DCG) is a well-known ranking evaluation measure for models built with multiple relevance graded data. By handling tagging data used in recommendation systems as an ordinal relevance set of {negative,null,positive}, we propose to build a DCG based recommendation model. We present an efficient and novel learning-to-rank method by optimizing DCG for a recommendation model using the tagging data interpretation scheme. Evaluating the proposed method on real-world datasets, we demonstrate that the method is scalable and outperforms the benchmarking methods by generating a quality top-N item recommendation list.
Resumo:
The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.
Resumo:
This thesis introduces a new way of using prior information in a spatial model and develops scalable algorithms for fitting this model to large imaging datasets. These methods are employed for image-guided radiation therapy and satellite based classification of land use and water quality. This study has utilized a pre-computation step to achieve a hundredfold improvement in the elapsed runtime for model fitting. This makes it much more feasible to apply these models to real-world problems, and enables full Bayesian inference for images with a million or more pixels.
Resumo:
Embedded many-core architectures contain dozens to hundreds of CPU cores that are connected via a highly scalable NoC interconnect. Our Multiprocessor-System-on-Chip CoreVAMPSoC combines the advantages of tightly coupled bus-based communication with the scalability of NoC approaches by adding a CPU cluster as an additional level of hierarchy. In this work, we analyze different cluster interconnect implementations with 8 to 32 CPUs and compare them in terms of resource requirements and performance to hierarchical NoCs approaches. Using 28nm FD-SOI technology the area requirement for 32 CPUs and AXI crossbar is 5.59mm2 including 23.61% for the interconnect at a clock frequency of 830 MHz. In comparison, a hierarchical MPSoC with 4 CPU cluster and 8 CPUs in each cluster requires only 4.83mm2 including 11.61% for the interconnect. To evaluate the performance, we use a compiler for streaming applications to map programs to the different MPSoC configurations. We use this approach for a design-space exploration to find the most efficient architecture and partitioning for an application.
Resumo:
The only effective and scalable way to regulate the actions of people on the internet is through online intermediaries. These are the institutions that facilitate communication: internet service providers, search engines, content hosts, and social networks. Governments, private firms, and civil society organisations are increasingly seeking to influence these intermediaries to take more responsibility to prevent or respond to IP infringements. Around the world, intermediaries are increasingly subject to a variety of obligations to help enforce IP rights, ranging from informal social and governmental pressure, to industry codes and private negotiated agreements, to formal legislative schemes. This paper provides an overview of this emerging shift in regulatory approaches, away from legal liability and towards increased responsibilities for intermediaries. This shift straddles two different potential futures: an optimistic set of more effective, more efficient mechanisms for regulating user behaviour, and a dystopian vision of rule by algorithm and private power, without the legitimising influence of the rule of law.
Resumo:
Organisations are constantly seeking new ways to improve operational efficiencies. This study investigates a novel way to identify potential efficiency gains in business operations by observing how they were carried out in the past and then exploring better ways of executing them by taking into account trade-offs between time, cost and resource utilisation. This paper demonstrates how these trade-offs can be incorporated in the assessment of alternative process execution scenarios by making use of a cost environment. A number of optimisation techniques are proposed to explore and assess alternative execution scenarios. The objective function is represented by a cost structure that captures different process dimensions. An experimental evaluation is conducted to analyse the performance and scalability of the optimisation techniques: integer linear programming (ILP), hill climbing, tabu search, and our earlier proposed hybrid genetic algorithm approach. The findings demonstrate that the hybrid genetic algorithm is scalable and performs better compared to other techniques. Moreover, we argue that the use of ILP is unrealistic in this setup and cannot handle complex cost functions such as the ones we propose. Finally, we show how cost-related insights can be gained from improved execution scenarios and how these can be utilised to put forward recommendations for reducing process-related cost and overhead within organisations.
Resumo:
Experiences showed that developing business applications that base on text analysis normally requires a lot of time and expertise in the field of computer linguistics. Several approaches of integrating text analysis systems with business applications have been proposed, but so far there has been no coordinated approach which would enable building scalable and flexible applications of text analysis in enterprise scenarios. In this paper, a service-oriented architecture for text processing applications in the business domain is introduced. It comprises various groups of processing components and knowledge resources. The architecture, created as a result of our experiences with building natural language processing applications in business scenarios, allows for the reuse of text analysis and other components, and facilitates the development of business applications. We verify our approach by showing how the proposed architecture can be applied to create a text analytics enabled business application that addresses a concrete business scenario. © 2010 IEEE.
Resumo:
The requirement of distributed computing of all-to-all comparison (ATAC) problems in heterogeneous systems is increasingly important in various domains. Though Hadoop-based solutions are widely used, they are inefficient for the ATAC pattern, which is fundamentally different from the MapReduce pattern for which Hadoop is designed. They exhibit poor data locality and unbalanced allocation of comparison tasks, particularly in heterogeneous systems. The results in massive data movement at runtime and ineffective utilization of computing resources, affecting the overall computing performance significantly. To address these problems, a scalable and efficient data and task distribution strategy is presented in this paper for processing large-scale ATAC problems in heterogeneous systems. It not only saves storage space but also achieves load balancing and good data locality for all comparison tasks. Experiments of bioinformatics examples show that about 89\% of the ideal performance capacity of the multiple machines have be achieved through using the approach presented in this paper.