51 resultados para downloading of data


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Burst timing synchronisation is maintained in a digital data decoder during multiple burst reception in a TDMA system. The data within a multiple burst are streamed into memory storage and data corresponding to a first burst in the series of bursts are selected on the basis of a current timing estimate derived from a synchronisation burst. Selections of data corresponding to other bursts in the series of bursts are modified in accordance with updated timing estimates derived from previously processed bursts.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

 In the last decade, a vast number of land surface schemes has been designed for use in global climate models, atmospheric weather prediction, mesoscale numerical models, ecological models, and models of global changes. Since land surface schemes are designed for different purposes they have various levels of complexity in the treatment of bare soil processes, vegetation, and soil water movement. This paper is a contribution to a little group of papers dealing with intercomparison of differently designed and oriented land surface schemes. For that purpose we have chosen three schemes for classification: i) global climate models, BATS (Dickinson et al., 1986; Dickinson et al., 1992); ii) mesoscale and ecological models, LEAF (Lee, 1992) and iii) mesoscale models, LAPS (Mihailović, 1996; Mihailović and Kallos, 1997; Mihailović et al., 1999) according to the Shao et al. (1995) classification. These schemes were compared using surface fluxes and leaf temperature outputs obtained by time integrations of data sets derived from the micrometeorological measurements above a maize field at an experimental site in De Sinderhoeve (The Netherlands) for 18 August, 8 September, and 4 October 1988. Finally, comparison of the schemes was supported applying a simple statistical analysis on the surface flux outputs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVES: The prediction of protein structure and the precise understanding of protein folding and unfolding processes remains one of the greatest challenges in structural biology and bioinformatics. Computer simulations based on molecular dynamics (MD) are at the forefront of the effort to gain a deeper understanding of these complex processes. Currently, these MD simulations are usually on the order of tens of nanoseconds, generate a large amount of conformational data and are computationally expensive. More and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. METHODS: To adequately organize, manage, and analyze the data generated by unfolding simulation studies, we designed a data warehouse system that is embedded in a grid environment to facilitate the seamless sharing of available computer resources and thus enable many groups to share complex molecular dynamics simulations on a more regular basis. RESULTS: To gain insight into the conformational fluctuations and stability of the monomeric forms of the amyloidogenic protein transthyretin (TTR), molecular dynamics unfolding simulations of the monomer of human TTR have been conducted. Trajectory data and meta-data of the wild-type (WT) protein and the highly amyloidogenic variant L55P-TTR represent the test case for the data warehouse. CONCLUSIONS: Web and grid services, especially pre-defined data mining services that can run on or 'near' the data repository of the data warehouse, are likely to play a pivotal role in the analysis of molecular dynamics unfolding data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the increasing awareness of protein folding disorders, the explosion of genomic information, and the need for efficient ways to predict protein structure, protein folding and unfolding has become a central issue in molecular sciences research. Molecular dynamics computer simulations are increasingly employed to understand the folding and unfolding of proteins. Running protein unfolding simulations is computationally expensive and finding ways to enhance performance is a grid issue on its own. However, more and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. This paper describes efforts to provide a grid-enabled data warehouse for protein unfolding data. We outline the challenge and present first results in the design and implementation of the data warehouse.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the recent years, the area of data mining has been experiencing considerable demand for technologies that extract knowledge from large and complex data sources. There has been substantial commercial interest as well as active research in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from large datasets. Artificial neural networks (NNs) are popular biologically-inspired intelligent methodologies, whose classification, prediction, and pattern recognition capabilities have been utilized successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction, and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks. © 2012 Wiley Periodicals, Inc.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

 In the last decade, a vast number of land surface schemes has been designed for use in global climate models, atmospheric weather prediction, mesoscale numerical models, ecological models, and models of global changes. Since land surface schemes are designed for different purposes they have various levels of complexity in the treatment of bare soil processes, vegetation, and soil water movement. This paper is a contribution to a little group of papers dealing with intercomparison of differently designed and oriented land surface schemes. For that purpose we have chosen three schemes for classification: i) global climate models, BATS (Dickinson et al., 1986; Dickinson et al., 1992); ii) mesoscale and ecological models, LEAF (Lee, 1992) and iii) mesoscale models, LAPS (Mihailović, 1996; Mihailović and Kallos, 1997; Mihailović et al., 1999) according to the Shao et al. (1995) classification. These schemes were compared using surface fluxes and leaf temperature outputs obtained by time integrations of data sets derived from the micrometeorological measurements above a maize field at an experimental site in De Sinderhoeve (The Netherlands) for 18 August, 8 September, and 4 October 1988. Finally, comparison of the schemes was supported applying a simple statistical analysis on the surface flux outputs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Progress in functional neuroimaging of the brain increasingly relies on the integration of data from complementary imaging modalities in order to improve spatiotemporal resolution and interpretability. However, the usefulness of merely statistical combinations is limited, since neural signal sources differ between modalities and are related non-trivially. We demonstrate here that a mean field model of brain activity can simultaneously predict EEG and fMRI BOLD with proper signal generation and expression. Simulations are shown using a realistic head model based on structural MRI, which includes both dense short-range background connectivity and long-range specific connectivity between brain regions. The distribution of modeled neural masses is comparable to the spatial resolution of fMRI BOLD, and the temporal resolution of the modeled dynamics, importantly including activity conduction, matches the fastest known EEG phenomena. The creation of a cortical mean field model with anatomically sound geometry, extensive connectivity, and proper signal expression is an important first step towards the model-based integration of multimodal neuroimages.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Considerable progress has taken place in numerical weather prediction over the last decade. It has been possible to extend predictive skills in the extra-tropics of the Northern Hemisphere during the winter from less than five days to seven days. Similar improvements, albeit on a lower level, have taken place in the Southern Hemisphere. Another example of improvement in the forecasts is the prediction of intense synoptic phenomena such as cyclogenesis which on the whole is quite successful with the most advanced operational models (Bengtsson (1989), Gadd and Kruze (1988)). A careful examination shows that there are no single causes for the improvements in predictive skill, but instead they are due to several different factors encompassing the forecasting system as a whole (Bengtsson, 1985). In this paper we will focus our attention on the role of data-assimilation and the effect it may have on reducing the initial error and hence improving the forecast. The first part of the paper contains a theoretical discussion on error growth in simple data assimilation systems, following Leith (1983). In the second part we will apply the result on actual forecast data from ECMWF. The potential for further forecast improvements within the framework of the present observing system in the two hemispheres will be discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

n the past decade, the analysis of data has faced the challenge of dealing with very large and complex datasets and the real-time generation of data. Technologies to store and access these complex and large datasets are in place. However, robust and scalable analysis technologies are needed to extract meaningful information from these datasets. The research field of Information Visualization and Visual Data Analytics addresses this need. Information visualization and data mining are often used complementary to each other. Their common goal is the extraction of meaningful information from complex and possibly large data. However, though data mining focuses on the usage of silicon hardware, visualization techniques also aim to access the powerful image-processing capabilities of the human brain. This article highlights the research on data visualization and visual analytics techniques. Furthermore, we highlight existing visual analytics techniques, systems, and applications including a perspective on the field from the chemical process industry.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples. Results: We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2 of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log(2) units (6 of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators. Conclusions: This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Palaeodata in synthesis form are needed as benchmarks for the Palaeoclimate Modelling Intercomparison Project (PMIP). Advances since the last synthesis of terrestrial palaeodata from the last glacial maximum (LGM) call for a new evaluation, especially of data from the tropics. Here pollen, plant-macrofossil, lake-level, noble gas (from groundwater) and δ18O (from speleothems) data are compiled for 18±2 ka (14C), 32 °N–33 °S. The reliability of the data was evaluated using explicit criteria and some types of data were re-analysed using consistent methods in order to derive a set of mutually consistent palaeoclimate estimates of mean temperature of the coldest month (MTCO), mean annual temperature (MAT), plant available moisture (PAM) and runoff (P-E). Cold-month temperature (MAT) anomalies from plant data range from −1 to −2 K near sea level in Indonesia and the S Pacific, through −6 to −8 K at many high-elevation sites to −8 to −15 K in S China and the SE USA. MAT anomalies from groundwater or speleothems seem more uniform (−4 to −6 K), but the data are as yet sparse; a clear divergence between MAT and cold-month estimates from the same region is seen only in the SE USA, where cold-air advection is expected to have enhanced cooling in winter. Regression of all cold-month anomalies against site elevation yielded an estimated average cooling of −2.5 to −3 K at modern sea level, increasing to ≈−6 K by 3000 m. However, Neotropical sites showed larger than the average sea-level cooling (−5 to −6 K) and a non-significant elevation effect, whereas W and S Pacific sites showed much less sea-level cooling (−1 K) and a stronger elevation effect. These findings support the inference that tropical sea-surface temperatures (SSTs) were lower than the CLIMAP estimates, but they limit the plausible average tropical sea-surface cooling, and they support the existence of CLIMAP-like geographic patterns in SST anomalies. Trends of PAM and lake levels indicate wet LGM conditions in the W USA, and at the highest elevations, with generally dry conditions elsewhere. These results suggest a colder-than-present ocean surface producing a weaker hydrological cycle, more arid continents, and arguably steeper-than-present terrestrial lapse rates. Such linkages are supported by recent observations on freezing-level height and tropical SSTs; moreover, simulations of “greenhouse” and LGM climates point to several possible feedback processes by which low-level temperature anomalies might be amplified aloft.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this article, we review the state-of-the-art techniques in mining data streams for mobile and ubiquitous environments. We start the review with a concise background of data stream processing, presenting the building blocks for mining data streams. In a wide range of applications, data streams are required to be processed on small ubiquitous devices like smartphones and sensor devices. Mobile and ubiquitous data mining target these applications with tailored techniques and approaches addressing scarcity of resources and mobility issues. Two categories can be identified for mobile and ubiquitous mining of streaming data: single-node and distributed. This survey will cover both categories. Mining mobile and ubiquitous data require algorithms with the ability to monitor and adapt the working conditions to the available computational resources. We identify the key characteristics of these algorithms and present illustrative applications. Distributed data stream mining in the mobile environment is then discussed, presenting the Pocket Data Mining framework. Mobility of users stimulates the adoption of context-awareness in this area of research. Context-awareness and collaboration are discussed in the Collaborative Data Stream Mining, where agents share knowledge to learn adaptive accurate models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this study was to develop an understanding of the current state of scientific data sharing that stakeholders could use to develop and implement effective data sharing strategies and policies. The study developed a conceptual model to describe the process of data sharing, and the drivers, barriers, and enablers that determine stakeholder engagement. The conceptual model was used as a framework to structure discussions and interviews with key members of all stakeholder groups. Analysis of data obtained from interviewees identified a number of themes that highlight key requirements for the development of a mature data sharing culture.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The term 'big data' has recently emerged to describe a range of technological and commercial trends enabling the storage and analysis of huge amounts of customer data, such as that generated by social networks and mobile devices. Much of the commercial promise of big data is in the ability to generate valuable insights from collecting new types and volumes of data in ways that were not previously economically viable. At the same time a number of questions have been raised about the implications for individual privacy. This paper explores key perspectives underlying the emergence of big data, and considers both the opportunities and ethical challenges raised for market research.