952 resultados para Dynamic data set visualization
Resumo:
Open Research Data - A step by step guide through the research data lifecycle, data set creation, big data vs long-tail, metadata, data centres/data repositories, open access for data, data sharing, data citation and publication.
Resumo:
Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to be analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss Preprocessing, Standardization, Imputation, Projection, Regularization, Penalization, Compression, Reduction, Selection, Kernelization, Hybridization, Parallelization, Aggregation, Randomization, Replication, Sequentialization. Indeed, it is important to emphasize right away that the so-called no free lunch theorem applies here, in the sense that there is no universally superior method that outperforms all other methods on all categories of bigness. It is also important to stress the fact that simplicity in the sense of Ockham’s razor non-plurality principle of parsimony tends to reign supreme when it comes to massive data. We conclude with a comparison of the predictive performance of some of the most commonly used methods on a few data sets.
Resumo:
Most current 3D landscape visualisation systems either use bespoke hardware solutions, or offer a limited amount of interaction and detail when used in realtime mode. We are developing a modular, data driven 3D visualisation system that can be readily customised to specific requirements. By utilising the latest software engineering methods and bringing a dynamic data driven approach to geo-spatial data visualisation we will deliver an unparalleled level of customisation in near-photo realistic, realtime 3D landscape visualisation. In this paper we show the system framework and describe how this employs data driven techniques. In particular we discuss how data driven approaches are applied to the spatiotemporal management aspect of the application framework, and describe the advantages these convey. © Springer-Verlag Berlin Heidelberg 2006.
Resumo:
The seminal multiple-view stereo benchmark evaluations from Middlebury and by Strecha et al. have played a major role in propelling the development of multi-view stereopsis (MVS) methodology. The somewhat small size and variability of these data sets, however, limit their scope and the conclusions that can be derived from them. To facilitate further development within MVS, we here present a new and varied data set consisting of 80 scenes, seen from 49 or 64 accurate camera positions. This is accompanied by accurate structured light scans for reference and evaluation. In addition all images are taken under seven different lighting conditions. As a benchmark and to validate the use of our data set for obtaining reasonable and statistically significant findings about MVS, we have applied the three state-of-the-art MVS algorithms by Campbell et al., Furukawa et al., and Tola et al. to the data set. To do this we have extended the evaluation protocol from the Middlebury evaluation, necessitated by the more complex geometry of some of our scenes. The data set and accompanying evaluation framework are made freely available online. Based on this evaluation, we are able to observe several characteristics of state-of-the-art MVS, e.g. that there is a tradeoff between the quality of the reconstructed 3D points (accuracy) and how much of an object’s surface is captured (completeness). Also, several issues that we hypothesized would challenge MVS, such as specularities and changing lighting conditions did not pose serious problems. Our study finds that the two most pressing issues for MVS are lack of texture and meshing (forming 3D points into closed triangulated surfaces).
Resumo:
Az árhatásfüggvények azt mutatják meg, hogy egy adott értékű megbízás mekkora relatív árváltozást okoz. Az árhatásfüggvény ismerete a piaci szereplők számára fontos szerepet játszik a jövőben benyújtandó ajánlataikhoz kapcsolódó árhatás előrejelzésében, a kereskedés árváltozásból eredő többletköltségének becslésében, illetve az optimális kereskedési algoritmus kialakításában. Az általunk kidolgozott módszer révén a piaci szereplők a teljes ajánlati könyv ismerete nélkül egyszerűen és gyorsan tudnak virtuális árhatásfüggvényt meghatározni, ugyanis bemutatjuk az árhatásfüggvény és a likviditási mértékek kapcsolatát, valamint azt, hogy miként lehet a Budapesti Likviditási Mérték (BLM) idősorából ár ha tás függ vényt becsülni. A kidolgozott módszertant az OTP-részvény idősorán szemléltetjük, és a részvény BLM-adatsorából a 2007. január 1-je és 2011. június 3-a közötti időszakra virtuális árhatás függvényt becsülünk. Empirikus elemzésünk során az árhatás függ vény időbeli alakulásának és alapvető statisztikai tulajdonságainak vizsgálatát végezzük el, ami révén képet kaphatunk a likviditás hiányában fellépő tranzakciós költségek múltbeli viselkedéséről. Az így kapott információk például a dinamikus portfólióoptimalizálás során lehetnek a kereskedők segítségére. / === / Price-effect equations show what relative price change a commission of a given value will have. Knowledge of price-effect equations plays an important part in enabling market players to predict the price effect of their future commissions and to develop an optimal trading algorithm. The method devised by the authors allows a virtual price-effect equation to be defined simply and rapidly without knowledge of the whole offer book, by presenting the relation between the price-effect equation and degree of liquidity, and how to estimate the price-effect equation from the time line of the Budapest Liquidity Measure (BLM). The methodology is shown using the time line for OTP shares and the virtual price-effect equation estimated for the 1 January 2007 to 3 June 2011 period from the shares BML data set. During the empirical analysis the authors conducted an examination of the tendency of the price-effect equation over time and for its basic statistical attributes, to yield a picture of the past behaviour of the transaction costs arising in the absence of liquidity. The information obtained may, for instance, help traders in dynamic portfolio optimization.
Resumo:
Airborne Light Detection and Ranging (LIDAR) technology has become the primary method to derive high-resolution Digital Terrain Models (DTMs), which are essential for studying Earth's surface processes, such as flooding and landslides. The critical step in generating a DTM is to separate ground and non-ground measurements in a voluminous point LIDAR dataset, using a filter, because the DTM is created by interpolating ground points. As one of widely used filtering methods, the progressive morphological (PM) filter has the advantages of classifying the LIDAR data at the point level, a linear computational complexity, and preserving the geometric shapes of terrain features. The filter works well in an urban setting with a gentle slope and a mixture of vegetation and buildings. However, the PM filter often removes ground measurements incorrectly at the topographic high area, along with large sizes of non-ground objects, because it uses a constant threshold slope, resulting in "cut-off" errors. A novel cluster analysis method was developed in this study and incorporated into the PM filter to prevent the removal of the ground measurements at topographic highs. Furthermore, to obtain the optimal filtering results for an area with undulating terrain, a trend analysis method was developed to adaptively estimate the slope-related thresholds of the PM filter based on changes of topographic slopes and the characteristics of non-terrain objects. The comparison of the PM and generalized adaptive PM (GAPM) filters for selected study areas indicates that the GAPM filter preserves the most "cut-off" points removed incorrectly by the PM filter. The application of the GAPM filter to seven ISPRS benchmark datasets shows that the GAPM filter reduces the filtering error by 20% on average, compared with the method used by the popular commercial software TerraScan. The combination of the cluster method, adaptive trend analysis, and the PM filter allows users without much experience in processing LIDAR data to effectively and efficiently identify ground measurements for the complex terrains in a large LIDAR data set. The GAPM filter is highly automatic and requires little human input. Therefore, it can significantly reduce the effort of manually processing voluminous LIDAR measurements.
Resumo:
The purpose of this project was to evaluate the use of remote sensing 1) to detect and map Everglades wetland plant communities at different scales; and 2) to compare map products delineated and resampled at various scales with the intent to quantify and describe the quantitative and qualitative differences between such products. We evaluated data provided by Digital Globe’s WorldView 2 (WV2) sensor with a spatial resolution of 2m and data from Landsat’s Thematic and Enhanced Thematic Mapper (TM and ETM+) sensors with a spatial resolution of 30m. We were also interested in the comparability and scalability of products derived from these data sources. The adequacy of each data set to map wetland plant communities was evaluated utilizing two metrics: 1) model-based accuracy estimates of the classification procedures; and 2) design-based post-classification accuracy estimates of derived maps.
Resumo:
We provide a compilation of downward fluxes (total mass, POC, PON, BSiO2, CaCO3, PIC and lithogenic/terrigenous fluxes) from over 6000 sediment trap measurements distributed in the Atlantic Ocean, from 30 degree North to 49 degree South, and covering the period 1982-2011. Data from the Mediterranean Sea are also included. Data were compiled from different sources: data repositories (BCO-DMO, PANGAEA), time series sites (BATS, CARIACO), published scientific papers and/or personal communications from PI's. All sources are specifed in the data set. Data from the World Ocean Atlas 2009 were extracted to provide each flux observation with contextual environmental data, such as temperature, salinity, oxygen (concentration, AOU and percentage saturation), nitrate, phosphate and silicate.
Resumo:
Corvio sandstone is a ~20 m thick unit (Corvio Formation) that appears in the top section of the Frontada Formation (Campoó Group; Lower Cretaceous) located in Northern Spain in the southern margin of the Basque-Cantabrian Basin. Up to 228 plugs were cored from four 0.3 x 0.2 x 0.5 m blocks of Corvio sandstone, to perform a comprehensive characterization of the physical, mineralogical, geomechanical, geophysical and hydrodynamic properties of this geological formation, and the anisotropic assessment of the most relevant parameters. Here we present the first data set obtained on 53 plugs which covers (i) basic physical and chemical properties including density, porosity, specific surface area and elementary analysis (XRF - CHNS); (ii) the curves obtained during unconfined and confined strengths tests, the tensile strengths, the calculated static elastic moduli and the characteristic stress levels describing the brittle behaviour of the rock; (iii) P- and S-wave velocities (and dynamic elastic moduli) and their respective attenuation factors Qp and Qs, electrical resistivity for a wide range of confining stress; and (iv) permeability and transport tracer tests. Furthermore, the geophysical, permeability and transport tests were additionally performed along the three main orthogonal directions of the original blocks, in order to complete a preliminary anisotropic assessment of the Corvio sandstone.
Resumo:
Reconstructing past modes of ocean circulation is an essential task in paleoclimatology and paleoceanography. To this end, we combine two sedimentary proxies, Nd isotopes (epsilon-Nd) and the 231Pa/230Th ratio, both of which are not directly involved in the global carbon cycle, but allow the reconstruction of water mass provenance and provide information about the past strength of overturning circulation, respectively. In this study, combined 231Pa/230Th and epsilon-Nd down-core profiles from six Atlantic Ocean sediment cores are presented. The data set is complemented by the two available combined data sets from the literature. From this we derive a comprehensive picture of spatial and temporal patterns and the dynamic changes of the Atlantic Meridional Overturning Circulation over the past ~25 ka. Our results provide evidence for a consistent pattern of glacial/stadial advances of Southern Sourced Water along with a northward circulation mode for all cores in the deeper (>3000 m) Atlantic. Results from shallower core sites support an active overturning cell of shoaled Northern Sourced Water during the LGM and the subsequent deglaciation. Furthermore, we report evidence for a short-lived period of intensified AMOC in the early Holocene.
Resumo:
We consider a class of initial data sets (Σ,h,K) for the Einstein constraint equations which we define to be generalized Brill (GB) data. This class of data is simply connected, U(1)²-invariant, maximal, and four-dimensional with two asymptotic ends. We study the properties of GB data and in particular the topology of Σ. The GB initial data sets have applications in geometric inequalities in general relativity. We construct a mass functional M for GB initial data sets and we show:(i) the mass of any GB data is greater than or equals M, (ii) it is a non-negative functional for a broad subclass of GB data, (iii) it evaluates to the ADM mass of reduced t − φi symmetric data set, (iv) its critical points are stationary U(1)²-invariant vacuum solutions to the Einstein equations. Then we use this mass functional and prove two geometric inequalities: (1) a positive mass theorem for subclass of GB initial data which includes Myers-Perry black holes, (2) a class of local mass-angular momenta inequalities for U(1)²-invariant black holes. Finally, we construct a one-parameter family of initial data sets which we show can be seen as small deformations of the extreme Myers- Perry black hole which preserve the horizon geometry and angular momenta but have strictly greater energy.
Resumo:
This data set contains seasonal forecasts of sea surface temperature and Arctic sea ice extent from state-of-the-art climate models, along with observational references used to evaluate those forecasts. Common skill scores like the correlation between modelled and observed time series are also reported.
Resumo:
Brain-computer interfaces (BCI) have the potential to restore communication or control abilities in individuals with severe neuromuscular limitations, such as those with amyotrophic lateral sclerosis (ALS). The role of a BCI is to extract and decode relevant information that conveys a user's intent directly from brain electro-physiological signals and translate this information into executable commands to control external devices. However, the BCI decision-making process is error-prone due to noisy electro-physiological data, representing the classic problem of efficiently transmitting and receiving information via a noisy communication channel.
This research focuses on P300-based BCIs which rely predominantly on event-related potentials (ERP) that are elicited as a function of a user's uncertainty regarding stimulus events, in either an acoustic or a visual oddball recognition task. The P300-based BCI system enables users to communicate messages from a set of choices by selecting a target character or icon that conveys a desired intent or action. P300-based BCIs have been widely researched as a communication alternative, especially in individuals with ALS who represent a target BCI user population. For the P300-based BCI, repeated data measurements are required to enhance the low signal-to-noise ratio of the elicited ERPs embedded in electroencephalography (EEG) data, in order to improve the accuracy of the target character estimation process. As a result, BCIs have relatively slower speeds when compared to other commercial assistive communication devices, and this limits BCI adoption by their target user population. The goal of this research is to develop algorithms that take into account the physical limitations of the target BCI population to improve the efficiency of ERP-based spellers for real-world communication.
In this work, it is hypothesised that building adaptive capabilities into the BCI framework can potentially give the BCI system the flexibility to improve performance by adjusting system parameters in response to changing user inputs. The research in this work addresses three potential areas for improvement within the P300 speller framework: information optimisation, target character estimation and error correction. The visual interface and its operation control the method by which the ERPs are elicited through the presentation of stimulus events. The parameters of the stimulus presentation paradigm can be modified to modulate and enhance the elicited ERPs. A new stimulus presentation paradigm is developed in order to maximise the information content that is presented to the user by tuning stimulus paradigm parameters to positively affect performance. Internally, the BCI system determines the amount of data to collect and the method by which these data are processed to estimate the user's target character. Algorithms that exploit language information are developed to enhance the target character estimation process and to correct erroneous BCI selections. In addition, a new model-based method to predict BCI performance is developed, an approach which is independent of stimulus presentation paradigm and accounts for dynamic data collection. The studies presented in this work provide evidence that the proposed methods for incorporating adaptive strategies in the three areas have the potential to significantly improve BCI communication rates, and the proposed method for predicting BCI performance provides a reliable means to pre-assess BCI performance without extensive online testing.
Resumo:
Recent research into resting-state functional magnetic resonance imaging (fMRI) has shown that the brain is very active during rest. This thesis work utilizes blood oxygenation level dependent (BOLD) signals to investigate the spatial and temporal functional network information found within resting-state data, and aims to investigate the feasibility of extracting functional connectivity networks using different methods as well as the dynamic variability within some of the methods. Furthermore, this work looks into producing valid networks using a sparsely-sampled sub-set of the original data.
In this work we utilize four main methods: independent component analysis (ICA), principal component analysis (PCA), correlation, and a point-processing technique. Each method comes with unique assumptions, as well as strengths and limitations into exploring how the resting state components interact in space and time.
Correlation is perhaps the simplest technique. Using this technique, resting-state patterns can be identified based on how similar the time profile is to a seed region’s time profile. However, this method requires a seed region and can only identify one resting state network at a time. This simple correlation technique is able to reproduce the resting state network using subject data from one subject’s scan session as well as with 16 subjects.
Independent component analysis, the second technique, has established software programs that can be used to implement this technique. ICA can extract multiple components from a data set in a single analysis. The disadvantage is that the resting state networks it produces are all independent of each other, making the assumption that the spatial pattern of functional connectivity is the same across all the time points. ICA is successfully able to reproduce resting state connectivity patterns for both one subject and a 16 subject concatenated data set.
Using principal component analysis, the dimensionality of the data is compressed to find the directions in which the variance of the data is most significant. This method utilizes the same basic matrix math as ICA with a few important differences that will be outlined later in this text. Using this method, sometimes different functional connectivity patterns are identifiable but with a large amount of noise and variability.
To begin to investigate the dynamics of the functional connectivity, the correlation technique is used to compare the first and second halves of a scan session. Minor differences are discernable between the correlation results of the scan session halves. Further, a sliding window technique is implemented to study the correlation coefficients through different sizes of correlation windows throughout time. From this technique it is apparent that the correlation level with the seed region is not static throughout the scan length.
The last method introduced, a point processing method, is one of the more novel techniques because it does not require analysis of the continuous time points. Here, network information is extracted based on brief occurrences of high or low amplitude signals within a seed region. Because point processing utilizes less time points from the data, the statistical power of the results is lower. There are also larger variations in DMN patterns between subjects. In addition to boosted computational efficiency, the benefit of using a point-process method is that the patterns produced for different seed regions do not have to be independent of one another.
This work compares four unique methods of identifying functional connectivity patterns. ICA is a technique that is currently used by many scientists studying functional connectivity patterns. The PCA technique is not optimal for the level of noise and the distribution of the data sets. The correlation technique is simple and obtains good results, however a seed region is needed and the method assumes that the DMN regions is correlated throughout the entire scan. Looking at the more dynamic aspects of correlation changing patterns of correlation were evident. The last point-processing method produces a promising results of identifying functional connectivity networks using only low and high amplitude BOLD signals.
Resumo:
To provide biological insights into transcriptional regulation, a couple of groups have recently presented models relating the promoter DNA-bound transcription factors (TFs) to downstream gene’s mean transcript level or transcript production rates over time. However, transcript production is dynamic in response to changes of TF concentrations over time. Also, TFs are not the only factors binding to promoters; other DNA binding factors (DBFs) bind as well, especially nucleosomes, resulting in competition between DBFs for binding at same genomic location. Additionally, not only TFs, but also some other elements regulate transcription. Within core promoter, various regulatory elements influence RNAPII recruitment, PIC formation, RNAPII searching for TSS, and RNAPII initiating transcription. Moreover, it is proposed that downstream from TSS, nucleosomes resist RNAPII elongation.
Here, we provide a machine learning framework to predict transcript production rates from DNA sequences. We applied this framework in the S. cerevisiae yeast for two scenarios: a) to predict the dynamic transcript production rate during the cell cycle for native promoters; b) to predict the mean transcript production rate over time for synthetic promoters. As far as we know, our framework is the first successful attempt to have a model that can predict dynamic transcript production rates from DNA sequences only: with cell cycle data set, we got Pearson correlation coefficient Cp = 0.751 and coefficient of determination r2 = 0.564 on test set for predicting dynamic transcript production rate over time. Also, for DREAM6 Gene Promoter Expression Prediction challenge, our fitted model outperformed all participant teams, best of all teams, and a model combining best team’s k-mer based sequence features and another paper’s biologically mechanistic features, in terms of all scoring metrics.
Moreover, our framework shows its capability of identifying generalizable fea- tures by interpreting the highly predictive models, and thereby provide support for associated hypothesized mechanisms about transcriptional regulation. With the learned sparse linear models, we got results supporting the following biological insights: a) TFs govern the probability of RNAPII recruitment and initiation possibly through interactions with PIC components and transcription cofactors; b) the core promoter amplifies the transcript production probably by influencing PIC formation, RNAPII recruitment, DNA melting, RNAPII searching for and selecting TSS, releasing RNAPII from general transcription factors, and thereby initiation; c) there is strong transcriptional synergy between TFs and core promoter elements; d) the regulatory elements within core promoter region are more than TATA box and nucleosome free region, suggesting the existence of still unidentified TAF-dependent and cofactor-dependent core promoter elements in yeast S. cerevisiae; e) nucleosome occupancy is helpful for representing +1 and -1 nucleosomes’ regulatory roles on transcription.