957 resultados para Imbalanced datasets


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The results of the high-quality nonlinear pulse compression of gain-switched laser diode pulses using a two-cascade compression scheme are presented. The scheme incorporates a dispersive delay line and a nonlinear pulse compressor based on a dispersion-imbalanced fiber loop mirror (DILM). It is demonstrated that the DILM can be also used for the pulse compression with a compression ratio of 10 or higher.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The nonlinear filtering of a 10Gb/s data stream in a dispersion-imbalanced fibre loop mirror has been demonstrated over a wide spectral range of 28nm. A relative extinction ratio of - 30 dB for the cw background has been achieved across the whole spectral range.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

MOTIVATION: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct-but often complementary-information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. RESULTS: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI's performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques-as well as to non-integrative approaches-demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Long-term biological time-series in the oceans are relatively rare. Using the two longest of these we show how the information value of such ecological time-series increases through space and time in terms of their potential policy value. We also explore the co-evolution of these oceanic biological time-series with changing marine management drivers. Lessons learnt from reviewing these sequences of observations provide valuable context for the continuation of existing time-series and perspective for the initiation of new time-series in response to rapid global change. Concluding sections call for a more integrated approach to marine observation systems and highlight the future role of ocean observations in adaptive marine management.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The use of in situ measurements is essential in the validation and evaluation of the algorithms that provide coastal water quality data products from ocean colour satellite remote sensing. Over the past decade, various types of ocean colour algorithms have been developed to deal with the optical complexity of coastal waters. Yet there is a lack of a comprehensive intercomparison due to the availability of quality checked in situ databases. The CoastColour Round Robin (CCRR) project, funded by the European Space Agency (ESA), was designed to bring together three reference data sets using these to test algorithms and to assess their accuracy for retrieving water quality parameters. This paper provides a detailed description of these reference data sets, which include the Medium Resolution Imaging Spectrometer (MERIS) level 2 match-ups, in situ reflectance measurements, and synthetic data generated by a radiative transfer model (HydroLight). These data sets, representing mainly coastal waters, are available from doi:10.1594/PANGAEA.841950. The data sets mainly consist of 6484 marine reflectance (either multispectral or hyperspectral) associated with various geometrical (sensor viewing and solar angles) and sky conditions and water constituents: total suspended matter (TSM) and chlorophyll a (CHL) concentrations, and the absorption of coloured dissolved organic matter (CDOM). Inherent optical properties are also provided in the simulated data sets (5000 simulations) and from 3054 match-up locations. The distributions of reflectance at selected MERIS bands and band ratios, CHL and TSM as a function of reflectance, from the three data sets are compared. Match-up and in situ sites where deviations occur are identified. The distributions of the three reflectance data sets are also compared to the simulated and in situ reflectances used previously by the International Ocean Colour Coordinating Group (IOCCG, 2006) for algorithm testing, showing a clear extension of the CCRR data which covers more turbid waters.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The effect of differing the datasets used in the modelling of the Ni-like Gd x-ray laser (XRL) is examined through the 1.50 hydro-atomic code, EHYBRID. Two atomic datasets, including energy levels and radiative and collisional excitation rates, are used as input data for the code. It is found that the behaviour of the XRL is somewhat different than might be expected from superficial examination of the atomic data. The similarities in the gain profiles at low densities are found to have encouraging implications. in our attempts to model XRLs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Spectral signal intensities, especially in 'real-world' applications with nonstandardized sample presentation due to uncontrolled variables/factors, commonly require additional spectral processing to normalize signal intensity in an effective way. In this study, we have demonstrated the complexity of choosing a normalization routine in the presence of multiple spectrally distinct constituents by probing a dataset of Raman spectra. Variation in absolute signal intensity (90.1% of total variance) of the Raman spectra of these complex biological samples swamps the variation in useful signals (9.4% of total variance), degrading its diagnostic and evaluative potential.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The problem of learning from imbalanced data is of critical importance in a large number of application domains and can be a bottleneck in the performance of various conventional learning methods that assume the data distribution to be balanced. The class imbalance problem corresponds to dealing with the situation where one class massively outnumbers the other. The imbalance between majority and minority would lead machine learning to be biased and produce unreliable outcomes if the imbalanced data is used directly. There has been increasing interest in this research area and a number of algorithms have been developed. However, independent evaluation of the algorithms is limited. This paper aims at evaluating the performance of five representative data sampling methods namely SMOTE, ADASYN, BorderlineSMOTE, SMOTETomek and RUSBoost that deal with class imbalance problems. A comparative study is conducted and the performance of each method is critically analysed in terms of assessment metrics. © 2013 Springer-Verlag.