982 resultados para datasets


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The effect of differing the datasets used in the modelling of the Ni-like Gd x-ray laser (XRL) is examined through the 1.50 hydro-atomic code, EHYBRID. Two atomic datasets, including energy levels and radiative and collisional excitation rates, are used as input data for the code. It is found that the behaviour of the XRL is somewhat different than might be expected from superficial examination of the atomic data. The similarities in the gain profiles at low densities are found to have encouraging implications. in our attempts to model XRLs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Spectral signal intensities, especially in 'real-world' applications with nonstandardized sample presentation due to uncontrolled variables/factors, commonly require additional spectral processing to normalize signal intensity in an effective way. In this study, we have demonstrated the complexity of choosing a normalization routine in the presence of multiple spectrally distinct constituents by probing a dataset of Raman spectra. Variation in absolute signal intensity (90.1% of total variance) of the Raman spectra of these complex biological samples swamps the variation in useful signals (9.4% of total variance), degrading its diagnostic and evaluative potential.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: While the discovery of new drugs is a complex, lengthy and costly process, identifying new uses for existing drugs is a cost-effective approach to therapeutic discovery. Connectivity mapping integrates gene expression profiling with advanced algorithms to connect genes, diseases and small molecule compounds and has been applied in a large number of studies to identify potential drugs, particularly to facilitate drug repurposing. Colorectal cancer (CRC) is a commonly diagnosed cancer with high mortality rates, presenting a worldwide health problem. With the advancement of high throughput omics technologies, a number of large scale gene expression profiling studies have been conducted on CRCs, providing multiple datasets in gene expression data repositories. In this work, we systematically apply gene expression connectivity mapping to multiple CRC datasets to identify candidate therapeutics to this disease.

RESULTS: We developed a robust method to compile a combined gene signature for colorectal cancer across multiple datasets. Connectivity mapping analysis with this signature of 148 genes identified 10 candidate compounds, including irinotecan and etoposide, which are chemotherapy drugs currently used to treat CRCs. These results indicate that we have discovered high quality connections between the CRC disease state and the candidate compounds, and that the gene signature we created may be used as a potential therapeutic target in treating the disease. The method we proposed is highly effective in generating quality gene signature through multiple datasets; the publication of the combined CRC gene signature and the list of candidate compounds from this work will benefit both cancer and systems biology research communities for further development and investigations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This review paper discusses the use of Tellus and Tellus Border soil and stream geochemistry data to investigate the relationship between medical data and naturally occurring background levels of potentially toxic elements (PTEs) such as heavy metals in soils and water. The research hypothesis is that long-term low level oral exposure of PTEs via soil and water may result in cumulative exposures that may act as risk factors for progressive diseases including cancer and chronic kidney disease. A number of public policy implications for regional human health risk assessments, public health policy and education are also explored alongside the argument for better integration of multiple data sets to enhance ongoing medical and social research. This work presents a partnership between the School of Geography, Archaeology and Palaeoecology, Northern Ireland Cancer Registry, Queen’s University Belfast, and the nephrology (kidney medicine) research group.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clusters of text documents output by clustering algorithms are often hard to interpret. We describe motivating real-world scenarios that necessitate reconfigurability and high interpretability of clusters and outline the problem of generating clusterings with interpretable and reconfigurable cluster models. We develop two clustering algorithms toward the outlined goal of building interpretable and reconfigurable cluster models. They generate clusters with associated rules that are composed of conditions on word occurrences or nonoccurrences. The proposed approaches vary in the complexity of the format of the rules; RGC employs disjunctions and conjunctions in rule generation whereas RGC-D rules are simple disjunctions of conditions signifying presence of various words. In both the cases, each cluster is comprised of precisely the set of documents that satisfy the corresponding rule. Rules of the latter kind are easy to interpret, whereas the former leads to more accurate clustering. We show that our approaches outperform the unsupervised decision tree approach for rule-generating clustering and also an approach we provide for generating interpretable models for general clusterings, both by significant margins. We empirically show that the purity and f-measure losses to achieve interpretability can be as little as 3 and 5%, respectively using the algorithms presented herein.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This part contains geomorphological, hydrological and other information concerning the desktop research of the River Tyne catchment area.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Linux commands that are generally useful for analyzing data; it is very easy to reduce phenomena such as links, nodes, URLs or downloads, to multiply repeating identifiers and then sorting and counting appearances.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data from four recent reanalysis projects [ECMWF, NCEP-NCAR, NCEP - Department of Energy ( DOE), NASA] have been diagnosed at the scale of synoptic weather systems using an objective feature tracking method. The tracking statistics indicate that, overall, the reanalyses correspond very well in the Northern Hemisphere (NH) lower troposphere, although differences for the spatial distribution of mean intensities show that the ECMWF reanalysis is systematically stronger in the main storm track regions but weaker around major orographic features. A direct comparison of the track ensembles indicates a number of systems with a broad range of intensities that compare well among the reanalyses. In addition, a number of small-scale weak systems are found that have no correspondence among the reanalyses or that only correspond upon relaxing the matching criteria, indicating possible differences in location and/or temporal coherence. These are distributed throughout the storm tracks, particularly in the regions known for small-scale activity, such as secondary development regions and the Mediterranean. For the Southern Hemisphere (SH), agreement is found to be generally less consistent in the lower troposphere with significant differences in both track density and mean intensity. The systems that correspond between the various reanalyses are considerably reduced and those that do not match span a broad range of storm intensities. Relaxing the matching criteria indicates that there is a larger degree of uncertainty in both the location of systems and their intensities compared with the NH. At upper-tropospheric levels, significant differences in the level of activity occur between the ECMWF reanalysis and the other reanalyses in both the NH and SH winters. This occurs due to a lack of coherence in the apparent propagation of the systems in ERA15 and appears most acute above 500 hPa. This is probably due to the use of optimal interpolation data assimilation in ERA15. Also shown are results based on using the same techniques to diagnose the tropical easterly wave activity. Results indicate that the wave activity is sensitive not only to the resolution and assimilation methods used but also to the model formulation.