19 resultados para data complexity
Resumo:
High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods.
Resumo:
Support vector machine (SVM) is a powerful technique for data classification. Despite of its good theoretic foundations and high classification accuracy, normal SVM is not suitable for classification of large data sets, because the training complexity of SVM is highly dependent on the size of data set. This paper presents a novel SVM classification approach for large data sets by using minimum enclosing ball clustering. After the training data are partitioned by the proposed clustering method, the centers of the clusters are used for the first time SVM classification. Then we use the clusters whose centers are support vectors or those clusters which have different classes to perform the second time SVM classification. In this stage most data are removed. Several experimental results show that the approach proposed in this paper has good classification accuracy compared with classic SVM while the training is significantly faster than several other SVM classifiers.
Resumo:
The present report investigates the role of formate species as potential reaction intermediates for the WGS reaction (CO + H2O -> CO2 + H-2) over a Pt-CeO2 catalyst. A combination of operando techniques, i.e., in situ diffuse reflectance FT-IR (DRIFT) spectroscopy and mass spectrometry (MS) during steady-state isotopic transient kinetic analysis (SSITKA), was used to relate the exchange of the reaction product CO2 to that of surface formate species. The data presented here suggest that a switchover from a non-formate to a formate-based mechanism could take place over a very narrow temperature range (as low as 60 K) over our Pt-CeO2 catalyst. This observation clearly stresses the need to avoid extrapolating conclusions to the case of results obtained under even slightly different experimental conditions. The occurrence of a low-temperature mechanism, possibly redox or Mars van Krevelen-like, that deactivates above 473 K because of ceria over-reduction is suggested as a possible explanation for the switchover, similarly to the case of the CO-NO reaction over Cu, I'd and Rh-CeZrOx (see Kaspar and co-workers [1-3]). (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
his paper develops a typology of strategic options for small firms in the furniture industry and examines the extent to which firms are re-engineering their strategies in response to profit performance. Empirical analysis is based on data from 39 firms with between 10 and 100 employees in the Irish furniture industry. Three main results emerge from the analysis. First, firms in the Irish furniture industry predominantly adopt “simple” business development strategies. Secondly, in terms of profit performance, we find no evidence that simple strategies unambiguously outperform more complex approaches. Instead, the success of both simple and complex business strategies is directly related to the strength of firms’ resource base. Finally, systematic differences were found in firms’ ability or willingness to re-engineer their strategies in the light of their profit performance.
Resumo:
Spectral signal intensities, especially in 'real-world' applications with nonstandardized sample presentation due to uncontrolled variables/factors, commonly require additional spectral processing to normalize signal intensity in an effective way. In this study, we have demonstrated the complexity of choosing a normalization routine in the presence of multiple spectrally distinct constituents by probing a dataset of Raman spectra. Variation in absolute signal intensity (90.1% of total variance) of the Raman spectra of these complex biological samples swamps the variation in useful signals (9.4% of total variance), degrading its diagnostic and evaluative potential.
Resumo:
How best to predict the effects of perturbations to ecological communities has been a long-standing goal for both applied and basic ecology. This quest has recently been revived by new empirical data, new analysis methods, and increased computing speed, with the promise that ecologically important insights may be obtainable from a limited knowledge of community interactions. We use empirically based and simulated networks of varying size and connectance to assess two limitations to predicting perturbation responses in multispecies communities: (1) the inaccuracy by which species interaction strengths are empirically quantified and (2) the indeterminacy of species responses due to indirect effects associated with network size and structure. We find that even modest levels of species richness and connectance (similar to 25 pairwise interactions) impose high requirements for interaction strength estimates because system indeterminacy rapidly overwhelms predictive insights. Nevertheless, even poorly estimated interaction strengths provide greater average predictive certainty than an approach that uses only the sign of each interaction. Our simulations provide guidance in dealing with the trade-offs involved in maximizing the utility of network approaches for predicting dynamics in multispecies communities.
Resumo:
The rejoining kinetics of double-stranded DNA fragments, along with measurements of residual damage after postirradiation incubation, are often used as indicators of the biological relevance of the damage induced by ionizing radiation of different qualities. Although it is widely accepted that high-LET radiation-induced double-strand breaks (DSBs) tend to rejoin with kinetics slower than low-LET radiation-induced DSBs, possibly due to the complexity of the DSB itself, the nature of a slowly rejoining DSB-containing DNA lesion remains unknown. Using an approach that combines pulsed-field gel electrophoresis (PFGE) of fragmented DNA from human skin fibroblasts and a recently developed Monte Carlo simulation of radiation-induced DNA breakage and rejoining kinetics, we have tested the role of DSB-containing DNA lesions in the 8-kbp-5.7-Mbp fragment size range in determining the DSB rejoining kinetics. It is found that with low-LET X rays or high LET alpha particles, DSB rejoining kinetics data obtained with PFGE can be computer-simulated assuming that DSB rejoining kinetics does not depend on spacing of breaks along the chromosomes. After analysis of DNA fragmentation profiles, the rejoining kinetics of X-ray-induced DSBs could be fitted by two components: a fast component with a half-life of 0.9 +/- 0.5 h and a slow component with a half-life of 16 +/- 9 h. For a particles, a fast component with a half-life of 0.7 +/- 0.4 h and a slow component with a half-life of 12 5 h along with a residual fraction of unrepaired breaks accounting for 8% of the initial damage were observed. In summary, it is shown that genomic proximity of breaks along a chromosome does not determine the rejoining kinetics, so the slowly rejoining breaks induced with higher frequencies after exposure to high-LET radiation (0.37 +/- 0.12) relative to low-LET radiation (0.22 +/- 0.07) can be explained on the basis of lesion complexity at the nanometer scale, known as locally multiply damaged sites. (c) 2005 by Radiation Research Society.
Resumo:
The influence of predation in structuring ecological communities can be informed by examining the shape and magnitude of the functional response of predators towards prey. We derived functional responses of the ubiquitous intertidal amphipod Echinogammarus marinus towards one of its preferred prey species, the isopod Jaera nordmanni. First, we examined the form of the functional response where prey were replaced following consumption, as compared to the usual experimental design where prey density in each replicate is allowed to deplete. E. marinus exhibited Type II functional responses, i.e. inversely density-dependent predation of J. nordmanni that increased linearly with prey availability at low densities, but decreased with further prey supply. In both prey replacement and non-replacement experiments, handling times and maximum feeding rates were similar. The non-replacement design underestimated attack rates compared to when prey were replaced. We then compared the use of Holling’s disc equation (assuming constant prey density) with the more appropriate Rogers’ random predator equation (accounting for prey depletion) using the prey non-replacement data. Rogers’ equation returned significantly greater attack rates but lower maximum feeding rates, indicating that model choice has significant implications for parameter estimates. We then manipulated habitat complexity and found significantly reduced predation by the amphipod in complex as opposed to simple habitat structure. Further, the functional response changed from a Type II in simple habitats to a sigmoidal, density-dependent Type III response in complex habitats, which may impart stability on the predator−prey interaction. Enhanced habitat complexity returned significantly lower attack rates, higher handling times and lower maximum feeding rates. These findings illustrate the sensitivity of the functional response to variations in prey supply, model selection and habitat complexity and, further, that E. marinus could potentially determine the local exclusion and persistence of prey through habitat-mediated changes in its predatory functional responses.
Resumo:
To enable reliable data transfer in next generation Multiple-Input Multiple-Output (MIMO) communication systems, terminals must be able to react to fluctuating channel conditions by having flexible modulation schemes and antenna configurations. This creates a challenging real-time implementation problem: to provide the high performance required of cutting edge MIMO standards, such as 802.11n, with the flexibility for this behavioural variability. FPGA softcore processors offer a solution to this problem, and in this paper we show how heterogeneous SISD/SIMD/MIMD architectures can enable programmable multicore architectures on FPGA with similar performance and cost as traditional dedicated circuit-based architectures. When applied to a 4×4 16-QAM Fixed-Complexity Sphere Decoder (FSD) detector we present the first soft-processor based solution for real-time 802.11n MIMO.
Resumo:
Acidity peaks in Greenland ice cores have been used as critical reference horizons for synchronizing ice core records, aiding the construction of a single Greenland Ice Core Chronology (GICC05) for the Holocene. Guided by GICC05, we examined sub-sections of three Greenland cores in the search for tephra from specific eruptions that might facilitate the linkage of ice core records, the dating of prehistoric tephras and the understanding of the eruptions. Here we report the identification of 14 horizons with tephra particles, including 11 that have not previously been reported from the North Atlantic region and that have the potential to be valuable isochrons. The positions of tephras whose major element data are consistent with ash from the Katmai AD 1912 and Öraefajökull AD 1362 eruptions confirm the annually resolved ice core chronology for the last 700 years. We provide a more refined date for the so-called “AD860B” tephra, a widespread isochron found across NW Europe, and present new evidence relating to the 17th century BC Thera/Aniakchak debate that shows N. American eruptions likely contributed to the acid signals at this time. Our results emphasize the variable spatial and temporal distributions of volcanic products in Greenland ice that call for a more cautious approach in the attribution of acid signals to specific eruptive events.
Resumo:
The relationships among organisms and their surroundings can be of immense complexity. To describe and understand an ecosystem as a tangled bank, multiple ways of interaction and their effects have to be considered, such as predation, competition, mutualism and facilitation. Understanding the resulting interaction networks is a challenge in changing environments, e.g. to predict knock-on effects of invasive species and to understand how climate change impacts biodiversity. The elucidation of complex ecological systems with their interactions will benefit enormously from the development of new machine learning tools that aim to infer the structure of interaction networks from field data. In the present study, we propose a novel Bayesian regression and multiple changepoint model (BRAM) for reconstructing species interaction networks from observed species distributions. The model has been devised to allow robust inference in the presence of spatial autocorrelation and distributional heterogeneity. We have evaluated the model on simulated data that combines a trophic niche model with a stochastic population model on a 2-dimensional lattice, and we have compared the performance of our model with L1-penalized sparse regression (LASSO) and non-linear Bayesian networks with the BDe scoring scheme. In addition, we have applied our method to plant ground coverage data from the western shore of the Outer Hebrides with the objective to infer the ecological interactions. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
In many applications in applied statistics researchers reduce the complexity of a data set by combining a group of variables into a single measure using factor analysis or an index number. We argue that such compression loses information if the data actually has high dimensionality. We advocate the use of a non-parametric estimator, commonly used in physics (the Takens estimator), to estimate the correlation dimension of the data prior to compression. The advantage of this approach over traditional linear data compression approaches is that the data does not have to be linearized. Applying our ideas to the United Nations Human Development Index we find that the four variables that are used in its construction have dimension three and the index loses information.
Resumo:
In this paper, a low complexity system for spectral analysis of heart rate variability (HRV) is presented. The main idea of the proposed approach is the implementation of the Fast-Lomb periodogram that is a ubiquitous tool in spectral analysis, using a wavelet based Fast Fourier transform. Interestingly we show that the proposed approach enables the classification of processed data into more and less significant based on their contribution to output quality. Based on such a classification a percentage of less-significant data is being pruned leading to a significant reduction of algorithmic complexity with minimal quality degradation. Indeed, our results indicate that the proposed system can achieve up-to 45% reduction in number of computations with only 4.9% average error in the output quality compared to a conventional FFT based HRV system.
Resumo:
Cloud data centres are critical business infrastructures and the fastest growing service providers. Detecting anomalies in Cloud data centre operation is vital. Given the vast complexity of the data centre system software stack, applications and workloads, anomaly detection is a challenging endeavour. Current tools for detecting anomalies often use machine learning techniques, application instance behaviours or system metrics distribu- tion, which are complex to implement in Cloud computing environments as they require training, access to application-level data and complex processing. This paper presents LADT, a lightweight anomaly detection tool for Cloud data centres that uses rigorous correlation of system metrics, implemented by an efficient corre- lation algorithm without need for training or complex infrastructure set up. LADT is based on the hypothesis that, in an anomaly-free system, metrics from data centre host nodes and virtual machines (VMs) are strongly correlated. An anomaly is detected whenever correlation drops below a threshold value. We demonstrate and evaluate LADT using a Cloud environment, where it shows that the hosting node I/O operations per second (IOPS) are strongly correlated with the aggregated virtual machine IOPS, but this correlation vanishes when an application stresses the disk, indicating a node-level anomaly.
Resumo:
The increasing complexity and scale of cloud computing environments due to widespread data centre heterogeneity makes measurement-based evaluations highly difficult to achieve. Therefore the use of simulation tools to support decision making in cloud computing environments to cope with this problem is an increasing trend. However the data required in order to model cloud computing environments with an appropriate degree of accuracy is typically large, very difficult to collect without some form of automation, often not available in a suitable format and a time consuming process if done manually. In this research, an automated method for cloud computing topology definition, data collection and model creation activities is presented, within the context of a suite of tools that have been developed and integrated to support these activities.