882 resultados para Ensemble of classifiers
Resumo:
In the present work we consider two aspects of the deposition of metal clusters on an electrode surface. The formation of such clusters with the tip of a scanning tunneling microscope is simulated by atom dynamics. Subsequently the stability of these clusters is investigated by Monte Carlo simulations in a grand-canonical ensemble. In particular, the following systems were considered explicitly: Pd clusters on Au(111), Cu on Au(111), Ag on Au(111), Pb on Au(111) and Cu on Ag(111). The analysis of the results obtained for the different systems leads to the conclusion that optimal systems for nanostructuring are those where the metals participating have similar cohesive energies and negative heats of alloy formation. In this respect, the system Cu-Pd(111) is predicted as a good candidate for the formation of stable clusters. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
A study was performed to determine if targeted metabolic profiling of cattle sera could be used to establish a predictive tool for identifying hormone misuse in cattle. Metabolites were assayed in heifers (n ) 5) treated with nortestosterone decanoate (0.85 mg/kg body weight), untreated heifers (n ) 5), steers (n ) 5) treated with oestradiol benzoate (0.15 mg/kg body weight) and untreated steers (n ) 5). Treatments were administered on days 0, 14, and 28 throughout a 42 day study period. Two support vector machines (SVMs) were trained, respectively, from heifer and steer data to identify hormonetreated animals. Performance of both SVM classifiers were evaluated by sensitivity and specificity of treatment prediction. The SVM trained on steer data achieved 97.33% sensitivity and 93.85% specificity while the one on heifer data achieved 94.67% sensitivity and 87.69% specificity. Solutions of SVM classifiers were further exploited to determine those days when classification accuracy of the SVM was most reliable. For heifers and steers, days 17-35 were determined to be the most selective. In summary, bioinformatics applied to targeted metabolic profiles generated from standard clinical chemistry analyses, has yielded an accurate, inexpensive, high-throughput test for predicting steroid abuse in cattle.
Resumo:
Logistic regression and Gaussian mixture model (GMM) classifiers have been trained to estimate the probability of acute myocardial infarction (AMI) in patients based upon the concentrations of a panel of cardiac markers. The panel consists of two new markers, fatty acid binding protein (FABP) and glycogen phosphorylase BB (GPBB), in addition to the traditional cardiac troponin I (cTnI), creatine kinase MB (CKMB) and myoglobin. The effect of using principal component analysis (PCA) and Fisher discriminant analysis (FDA) to preprocess the marker concentrations was also investigated. The need for classifiers to give an accurate estimate of the probability of AMI is argued and three categories of performance measure are described, namely discriminatory ability, sharpness, and reliability. Numerical performance measures for each category are given and applied. The optimum classifier, based solely upon the samples take on admission, was the logistic regression classifier using FDA preprocessing. This gave an accuracy of 0.85 (95% confidence interval: 0.78-0.91) and a normalised Brier score of 0.89. When samples at both admission and a further time, 1-6 h later, were included, the performance increased significantly, showing that logistic regression classifiers can indeed use the information from the five cardiac markers to accurately and reliably estimate the probability AMI. © Springer-Verlag London Limited 2008.
Resumo:
Motivation: The inference of regulatory networks from large-scale expression data holds great promise because of the potentially causal interpretation of these networks. However, due to the difficulty to establish reliable methods based on observational data there is so far only incomplete knowledge about possibilities and limitations of such inference methods in this context.
Results: In this article, we conduct a statistical analysis investigating differences and similarities of four network inference algorithms, ARACNE, CLR, MRNET and RN, with respect to local network-based measures. We employ ensemble methods allowing to assess the inferability down to the level of individual edges. Our analysis reveals the bias of these inference methods with respect to the inference of various network components and, hence, provides guidance in the interpretation of inferred regulatory networks from expression data. Further, as application we predict the total number of regulatory interactions in human B cells and hypothesize about the role of Myc and its targets regarding molecular information processing.
Resumo:
Background: The evaluation of the complexity of an observed object is an old but outstanding problem. In this paper we are tying on this problem introducing a measure called statistic complexity.
Resumo:
The work ROTATING BRAINS / BEATING HEART was specifically developed for the opening performance of the 2010 DRHA conference. The conference’s theme ‘Sensual Technologies: Collaborative Practices of Interdisciplinarity explored collaborative relationships between the body and sensual/sensing technologies across various disciplines, looking to new approaches offered by various emerging fields and practices that incorporate new and existing technologies. The conference had a specific focus on SecondLife with roundtable events and discussions, led by performance artist Stelarc, as well as international participation via SecondLife.
The collaboration between Stelarc, the Avatar Orchestra Metaverse (AOM) and myself as the DRHA2010 conference program chair was a unique occurrence for this conference.
Resumo:
Background
Inferring gene regulatory networks from large-scale expression data is an important problem that received much attention in recent years. These networks have the potential to gain insights into causal molecular interactions of biological processes. Hence, from a methodological point of view, reliable estimation methods based on observational data are needed to approach this problem practically.
Results
In this paper, we introduce a novel gene regulatory network inference (GRNI) algorithm, called C3NET. We compare C3NET with four well known methods, ARACNE, CLR, MRNET and RN, conducting in-depth numerical ensemble simulations and demonstrate also for biological expression data from E. coli that C3NET performs consistently better than the best known GRNI methods in the literature. In addition, it has also a low computational complexity. Since C3NET is based on estimates of mutual information values in conjunction with a maximization step, our numerical investigations demonstrate that our inference algorithm exploits causal structural information in the data efficiently.
Conclusions
For systems biology to succeed in the long run, it is of crucial importance to establish methods that extract large-scale gene networks from high-throughput data that reflect the underlying causal interactions among genes or gene products. Our method can contribute to this endeavor by demonstrating that an inference algorithm with a neat design permits not only a more intuitive and possibly biological interpretation of its working mechanism but can also result in superior results.
Resumo:
Wind power generation differs from conventional thermal generation due to the stochastic nature of wind. Thus wind power forecasting plays a key role in dealing with the challenges of balancing supply and demand in any electricity system, given the uncertainty associated with the wind farm power output. Accurate wind power forecasting reduces the need for additional balancing energy and reserve power to integrate wind power. Wind power forecasting tools enable better dispatch, scheduling and unit commitment of thermal generators, hydro plant and energy storage plant and more competitive market trading as wind power ramps up and down on the grid. This paper presents an in-depth review of the current methods and advances in wind power forecasting and prediction. Firstly, numerical wind prediction methods from global to local scales, ensemble forecasting, upscaling and downscaling processes are discussed. Next the statistical and machine learning approach methods are detailed. Then the techniques used for benchmarking and uncertainty analysis of forecasts are overviewed, and the performance of various approaches over different forecast time horizons is examined. Finally, current research activities, challenges and potential future developments are appraised. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
The concentration of organic acids in anaerobic digesters is one of the most critical parameters for monitoring and advanced control of anaerobic digestion processes. Thus, a reliable online-measurement system is absolutely necessary. A novel approach to obtaining these measurements indirectly and online using UV/vis spectroscopic probes, in conjunction with powerful pattern recognition methods, is presented in this paper. An UV/vis spectroscopic probe from S::CAN is used in combination with a custom-built dilution system to monitor the absorption of fully fermented sludge at a spectrum from 200 to 750 nm. Advanced pattern recognition methods are then used to map the non-linear relationship between measured absorption spectra to laboratory measurements of organic acid concentrations. Linear discriminant analysis, generalized discriminant analysis (GerDA), support vector machines (SVM), relevance vector machines, random forest and neural networks are investigated for this purpose and their performance compared. To validate the approach, online measurements have been taken at a full-scale 1.3-MW industrial biogas plant. Results show that whereas some of the methods considered do not yield satisfactory results, accurate prediction of organic acid concentration ranges can be obtained with both GerDA and SVM-based classifiers, with classification rates in excess of 87% achieved on test data.
Resumo:
Support vector machines (SVMs), though accurate, are not preferred in applications requiring high classification speed or when deployed in systems of limited computational resources, due to the large number of support vectors involved in the model. To overcome this problem we have devised a primal SVM method with the following properties: (1) it solves for the SVM representation without the need to invoke the representer theorem, (2) forward and backward selections are combined to approach the final globally optimal solution, and (3) a criterion is introduced for identification of support vectors leading to a much reduced support vector set. In addition to introducing this method the paper analyzes the complexity of the algorithm and presents test results on three public benchmark problems and a human activity recognition application. These applications demonstrate the effectiveness and efficiency of the proposed algorithm.
--------------------------------------------------------------------------------
Resumo:
Background: Ineffective risk stratification can delay diagnosis of serious disease in patients with hematuria. We applied a systems biology approach to analyze clinical, demographic and biomarker measurements (n = 29) collected from 157 hematuric patients: 80 urothelial cancer (UC) and 77 controls with confounding pathologies.
Methods: On the basis of biomarkers, we conducted agglomerative hierarchical clustering to identify patient and biomarker clusters. We then explored the relationship between the patient clusters and clinical characteristics using Chi-square analyses. We determined classification errors and areas under the receiver operating curve of Random Forest Classifiers (RFC) for patient subpopulations using the biomarker clusters to reduce the dimensionality of the data.
Results: Agglomerative clustering identified five patient clusters and seven biomarker clusters. Final diagnoses categories were non-randomly distributed across the five patient clusters. In addition, two of the patient clusters were enriched with patients with ‘low cancer-risk’ characteristics. The biomarkers which contributed to the diagnostic classifiers for these two patient clusters were similar. In contrast, three of the patient clusters were significantly enriched with patients harboring ‘high cancer-risk” characteristics including proteinuria, aggressive pathological stage and grade, and malignant cytology. Patients in these three clusters included controls, that is, patients with other serious disease and patients with cancers other than UC. Biomarkers which contributed to the diagnostic classifiers for the largest ‘high cancer- risk’ cluster were different than those contributing to the classifiers for the ‘low cancer-risk’ clusters. Biomarkers which contributed to subpopulations that were split according to smoking status, gender and medication were different.
Conclusions: The systems biology approach applied in this study allowed the hematuric patients to cluster naturally on the basis of the heterogeneity within their biomarker data, into five distinct risk subpopulations. Our findings highlight an approach with the promise to unlock the potential of biomarkers. This will be especially valuable in the field of diagnostic bladder cancer where biomarkers are urgently required. Clinicians could interpret risk classification scores in the context of clinical parameters at the time of triage. This could reduce cystoscopies and enable priority diagnosis of aggressive diseases, leading to improved patient outcomes at reduced costs. © 2013 Emmert-Streib et al; licensee BioMed Central Ltd.
Resumo:
We present an analysis of comet activity based on the Spitzer Space Telescope component of the Survey of the Ensemble Physical Properties of Cometary Nuclei. We show that the survey is well suited to measuring the activity of Jupiter-family comets at 3-7 AU from the Sun. Dust was detected in 33 of 89 targets (37 ± 6%), and we conclude that 21 comets (24 ± 5%) have morphologies that suggest ongoing or recent cometary activity. Our dust detections are sensitivity limited, therefore our measured activity rate is necessarily a lower limit. All comets with small perihelion distances (q <1.8 AU) are inactive in our survey, and the active comets in our sample are strongly biased to post-perihelion epochs. We introduce the quantity ɛfρ, intended to be a thermal emission counterpart to the often reported Afρ, and find that the comets with large perihelion distances likely have greater dust production rates than other comets in our survey at 3-7 AU from the Sun, indicating a bias in the discovered Jupiter-family comet population. By examining the orbital history of our survey sample, we suggest that comets perturbed to smaller perihelion distances in the past 150 yr are more likely to be active, but more study on this effect is needed.
Resumo:
We present results from SEPPCoN, an on-going Survey of the Ensemble Physical Properties of Cometary Nuclei. In this report we discuss mid-infrared measurements of the thermal emission from 89 nuclei of Jupiter-family comets (JFCs). All data were obtained in 2006 and 2007 using imaging capabilities of the Spitzer Space Telescope. The comets were typically 4-5 AU from the Sun when observed and most showed only a point-source with little or no extended emission from dust. For those comets showing dust, we used image processing to photometrically extract the nuclei. For all 89 comets, we present new effective radii, and for 57 comets we present beaming parameters. Thus our survey provides the largest compilation of radiometrically-derived physical properties of nuclei to date. We have six main conclusions: (a) The average beaming parameter of the JFC population is 1.03 ± 0.11, consistent with unity; coupled with the large distance of the nuclei from the Sun, this indicates that most nuclei have Tempel 1-like thermal inertia. Only two of the 57 nuclei had outlying values (in a statistical sense) of infrared beaming. (b) The known JFC population is not complete even at 3 km radius, and even for comets that approach to ˜2 AU from the Sun and so ought to be more discoverable. Several recently-discovered comets in our survey have small perihelia and large (above ˜2 km) radii. (c) With our radii, we derive an independent estimate of the JFC nuclear cumulative size distribution (CSD), and we find that it has a power-law slope of around -1.9, with the exact value depending on the bounds in radius. (d) This power-law is close to that derived by others from visible-wavelength observations that assume a fixed geometric albedo, suggesting that there is no strong dependence of geometric albedo with radius. (e) The observed CSD shows a hint of structure with an excess of comets with radii 3-6 km. (f) Our CSD is consistent with the idea that the intrinsic size distribution of the JFC population is not a simple power-law and lacks many sub-kilometer objects.
Resumo:
Mobile malware has continued to grow at an alarming rate despite on-going mitigation efforts. This has been much more prevalent on Android due to being an open platform that is rapidly overtaking other competing platforms in the mobile smart devices market. Recently, a new generation of Android malware families has emerged with advanced evasion capabilities which make them much more difficult to detect using conventional methods. This paper proposes and investigates a parallel machine learning based classification approach for early detection of Android malware. Using real malware samples and benign applications, a composite classification model is developed from parallel combination of heterogeneous classifiers. The empirical evaluation of the model under different combination schemes demonstrates its efficacy and potential to improve detection accuracy. More importantly, by utilizing several classifiers with diverse characteristics, their strengths can be harnessed not only for enhanced Android malware detection but also quicker white box analysis by means of the more interpretable constituent classifiers.