15 resultados para variable selection

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In many situations, the number of data points is fixed, and the asymptotic convergence results of popular model selection tools may not be useful. A new algorithm for model selection, RIVAL (removing irrelevant variables amidst Lasso iterations), is presented and shown to be particularly effective for a large but fixed number of data points. The algorithm is motivated by an application of nuclear material detection where all unknown parameters are to be non-negative. Thus, positive Lasso and its variants are analyzed. Then, RIVAL is proposed and is shown to have some desirable properties, namely the number of data points needed to have convergence is smaller than existing methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we consider the variable selection problem for a nonlinear non-parametric system. Two approaches are proposed, one top-down approach and one bottom-up approach. The top-down algorithm selects a variable by detecting if the corresponding partial derivative is zero or not at the point of interest. The algorithm is shown to have not only the parameter but also the set convergence. This is critical because the variable selection problem is binary, a variable is either selected or not selected. The bottom-up approach is based on the forward/backward stepwise selection which is designed to work if the data length is limited. Both approaches determine the most important variables locally and allow the unknown non-parametric nonlinear system to have different local dimensions at different points of interest. Further, two potential applications along with numerical simulations are provided to illustrate the usefulness of the proposed algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

his paper considers a problem of identification for a high dimensional nonlinear non-parametric system when only a limited data set is available. The algorithms are proposed for this purpose which exploit the relationship between the input variables and the output and further the inter-dependence of input variables so that the importance of the input variables can be established. A key to these algorithms is the non-parametric two stage input selection algorithm.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents a feature selection method for data classification, which combines a model-based variable selection technique and a fast two-stage subset selection algorithm. The relationship between a specified (and complete) set of candidate features and the class label is modelled using a non-linear full regression model which is linear-in-the-parameters. The performance of a sub-model measured by the sum of the squared-errors (SSE) is used to score the informativeness of the subset of features involved in the sub-model. The two-stage subset selection algorithm approaches a solution sub-model with the SSE being locally minimized. The features involved in the solution sub-model are selected as inputs to support vector machines (SVMs) for classification. The memory requirement of this algorithm is independent of the number of training patterns. This property makes this method suitable for applications executed in mobile devices where physical RAM memory is very limited. An application was developed for activity recognition, which implements the proposed feature selection algorithm and an SVM training procedure. Experiments are carried out with the application running on a PDA for human activity recognition using accelerometer data. A comparison with an information gain based feature selection method demonstrates the effectiveness and efficiency of the proposed algorithm.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Semiconductor fabrication involves several sequential processing steps with the result that critical production variables are often affected by a superposition of affects over multiple steps. In this paper a Virtual Metrology (VM) system for early stage measurement of such variables is presented; the VM system seeks to express the contribution to the output variability that is due to a defined observable part of the production line. The outputs of the processed system may be used for process monitoring and control purposes. A second contribution of this work is the introduction of Elastic Nets, a regularization and variable selection technique for the modelling of highly-correlated datasets, as a technique for the development of VM models. Elastic Nets and the proposed VM system are illustrated using real data from a multi-stage etch process used in the fabrication of disk drive read/write heads. © 2013 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The least-mean-fourth (LMF) algorithm is known for its fast convergence and lower steady state error, especially in sub-Gaussian noise environments. Recent work on normalised versions of the LMF algorithm has further enhanced its stability and performance in both Gaussian and sub-Gaussian noise environments. For example, the recently developed normalised LMF (XE-NLMF) algorithm is normalised by the mixed signal and error powers, and weighted by a fixed mixed-power parameter. Unfortunately, this algorithm depends on the selection of this mixing parameter. In this work, a time-varying mixed-power parameter technique is introduced to overcome this dependency. A convergence analysis, transient analysis, and steady-state behaviour of the proposed algorithm are derived and verified through simulations. An enhancement in performance is obtained through the use of this technique in two different scenarios. Moreover, the tracking analysis of the proposed algorithm is carried out in the presence of two sources of nonstationarities: (1) carrier frequency offset between transmitter and receiver and (2) random variations in the environment. Close agreement between analysis and simulation results is obtained. The results show that, unlike in the stationary case, the steady-state excess mean-square error is not a monotonically increasing function of the step size. (c) 2007 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The structure and properties of the diffuse interstellar medium (ISM) on small scales, sub-au to 1 pc, are poorly understood. We compare interstellar absorption-lines, observed towards a selection of O- and B-type stars at two or more epochs, to search for variations over time caused by the transverse motion of each star combined with changes in the structure in the foreground ISM. Two sets of data were used: 83 VLT- UVES spectra with approximately 6 yr between epochs and 21 McDonald observatory 2.7m telescope echelle spectra with 6 - 20 yr between epochs, over a range of scales from 0 - 360 au. The interstellar absorption-lines observed at the two epochs were subtracted and searched for any residuals due to changes in the foreground ISM. Of the 104 sightlines investigated with typically five or more components in Na I D, possible temporal variation was identified in five UVES spectra (six components), in Ca II, Ca I and/or Na I absorption-lines. The variations detected range from 7\% to a factor of 3.6 in column density. No variation was found in any other interstellar species. Most sightlines show no variation, with 3{\sigma} upper limits to changes of the order 0.1 - 0.3 dex in Ca II and Na I. These variations observed imply that fine-scale structure is present in the ISM, but at the resolution available in this study, is not very common at visible wavelengths. A determination of the electron densities and lower limits to the total number density of a sample of the sightlines implies that there is no striking difference between these parameters in sightlines with, and sightlines without, varying components.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Selection bias in HIV prevalence estimates occurs if non-participation in testing is correlated with HIV status. Longitudinal data suggests that individuals who know or suspect they are HIV positive are less likely to participate in testing in HIV surveys, in which case methods to correct for missing data which are based on imputation and observed characteristics will produce biased results. Methods: The identity of the HIV survey interviewer is typically associated with HIV testing participation, but is unlikely to be correlated with HIV status. Interviewer identity can thus be used as a selection variable allowing estimation of Heckman-type selection models. These models produce asymptotically unbiased HIV prevalence estimates, even when non-participation is correlated with unobserved characteristics, such as knowledge of HIV status. We introduce a new random effects method to these selection models which overcomes non-convergence caused by collinearity, small sample bias, and incorrect inference in existing approaches. Our method is easy to implement in standard statistical software, and allows the construction of bootstrapped standard errors which adjust for the fact that the relationship between testing and HIV status is uncertain and needs to be estimated. Results: Using nationally representative data from the Demographic and Health Surveys, we illustrate our approach with new point estimates and confidence intervals (CI) for HIV prevalence among men in Ghana (2003) and Zambia (2007). In Ghana, we find little evidence of selection bias as our selection model gives an HIV prevalence estimate of 1.4% (95% CI 1.2% – 1.6%), compared to 1.6% among those with a valid HIV test. In Zambia, our selection model gives an HIV prevalence estimate of 16.3% (95% CI 11.0% - 18.4%), compared to 12.1% among those with a valid HIV test. Therefore, those who decline to test in Zambia are found to be more likely to be HIV positive. Conclusions: Our approach corrects for selection bias in HIV prevalence estimates, is possible to implement even when HIV prevalence or non-participation is very high or very low, and provides a practical solution to account for both sampling and parameter uncertainty in the estimation of confidence intervals. The wide confidence intervals estimated in an example with high HIV prevalence indicate that it is difficult to correct statistically for the bias that may occur when a large proportion of people refuse to test.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Heckman-type selection models have been used to control HIV prevalence estimates for selection bias when participation in HIV testing and HIV status are associated after controlling for observed variables. These models typically rely on the strong assumption that the error terms in the participation and the outcome equations that comprise the model are distributed as bivariate normal.
Methods: We introduce a novel approach for relaxing the bivariate normality assumption in selection models using copula functions. We apply this method to estimating HIV prevalence and new confidence intervals (CI) in the 2007 Zambia Demographic and Health Survey (DHS) by using interviewer identity as the selection variable that predicts participation (consent to test) but not the outcome (HIV status).
Results: We show in a simulation study that selection models can generate biased results when the bivariate normality assumption is violated. In the 2007 Zambia DHS, HIV prevalence estimates are similar irrespective of the structure of the association assumed between participation and outcome. For men, we estimate a population HIV prevalence of 21% (95% CI = 16%–25%) compared with 12% (11%–13%) among those who consented to be tested; for women, the corresponding figures are 19% (13%–24%) and 16% (15%–17%).
Conclusions: Copula approaches to Heckman-type selection models are a useful addition to the methodological toolkit of HIV epidemiology and of epidemiology in general. We develop the use of this approach to systematically evaluate the robustness of HIV prevalence estimates based on selection models, both empirically and in a simulation study.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Real-time quantitative PCR (qPCR) is a highly sensitive and specific method which is used extensively for determining gene expression profiles in a variety of cell and tissue types. In order to obtain accurate and reliable gene expression quantification, qPCR data are generally normalised against so-called reference or housekeeping genes. Ideally, reference genes should have abundant and stable RNA transcriptomes under the experimental conditions employed. However, reference genes are often selected rather arbitrarily and indeed some have been shown to have variable expression in a variety of in vitro experimental conditions.
Objective: The objective of the current study was to investigate reference gene expression in human periodontal ligament (PDL) cells in response to treatment with lipopolysaccharide (LPS).
Method: Primary human PDL cells were grown in Dulbecco’s Modified Eagle Medium with L-glutamine supplemented with 10% fetal bovine serum, 100UI/ml penicillin and 100µg/ml streptomycin. RNA was isolated using the RNeasy Mini Kit (Qiagen) and reverse transcribed using the QuantiTect Reverse Transcription Kit (Qiagen). The expression of a total of 19 reference genes was studied in the presence and absence of LPS treatment using the Roche Reference Gene Panel. Data were analysed using NormFinder and Bestkeeper validation programs.
Results: Treatment of human PDL cells with LPS resulted in changes in expression of several commonly used reference genes, including GAPDH. On the other hand the reference genes β-actin, G6PDH and 18S were identified as stable genes following LPS treatment.
Conclusion: Many of the reference genes studied were robust to LPS treatment (up to 100 ng/ml). However several commonly employed reference genes, including GAPDH varied with LPS treatment, suggesting they would not be ideal candidates for normalisation in qPCR gene expression studies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We performed an immunogenetic analysis of 345 IGHV-IGHD-IGHJ rearrangements from 337 cases with primary splenic small B-cell lymphomas of marginal-zone origin. Three immunoglobulin (IG) heavy variable (IGHV) genes accounted for 45.8% of the cases (IGHV1-2, 24.9%; IGHV4-34, 12.8%; IGHV3-23, 8.1%). Particularly for the IGHV1-2 gene, strong biases were evident regarding utilization of different alleles, with 79/86 rearrangements (92%) using allele (*)04. Among cases more stringently classified as splenic marginal-zone lymphoma (SMZL) thanks to the availability of splenic histopathological specimens, the frequency of IGHV1-2(*)04 peaked at 31%. The IGHV1-2(*)04 rearrangements carried significantly longer complementarity-determining region-3 (CDR3) than all other cases and showed biased IGHD gene usage, leading to CDR3s with common motifs. The great majority of analyzed rearrangements (299/345, 86.7%) carried IGHV genes with some impact of somatic hypermutation, from minimal to pronounced. Noticeably, 75/79 (95%) IGHV1-2(*)04 rearrangements were mutated; however, they mostly (56/75 cases; 74.6%) carried few mutations (97-99.9% germline identity) of conservative nature and restricted distribution. These distinctive features of the IG receptors indicate selection by (super)antigenic element(s) in the pathogenesis of SMZL. Furthermore, they raise the possibility that certain SMZL subtypes could derive from progenitor populations adapted to particular antigenic challenges through selection of VH domain specificities, in particular the IGHV1-2(*)04 allele.