337 resultados para Outliers


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The paper presents a new copula based method for measuring dependence between random variables. Our approach extends the Maximum Mean Discrepancy to the copula of the joint distribution. We prove that this approach has several advantageous properties. Similarly to Shannon mutual information, the proposed dependence measure is invariant to any strictly increasing transformation of the marginal variables. This is important in many applications, for example in feature selection. The estimator is consistent, robust to outliers, and uses rank statistics only. We derive upper bounds on the convergence rate and propose independence tests too. We illustrate the theoretical contributions through a series of experiments in feature selection and low-dimensional embedding of distributions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

DNA microarrays provide a huge amount of data and require therefore dimensionality reduction methods to extract meaningful biological information. Independent Component Analysis (ICA) was proposed by several authors as an interesting means. Unfortunately, experimental data are usually of poor quality- because of noise, outliers and lack of samples. Robustness to these hurdles will thus be a key feature for an ICA algorithm. This paper identifies a robust contrast function and proposes a new ICA algorithm. © 2007 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Semisupervised dimensionality reduction has been attracting much attention as it not only utilizes both labeled and unlabeled data simultaneously, but also works well in the situation of out-of-sample. This paper proposes an effective approach of semisupervised dimensionality reduction through label propagation and label regression. Different from previous efforts, the new approach propagates the label information from labeled to unlabeled data with a well-designed mechanism of random walks, in which outliers are effectively detected and the obtained virtual labels of unlabeled data can be well encoded in a weighted regression model. These virtual labels are thereafter regressed with a linear model to calculate the projection matrix for dimensionality reduction. By this means, when the manifold or the clustering assumption of data is satisfied, the labels of labeled data can be correctly propagated to the unlabeled data; and thus, the proposed approach utilizes the labeled and the unlabeled data more effectively than previous work. Experimental results are carried out upon several databases, and the advantage of the new approach is well demonstrated.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Tensor analysis plays an important role in modern image and vision computing problems. Most of the existing tensor analysis approaches are based on the Frobenius norm, which makes them sensitive to outliers. In this paper, we propose L1-norm-based tensor analysis (TPCA-L1), which is robust to outliers. Experimental results upon face and other datasets demonstrate the advantages of the proposed approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we first present a simple but effective L1-norm-based two-dimensional principal component analysis (2DPCA). Traditional L2-norm-based least squares criterion is sensitive to outliers, while the newly proposed L1-norm 2DPCA is robust. Experimental results demonstrate its advantages.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

一般说来,离群点是远离其他数据点的数据,但很可能包含着极其重要的信息.提出了一种新的离群模糊核聚类算法来发现样本集中的离群点.通过Mercer核把原来的数据空间映射到特征空间,并为特征空间的每个向量分配一个动态权值,在经典的FCM模糊聚类算法的基础上得到了一个特征空间内的全新的聚类目标函数,通过对目标函数的优化,最终得到了各个数据的权值,根据权值的大小标识出样本集中的离群点.仿真实验的结果表明了该离群模糊核聚类算法的可行性和有效性.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Given n noisy observations g; of the same quantity f, it is common use to give an estimate of f by minimizing the function Eni=1(gi-f)2. From a statistical point of view this corresponds to computing the Maximum likelihood estimate, under the assumption of Gaussian noise. However, it is well known that this choice leads to results that are very sensitive to the presence of outliers in the data. For this reason it has been proposed to minimize the functions of the form Eni=1V(gi-f), where V is a function that increases less rapidly than the square. Several choices for V have been proposed and successfully used to obtain "robust" estimates. In this paper we show that, for a class of functions V, using these robust estimators corresponds to assuming that data are corrupted by Gaussian noise whose variance fluctuates according to some given probability distribution, that uniquely determines the shape of V.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Liu, Yonghuai. Improving ICP with Easy Implementation for Free Form Surface Matching. Pattern Recognition, vol. 37, no. 2, pp. 211-226, 2004.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Liu, Yonghuai. Automatic 3d free form shape matching using the graduated assignment algorithm. Pattern Recognition, vol. 38, no. 10, pp. 1615-1631, 2005.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

STUDY QUESTION. Are significant abnormalities in outward (K+) conductance and resting membrane potential (Vm) present in the spermatozoa of patients undertaking IVF and ICSI and if so, what is their functional effect on fertilization success? SUMMARY ANSWER. Negligible outward conductance (≈5% of patients) or an enhanced inward conductance (≈4% of patients), both of which caused depolarization of Vm, were associated with a low rate of fertilization following IVF. WHAT IS KNOWN ALREADY. Sperm-specific potassium channel knockout mice are infertile with defects in sperm function, suggesting that these channels are essential for fertility. These observations suggest that malfunction of K+ channels in human spermatozoa might contribute significantly to the occurrence of subfertility in men. However, remarkably little is known of the nature of K+ channels in human spermatozoa or the incidence and functional consequences of K+ channel defects. STUDY DESIGN, SIZE AND DURATION. Spermatozoa were obtained from healthy volunteer research donors and subfertile IVF and ICSI patients attending a hospital assisted reproductive techniques clinic between May 2013 and December 2015. In total, 40 IVF patients, 41 ICSI patients and 26 normozoospermic donors took part in the study. PARTICIPANTS/MATERIALS, SETTING, METHODS. Samples were examined using electrophysiology (whole-cell patch clamping). Where abnormal electrophysiological characteristics were identified, spermatozoa were further examined for Ca2+ influx induced by progesterone and penetration into viscous media if sufficient sample was available. Full exome sequencing was performed to specifically evaluate potassium calcium-activated channel subfamily M α 1 (KCNMA1), potassium calcium-activated channel subfamily U member 1 (KCNU1) and leucine-rich repeat containing 52 (LRRC52) genes and others associated with K+ signalling. In IVF patients, comparison with fertilization rates was done to assess the functional significance of the electrophysiological abnormalities. MAIN RESULTS AND THE ROLE OF CHANCE. Patch clamp electrophysiology was used to assess outward (K+) conductance and resting membrane potential (Vm) and signalling/motility assays were used to assess functional characteristics of sperm from IVF and ICSI patient samples. The mean Vm and outward membrane conductance in sperm from IVF and ICSI patients were not significantly different from those of control (donor) sperm prepared under the same conditions, but variation between individuals was significantly greater (P< 0.02) with a large number of outliers (>25%). In particular, in ≈10% of patients (7/81), we observed either a negligible outward conductance (4 patients) or an enhanced inward current (3 patients), both of which caused depolarization of Vm. Analysis of clinical data from the IVF patients showed significant association of depolarized Vm (≥0 mV) with low fertilization rate (P= 0.012). Spermatozoa with electrophysiological abnormities (conductance and Vm) responded normally to progesterone with elevation of [Ca2+]i and penetration of viscous medium, indicating retention of cation channel of sperm (CatSper) channel function. LIMITATIONS, REASONS FOR CAUTION. For practical, technical, ethical and logistical reasons, we could not obtain sufficient additional semen samples from men with conductance abnormalities to establish the cause of the conductance defects. Full exome sequencing was only available in two men with conductance defects. WIDER IMPLICATIONS OF THE FINDINGS. These data add significantly to the understanding of the role of ion channels in human sperm function and its impact on male fertility. Impaired potassium channel conductance (Gm) and/or Vm regulation is both common and complex in human spermatozoa and importantly is associated with impaired fertilization capacity when the Vm of cells is completely depolarized.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: In a time-course microarray experiment, the expression level for each gene is observed across a number of time-points in order to characterize the temporal trajectories of the gene-expression profiles. For many of these experiments, the scientific aim is the identification of genes for which the trajectories depend on an experimental or phenotypic factor. There is an extensive recent body of literature on statistical methodology for addressing this analytical problem. Most of the existing methods are based on estimating the time-course trajectories using parametric or non-parametric mean regression methods. The sensitivity of these regression methods to outliers, an issue that is well documented in the statistical literature, should be of concern when analyzing microarray data. RESULTS: In this paper, we propose a robust testing method for identifying genes whose expression time profiles depend on a factor. Furthermore, we propose a multiple testing procedure to adjust for multiplicity. CONCLUSIONS: Through an extensive simulation study, we will illustrate the performance of our method. Finally, we will report the results from applying our method to a case study and discussing potential extensions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Our media is saturated with claims of ``facts'' made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, e.g., is a claim ``cherry-picking''? This paper proposes a Query Response Surface (QRS) based framework that models claims based on structured data as parameterized queries. A key insight is that we can learn a lot about a claim by perturbing its parameters and seeing how its conclusion changes. This framework lets us formulate and tackle practical fact-checking tasks --- reverse-engineering vague claims, and countering questionable claims --- as computational problems. Within the QRS based framework, we take one step further, and propose a problem along with efficient algorithms for finding high-quality claims of a given form from data, i.e. raising good questions, in the first place. This is achieved to using a limited number of high-valued claims to represent high-valued regions of the QRS. Besides the general purpose high-quality claim finding problem, lead-finding can be tailored towards specific claim quality measures, also defined within the QRS framework. An example of uniqueness-based lead-finding is presented for ``one-of-the-few'' claims, landing in interpretable high-quality claims, and an adjustable mechanism for ranking objects, e.g. NBA players, based on what claims can be made for them. Finally, we study the use of visualization as a powerful way of conveying results of a large number of claims. An efficient two stage sampling algorithm is proposed for generating input of 2d scatter plot with heatmap, evalutaing a limited amount of data, while preserving the two essential visual features, namely outliers and clusters. For all the problems, we present real-world examples and experiments that demonstrate the power of our model, efficiency of our algorithms, and usefulness of their results.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

With increasing recognition of the roles RNA molecules and RNA/protein complexes play in an unexpected variety of biological processes, understanding of RNA structure-function relationships is of high current importance. To make clean biological interpretations from three-dimensional structures, it is imperative to have high-quality, accurate RNA crystal structures available, and the community has thoroughly embraced that goal. However, due to the many degrees of freedom inherent in RNA structure (especially for the backbone), it is a significant challenge to succeed in building accurate experimental models for RNA structures. This chapter describes the tools and techniques our research group and our collaborators have developed over the years to help RNA structural biologists both evaluate and achieve better accuracy. Expert analysis of large, high-resolution, quality-conscious RNA datasets provides the fundamental information that enables automated methods for robust and efficient error diagnosis in validating RNA structures at all resolutions. The even more crucial goal of correcting the diagnosed outliers has steadily developed toward highly effective, computationally based techniques. Automation enables solving complex issues in large RNA structures, but cannot circumvent the need for thoughtful examination of local details, and so we also provide some guidance for interpreting and acting on the results of current structure validation for RNA.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Agglomerative cluster analyses encompass many techniques, which have been widely used in various fields of science. In biology, and specifically ecology, datasets are generally highly variable and may contain outliers, which increase the difficulty to identify the number of clusters. Here we present a new criterion to determine statistically the optimal level of partition in a classification tree. The criterion robustness is tested against perturbated data (outliers) using an observation or variable with values randomly generated. The technique, called Random Simulation Test (RST), is tested on (1) the well-known Iris dataset [Fisher, R.A., 1936. The use of multiple measurements in taxonomic problems. Ann. Eugenic. 7, 179–188], (2) simulated data with predetermined numbers of clusters following Milligan and Cooper [Milligan, G.W., Cooper, M.C., 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179] and finally (3) is applied on real copepod communities data previously analyzed in Beaugrand et al. [Beaugrand, G., Ibanez, F., Lindley, J.A., Reid, P.C., 2002. Diversity of calanoid copepods in the North Atlantic and adjacent seas: species associations and biogeography. Mar. Ecol. Prog. Ser. 232, 179–195]. The technique is compared to several standard techniques. RST performed generally better than existing algorithms on simulated data and proved to be especially efficient with highly variable datasets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a novel approach based on the use of evolutionary agents for epipolar geometry estimation. In contrast to conventional nonlinear optimization methods, the proposed technique employs each agent to denote a minimal subset to compute the fundamental matrix, and considers the data set of correspondences as a 1D cellular environment, in which the agents inhabit and evolve. The agents execute some evolutionary behavior, and evolve autonomously in a vast solution space to reach the optimal (or near optima) result. Then three different techniques are proposed in order to improve the searching ability and computational efficiency of the original agents. Subset template enables agents to collaborate more efficiently with each other, and inherit accurate information from the whole agent set. Competitive evolutionary agent (CEA) and finite multiple evolutionary agent (FMEA) apply a better evolutionary strategy or decision rule, and focus on different aspects of the evolutionary process. Experimental results with both synthetic data and real images show that the proposed agent-based approaches perform better than other typical methods in terms of accuracy and speed, and are more robust to noise and outliers.