5 resultados para extremal polynomials

em Duke University


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Extremal quantile index is a concept that the quantile index will drift to zero (or one)

as the sample size increases. The three chapters of my dissertation consists of three

applications of this concept in three distinct econometric problems. In Chapter 2, I

use the concept of extremal quantile index to derive new asymptotic properties and

inference method for quantile treatment effect estimators when the quantile index

of interest is close to zero. In Chapter 3, I rely on the concept of extremal quantile

index to achieve identification at infinity of the sample selection models and propose

a new inference method. Last, in Chapter 4, I use the concept of extremal quantile

index to define an asymptotic trimming scheme which can be used to control the

convergence rate of the estimator of the intercept of binary response models.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a theory of hypoellipticity and unique ergodicity for semilinear parabolic stochastic PDEs with "polynomial" nonlinearities and additive noise, considered as abstract evolution equations in some Hilbert space. It is shown that if Hörmander's bracket condition holds at every point of this Hilbert space, then a lower bound on the Malliavin covariance operatorμt can be obtained. Informally, this bound can be read as "Fix any finite-dimensional projection on a subspace of sufficiently regular functions. Then the eigenfunctions of μt with small eigenvalues have only a very small component in the image of Π." We also show how to use a priori bounds on the solutions to the equation to obtain good control on the dependency of the bounds on the Malliavin matrix on the initial condition. These bounds are sufficient in many cases to obtain the asymptotic strong Feller property introduced in [HM06]. One of the main novel technical tools is an almost sure bound from below on the size of "Wiener polynomials," where the coefficients are possibly non-adapted stochastic processes satisfying a Lips chitz condition. By exploiting the polynomial structure of the equations, this result can be used to replace Norris' lemma, which is unavailable in the present context. We conclude by showing that the two-dimensional stochastic Navier-Stokes equations and a large class of reaction-diffusion equations fit the framework of our theory.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Histopathology is the clinical standard for tissue diagnosis. However, histopathology has several limitations including that it requires tissue processing, which can take 30 minutes or more, and requires a highly trained pathologist to diagnose the tissue. Additionally, the diagnosis is qualitative, and the lack of quantitation leads to possible observer-specific diagnosis. Taken together, it is difficult to diagnose tissue at the point of care using histopathology.

Several clinical situations could benefit from more rapid and automated histological processing, which could reduce the time and the number of steps required between obtaining a fresh tissue specimen and rendering a diagnosis. For example, there is need for rapid detection of residual cancer on the surface of tumor resection specimens during excisional surgeries, which is known as intraoperative tumor margin assessment. Additionally, rapid assessment of biopsy specimens at the point-of-care could enable clinicians to confirm that a suspicious lesion is successfully sampled, thus preventing an unnecessary repeat biopsy procedure. Rapid and low cost histological processing could also be potentially useful in settings lacking the human resources and equipment necessary to perform standard histologic assessment. Lastly, automated interpretation of tissue samples could potentially reduce inter-observer error, particularly in the diagnosis of borderline lesions.

To address these needs, high quality microscopic images of the tissue must be obtained in rapid timeframes, in order for a pathologic assessment to be useful for guiding the intervention. Optical microscopy is a powerful technique to obtain high-resolution images of tissue morphology in real-time at the point of care, without the need for tissue processing. In particular, a number of groups have combined fluorescence microscopy with vital fluorescent stains to visualize micro-anatomical features of thick (i.e. unsectioned or unprocessed) tissue. However, robust methods for segmentation and quantitative analysis of heterogeneous images are essential to enable automated diagnosis. Thus, the goal of this work was to obtain high resolution imaging of tissue morphology through employing fluorescence microscopy and vital fluorescent stains and to develop a quantitative strategy to segment and quantify tissue features in heterogeneous images, such as nuclei and the surrounding stroma, which will enable automated diagnosis of thick tissues.

To achieve these goals, three specific aims were proposed. The first aim was to develop an image processing method that can differentiate nuclei from background tissue heterogeneity and enable automated diagnosis of thick tissue at the point of care. A computational technique called sparse component analysis (SCA) was adapted to isolate features of interest, such as nuclei, from the background. SCA has been used previously in the image processing community for image compression, enhancement, and restoration, but has never been applied to separate distinct tissue types in a heterogeneous image. In combination with a high resolution fluorescence microendoscope (HRME) and a contrast agent acriflavine, the utility of this technique was demonstrated through imaging preclinical sarcoma tumor margins. Acriflavine localizes to the nuclei of cells where it reversibly associates with RNA and DNA. Additionally, acriflavine shows some affinity for collagen and muscle. SCA was adapted to isolate acriflavine positive features or APFs (which correspond to RNA and DNA) from background tissue heterogeneity. The circle transform (CT) was applied to the SCA output to quantify the size and density of overlapping APFs. The sensitivity of the SCA+CT approach to variations in APF size, density and background heterogeneity was demonstrated through simulations. Specifically, SCA+CT achieved the lowest errors for higher contrast ratios and larger APF sizes. When applied to tissue images of excised sarcoma margins, SCA+CT correctly isolated APFs and showed consistently increased density in tumor and tumor + muscle images compared to images containing muscle. Next, variables were quantified from images of resected primary sarcomas and used to optimize a multivariate model. The sensitivity and specificity for differentiating positive from negative ex vivo resected tumor margins was 82% and 75%. The utility of this approach was further tested by imaging the in vivo tumor cavities from 34 mice after resection of a sarcoma with local recurrence as a bench mark. When applied prospectively to images from the tumor cavity, the sensitivity and specificity for differentiating local recurrence was 78% and 82%. The results indicate that SCA+CT can accurately delineate APFs in heterogeneous tissue, which is essential to enable automated and rapid surveillance of tissue pathology.

Two primary challenges were identified in the work in aim 1. First, while SCA can be used to isolate features, such as APFs, from heterogeneous images, its performance is limited by the contrast between APFs and the background. Second, while it is feasible to create mosaics by scanning a sarcoma tumor bed in a mouse, which is on the order of 3-7 mm in any one dimension, it is not feasible to evaluate an entire human surgical margin. Thus, improvements to the microscopic imaging system were made to (1) improve image contrast through rejecting out-of-focus background fluorescence and to (2) increase the field of view (FOV) while maintaining the sub-cellular resolution needed for delineation of nuclei. To address these challenges, a technique called structured illumination microscopy (SIM) was employed in which the entire FOV is illuminated with a defined spatial pattern rather than scanning a focal spot, such as in confocal microscopy.

Thus, the second aim was to improve image contrast and increase the FOV through employing wide-field, non-contact structured illumination microscopy and optimize the segmentation algorithm for new imaging modality. Both image contrast and FOV were increased through the development of a wide-field fluorescence SIM system. Clear improvement in image contrast was seen in structured illumination images compared to uniform illumination images. Additionally, the FOV is over 13X larger than the fluorescence microendoscope used in aim 1. Initial segmentation results of SIM images revealed that SCA is unable to segment large numbers of APFs in the tumor images. Because the FOV of the SIM system is over 13X larger than the FOV of the fluorescence microendoscope, dense collections of APFs commonly seen in tumor images could no longer be sparsely represented, and the fundamental sparsity assumption associated with SCA was no longer met. Thus, an algorithm called maximally stable extremal regions (MSER) was investigated as an alternative approach for APF segmentation in SIM images. MSER was able to accurately segment large numbers of APFs in SIM images of tumor tissue. In addition to optimizing MSER for SIM image segmentation, an optimal frequency of the illumination pattern used in SIM was carefully selected because the image signal to noise ratio (SNR) is dependent on the grid frequency. A grid frequency of 31.7 mm-1 led to the highest SNR and lowest percent error associated with MSER segmentation.

Once MSER was optimized for SIM image segmentation and the optimal grid frequency was selected, a quantitative model was developed to diagnose mouse sarcoma tumor margins that were imaged ex vivo with SIM. Tumor margins were stained with acridine orange (AO) in aim 2 because AO was found to stain the sarcoma tissue more brightly than acriflavine. Both acriflavine and AO are intravital dyes, which have been shown to stain nuclei, skeletal muscle, and collagenous stroma. A tissue-type classification model was developed to differentiate localized regions (75x75 µm) of tumor from skeletal muscle and adipose tissue based on the MSER segmentation output. Specifically, a logistic regression model was used to classify each localized region. The logistic regression model yielded an output in terms of probability (0-100%) that tumor was located within each 75x75 µm region. The model performance was tested using a receiver operator characteristic (ROC) curve analysis that revealed 77% sensitivity and 81% specificity. For margin classification, the whole margin image was divided into localized regions and this tissue-type classification model was applied. In a subset of 6 margins (3 negative, 3 positive), it was shown that with a tumor probability threshold of 50%, 8% of all regions from negative margins exceeded this threshold, while over 17% of all regions exceeded the threshold in the positive margins. Thus, 8% of regions in negative margins were considered false positives. These false positive regions are likely due to the high density of APFs present in normal tissues, which clearly demonstrates a challenge in implementing this automatic algorithm based on AO staining alone.

Thus, the third aim was to improve the specificity of the diagnostic model through leveraging other sources of contrast. Modifications were made to the SIM system to enable fluorescence imaging at a variety of wavelengths. Specifically, the SIM system was modified to enabling imaging of red fluorescent protein (RFP) expressing sarcomas, which were used to delineate the location of tumor cells within each image. Initial analysis of AO stained panels confirmed that there was room for improvement in tumor detection, particularly in regards to false positive regions that were negative for RFP. One approach for improving the specificity of the diagnostic model was to investigate using a fluorophore that was more specific to staining tumor. Specifically, tetracycline was selected because it appeared to specifically stain freshly excised tumor tissue in a matter of minutes, and was non-toxic and stable in solution. Results indicated that tetracycline staining has promise for increasing the specificity of tumor detection in SIM images of a preclinical sarcoma model and further investigation is warranted.

In conclusion, this work presents the development of a combination of tools that is capable of automated segmentation and quantification of micro-anatomical images of thick tissue. When compared to the fluorescence microendoscope, wide-field multispectral fluorescence SIM imaging provided improved image contrast, a larger FOV with comparable resolution, and the ability to image a variety of fluorophores. MSER was an appropriate and rapid approach to segment dense collections of APFs from wide-field SIM images. Variables that reflect the morphology of the tissue, such as the density, size, and shape of nuclei and nucleoli, can be used to automatically diagnose SIM images. The clinical utility of SIM imaging and MSER segmentation to detect microscopic residual disease has been demonstrated by imaging excised preclinical sarcoma margins. Ultimately, this work demonstrates that fluorescence imaging of tissue micro-anatomy combined with a specialized algorithm for delineation and quantification of features is a means for rapid, non-destructive and automated detection of microscopic disease, which could improve cancer management in a variety of clinical scenarios.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Intraoperative assessment of surgical margins is critical to ensuring residual tumor does not remain in a patient. Previously, we developed a fluorescence structured illumination microscope (SIM) system with a single-shot field of view (FOV) of 2.1 × 1.6 mm (3.4 mm2) and sub-cellular resolution (4.4 μm). The goal of this study was to test the utility of this technology for the detection of residual disease in a genetically engineered mouse model of sarcoma. Primary soft tissue sarcomas were generated in the hindlimb and after the tumor was surgically removed, the relevant margin was stained with acridine orange (AO), a vital stain that brightly stains cell nuclei and fibrous tissues. The tissues were imaged with the SIM system with the primary goal of visualizing fluorescent features from tumor nuclei. Given the heterogeneity of the background tissue (presence of adipose tissue and muscle), an algorithm known as maximally stable extremal regions (MSER) was optimized and applied to the images to specifically segment nuclear features. A logistic regression model was used to classify a tissue site as positive or negative by calculating area fraction and shape of the segmented features that were present and the resulting receiver operator curve (ROC) was generated by varying the probability threshold. Based on the ROC curves, the model was able to classify tumor and normal tissue with 77% sensitivity and 81% specificity (Youden's index). For an unbiased measure of the model performance, it was applied to a separate validation dataset that resulted in 73% sensitivity and 80% specificity. When this approach was applied to representative whole margins, for a tumor probability threshold of 50%, only 1.2% of all regions from the negative margin exceeded this threshold, while over 14.8% of all regions from the positive margin exceeded this threshold.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.

Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.

One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.

Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.

The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.

Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.