854 resultados para data gathering algorithm


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Advanced Very High Resolution Radiometer (AVHRR) carried on board the National Oceanic and Atmospheric Administration (NOAA) and the Meteorological Operational Satellite (MetOp) polar orbiting satellites is the only instrument offering more than 25 years of satellite data to analyse aerosols on a daily basis. The present study assessed a modified AVHRR aerosol optical depth τa retrieval over land for Europe. The algorithm might also be applied to other parts of the world with similar surface characteristics like Europe, only the aerosol properties would have to be adapted to a new region. The initial approach used a relationship between Sun photometer measurements from the Aerosol Robotic Network (AERONET) and the satellite data to post-process the retrieved τa. Herein a quasi-stand-alone procedure, which is more suitable for the pre-AERONET era, is presented. In addition, the estimation of surface reflectance, the aerosol model, and other processing steps have been adapted. The method's cross-platform applicability was tested by validating τa from NOAA-17 and NOAA-18 AVHRR at 15 AERONET sites in Central Europe (40.5° N–50° N, 0° E–17° E) from August 2005 to December 2007. Furthermore, the accuracy of the AVHRR retrieval was related to products from two newer instruments, the Medium Resolution Imaging Spectrometer (MERIS) on board the Environmental Satellite (ENVISAT) and the Moderate Resolution Imaging Spectroradiometer (MODIS) on board Aqua/Terra. Considering the linear correlation coefficient R, the AVHRR results were similar to those of MERIS with even lower root mean square error RMSE. Not surprisingly, MODIS, with its high spectral coverage, gave the highest R and lowest RMSE. Regarding monthly averaged τa, the results were ambiguous. Focusing on small-scale structures, R was reduced for all sensors, whereas the RMSE solely for MERIS substantially increased. Regarding larger areas like Central Europe, the error statistics were similar to the individual match-ups. This was mainly explained with sampling issues. With the successful validation of AVHRR we are now able to concentrate on our large data archive dating back to 1985. This is a unique opportunity for both climate and air pollution studies over land surfaces.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The main objective of this paper is to discuss various aspects of implementing a specific intrusion-detection scheme on a micro-computer system using fixed-point arithmetic. The proposed scheme is suitable for detecting intruder stimuli which are in the form of transient signals. It consists of two stages: an adaptive digital predictor and an adaptive threshold detection algorithm. Experimental results involving data acquired via field experiments are also included.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Despite numerous studies about nitrogen-cycling in forest ecosystems, many uncertainties remain, especially regarding the longer-term nitrogen accumulation. To contribute to filling this gap, the dynamic process-based model TRACE, with the ability to simulate 15N tracer redistribution in forest ecosystems was used to study N cycling processes in a mountain spruce forest of the northern edge of the Alps in Switzerland (Alptal, SZ). Most modeling analyses of N-cycling and C-N interactions have very limited ability to determine whether the process interactions are captured correctly. Because the interactions in such a system are complex, it is possible to get the whole-system C and N cycling right in a model without really knowing if the way the model combines fine-scale interactions to derive whole-system cycling is correct. With the possibility to simulate 15N tracer redistribution in ecosystem compartments, TRACE features a very powerful tool for the validation of fine-scale processes captured by the model. We first adapted the model to the new site (Alptal, Switzerland; long-term low-dose N-amendment experiment) by including a new algorithm for preferential water flow and by parameterizing of differences in drivers such as climate, N deposition and initial site conditions. After the calibration of key rates such as NPP and SOM turnover, we simulated patterns of 15N redistribution to compare against 15N field observations from a large-scale labeling experiment. The comparison of 15N field data with the modeled redistribution of the tracer in the soil horizons and vegetation compartments shows that the majority of fine-scale processes are captured satisfactorily. Particularly, the model is able to reproduce the fact that the largest part of the N deposition is immobilized in the soil. The discrepancies of 15N recovery in the LF and M soil horizon can be explained by the application method of the tracer and by the retention of the applied tracer by the well developed moss layer, which is not considered in the model. Discrepancies in the dynamics of foliage and litterfall 15N recovery were also observed and are related to the longevity of the needles in our mountain forest. As a next step, we will use the final Alptal version of the model to calculate the effects of climate change (temperature, CO2) and N deposition on ecosystem C sequestration in this regionally representative Norway spruce (Picea abies) stand.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Optical coherence tomography (OCT) is a well-established image modality in ophthalmology and used daily in the clinic. Automatic evaluation of such datasets requires an accurate segmentation of the retinal cell layers. However, due to the naturally low signal to noise ratio and the resulting bad image quality, this task remains challenging. We propose an automatic graph-based multi-surface segmentation algorithm that internally uses soft constraints to add prior information from a learned model. This improves the accuracy of the segmentation and increase the robustness to noise. Furthermore, we show that the graph size can be greatly reduced by applying a smart segmentation scheme. This allows the segmentation to be computed in seconds instead of minutes, without deteriorating the segmentation accuracy, making it ideal for a clinical setup. An extensive evaluation on 20 OCT datasets of healthy eyes was performed and showed a mean unsigned segmentation error of 3.05 ±0.54 μm over all datasets when compared to the average observer, which is lower than the inter-observer variability. Similar performance was measured for the task of drusen segmentation, demonstrating the usefulness of using soft constraints as a tool to deal with pathologies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Independent component analysis (ICA) or seed based approaches (SBA) in functional magnetic resonance imaging blood oxygenation level dependent (BOLD) data became widely applied tools to identify functionally connected, large scale brain networks. Differences between task conditions as well as specific alterations of the networks in patients as compared to healthy controls were reported. However, BOLD lacks the possibility of quantifying absolute network metabolic activity, which is of particular interest in the case of pathological alterations. In contrast, arterial spin labeling (ASL) techniques allow quantifying absolute cerebral blood flow (CBF) in rest and in task-related conditions. In this study, we explored the ability of identifying networks in ASL data using ICA and to quantify network activity in terms of absolute CBF values. Moreover, we compared the results to SBA and performed a test-retest analysis. Twelve healthy young subjects performed a fingertapping block-design experiment. During the task pseudo-continuous ASL was measured. After CBF quantification the individual datasets were concatenated and subjected to the ICA algorithm. ICA proved capable to identify the somato-motor and the default mode network. Moreover, absolute network CBF within the separate networks during either condition could be quantified. We could demonstrate that using ICA and SBA functional connectivity analysis is feasible and robust in ASL-CBF data. CBF functional connectivity is a novel approach that opens a new strategy to evaluate differences of network activity in terms of absolute network CBF and thus allows quantifying inter-individual differences in the resting state and task-related activations and deactivations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We derive a new class of iterative schemes for accelerating the convergence of the EM algorithm, by exploiting the connection between fixed point iterations and extrapolation methods. First, we present a general formulation of one-step iterative schemes, which are obtained by cycling with the extrapolation methods. We, then square the one-step schemes to obtain the new class of methods, which we call SQUAREM. Squaring a one-step iterative scheme is simply applying it twice within each cycle of the extrapolation method. Here we focus on the first order or rank-one extrapolation methods for two reasons, (1) simplicity, and (2) computational efficiency. In particular, we study two first order extrapolation methods, the reduced rank extrapolation (RRE1) and minimal polynomial extrapolation (MPE1). The convergence of the new schemes, both one-step and squared, is non-monotonic with respect to the residual norm. The first order one-step and SQUAREM schemes are linearly convergent, like the EM algorithm but they have a faster rate of convergence. We demonstrate, through five different examples, the effectiveness of the first order SQUAREM schemes, SqRRE1 and SqMPE1, in accelerating the EM algorithm. The SQUAREM schemes are also shown to be vastly superior to their one-step counterparts, RRE1 and MPE1, in terms of computational efficiency. The proposed extrapolation schemes can fail due to the numerical problems of stagnation and near breakdown. We have developed a new hybrid iterative scheme that combines the RRE1 and MPE1 schemes in such a manner that it overcomes both stagnation and near breakdown. The squared first order hybrid scheme, SqHyb1, emerges as the iterative scheme of choice based on our numerical experiments. It combines the fast convergence of the SqMPE1, while avoiding near breakdowns, with the stability of SqRRE1, while avoiding stagnations. The SQUAREM methods can be incorporated very easily into an existing EM algorithm. They only require the basic EM step for their implementation and do not require any other auxiliary quantities such as the complete data log likelihood, and its gradient or hessian. They are an attractive option in problems with a very large number of parameters, and in problems where the statistical model is complex, the EM algorithm is slow and each EM step is computationally demanding.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With recent advances in mass spectrometry techniques, it is now possible to investigate proteins over a wide range of molecular weights in small biological specimens. This advance has generated data-analytic challenges in proteomics, similar to those created by microarray technologies in genetics, namely, discovery of "signature" protein profiles specific to each pathologic state (e.g., normal vs. cancer) or differential profiles between experimental conditions (e.g., treated by a drug of interest vs. untreated) from high-dimensional data. We propose a data analytic strategy for discovering protein biomarkers based on such high-dimensional mass-spectrometry data. A real biomarker-discovery project on prostate cancer is taken as a concrete example throughout the paper: the project aims to identify proteins in serum that distinguish cancer, benign hyperplasia, and normal states of prostate using the Surface Enhanced Laser Desorption/Ionization (SELDI) technology, a recently developed mass spectrometry technique. Our data analytic strategy takes properties of the SELDI mass-spectrometer into account: the SELDI output of a specimen contains about 48,000 (x, y) points where x is the protein mass divided by the number of charges introduced by ionization and y is the protein intensity of the corresponding mass per charge value, x, in that specimen. Given high coefficients of variation and other characteristics of protein intensity measures (y values), we reduce the measures of protein intensities to a set of binary variables that indicate peaks in the y-axis direction in the nearest neighborhoods of each mass per charge point in the x-axis direction. We then account for a shifting (measurement error) problem of the x-axis in SELDI output. After these pre-analysis processing of data, we combine the binary predictors to generate classification rules for cancer, benign hyperplasia, and normal states of prostate. Our approach is to apply the boosting algorithm to select binary predictors and construct a summary classifier. We empirically evaluate sensitivity and specificity of the resulting summary classifiers with a test dataset that is independent from the training dataset used to construct the summary classifiers. The proposed method performed nearly perfectly in distinguishing cancer and benign hyperplasia from normal. In the classification of cancer vs. benign hyperplasia, however, an appreciable proportion of the benign specimens were classified incorrectly as cancer. We discuss practical issues associated with our proposed approach to the analysis of SELDI output and its application in cancer biomarker discovery.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In natural history studies of chronic disease, it is of interest to understand the evolution of key variables that measure aspects of disease progression. This is particularly true for immunological variables in persons infected with the Human Immunodeficiency Virus (HIV). The natural timescale for such studies is time since infection. However, most data available for analysis arise from prevalent cohorts, where the date of infection is unknown for most or all individuals. As a result, standard curve fitting algorithms are not immediately applicable. Here we propose two methods to circumvent this difficulty. The first uses repeated measurement data to provide information not only on the level of the variable of interest, but also on its rate of change, while the second uses an estimate of the expected time since infection. Both methods are based on the principal curves algorithm of Hastie and Stuetzle, and are applied to data from a prevalent cohort of HIV-infected homosexual men, giving estimates of the average pattern of CD4+ lymphocyte decline. These methods are applicable to natural history studies using data from prevalent cohorts where the time of disease origin is uncertain, provided certain ancillary information is available from external sources.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider nonparametric missing data models for which the censoring mechanism satisfies coarsening at random and which allow complete observations on the variable X of interest. W show that beyond some empirical process conditions the only essential condition for efficiency of an NPMLE of the distribution of X is that the regions associated with incomplete observations on X contain enough complete observations. This is heuristically explained by describing the EM-algorithm. We provide identifiably of the self-consistency equation and efficiency of the NPMLE in order to make this statement rigorous. The usual kind of differentiability conditions in the proof are avoided by using an identity which holds for the NPMLE of linear parameters in convex models. We provide a bivariate censoring application in which the condition and hence the NPMLE fails, but where other estimators, not based on the NPMLE principle, are highly inefficient. It is shown how to slightly reduce the data so that the conditions hold for the reduced data. The conditions are verified for the univariate censoring, double censored, and Ibragimov-Has'minski models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we focus on the model for two types of tumors. Tumor development can be described by four types of death rates and four tumor transition rates. We present a general semi-parametric model to estimate the tumor transition rates based on data from survival/sacrifice experiments. In the model, we make a proportional assumption of tumor transition rates on a common parametric function but no assumption of the death rates from any states. We derived the likelihood function of the data observed in such an experiment, and an EM algorithm that simplified estimating procedures. This article extends work on semi-parametric models for one type of tumor (see Portier and Dinse and Dinse) to two types of tumors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Genomic alterations have been linked to the development and progression of cancer. The technique of Comparative Genomic Hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array-CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for algorithms that can identify gains and losses in the number of copies based on statistical considerations, rather than merely detect trends in the data. We adopt a Bayesian approach, relying on the hidden Markov model to account for the inherent dependence in the intensity ratios. Posterior inferences are made about gains and losses in copy number. Localized amplifications (associated with oncogene mutations) and deletions (associated with mutations of tumor suppressors) are identified using posterior probabilities. Global trends such as extended regions of altered copy number are detected. Since the posterior distribution is analytically intractable, we implement a Metropolis-within-Gibbs algorithm for efficient simulation-based inference. Publicly available data on pancreatic adenocarcinoma, glioblastoma multiforme and breast cancer are analyzed, and comparisons are made with some widely-used algorithms to illustrate the reliability and success of the technique.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose a new method for fitting proportional hazards models with error-prone covariates. Regression coefficients are estimated by solving an estimating equation that is the average of the partial likelihood scores based on imputed true covariates. For the purpose of imputation, a linear spline model is assumed on the baseline hazard. We discuss consistency and asymptotic normality of the resulting estimators, and propose a stochastic approximation scheme to obtain the estimates. The algorithm is easy to implement, and reduces to the ordinary Cox partial likelihood approach when the measurement error has a degenerative distribution. Simulations indicate high efficiency and robustness. We consider the special case where error-prone replicates are available on the unobserved true covariates. As expected, increasing the number of replicate for the unobserved covariates increases efficiency and reduces bias. We illustrate the practical utility of the proposed method with an Eastern Cooperative Oncology Group clinical trial where a genetic marker, c-myc expression level, is subject to measurement error.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multiple outcomes data are commonly used to characterize treatment effects in medical research, for instance, multiple symptoms to characterize potential remission of a psychiatric disorder. Often either a global, i.e. symptom-invariant, treatment effect is evaluated. Such a treatment effect may over generalize the effect across the outcomes. On the other hand individual treatment effects, varying across all outcomes, are complicated to interpret, and their estimation may lose precision relative to a global summary. An effective compromise to summarize the treatment effect may be through patterns of the treatment effects, i.e. "differentiated effects." In this paper we propose a two-category model to differentiate treatment effects into two groups. A model fitting algorithm and simulation study are presented, and several methods are developed to analyze heterogeneity presenting in the treatment effects. The method is illustrated using an analysis of schizophrenia symptom data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decade’s worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show RNA-seq data demonstrates unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find GC-content has a strong sample specific effect on gene expression measurements that, if left uncorrected, leads to false positives in downstream results. We also report on commonly observed data distortions that demonstrate the need for data normalization. Here we describe statistical methodology that improves precision by 42% without loss of accuracy. Our resulting conditional quantile normalization (CQN) algorithm combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content, and quantile normalization to correct for global distortions.