856 resultados para regression algorithm
Resumo:
Motivation: The immunogenicity of peptides depends on their ability to bind to MHC molecules. MHC binding affinity prediction methods can save significant amounts of experimental work. The class II MHC binding site is open at both ends, making epitope prediction difficult because of the multiple binding ability of long peptides. Results: An iterative self-consistent partial least squares (PLS)-based additive method was applied to a set of 66 pep- tides no longer than 16 amino acids, binding to DRB1*0401. A regression equation containing the quantitative contributions of the amino acids at each of the nine positions was generated. Its predictability was tested using two external test sets which gave r pred =0.593 and r pred=0.655, respectively. Furthermore, it was benchmarked using 25 known T-cell epitopes restricted by DRB1*0401 and we compared our results with four other online predictive methods. The additive method showed the best result finding 24 of the 25 T-cell epitopes. Availability: Peptides used in the study are available from http://www.jenner.ac.uk/JenPep. The PLS method is available commercially in the SYBYL molecular modelling software package. The final model for affinity prediction of peptides binding to DRB1*0401 molecule is available at http://www.jenner.ac.uk/MHCPred. Models developed for DRB1*0101 and DRB1*0701 also are available in MHC- Pred
Resumo:
General Regression Neuro-Fuzzy Network, which combines the properties of conventional General Regression Neural Network and Adaptive Network-based Fuzzy Inference System is proposed in this work. This network relates to so-called “memory-based networks”, which is adjusted by one-pass learning algorithm.
Resumo:
2010 Mathematics Subject Classification: 68T50,62H30,62J05.
Resumo:
Knowledge-based radiation treatment is an emerging concept in radiotherapy. It
mainly refers to the technique that can guide or automate treatment planning in
clinic by learning from prior knowledge. Dierent models are developed to realize
it, one of which is proposed by Yuan et al. at Duke for lung IMRT planning. This
model can automatically determine both beam conguration and optimization ob-
jectives with non-coplanar beams based on patient-specic anatomical information.
Although plans automatically generated by this model demonstrate equivalent or
better dosimetric quality compared to clinical approved plans, its validity and gener-
ality are limited due to the empirical assignment to a coecient called angle spread
constraint dened in the beam eciency index used for beam ranking. To eliminate
these limitations, a systematic study on this coecient is needed to acquire evidences
for its optimal value.
To achieve this purpose, eleven lung cancer patients with complex tumor shape
with non-coplanar beams adopted in clinical approved plans were retrospectively
studied in the frame of the automatic lung IMRT treatment algorithm. The primary
and boost plans used in three patients were treated as dierent cases due to the
dierent target size and shape. A total of 14 lung cases, thus, were re-planned using
the knowledge-based automatic lung IMRT planning algorithm by varying angle
spread constraint from 0 to 1 with increment of 0.2. A modied beam angle eciency
index used for navigate the beam selection was adopted. Great eorts were made to assure the quality of plans associated to every angle spread constraint as good
as possible. Important dosimetric parameters for PTV and OARs, quantitatively
re
ecting the plan quality, were extracted from the DVHs and analyzed as a function
of angle spread constraint for each case. Comparisons of these parameters between
clinical plans and model-based plans were evaluated by two-sampled Students t-tests,
and regression analysis on a composite index built on the percentage errors between
dosimetric parameters in the model-based plans and those in the clinical plans as a
function of angle spread constraint was performed.
Results show that model-based plans generally have equivalent or better quality
than clinical approved plans, qualitatively and quantitatively. All dosimetric param-
eters except those for lungs in the automatically generated plans are statistically
better or comparable to those in the clinical plans. On average, more than 15% re-
duction on conformity index and homogeneity index for PTV and V40, V60 for heart
while an 8% and 3% increase on V5, V20 for lungs, respectively, are observed. The
intra-plan comparison among model-based plans demonstrates that plan quality does
not change much with angle spread constraint larger than 0.4. Further examination
on the variation curve of the composite index as a function of angle spread constraint
shows that 0.6 is the optimal value that can result in statistically the best achievable
plans.
Resumo:
Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.
While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.
For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.
Resumo:
In recent years, a plethora of approaches have been proposed to deal with the increasingly challenging task of multi-output regression. This paper provides a survey on state-of-the-art multi-output regression methods, that are categorized as problem transformation and algorithm adaptation methods. In addition, we present the mostly used performance evaluation measures, publicly available data sets for multi-output regression real-world problems, as well as open-source software frameworks.
Resumo:
The cerebral cortex presents self-similarity in a proper interval of spatial scales, a property typical of natural objects exhibiting fractal geometry. Its complexity therefore can be characterized by the value of its fractal dimension (FD). In the computation of this metric, it has usually been employed a frequentist approach to probability, with point estimator methods yielding only the optimal values of the FD. In our study, we aimed at retrieving a more complete evaluation of the FD by utilizing a Bayesian model for the linear regression analysis of the box-counting algorithm. We used T1-weighted MRI data of 86 healthy subjects (age 44.2 ± 17.1 years, mean ± standard deviation, 48% males) in order to gain insights into the confidence of our measure and investigate the relationship between mean Bayesian FD and age. Our approach yielded a stronger and significant (P < .001) correlation between mean Bayesian FD and age as compared to the previous implementation. Thus, our results make us suppose that the Bayesian FD is a more truthful estimation for the fractal dimension of the cerebral cortex compared to the frequentist FD.
Resumo:
Split-plot design (SPD) and near-infrared chemical imaging were used to study the homogeneity of the drug paracetamol loaded in films and prepared from mixtures of the biocompatible polymers hydroxypropyl methylcellulose, polyvinylpyrrolidone, and polyethyleneglycol. The study was split into two parts: a partial least-squares (PLS) model was developed for a pixel-to-pixel quantification of the drug loaded into films. Afterwards, a SPD was developed to study the influence of the polymeric composition of films and the two process conditions related to their preparation (percentage of the drug in the formulations and curing temperature) on the homogeneity of the drug dispersed in the polymeric matrix. Chemical images of each formulation of the SPD were obtained by pixel-to-pixel predictions of the drug using the PLS model of the first part, and macropixel analyses were performed for each image to obtain the y-responses (homogeneity parameter). The design was modeled using PLS regression, allowing only the most relevant factors to remain in the final model. The interpretation of the SPD was enhanced by utilizing the orthogonal PLS algorithm, where the y-orthogonal variations in the design were separated from the y-correlated variation.
Resumo:
Lipidic mixtures present a particular phase change profile highly affected by their unique crystalline structure. However, classical solid-liquid equilibrium (SLE) thermodynamic modeling approaches, which assume the solid phase to be a pure component, sometimes fail in the correct description of the phase behavior. In addition, their inability increases with the complexity of the system. To overcome some of these problems, this study describes a new procedure to depict the SLE of fatty binary mixtures presenting solid solutions, namely the Crystal-T algorithm. Considering the non-ideality of both liquid and solid phases, this algorithm is aimed at the determination of the temperature in which the first and last crystal of the mixture melts. The evaluation is focused on experimental data measured and reported in this work for systems composed of triacylglycerols and fatty alcohols. The liquidus and solidus lines of the SLE phase diagrams were described by using excess Gibbs energy based equations, and the group contribution UNIFAC model for the calculation of the activity coefficients of both liquid and solid phases. Very low deviations of theoretical and experimental data evidenced the strength of the algorithm, contributing to the enlargement of the scope of the SLE modeling.
Resumo:
El Niño South Oscillation (ENSO) is one climatic phenomenon related to the inter-annual variability of global meteorological patterns influencing sea surface temperature and rainfall variability. It influences human health indirectly through extreme temperature and moisture conditions that may accelerate the spread of some vector-borne viral diseases, like dengue fever (DF). This work examines the spatial distribution of association between ENSO and DF in the countries of the Americas during 1995-2004, which includes the 1997-1998 El Niño, one of the most important climatic events of 20(th) century. Data regarding the South Oscillation index (SOI), indicating El Niño-La Niña activity, were obtained from Australian Bureau of Meteorology. The annual DF incidence (AIy) by country was computed using Pan-American Health Association data. SOI and AIy values were standardised as deviations from the mean and plotted in bars-line graphics. The regression coefficient values between SOI and AIy (rSOI,AI) were calculated and spatially interpolated by an inverse distance weighted algorithm. The results indicate that among the five years registering high number of cases (1998, 2002, 2001, 2003 and 1997), four had El Niño activity. In the southern hemisphere, the annual spatial weighted mean centre of epidemics moved southward, from 6° 31' S in 1995 to 21° 12' S in 1999 and the rSOI,AI values were negative in Cuba, Belize, Guyana and Costa Rica, indicating a synchrony between higher DF incidence rates and a higher El Niño activity. The rSOI,AI map allows visualisation of a graded surface with higher values of ENSO-DF associations for Mexico, Central America, northern Caribbean islands and the extreme north-northwest of South America.
Resumo:
A method using the ring-oven technique for pre-concentration in filter paper discs and near infrared hyperspectral imaging is proposed to identify four detergent and dispersant additives, and to determine their concentration in gasoline. Different approaches were used to select the best image data processing in order to gather the relevant spectral information. This was attained by selecting the pixels of the region of interest (ROI), using a pre-calculated threshold value of the PCA scores arranged as histograms, to select the spectra set; summing up the selected spectra to achieve representativeness; and compensating for the superimposed filter paper spectral information, also supported by scores histograms for each individual sample. The best classification model was achieved using linear discriminant analysis and genetic algorithm (LDA/GA), whose correct classification rate in the external validation set was 92%. Previous classification of the type of additive present in the gasoline is necessary to define the PLS model required for its quantitative determination. Considering that two of the additives studied present high spectral similarity, a PLS regression model was constructed to predict their content in gasoline, while two additional models were used for the remaining additives. The results for the external validation of these regression models showed a mean percentage error of prediction varying from 5 to 15%.
Resumo:
Conventional reflectance spectroscopy (NIRS) and hyperspectral imaging (HI) in the near-infrared region (1000-2500 nm) are evaluated and compared, using, as the case study, the determination of relevant properties related to the quality of natural rubber. Mooney viscosity (MV) and plasticity indices (PI) (PI0 - original plasticity, PI30 - plasticity after accelerated aging, and PRI - the plasticity retention index after accelerated aging) of rubber were determined using multivariate regression models. Two hundred and eighty six samples of rubber were measured using conventional and hyperspectral near-infrared imaging reflectance instruments in the range of 1000-2500 nm. The sample set was split into regression (n = 191) and external validation (n = 95) sub-sets. Three instruments were employed for data acquisition: a line scanning hyperspectral camera and two conventional FT-NIR spectrometers. Sample heterogeneity was evaluated using hyperspectral images obtained with a resolution of 150 × 150 μm and principal component analysis. The probed sample area (5 cm(2); 24,000 pixels) to achieve representativeness was found to be equivalent to the average of 6 spectra for a 1 cm diameter probing circular window of one FT-NIR instrument. The other spectrophotometer can probe the whole sample in only one measurement. The results show that the rubber properties can be determined with very similar accuracy and precision by Partial Least Square (PLS) regression models regardless of whether HI-NIR or conventional FT-NIR produce the spectral datasets. The best Root Mean Square Errors of Prediction (RMSEPs) of external validation for MV, PI0, PI30, and PRI were 4.3, 1.8, 3.4, and 5.3%, respectively. Though the quantitative results provided by the three instruments can be considered equivalent, the hyperspectral imaging instrument presents a number of advantages, being about 6 times faster than conventional bulk spectrometers, producing robust spectral data by ensuring sample representativeness, and minimizing the effect of the presence of contaminants.
Resumo:
PURPOSE: To compare the Full Threshold (FT) and SITA Standard (SS) strategies in glaucomatous patients undergoing automated perimetry for the first time. METHODS: Thirty-one glaucomatous patients who had never undergone perimetry underwent automated perimetry (Humphrey, program 30-2) with both FT and SS on the same day, with an interval of at least 15 minutes. The order of the examination was randomized, and only one eye per patient was analyzed. Three analyses were performed: a) all the examinations, regardless of the order of application; b) only the first examinations; c) only the second examinations. In order to calculate the sensitivity of both strategies, the following criteria were used to define abnormality: glaucoma hemifield test (GHT) outside normal limits, pattern standard deviation (PSD) <5%, or a cluster of 3 adjacent points with p<5% at the pattern deviation probability plot. RESULTS: When the results of all examinations were analyzed regardless of the order in which they were performed, the number of depressed points with p<0.5% in the pattern deviation probability map was significantly greater with SS (p=0.037), and the sensitivities were 87.1% for SS and 77.4% for FT (p=0.506). When only the first examinations were compared, there were no statistically significant differences regarding the number of depressed points, but the sensitivity of SS (100%) was significantly greater than that obtained with FT (70.6%) (p=0.048). When only the second examinations were compared, there were no statistically significant differences regarding the number of depressed points, and the sensitivities of SS (76.5%) and FT (85.7%) (p=0.664). CONCLUSION: SS may have a higher sensitivity than FT in glaucomatous patients undergoing automated perimetry for the first time. However, this difference tends to disappear in subsequent examinations.
Resumo:
Objective To evaluate the occurrence of severe obstetric complications associated with antepartum and intrapartum hemorrhage among women from the Brazilian Network for Surveillance of Severe Maternal Morbidity.Design Multicenter cross-sectional study.Setting Twenty-seven obstetric referral units in Brazil between July 2009 and June 2010.Population A total of 9555 women categorized as having obstetric complications.Methods The occurrence of potentially life-threatening conditions, maternal near miss and maternal deaths associated with antepartum and intrapartum hemorrhage was evaluated. Sociodemographic and obstetric characteristics and the use of criteria for management of severe bleeding were also assessed in these women.Main outcome measures The prevalence ratios with their respective 95% confidence intervals adjusted for the cluster effect of the design, and multiple logistic regression analysis were performed to identify factors independently associated with the occurrence of severe maternal outcome.Results Antepartum and intrapartum hemorrhage occurred in only 8% (767) of women experiencing any type of obstetric complication. However, it was responsible for 18.2% (140) of maternal near miss and 10% (14) of maternal death cases. On multivariate analysis, maternal age and previous cesarean section were shown to be independently associated with an increased risk of severe maternal outcome (near miss or death).Conclusion Severe maternal outcome due to antepartum and intrapartum hemorrhage was highly prevalent among Brazilian women. Certain risk factors, maternal age and previous cesarean delivery in particular, were associated with the occurrence of bleeding.
Resumo:
The troglobitic armored catfish, Ancistrus cryptophthalmus (Loricariidae, Ancistrinae) is known from four caves in the São Domingos karst area, upper rio Tocantins basin, Central Brazil. These populations differ in general body shape and degree of reduction of eyes and of pigmentation. The small Passa Três population (around 1,000 individuals) presents the most reduced eyes, which are not externally visible in adults. A small group of Passa Três catfish, one male and three females, reproduced spontaneously thrice in laboratory, at the end of summertime in 2000, 2003 and 2004. Herein we describe the reproductive behavior during the 2003 event, as well as the early development of the 2003 and 2004 offsprings, with focus on body growth and ontogenetic regression of eyes. The parental care by the male, which includes defense of the rock shelter where the egg clutch is laid, cleaning and oxygenation of eggs, is typical of many loricariids. On the other hand, the slow development, including delayed eye degeneration, low body growth rates and high estimated longevity (15 years or more) are characteristic of precocial, or K-selected, life cycles. In the absence of comparable data for close epigean relatives (Ancistrus spp.), it is not possible to establish whether these features are an autapomorphic specialization of the troglobitic A. cryptophthalmus or a plesiomorphic trait already present in the epigean ancestor, possibly favoring the adoption of the life in the food-poor cave environment. We briefly discuss the current hypotheses on eye regression in troglobitic vertebrates.