124 resultados para Sample algorithms
em Université de Lausanne, Switzerland
Resumo:
Introduction: As part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature-selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically relevant endpoints. Methods: We used gene-expression data from 230 breast cancers (grouped into training and independent validation sets), and we examined 40 predictors (five univariate feature-selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set by using two different resampling methods and compared with the accuracy observed in the independent validation set. Results: A ranking of the three classification problems was obtained, and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than were the cross-validation estimates. The required sample size for each endpoint was estimated, and both gene-level and pathway-level analyses were performed on the obtained models. Conclusions: We showed that genomic predictor accuracy is determined largely by an interplay between sample size and classification difficulty. Variations on univariate feature-selection methods and choice of classification algorithm have only a modest impact on predictor performance, and several statistically equally good predictors can be developed for any given classification problem.
Resumo:
BACKGROUND/OBJECTIVES: (1) To cross-validate tetra- (4-BIA) and octopolar (8-BIA) bioelectrical impedance analysis vs dual-energy X-ray absorptiometry (DXA) for the assessment of total and appendicular body composition and (2) to evaluate the accuracy of external 4-BIA algorithms for the prediction of total body composition, in a representative sample of Swiss children. SUBJECTS/METHODS: A representative sample of 333 Swiss children aged 6-13 years from the Kinder-Sportstudie (KISS) (ISRCTN15360785). Whole-body fat-free mass (FFM) and appendicular lean tissue mass were measured with DXA. Body resistance (R) was measured at 50 kHz with 4-BIA and segmental body resistance at 5, 50, 250 and 500 kHz with 8-BIA. The resistance index (RI) was calculated as height(2)/R. Selection of predictors (gender, age, weight, RI4 and RI8) for BIA algorithms was performed using bootstrapped stepwise linear regression on 1000 samples. We calculated 95% confidence intervals (CI) of regression coefficients and measures of model fit using bootstrap analysis. Limits of agreement were used as measures of interchangeability of BIA with DXA. RESULTS: 8-BIA was more accurate than 4-BIA for the assessment of FFM (root mean square error (RMSE)=0.90 (95% CI 0.82-0.98) vs 1.12 kg (1.01-1.24); limits of agreement 1.80 to -1.80 kg vs 2.24 to -2.24 kg). 8-BIA also gave accurate estimates of appendicular body composition, with RMSE < or = 0.10 kg for arms and < or = 0.24 kg for legs. All external 4-BIA algorithms performed poorly with substantial negative proportional bias (r> or = 0.48, P<0.001). CONCLUSIONS: In a representative sample of young Swiss children (1) 8-BIA was superior to 4-BIA for the prediction of FFM, (2) external 4-BIA algorithms gave biased predictions of FFM and (3) 8-BIA was an accurate predictor of segmental body composition.
Resumo:
In order to distinguish dysfunctional gait; clinicians require a measure of reference gait parameters for each population. This study provided normative values for widely used parameters in more than 1400 able-bodied adults over the age of 65. We also measured the foot clearance parameters (i.e., height of the foot above ground during swing phase) that are crucial to understand the complex relationship between gait and falls as well as obstacle negotiation strategies. We used a shoe-worn inertial sensor on each foot and previously validated algorithms to extract the gait parameters during 20 m walking trials in a corridor at a self-selected pace. We investigated the difference of the gait parameters between male and female participants by considering the effect of age and height factors. Besides; we examined the inter-relation of the clearance parameters with the gait speed. The sample size and breadth of gait parameters provided in this study offer a unique reference resource for the researchers.
Resumo:
A wide range of modelling algorithms is used by ecologists, conservation practitioners, and others to predict species ranges from point locality data. Unfortunately, the amount of data available is limited for many taxa and regions, making it essential to quantify the sensitivity of these algorithms to sample size. This is the first study to address this need by rigorously evaluating a broad suite of algorithms with independent presence-absence data from multiple species and regions. We evaluated predictions from 12 algorithms for 46 species (from six different regions of the world) at three sample sizes (100, 30, and 10 records). We used data from natural history collections to run the models, and evaluated the quality of model predictions with area under the receiver operating characteristic curve (AUC). With decreasing sample size, model accuracy decreased and variability increased across species and between models. Novel modelling methods that incorporate both interactions between predictor variables and complex response shapes (i.e. GBM, MARS-INT, BRUTO) performed better than most methods at large sample sizes but not at the smallest sample sizes. Other algorithms were much less sensitive to sample size, including an algorithm based on maximum entropy (MAXENT) that had among the best predictive power across all sample sizes. Relative to other algorithms, a distance metric algorithm (DOMAIN) and a genetic algorithm (OM-GARP) had intermediate performance at the largest sample size and among the best performance at the lowest sample size. No algorithm predicted consistently well with small sample size (n < 30) and this should encourage highly conservative use of predictions based on small sample size and restrict their use to exploratory modelling.
Resumo:
Little is known about the opinions, beliefs and behavior of Swiss physicians regarding physical activity (PA) promotion in a primary care setting. A qualitative study was performed with semi-structured interviews. We purposively recruited and interviewed 16 physicians in the French speaking part of Switzerland. Their statements and ideas regarding the promotion of PA in a primary care setting were transcribed and synthesized from the tape recorded interviews. Les opinions, les représentations et les comportements des médecins suisses en matière de promotion de l'activité physique au cabinet médical restent largement méconnus en Suisse. Une étude qualitative a été réalisée au moyen d'entretiens semi-structurés. Nous avons intentionnellement recruté et interviewé 16 médecins en Suisse romande. Leurs opinions et attitudes concernant la promotion de l'activité physique au cabinet médical ont été transcrites et synthétisées à partir de l'enregistrement de ces entretiens.
Inversion effect of "old" vs "new" faces, face-like objects, and objects in a healthy student sample
Resumo:
We have used massively parallel signature sequencing (MPSS) to sample the transcriptomes of 32 normal human tissues to an unprecedented depth, thus documenting the patterns of expression of almost 20,000 genes with high sensitivity and specificity. The data confirm the widely held belief that differences in gene expression between cell and tissue types are largely determined by transcripts derived from a limited number of tissue-specific genes, rather than by combinations of more promiscuously expressed genes. Expression of a little more than half of all known human genes seems to account for both the common requirements and the specific functions of the tissues sampled. A classification of tissues based on patterns of gene expression largely reproduces classifications based on anatomical and biochemical properties. The unbiased sampling of the human transcriptome achieved by MPSS supports the idea that most human genes have been mapped, if not functionally characterized. This data set should prove useful for the identification of tissue-specific genes, for the study of global changes induced by pathological conditions, and for the definition of a minimal set of genes necessary for basic cell maintenance. The data are available on the Web at http://mpss.licr.org and http://sgb.lynxgen.com.
Resumo:
The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl accident.
Resumo:
L'objectif de cette étude est d'examiner la structure factorielle et la consistance interne de la TAS-20 sur un échantillon d'adolescents (n = 264), ainsi que de décrire la distribution des caractéristiques alexithymiques dans cet échantillon. La structure à trois facteurs de la TAS-20 a été confirmée par notre analyse factorielle confirmatoire. La consistance interne, mesurée à l'aide d'alpha de Cronbach, est acceptable pour le premier facteur (difficulté à identifier les sentiments (DIF)), bonne pour le second (difficulté à verbaliser les sentiments (DDF)), mais en revanche, faible pour le troisième facteur (pensées orientées vers l'extérieur (EOT)). Les résultats d'une Anova mettent en évidence une tendance linéaire indiquant que plus l'âge augmente plus le niveau d'alexithymie (score total TAS-20), la difficulté à identifier les sentiments et les pensées orientées vers l'extérieur diminuent. En ce qui concerne la prévalence de l'alexithymie, on remarque en effet que 38,5 % des adolescents de moins de 16 ans sont considérés comme alexithymiques, contre 30,1 % des 16-17 ans et 22 % des plus de 17 ans. Notre étude indique donc que la TAS-20 est un instrument adéquat pour évaluer l'alexithymie à l'adolescence, tout en suggérant quelques précautions étant donné l'aspect développemental de cette période.
Resumo:
To assess the associations between alcohol consumption and cytokine levels (interleukin-1beta - IL-1β; interleukin-6 - IL-6 and tumor necrosis factor-α - TNF-α) in a Caucasian population. Population sample of 2884 men and 3201 women aged 35-75. Alcohol consumption was categorized as nondrinkers, low (1-6 drinks/week), moderate (7-13/week) and high (14+/week). No difference in IL-1β levels was found between alcohol consumption categories. Low and moderate alcohol consumption led to lower IL-6 levels: median (interquartile range) 1.47 (0.70-3.51), 1.41 (0.70-3.32), 1.42 (0.66-3.19) and 1.70 (0.83-4.39) pg/ml for nondrinkers, low, moderate and high drinkers, respectively, p<0.01, but this association was no longer significant after multivariate adjustment. Compared to nondrinkers, moderate drinkers had the lowest odds (Odds ratio=0.86 (0.71-1.03)) of being in the highest quartile of IL-6, with a significant (p<0.05) quadratic trend. Low and moderate alcohol consumption led to lower TNF-α levels: 2.92 (1.79-4.63), 2.83 (1.84-4.48), 2.82 (1.76-4.34) and 3.15 (1.91-4.73) pg/ml for nondrinkers, low, moderate and high drinkers, respectively, p<0.02, and this difference remained borderline significant (p=0.06) after multivariate adjustment. Moderate drinkers had a lower odds (0.81 [0.68-0.98]) of being in the highest quartile of TNF-α. No specific alcoholic beverage (wine, beer or spirits) effect was found. Moderate alcohol consumption is associated with lower levels of IL-6 and (to a lesser degree) of TNF-α, irrespective of the type of alcohol consumed. No association was found between IL-1β levels and alcohol consumption.
Resumo:
Defining an efficient training set is one of the most delicate phases for the success of remote sensing image classification routines. The complexity of the problem, the limited temporal and financial resources, as well as the high intraclass variance can make an algorithm fail if it is trained with a suboptimal dataset. Active learning aims at building efficient training sets by iteratively improving the model performance through sampling. A user-defined heuristic ranks the unlabeled pixels according to a function of the uncertainty of their class membership and then the user is asked to provide labels for the most uncertain pixels. This paper reviews and tests the main families of active learning algorithms: committee, large margin, and posterior probability-based. For each of them, the most recent advances in the remote sensing community are discussed and some heuristics are detailed and tested. Several challenging remote sensing scenarios are considered, including very high spatial resolution and hyperspectral image classification. Finally, guidelines for choosing the good architecture are provided for new and/or unexperienced user.