839 resultados para Polynomial Classifier


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Error-correcting codes and matroids have been widely used in the study of ordinary secret sharing schemes. In this paper, the connections between codes, matroids, and a special class of secret sharing schemes, namely, multiplicative linear secret sharing schemes (LSSSs), are studied. Such schemes are known to enable multiparty computation protocols secure against general (nonthreshold) adversaries.Two open problems related to the complexity of multiplicative LSSSs are considered in this paper. The first one deals with strongly multiplicative LSSSs. As opposed to the case of multiplicative LSSSs, it is not known whether there is an efficient method to transform an LSSS into a strongly multiplicative LSSS for the same access structure with a polynomial increase of the complexity. A property of strongly multiplicative LSSSs that could be useful in solving this problem is proved. Namely, using a suitable generalization of the well-known Berlekamp–Welch decoder, it is shown that all strongly multiplicative LSSSs enable efficient reconstruction of a shared secret in the presence of malicious faults. The second one is to characterize the access structures of ideal multiplicative LSSSs. Specifically, the considered open problem is to determine whether all self-dual vector space access structures are in this situation. By the aforementioned connection, this in fact constitutes an open problem about matroid theory, since it can be restated in terms of representability of identically self-dual matroids by self-dual codes. A new concept is introduced, the flat-partition, that provides a useful classification of identically self-dual matroids. Uniform identically self-dual matroids, which are known to be representable by self-dual codes, form one of the classes. It is proved that this property also holds for the family of matroids that, in a natural way, is the next class in the above classification: the identically self-dual bipartite matroids.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The properties and cosmological importance of a class of non-topological solitons, Q-balls, are studied. Aspects of Q-ball solutions and Q-ball cosmology discussed in the literature are reviewed. Q-balls are particularly considered in the Minimal Supersymmetric Standard Model with supersymmetry broken by a hidden sector mechanism mediated by either gravity or gauge interactions. Q-ball profiles, charge-energy relations and evaporation rates for realistic Q-ball profiles are calculated for general polynomial potentials and for the gravity mediated scenario. In all of the cases, the evaporation rates are found to increase with decreasing charge. Q-ball collisions are studied by numerical means in the two supersymmetry breaking scenarios. It is noted that the collision processes can be divided into three types: fusion, charge transfer and elastic scattering. Cross-sections are calculated for the different types of processes in the different scenarios. The formation of Q-balls from the fragmentation of the Aflieck-Dine -condensate is studied by numerical and analytical means. The charge distribution is found to depend strongly on the initial energy-charge ratio of the condensate. The final state is typically noted to consist of Q- and anti-Q-balls in a state of maximum entropy. By studying the relaxation of excited Q-balls the rate at which excess energy can be emitted is calculated in the gravity mediated scenario. The Q-ball is also found to withstand excess energy well without significant charge loss. The possible cosmological consequences of these Q-ball properties are discussed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Objective: To assess the level of hemoglobin-Hb during pregnancy before and after fortification of flours with iron. Method: A cross-sectional study with data from 12,119 pregnant women attended at a public prenatal from five macro regions of Brazil. The sample was divided into two groups: Before-fortification (birth before June/2004) and After-fortification (last menstruation after June/2005). Hb curves were compared with national and international references. Polynomial regression models were built, with a significance level of 5%. Results: Although the higher levels of Hb in all gestational months after-fortification, the polynomial regression did not show the fortification effect (p=0.3). Curves in the two groups were above the references in the first trimester, with following decrease and stabilization at the end of pregnancy. Conclusion: Although the fortification effect was not confirmed, the study presents variation of Hb levels during pregnancy, which is important for assistencial practice and evaluation of public policies.


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Résumé tout public : Le développement du diabète de type II et de l'obésité est causé par l'interaction entre des gènes de susceptibilité et des facteurs environnementaux, en particulier une alimentation riche en calories et une activité physique insuffisante. Afín d'évaluer le rôle de l'alimentation en absence d'hétérogénéité génétique, nous avons nourri une lignée de souris génétiquement pure avec un régime extrêmement gras. Ce régime a conduit à l'établissement de différents phénotypes parmi ces souris, soit : un diabète et une obésité (ObD), un diabète mais pas d'obésité (LD) ou ni un diabète, ni une obésité (LnD). Nous avons fait l'hypothèse que ces adaptations différentes au stress nutritionnel induit par le régime gras étaient dues à l'établissement de programmes génétiques différents dans les principaux organes impliqués dans le maintien de l'équilibre énergétique. Afin d'évaluer cette hypothèse, nous avons développé une puce à ADN contenant approximativement 700 gènes du métabolisme. Cette puce à ADN, en rendant possible la mesure simultanée de l'expression de nombreux gènes, nous a permis d'établir les profils d'expression des gènes caractéristiques de chaque groupe de souris nourries avec le régime gras, dans le foie et le muscle squelettique. Les données que nous avons obtenues à partir de ces profils d'expression ont montré que des changements d'expression marqués se produisaient dans le foie et le muscle entre les différents groupes de souris nourries avec le régime gras. Dans l'ensemble, ces changements suggèrent que l'établissement du diabète de type II et de l'obésité induits par un régime gras est associé à une synthèse accrue de lipides par le foie et à un flux augmenté de lipides du foie jusqu'à la périphérie (muscles squelettiques). Dans un deuxième temps, ces profils d'expression des gènes ont été utilisés pour sélectionner un sous-ensemble de gènes suffisamment discriminants pour pouvoir distinguer entre les différents phénotypes. Ce sous-ensemble de gènes nous a permis de construire un classificateur phénotypique capable de prédire avec une précision relativement élevée le phénotype des souris. Dans le futur, de tels « prédicteurs » basés sur l'expression des gènes pourraient servir d'outils pour le diagnostic de pathologies liées au métabolisme. Summary: Aetiology of obesity and type II diabetes is multifactorial, involving both genetic and environmental factors, such as calory-rich diets or lack of exercice. Genetically homogenous C57BL/6J mice fed a high fat diet (HFD) up to nine months develop differential adaptation, becoming either obese and diabetic (ObD) or remaining lean in the presence (LD) or absence (LnD) of diabetes development. Each phenotype is associated with diverse metabolic alterations, which may result from diverse molecular adaptations of key organs involved in the control of energy homeostasis. In this study, we evaluated if specific patterns of gene expression could be associated with each different phenotype of HFD mice in the liver and the skeletal muscles. To perform this, we constructed a metabolic cDNA microarray containing approximately 700 cDNA representing genes involved in the main metabolic pathways of energy homeostasis. Our data indicate that the development of diet-induced obesity and type II diabetes is linked to some defects in lipid metabolism, involving a preserved hepatic lipogenesis and increased levels of very low density lipoproteins (VLDL). In skeletal muscles, an increase in fatty acids uptake, as suggested by the increased expression of lipoprotein lipase, would contribute to the increased level of insulin resistance observed in the ObD mice. Conversely, both groups of lean mice showed a reduced expression in lipogenic genes, particularly stearoyl-CoA desaturase 1 (Scd-1), a gene linked to sensitivity to diet-induced obesity. Secondly, we identified a subset of genes from expression profiles that classified with relative accuracy the different groups of mice. Such classifiers may be used in the future as diagnostic tools of each metabolic state in each tissue. Résumé Développement d'une puce à ADN métabolique et application à l'étude d'un modèle murin d'obésité et de diabète de type II L'étiologie de l'obésité et du diabète de type II est multifactorielle, impliquant à la fois des facteurs génétiques et environnementaux, tels que des régimes riches en calories ou un manque d'exercice physique. Des souris génétiquement homogènes C57BL/6J nourries avec un régime extrêmement gras (HFD) pendant 9 mois développent une adaptation métabolique différentielle, soit en devenant obèses et diabétiques (ObD), soit en restant minces en présence (LD) ou en absence (LnD) d'un diabète. Chaque phénotype est associé à diverses altérations métaboliques, qui pourraient résulter de diverses adaptations moléculaires des organes impliqués dans le contrôle de l'homéostasie énergétique. Dans cette étude, nous avons évalué si des profils d'expression des gènes dans le foie et le muscle squelettique pouvaient être associés à chacun des phénotypes de souris HFD. Dans ce but, nous avons développé une puce à ADN métabolique contenant approximativement 700 ADNc représentant des gènes impliqués dans les différentes voies métaboliques de l'homéostasie énergétique. Nos données indiquent que le développement de l'obésité et du diabète de type II induit par un régime gras est associé à certains défauts du métabolisme lipidique, impliquant une lipogenèse hépatique préservée et des niveaux de lipoprotéines de très faible densité (VLDL) augmentés. Au niveau du muscle squelettique, une augmentation du captage des acides gras, suggéré par l'expression augmentée de la lipoprotéine lipase, contribuerait à expliquer la résistance à l'insuline plus marquée observée chez les souris ObD. Au contraire, les souris minces ont montré une réduction marquée de l'expression des gènes lipogéniques, en particulier de la stéaroyl-CoA désaturase 1 (scd-1), un gène associé à la sensibilité au développement de l'obésité par un régime gras. Dans un deuxième temps, nous avons identifié un sous-ensemble de gènes à partir des profils d'expression, qui permettent de classifier avec une précision relativement élevée les différents groupes de souris. De tels classificateurs pourraient être utilisés dans le futur comme outils pour le diagnostic de l'état métabolique d'un tissu donné.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Summary [résumé français voir ci-dessous] From the beginning of the 20th century the world population has been confronted with the human immune deficiency virus 1 (HIV-1). This virus has the particularity to mutate fast, and could thus evade and adapt to the human host. Our closest evolutionary related organisms, the non-human primates, are less susceptible to HIV-1. In a broader sense, primates are differentially susceptible to various retrovirus. Species specificity may be due to genetic differences among primates. In the present study we applied evolutionary and comparative genetic techniques to characterize the evolutionary pattern of host cellular determinants of HIV-1 pathogenesis. The study of the evolution of genes coding for proteins participating to the restriction or pathogenesis of HIV-1 may help understanding the genetic basis of modern human susceptibility to infection. To perform comparative genetics analysis, we constituted a collection of primate DNA and RNA to allow generation of de novo sequence of gene orthologs. More recently, release to the public domain of two new primate complete genomes (bornean orang-utan and common marmoset) in addition of the three previously available genomes (human, chimpanzee and Rhesus monkey) help scaling up the evolutionary and comparative genome analysis. Sequence analysis used phylogenetic and statistical methods for detecting molecular adaptation. We identified different selective pressures acting on host proteins involved in HIV-1 pathogenesis. Proteins with HIV-1 restriction properties in non-human primates were under strong positive selection, in particular in regions of interaction with viral proteins. These regions carried key residues for the antiviral activity. Proteins of the innate immunity presented an evolutionary pattern of conservation (purifying selection) but with signals of relaxed constrain if we compared them to the average profile of purifying selection of the primate genomes. Large scale analysis resulted in patterns of evolutionary pressures according to molecular function, biological process and cellular distribution. The data generated by various analyses served to guide the ancestral reconstruction of TRIM5a a potent antiviral host factor. The resurrected TRIM5a from the common ancestor of Old world monkeys was effective against HIV-1 and the recent resurrected hominoid variants were more effective against other retrovirus. Thus, as the result of trade-offs in the ability to restrict different retrovirus, human might have been exposed to HIV-1 at a time when TRIM5a lacked the appropriate specific restriction activity. The application of evolutionary and comparative genetic tools should be considered for the systematical assessment of host proteins relevant in viral pathogenesis, and to guide biological and functional studies. Résumé La population mondiale est confrontée depuis le début du vingtième siècle au virus de l'immunodéficience humaine 1 (VIH-1). Ce virus a un taux de mutation particulièrement élevé, il peut donc s'évader et s'adapter très efficacement à son hôte. Les organismes évolutivement le plus proches de l'homme les primates nonhumains sont moins susceptibles au VIH-1. De façon générale, les primates répondent différemment aux rétrovirus. Cette spécificité entre espèces doit résider dans les différences génétiques entre primates. Dans cette étude nous avons appliqué des techniques d'évolution et de génétique comparative pour caractériser le modèle évolutif des déterminants cellulaires impliqués dans la pathogenèse du VIH- 1. L'étude de l'évolution des gènes, codant pour des protéines impliquées dans la restriction ou la pathogenèse du VIH-1, aidera à la compréhension des bases génétiques ayant récemment rendu l'homme susceptible. Pour les analyses de génétique comparative, nous avons constitué une collection d'ADN et d'ARN de primates dans le but d'obtenir des nouvelles séquences de gènes orthologues. Récemment deux nouveaux génomes complets ont été publiés (l'orang-outan du Bornéo et Marmoset commun) en plus des trois génomes déjà disponibles (humain, chimpanzé, macaque rhésus). Ceci a permis d'améliorer considérablement l'étendue de l'analyse. Pour détecter l'adaptation moléculaire nous avons analysé les séquences à l'aide de méthodes phylogénétiques et statistiques. Nous avons identifié différentes pressions de sélection agissant sur les protéines impliquées dans la pathogenèse du VIH-1. Des protéines avec des propriétés de restriction du VIH-1 dans les primates non-humains présentent un taux particulièrement haut de remplacement d'acides aminés (sélection positive). En particulier dans les régions d'interaction avec les protéines virales. Ces régions incluent des acides aminés clé pour l'activité de restriction. Les protéines appartenant à l'immunité inné présentent un modèle d'évolution de conservation (sélection purifiante) mais avec des traces de "relaxation" comparé au profil général de sélection purifiante du génome des primates. Une analyse à grande échelle a permis de classifier les modèles de pression évolutive selon leur fonction moléculaire, processus biologique et distribution cellulaire. Les données générées par les différentes analyses ont permis la reconstruction ancestrale de TRIM5a, un puissant facteur antiretroviral. Le TRIM5a ressuscité, correspondant à l'ancêtre commun entre les grands singes et les groupe des catarrhiniens, est efficace contre le VIH-1 moderne. Les TRIM5a ressuscités plus récents, correspondant aux ancêtres des grands singes, sont plus efficaces contre d'autres rétrovirus. Ainsi, trouver un compromis dans la capacité de restreindre différents rétrovirus, l'homme aurait été exposé au VIH-1 à une période où TRIM5a manquait d'activité de restriction spécifique contre celui-ci. L'application de techniques d'évolution et de génétique comparative devraient être considérées pour l'évaluation systématique de protéines impliquées dans la pathogenèse virale, ainsi que pour guider des études biologiques et fonctionnelles

Relevância:

10.00% 10.00%

Publicador:

Resumo:

INTRODUCTION La loi peut-elle conduire à l'injustice ? Depuis l'antiquité, les rapports entre la loi et la justice constituent un thème fondamental de la pensée occidentale. En témoigne notamment la récurrence, à travers les siècles, du débat entre jusnaturalistes -qui postulent l'existence et la prééminence d'un droit naturel, toujours conforme à l'équité - et positivistes - pour lesquels la question du caractère juste ou injuste d'une loi valablement édictée n'est d'aucune pertinence. Cette préoccupation n'atteint toutefois pas aux seules réflexions spéculatives des théoriciens du droit. La plupart des législateurs ont cherché à fonder leurs normes sur l'équité, et à les conformer ainsi à l'idée de justice qui prévalait lors de leur adoption. La question peut se poser toutefois de l'existence possible d'une «loi juste dans l'abstrait, antérieurement à toute concrétisation. N'est-ce pas au contraire le sort de toute norme que d'échapper à son auteur et de voir son sens précisé, complété, voire modifié au travers des multiples cas qu'elle est appelée à régler? Aristote avait déjà mis en évidence l'imperfection inhérente à toute norme générale. Aucun législateur ne peut en effet envisager la multiplicité des hypothèses dans lesquelles les particuliers invoqueront la norme par lui édictée. Il arrive dès lors que le juge soit confronté à une prétention qui paraît régulièrement fondée en droit, mais dont la mise en oeuvre en l'espèce aboutit à une injustice. Pour cette raison, la plupart des systèmes juridiques ont développé des moyens destinés à éviter que l'application du droit conduise à un résultat injuste que ri avait pas envisagé l'auteur de la norme. Cette étude se propose tout d'abord d'examiner la solution du droit romain. Celui-ci a en effet développé, à travers l'exceptio doli, une institution qui permet au magistrat de paralyser les effets d'une prétention, pourtant parfaitement fondée en droit civil. Après un examen du contexte de son apparition (titre I), il s'agira d'en étudier - en tentant de les classifier - les nombreux cas d'application (titre II), avant de proposer une définition générale de l'institution (titre III). Seront ensuite décrites les principales étapes de l'évolution de l'exceptio doli, à partir de la fin de l'Empire romain d'Occident, jusqu'au XXe siècle de notre ère (titre IV). Codifié au début du XXe siècle, le droit privé suisse a certes concrétisé, par nombre de dispositions particulières, les divers cas d'application de l'exceptio doli ; il connaît cependant une institution - l'interdiction de l'abus de droit (art. 2 al. 2 CC) - dont la fonction paraît très analogue à celle de l'antique exception de dol (titre V). Il conviendra d'examiner les hypothèses d'abus de droit qui correspondent aux cas d'application de l'exceptio doli (titre VI) et celles qui doivent leur origine à d'autres institutions (titre VII). On aura ainsi mis en évidence l'étendue du lien de filiation entre l'antique exceptio doli et l'interdiction de l'abus de droit en droit suisse, deux institutions vouées à la concrétisation de l'idée de justice lors de la mise en oeuvre des normes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: Different studies have shown circadian variation of ischemic burden among patients with ST-Elevation Myocardial Infarction (STEMI), but with controversial results. The aim of this study was to analyze circadian variation of myocardial infarction size and in-hospital mortality in a large multicenter registry. METHODS: This retrospective, registry-based study was based on data from AMIS Plus, a large multicenter Swiss registry of patients who suffered myocardial infarction between 1999 and 2013. Peak creatine kinase (CK) was used as a proxy measure for myocardial infarction size. Associations between peak CK, in-hospital mortality, and the time of day at symptom onset were modelled using polynomial-harmonic regression methods. RESULTS: 6,223 STEMI patients were admitted to 82 acute-care hospitals in Switzerland and treated with primary angioplasty within six hours of symptom onset. Only the 24-hour harmonic was significantly associated with peak CK (p = 0.0001). The maximum average peak CK value (2,315 U/L) was for patients with symptom onset at 23:00, whereas the minimum average (2,017 U/L) was for onset at 11:00. The amplitude of variation was 298 U/L. In addition, no correlation was observed between ischemic time and circadian peak CK variation. Of the 6,223 patients, 223 (3.58%) died during index hospitalization. Remarkably, only the 24-hour harmonic was significantly associated with in-hospital mortality. The risk of death from STEMI was highest for patients with symptom onset at 00:00 and lowest for those with onset at 12:00. DISCUSSION: As a part of this first large study of STEMI patients treated with primary angioplasty in Swiss hospitals, investigations confirmed a circadian pattern to both peak CK and in-hospital mortality which were independent of total ischemic time. Accordingly, this study proposes that symptom onset time be incorporated as a prognosis factor in patients with myocardial infarction.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

One aspect of person-job fit reflects congruence between personal preferences and job design; as congruence increases so should satisfaction. We hypothesized that power distance would moderate whether fit is related to satisfaction with degree of job formalization. We obtained measures of job-formalization, fit and satisfaction, as well as organizational commitment from employees (n = 772) in a multinational firm with subsidiaries in six countries. Confirming previous findings, individuals from low power-distance cultures were most satisfied with increasing fit. However, the extent to which individuals from high power-distance cultures were satisfied did not necessarily depend on increasing fit, but mostly on whether the degree of formalization received was congruent to cultural norms. Irrespective of culture, satisfaction with formalization predicted a broad measure of organizational commitment. Apart from our novel extension of fit theory, we show how moderation can be tested in the context of polynomial response surface regression and how specific hypotheses can be tested regarding different points on the response surface.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper analyses the robustness of Least-Squares Monte Carlo, a techniquerecently proposed by Longstaff and Schwartz (2001) for pricing Americanoptions. This method is based on least-squares regressions in which theexplanatory variables are certain polynomial functions. We analyze theimpact of different basis functions on option prices. Numerical resultsfor American put options provide evidence that a) this approach is veryrobust to the choice of different alternative polynomials and b) few basisfunctions are required. However, these conclusions are not reached whenanalyzing more complex derivatives.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The objective of this paper is to compare the performance of twopredictive radiological models, logistic regression (LR) and neural network (NN), with five different resampling methods. One hundred and sixty-seven patients with proven calvarial lesions as the only known disease were enrolled. Clinical and CT data were used for LR and NN models. Both models were developed with cross validation, leave-one-out and three different bootstrap algorithms. The final results of each model were compared with error rate and the area under receiver operating characteristic curves (Az). The neural network obtained statistically higher Az than LR with cross validation. The remaining resampling validation methods did not reveal statistically significant differences between LR and NN rules. The neural network classifier performs better than the one based on logistic regression. This advantage is well detected by three-fold cross-validation, but remains unnoticed when leave-one-out or bootstrap algorithms are used.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In pediatric echocardiography, cardiac dimensions are often normalized for weight, height, or body surface area (BSA). The combined influence of height and weight on cardiac size is complex and likely varies with age. We hypothesized that increasing weight for height, as represented by body mass index (BMI) adjusted for age, is poorly accounted for in Z scores normalized for weight, height, or BSA. We aimed to evaluate whether a bias related to BMI was introduced when proximal aorta diameter Z scores are derived from bivariate models (only one normalizing variable), and whether such a bias was reduced when multivariable models are used. We analyzed 1,422 echocardiograms read as normal in children ≤18 years. We computed Z scores of the proximal aorta using allometric, polynomial, and multivariable models with four body size variables. We then assessed the level of residual association of Z scores and BMI adjusted for age and sex. In children ≥6 years, we found a significant residual linear association with BMI-for-age and Z scores for most regression models. Only a multivariable model including weight and height as independent predictors produced a Z score free of linear association with BMI. We concluded that a bias related to BMI was present in Z scores of proximal aorta diameter when normalization was done using bivariate models, regardless of the regression model or the normalizing variable. The use of multivariable models with weight and height as independent predictors should be explored to reduce this potential pitfall when pediatric echocardiography reference values are evaluated.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many classifiers achieve high levels of accuracy but have limited applicability in real world situations because they do not lead to a greater understanding or insight into the^way features influence the classification. In areas such as health informatics a classifier that clearly identifies the influences on classification can be used to direct research and formulate interventions. This research investigates the practical applications of Automated Weighted Sum, (AWSum), a classifier that provides accuracy comparable to other techniques whilst providing insight into the data. This is achieved by calculating a weight for each feature value that represents its influence on the class value. The merits of this approach in classification and insight are evaluated on a Cystic Fibrosis and Diabetes datasets with positive results.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we propose two active learning algorithms for semiautomatic definition of training samples in remote sensing image classification. Based on predefined heuristics, the classifier ranks the unlabeled pixels and automatically chooses those that are considered the most valuable for its improvement. Once the pixels have been selected, the analyst labels them manually and the process is iterated. Starting with a small and nonoptimal training set, the model itself builds the optimal set of samples which minimizes the classification error. We have applied the proposed algorithms to a variety of remote sensing data, including very high resolution and hyperspectral images, using support vector machines. Experimental results confirm the consistency of the methods. The required number of training samples can be reduced to 10% using the methods proposed, reaching the same level of accuracy as larger data sets. A comparison with a state-of-the-art active learning method, margin sampling, is provided, highlighting advantages of the methods proposed. The effect of spatial resolution and separability of the classes on the quality of the selection of pixels is also discussed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Several features that can be extracted from digital images of the sky and that can be useful for cloud-type classification of such images are presented. Some features are statistical measurements of image texture, some are based on the Fourier transform of the image and, finally, others are computed from the image where cloudy pixels are distinguished from clear-sky pixels. The use of the most suitable features in an automatic classification algorithm is also shown and discussed. Both the features and the classifier are developed over images taken by two different camera devices, namely, a total sky imager (TSI) and a whole sky imager (WSC), which are placed in two different areas of the world (Toowoomba, Australia; and Girona, Spain, respectively). The performance of the classifier is assessed by comparing its image classification with an a priori classification carried out by visual inspection of more than 200 images from each camera. The index of agreement is 76% when five different sky conditions are considered: clear, low cumuliform clouds, stratiform clouds (overcast), cirriform clouds, and mottled clouds (altocumulus, cirrocumulus). Discussion on the future directions of this research is also presented, regarding both the use of other features and the use of other classification techniques

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.