60 resultados para Modeling Non-Verbal Behaviors Using Machine Learning
Resumo:
Introduction: Discrimination of species-specific vocalizations is fundamental for survival and social interactions. Its unique behavioral relevance has encouraged the identification of circumscribed brain regions exhibiting selective responses (Belin et al., 2004), while the role of network dynamics has received less attention. Those studies that have examined the brain dynamics of vocalization discrimination leave unresolved the timing and the inter-relationship between general categorization, attention, and speech-related processes (Levy et al., 2001, 2003; Charest et al., 2009). Given these discrepancies and the presence of several confounding factors, electrical neuroimaging analyses were applied to auditory evoked-potential (AEPs) to acoustically and psychophysically controlled non-verbal human and animal vocalizations. This revealed which region(s) exhibit voice-sensitive responses and in which sequence. Methods: Subjects (N=10) performed a living vs. man-made 'oddball' auditory discrimination task, such that on a given block of trials 'target' stimuli occurred 10% of the time. Stimuli were complex, meaningful sounds of 500ms duration. There were 120 different sound files in total, 60 of which represented sounds of living objects and 60 man-made objects. The stimuli that were the focus of the present investigation were restricted to those of living objects within blocks where no response was required. These stimuli were further sorted between human non-verbal vocalizations and animal vocalizations. They were also controlled in terms of their spectrograms and formant distributions. Continuous 64-channel EEG was acquired through Neuroscan Synamps referenced to the nose, band-pass filtered 0.05-200Hz, and digitized at 1000Hz. Peri-stimulus epochs of continuous EEG (-100ms to 900ms) were visually inspected for artifacts, 40Hz low-passed filtered and baseline corrected using the pre-stimulus period . Averages were computed from each subject separately. AEPs in response to animal and human vocalizations were analyzed with respect to differences of Global Field Power (GFP) and with respect to changes of the voltage configurations at the scalp (reviewed in Murray et al., 2008). The former provides a measure of the strength of the electric field irrespective of topographic differences; the latter identifies changes in spatial configurations of the underlying sources independently of the response strength. In addition, we utilized the local auto-regressive average distributed linear inverse solution (LAURA; Grave de Peralta Menendez et al., 2001) to visualize and statistically contrast the likely underlying sources of effects identified in the preceding analysis steps. Results: We found differential activity in response to human vocalizations over three periods in the post-stimulus interval, and this response was always stronger than that to animal vocalizations. The first differential response (169-219ms) was a consequence of a modulation in strength of a common brain network localized into the right superior temporal sulcus (STS; Brodmann's Area (BA) 22) and extending into the superior temporal gyrus (STG; BA 41). A second difference (291-357ms) also followed from strength modulations of a common network with statistical differences localized to the left inferior precentral and prefrontal gyrus (BA 6/45). These two first strength modulations correlated (Spearman's rho(8)=0.770; p=0.009) indicative of functional coupling between temporally segregated stages of vocalization discrimination. A third difference (389-667ms) followed from strength and topographic modulations and was localized to the left superior frontal gyrus (BA10) although this third difference did not reach our spatial criterion of 12 continuous voxels. Conclusions: We show that voice discrimination unfolds over multiple temporal stages, involving a wide network of brain regions. The initial stages of vocalization discrimination are based on modulations in response strength within a common brain network with no evidence for a voice-selective module. The latency of this effect parallels that of face discrimination (Bentin et al., 2007), supporting the possibility that voice and face processes can mutually inform one another. Putative underlying sources (localized in the right STS; BA 22) are consistent with prior hemodynamic imaging evidence in humans (Belin et al., 2004). Our effect over the 291-357ms post-stimulus period overlaps the 'voice-specific-response' reported by Levy et al. (Levy et al., 2001) and the estimated underlying sources (left BA6/45) were in agreement with previous findings in humans (Fecteau et al., 2005). These results challenge the idea that circumscribed and selective areas subserve con-specific vocalization processing.
Resumo:
We present a novel filtering method for multispectral satellite image classification. The proposed method learns a set of spatial filters that maximize class separability of binary support vector machine (SVM) through a gradient descent approach. Regularization issues are discussed in detail and a Frobenius-norm regularization is proposed to efficiently exclude uninformative filters coefficients. Experiments carried out on multiclass one-against-all classification and target detection show the capabilities of the learned spatial filters.
Resumo:
When a new treatment is compared to an established one in a randomized clinical trial, it is standard practice to statistically test for non-inferiority rather than for superiority. When the endpoint is binary, one usually compares two treatments using either an odds-ratio or a difference of proportions. In this paper, we propose a mixed approach which uses both concepts. One first defines the non-inferiority margin using an odds-ratio and one ultimately proves non-inferiority statistically using a difference of proportions. The mixed approach is shown to be more powerful than the conventional odds-ratio approach when the efficacy of the established treatment is known (with good precision) and high (e.g. with more than 56% of success). The gain of power achieved may lead in turn to a substantial reduction in the sample size needed to prove non-inferiority. The mixed approach can be generalized to ordinal endpoints.
Evolutionary history and its relevance in understanding and conserving southern African biodiversity
Resumo:
Abstract : Understanding how biodiversity is distributed is central to any conservation effort and has traditionally been based on niche modeling and the causal relationship between spatial distribution of organisms and their environment. More recently, the study of species' evolutionary history and relatedness has permeated the fields of ecology and conservation and, coupled with spatial predictions, provides useful insights to the origin of current biodiversity patterns, community structuring and potential vulnerability to extinction. This thesis explores several key ecological questions by combining the fields of niche modeling and phylogenetics and using important components of southern African biodiversity. The aims of this thesis are to provide comparisons of biodiversity measures, to assess how climate change will affect evolutionary history loss, to ask whether there is a clear link between evolutionary history and morphology and to investigate the potential role of relatedness in macro-climatic niche structuring. The first part of my thesis provides a fine scale comparison and spatial overlap quantification of species richness and phylogenetic diversity predictions for one of the most diverse plant families in the Cape Floristic Region (CFR), the Proteaceae. In several of the measures used, patterns do not match sufficiently to argue that species relatedness information is implicit in species richness patterns. The second part of my thesis predicts how climate change may affect threat and potential extinction of southern African animal and plant taxa. I compare present and future niche models to assess whether predicted species extinction will result in higher or lower V phylogenetic diversity survival than what would be experienced under random extinction processes. l find that predicted extinction will result in lower phylogenetic diversity survival but that this non-random pattern will be detected only after a substantial proportion of the taxa in each group has been lost. The third part of my thesis explores the relationship between phylogenetic and morphological distance in southern African bats to assess whether long evolutionary histories correspond to equally high levels of morphological variation, as predicted by a neutral model of character evolution. I find no such evidence; on the contrary weak negative trends are detected for this group, as well as in simulations of both neutral and convergent character evolution. Finally, I ask whether spatial and climatic niche occupancy in southern African bats is influenced by evolutionary history or not. I relate divergence time between species pairs to climatic niche and range overlap and find no evidence for clear phylogenetic structuring. I argue that this may be due to particularly high levels of micro-niche partitioning. Résumé : Comprendre la distribution de la biodiversité représente un enjeu majeur pour la conservation de la nature. Les analyses se basent le plus souvent sur la modélisation de la niche écologique à travers l'étude des relations causales entre la distribution spatiale des organismes et leur environnement. Depuis peu, l'étude de l'histoire évolutive des organismes est également utilisée dans les domaines de l'écologie et de la conservation. En combinaison avec la modélisation de la distribution spatiale des organismes, cette nouvelle approche fournit des informations pertinentes pour mieux comprendre l'origine des patterns de biodiversité actuels, de la structuration des communautés et des risques potentiels d'extinction. Cette thèse explore plusieurs grandes questions écologiques, en combinant les domaines de la modélisation de la niche et de la phylogénétique. Elle s'applique aux composants importants de la biodiversité de l'Afrique australe. Les objectifs de cette thèse ont été l) de comparer différentes mesures de la biodiversité, 2) d'évaluer l'impact des changements climatiques à venir sur la perte de diversité phylogénétique, 3) d'analyser le lien potentiel entre diversité phylogénétique et diversité morphologique et 4) d'étudier le rôle potentiel de la phylogénie sur la structuration des niches macro-climatiques des espèces. La première partie de cette thèse fournit une comparaison spatiale, et une quantification du chevauchement, entre des prévisions de richesse spécifique et des prédictions de la diversité phylogénétique pour l'une des familles de plantes les plus riches en espèces de la région floristique du Cap (CFR), les Proteaceae. Il résulte des analyses que plusieurs mesures de diversité phylogénétique montraient des distributions spatiales différentes de la richesse spécifique, habituellement utilisée pour édicter des mesures de conservation. La deuxième partie évalue les effets potentiels des changements climatiques attendus sur les taux d'extinction d'animaux et de plantes de l'Afrique australe. Pour cela, des modèles de distribution d'espèces actuels et futurs ont permis de déterminer si l'extinction des espèces se traduira par une plus grande ou une plus petite perte de diversité phylogénétique en comparaison à un processus d'extinction aléatoire. Les résultats ont effectivement montré que l'extinction des espèces liées aux changements climatiques pourrait entraîner une perte plus grande de diversité phylogénétique. Cependant, cette perte ne serait plus grande que celle liée à un processus d'extinction aléatoire qu'à partir d'une forte perte de taxons dans chaque groupe. La troisième partie de cette thèse explore la relation entre distances phylogénétiques et morphologiques d'espèces de chauves-souris de l'Afrique australe. ll s'agit plus précisément de déterminer si une longue histoire évolutive correspond également à des variations morphologiques plus grandes dans ce groupe. Cette relation est en fait prédite par un modèle neutre d'évolution de caractères. Aucune évidence de cette relation n'a émergé des analyses. Au contraire, des tendances négatives ont été détectées, ce qui représenterait la conséquence d'une évolution convergente entre clades et des niveaux élevés de cloisonnement pour chaque clade. Enfin, la dernière partie présente une étude sur la répartition de la niche climatique des chauves-souris de l'Afrique australe. Dans cette étude je rapporte temps de divergence évolutive (ou deux espèces ont divergé depuis un ancêtre commun) au niveau de chevauchement de leurs niches climatiques. Les résultats n'ont pas pu mettre en évidence de lien entre ces deux paramètres. Les résultats soutiennent plutôt l'idée que cela pourrait être I dû à des niveaux particulièrement élevés de répartition de la niche à échelle fine.
Resumo:
BACKGROUND: Conserved non-coding sequences in the human genome are approximately tenfold more abundant than known genes, and have been hypothesized to mark the locations of cis-regulatory elements. However, the global contribution of conserved non-coding sequences to the transcriptional regulation of human genes is currently unknown. Deeply conserved elements shared between humans and teleost fish predominantly flank genes active during morphogenesis and are enriched for positive transcriptional regulatory elements. However, such deeply conserved elements account for <1% of the conserved non-coding sequences in the human genome, which are predominantly mammalian. RESULTS: We explored the regulatory potential of a large sample of these 'common' conserved non-coding sequences using a variety of classic assays, including chromatin remodeling, and enhancer/repressor and promoter activity. When tested across diverse human model cell types, we find that the fraction of experimentally active conserved non-coding sequences within any given cell type is low (approximately 5%), and that this proportion increases only modestly when considered collectively across cell types. CONCLUSIONS: The results suggest that classic assays of cis-regulatory potential are unlikely to expose the functional potential of the substantial majority of mammalian conserved non-coding sequences in the human genome.
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. The paper considers a data driven approach in modelling uncertainty in spatial predictions. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic features and describe stochastic variability and non-uniqueness of spatial properties. It is able to capture and preserve key spatial dependencies such as connectivity, which is often difficult to achieve with two-point geostatistical models. Semi-supervised SVR is designed to integrate various kinds of conditioning data and learn dependences from them. A stochastic semi-supervised SVR model is integrated into a Bayesian framework to quantify uncertainty with multiple models fitted to dynamic observations. The developed approach is illustrated with a reservoir case study. The resulting probabilistic production forecasts are described by uncertainty envelopes.
Resumo:
1. Identifying the boundary of a species' niche from observational and environmental data is a common problem in ecology and conservation biology and a variety of techniques have been developed or applied to model niches and predict distributions. Here, we examine the performance of some pattern-recognition methods as ecological niche models (ENMs). Particularly, one-class pattern recognition is a flexible and seldom used methodology for modelling ecological niches and distributions from presence-only data. The development of one-class methods that perform comparably to two-class methods (for presence/absence data) would remove modelling decisions about sampling pseudo-absences or background data points when absence points are unavailable. 2. We studied nine methods for one-class classification and seven methods for two-class classification (five common to both), all primarily used in pattern recognition and therefore not common in species distribution and ecological niche modelling, across a set of 106 mountain plant species for which presence-absence data was available. We assessed accuracy using standard metrics and compared trade-offs in omission and commission errors between classification groups as well as effects of prevalence and spatial autocorrelation on accuracy. 3. One-class models fit to presence-only data were comparable to two-class models fit to presence-absence data when performance was evaluated with a measure weighting omission and commission errors equally. One-class models were superior for reducing omission errors (i.e. yielding higher sensitivity), and two-classes models were superior for reducing commission errors (i.e. yielding higher specificity). For these methods, spatial autocorrelation was only influential when prevalence was low. 4. These results differ from previous efforts to evaluate alternative modelling approaches to build ENM and are particularly noteworthy because data are from exhaustively sampled populations minimizing false absence records. Accurate, transferable models of species' ecological niches and distributions are needed to advance ecological research and are crucial for effective environmental planning and conservation; the pattern-recognition approaches studied here show good potential for future modelling studies. This study also provides an introduction to promising methods for ecological modelling inherited from the pattern-recognition discipline.
Resumo:
Résumé Suite aux recentes avancées technologiques, les archives d'images digitales ont connu une croissance qualitative et quantitative sans précédent. Malgré les énormes possibilités qu'elles offrent, ces avancées posent de nouvelles questions quant au traitement des masses de données saisies. Cette question est à la base de cette Thèse: les problèmes de traitement d'information digitale à très haute résolution spatiale et/ou spectrale y sont considérés en recourant à des approches d'apprentissage statistique, les méthodes à noyau. Cette Thèse étudie des problèmes de classification d'images, c'est à dire de catégorisation de pixels en un nombre réduit de classes refletant les propriétés spectrales et contextuelles des objets qu'elles représentent. L'accent est mis sur l'efficience des algorithmes, ainsi que sur leur simplicité, de manière à augmenter leur potentiel d'implementation pour les utilisateurs. De plus, le défi de cette Thèse est de rester proche des problèmes concrets des utilisateurs d'images satellite sans pour autant perdre de vue l'intéret des méthodes proposées pour le milieu du machine learning dont elles sont issues. En ce sens, ce travail joue la carte de la transdisciplinarité en maintenant un lien fort entre les deux sciences dans tous les développements proposés. Quatre modèles sont proposés: le premier répond au problème de la haute dimensionalité et de la redondance des données par un modèle optimisant les performances en classification en s'adaptant aux particularités de l'image. Ceci est rendu possible par un système de ranking des variables (les bandes) qui est optimisé en même temps que le modèle de base: ce faisant, seules les variables importantes pour résoudre le problème sont utilisées par le classifieur. Le manque d'information étiquétée et l'incertitude quant à sa pertinence pour le problème sont à la source des deux modèles suivants, basés respectivement sur l'apprentissage actif et les méthodes semi-supervisées: le premier permet d'améliorer la qualité d'un ensemble d'entraînement par interaction directe entre l'utilisateur et la machine, alors que le deuxième utilise les pixels non étiquetés pour améliorer la description des données disponibles et la robustesse du modèle. Enfin, le dernier modèle proposé considère la question plus théorique de la structure entre les outputs: l'intègration de cette source d'information, jusqu'à présent jamais considérée en télédétection, ouvre des nouveaux défis de recherche. Advanced kernel methods for remote sensing image classification Devis Tuia Institut de Géomatique et d'Analyse du Risque September 2009 Abstract The technical developments in recent years have brought the quantity and quality of digital information to an unprecedented level, as enormous archives of satellite images are available to the users. However, even if these advances open more and more possibilities in the use of digital imagery, they also rise several problems of storage and treatment. The latter is considered in this Thesis: the processing of very high spatial and spectral resolution images is treated with approaches based on data-driven algorithms relying on kernel methods. In particular, the problem of image classification, i.e. the categorization of the image's pixels into a reduced number of classes reflecting spectral and contextual properties, is studied through the different models presented. The accent is put on algorithmic efficiency and the simplicity of the approaches proposed, to avoid too complex models that would not be used by users. The major challenge of the Thesis is to remain close to concrete remote sensing problems, without losing the methodological interest from the machine learning viewpoint: in this sense, this work aims at building a bridge between the machine learning and remote sensing communities and all the models proposed have been developed keeping in mind the need for such a synergy. Four models are proposed: first, an adaptive model learning the relevant image features has been proposed to solve the problem of high dimensionality and collinearity of the image features. This model provides automatically an accurate classifier and a ranking of the relevance of the single features. The scarcity and unreliability of labeled. information were the common root of the second and third models proposed: when confronted to such problems, the user can either construct the labeled set iteratively by direct interaction with the machine or use the unlabeled data to increase robustness and quality of the description of data. Both solutions have been explored resulting into two methodological contributions, based respectively on active learning and semisupervised learning. Finally, the more theoretical issue of structured outputs has been considered in the last model, which, by integrating outputs similarity into a model, opens new challenges and opportunities for remote sensing image processing.
Resumo:
We show how nonlinear embedding algorithms popular for use with shallow semi-supervised learning techniques such as kernel methods can be applied to deep multilayer architectures, either as a regularizer at the output layer, or on each layer of the architecture. This provides a simple alternative to existing approaches to deep learning whilst yielding competitive error rates compared to those methods, and existing shallow semi-supervised techniques.
Resumo:
We present a new framework for large-scale data clustering. The main idea is to modify functional dimensionality reduction techniques to directly optimize over discrete labels using stochastic gradient descent. Compared to methods like spectral clustering our approach solves a single optimization problem, rather than an ad-hoc two-stage optimization approach, does not require a matrix inversion, can easily encode prior knowledge in the set of implementable functions, and does not have an ?out-of-sample? problem. Experimental results on both artificial and real-world datasets show the usefulness of our approach.
Resumo:
This paper investigates the use of ensemble of predictors in order to improve the performance of spatial prediction methods. Support vector regression (SVR), a popular method from the field of statistical machine learning, is used. Several instances of SVR are combined using different data sampling schemes (bagging and boosting). Bagging shows good performance, and proves to be more computationally efficient than training a single SVR model while reducing error. Boosting, however, does not improve results on this specific problem.
Resumo:
Individual learning (e.g., trial-and-error) and social learning (e.g., imitation) are alternative ways of acquiring and expressing the appropriate phenotype in an environment. The optimal choice between using individual learning and/or social learning may be dictated by the life-stage or age of an organism. Of special interest is a learning schedule in which social learning precedes individual learning, because such a schedule is apparently a necessary condition for cumulative culture. Assuming two obligatory learning stages per discrete generation, we obtain the evolutionarily stable learning schedules for the three situations where the environment is constant, fluctuates between generations, or fluctuates within generations. During each learning stage, we assume that an organism may target the optimal phenotype in the current environment by individual learning, and/or the mature phenotype of the previous generation by oblique social learning. In the absence of exogenous costs to learning, the evolutionarily stable learning schedules are predicted to be either pure social learning followed by pure individual learning ("bang-bang" control) or pure individual learning at both stages ("flat" control). Moreover, we find for each situation that the evolutionarily stable learning schedule is also the one that optimizes the learned phenotype at equilibrium.
Resumo:
1. The ecological niche is a fundamental biological concept. Modelling species' niches is central to numerous ecological applications, including predicting species invasions, identifying reservoirs for disease, nature reserve design and forecasting the effects of anthropogenic and natural climate change on species' ranges. 2. A computational analogue of Hutchinson's ecological niche concept (the multidimensional hyperspace of species' environmental requirements) is the support of the distribution of environments in which the species persist. Recently developed machine-learning algorithms can estimate the support of such high-dimensional distributions. We show how support vector machines can be used to map ecological niches using only observations of species presence to train distribution models for 106 species of woody plants and trees in a montane environment using up to nine environmental covariates. 3. We compared the accuracy of three methods that differ in their approaches to reducing model complexity. We tested models with independent observations of both species presence and species absence. We found that the simplest procedure, which uses all available variables and no pre-processing to reduce correlation, was best overall. Ecological niche models based on support vector machines are theoretically superior to models that rely on simulating pseudo-absence data and are comparable in empirical tests. 4. Synthesis and applications. Accurate species distribution models are crucial for effective environmental planning, management and conservation, and for unravelling the role of the environment in human health and welfare. Models based on distribution estimation rather than classification overcome theoretical and practical obstacles that pervade species distribution modelling. In particular, ecological niche models based on machine-learning algorithms for estimating the support of a statistical distribution provide a promising new approach to identifying species' potential distributions and to project changes in these distributions as a result of climate change, land use and landscape alteration.
Resumo:
The quality of environmental data analysis and propagation of errors are heavily affected by the representativity of the initial sampling design [CRE 93, DEU 97, KAN 04a, LEN 06, MUL07]. Geostatistical methods such as kriging are related to field samples, whose spatial distribution is crucial for the correct detection of the phenomena. Literature about the design of environmental monitoring networks (MN) is widespread and several interesting books have recently been published [GRU 06, LEN 06, MUL 07] in order to clarify the basic principles of spatial sampling design (monitoring networks optimization) based on Support Vector Machines was proposed. Nonetheless, modelers often receive real data coming from environmental monitoring networks that suffer from problems of non-homogenity (clustering). Clustering can be related to the preferential sampling or to the impossibility of reaching certain regions.
Resumo:
Transmission of drug-resistant pathogens presents an almost-universal challenge for fighting infectious diseases. Transmitted drug resistance mutations (TDRM) can persist in the absence of drugs for considerable time. It is generally believed that differential TDRM-persistence is caused, at least partially, by variations in TDRM-fitness-costs. However, in vivo epidemiological evidence for the impact of fitness costs on TDRM-persistence is rare. Here, we studied the persistence of TDRM in HIV-1 using longitudinally-sampled nucleotide sequences from the Swiss-HIV-Cohort-Study (SHCS). All treatment-naïve individuals with TDRM at baseline were included. Persistence of TDRM was quantified via reversion rates (RR) determined with interval-censored survival models. Fitness costs of TDRM were estimated in the genetic background in which they occurred using a previously published and validated machine-learning algorithm (based on in vitro replicative capacities) and were included in the survival models as explanatory variables. In 857 sequential samples from 168 treatment-naïve patients, 17 TDRM were analyzed. RR varied substantially and ranged from 174.0/100-person-years;CI=[51.4, 588.8] (for 184V) to 2.7/100-person-years;[0.7, 10.9] (for 215D). RR increased significantly with fitness cost (increase by 1.6[1.3,2.0] per standard deviation of fitness costs). When subdividing fitness costs into the average fitness cost of a given mutation and the deviation from the average fitness cost of a mutation in a given genetic background, we found that both components were significantly associated with reversion-rates. Our results show that the substantial variations of TDRM persistence in the absence of drugs are associated with fitness-cost differences both among mutations and among different genetic backgrounds for the same mutation.