189 resultados para hierarchical classification structures


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Evidence from magnetic resonance imaging (MRI) studies shows that healthy aging is associated with profound changes in cortical and subcortical brain structures. The reliable delineation of cortex and basal ganglia using automated computational anatomy methods based on T1-weighted images remains challenging, which results in controversies in the literature. In this study we use quantitative MRI (qMRI) to gain an insight into the microstructural mechanisms underlying tissue ageing and look for potential interactions between ageing and brain tissue properties to assess their impact on automated tissue classification. To this end we acquired maps of longitudinal relaxation rate R1, effective transverse relaxation rate R2* and magnetization transfer - MT, from healthy subjects (n=96, aged 21-88 years) using a well-established multi-parameter mapping qMRI protocol. Within the framework of voxel-based quantification we find higher grey matter volume in basal ganglia, cerebellar dentate and prefrontal cortex when tissue classification is based on MT maps compared with T1 maps. These discrepancies between grey matter volume estimates can be attributed to R2* - a surrogate marker of iron concentration, and further modulation by an interaction between R2* and age, both in cortical and subcortical areas. We interpret our findings as direct evidence for the impact of ageing-related brain tissue property changes on automated tissue classification of brain structures using SPM12. Computational anatomy studies of ageing and neurodegeneration should acknowledge these effects, particularly when inferring about underlying pathophysiology from regional cortex and basal ganglia volume changes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents the predicted flow dynamics from the application of a Reynolds-averaged NavierStokes model to a series of bifurcation geometries with morphologies measured during previous flume experiments. The topography of the bifurcations consists of either plane or bedform-dominated beds which may or may not possess discordance between the two bifurcation distributaries. Numerical predictions are compared with experimental results to assess the ability of the numerical model to reproduce the division of flow into the bifurcation distributaries. The hydrodynamic model predicts: (1) diverting fluxes in the upstream channel which direct water into the distributaries; (2) super-elevation of the free surface induced at the bifurcation edge by pressure differences; and (3) counter-rotating secondary circulation cells which develop upstream of the apex of the bifurcation and move into the downstream channels, with water converging at the surface and diverging at the bed. When bedforms are not present, weak transversal fluxes characterize the upstream channel for almost its entire length, associated with clearly distinguishable secondary circulation cells, although these may be under-estimated by the turbulence model used in the solution. In the bedform dominated case, the same hydrodynamic conditions were not observed, with the bifurcation influence restricted and depth scale secondary circulation cells not forming. The results also demonstrate the dominant effect bed discordance has upon flow division between the two distributaries. Finally, results indicate that in bedform dominated rivers. Consequently, we suggest that sand-bed river bifurcations are more likely to have an influence that extends much further upstream and have a greater impact upon water distribution. This may contribute to observed morphological differences between sand-bedded and gravel-bedded braided river networks. Copyright (C) 2012 John Wiley & Sons, Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paper deals with the development and application of the generic methodology for automatic processing (mapping and classification) of environmental data. General Regression Neural Network (GRNN) is considered in detail and is proposed as an efficient tool to solve the problem of spatial data mapping (regression). The Probabilistic Neural Network (PNN) is considered as an automatic tool for spatial classifications. The automatic tuning of isotropic and anisotropic GRNN/PNN models using cross-validation procedure is presented. Results are compared with the k-Nearest-Neighbours (k-NN) interpolation algorithm using independent validation data set. Real case studies are based on decision-oriented mapping and classification of radioactively contaminated territories.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Colorectal cancer (CRC) is a major cause of cancer mortality. Whereas some patients respond well to therapy, others do not, and thus more precise, individualized treatment strategies are needed. To that end, we analyzed gene expression profiles from 1,290 CRC tumors using consensus-based unsupervised clustering. The resultant clusters were then associated with therapeutic response data to the epidermal growth factor receptor-targeted drug cetuximab in 80 patients. The results of these studies define six clinically relevant CRC subtypes. Each subtype shares similarities to distinct cell types within the normal colon crypt and shows differing degrees of 'stemness' and Wnt signaling. Subtype-specific gene signatures are proposed to identify these subtypes. Three subtypes have markedly better disease-free survival (DFS) after surgical resection, suggesting these patients might be spared from the adverse effects of chemotherapy when they have localized disease. One of these three subtypes, identified by filamin A expression, does not respond to cetuximab but may respond to cMET receptor tyrosine kinase inhibitors in the metastatic setting. Two other subtypes, with poor and intermediate DFS, associate with improved response to the chemotherapy regimen FOLFIRI in adjuvant or metastatic settings. Development of clinically deployable assays for these subtypes and of subtype-specific therapies may contribute to more effective management of this challenging disease.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Anti-doping authorities have high expectations of the athlete steroidal passport (ASP) for anabolic-androgenic steroids misuse detection. However, it is still limited to the monitoring of known well-established compounds and might greatly benefit from the discovery of new relevant biomarkers candidates. In this context, steroidomics opens the way to the untargeted simultaneous evaluation of a high number of compounds. Analytical platforms associating the performance of ultra-high pressure liquid chromatography (UHPLC) and the high mass-resolving power of quadrupole time-of-flight (QTOF) mass spectrometers are particularly adapted for such purpose. An untargeted steroidomic approach was proposed to analyse urine samples from a clinical trial for the discovery of relevant biomarkers of testosterone undecanoate oral intake. Automatic peak detection was performed and a filter of reference steroid metabolites mass-to-charge ratio (m/z) values was applied to the raw data to ensure the selection of a subset of steroid-related features. Chemometric tools were applied for the filtering and the analysis of UHPLC-QTOF-MS(E) data. Time kinetics could be assessed with N-way projections to latent structures discriminant analysis (N-PLS-DA) and a detection window was confirmed. Orthogonal projections to latent structures discriminant analysis (O-PLS-DA) classification models were evaluated in a second step to assess the predictive power of both known metabolites and unknown compounds. A shared and unique structure plot (SUS-plot) analysis was performed to select the most promising unknown candidates and receiver operating characteristic (ROC) curves were computed to assess specificity criteria applied in routine doping control. This approach underlined the pertinence to monitor both glucuronide and sulphate steroid conjugates and include them in the athletes passport, while promising biomarkers were also highlighted.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis is a compilation of projects to study sediment processes recharging debris flow channels. These works, conducted during my stay at the University of Lausanne, focus in the geological and morphological implications of torrent catchments to characterize debris supply, a fundamental element to predict debris flows. Other aspects of sediment dynamics are considered, e.g. the coupling headwaters - torrent, as well as the development of a modeling software that simulates sediment transfer in torrent systems. The sediment activity at Manival, an active torrent system of the northern French Alps, was investigated using terrestrial laser scanning and supplemented with geostructural investigations and a survey of sediment transferred in the main torrent. A full year of sediment flux could be observed, which coincided with two debris flows and several bedload transport events. This study revealed that both debris flows generated in the torrent and were preceded in time by recharge of material from the headwaters. Debris production occurred mostly during winter - early spring time and was caused by large slope failures. Sediment transfers were more puzzling, occurring almost exclusively in early spring subordinated to runoffconditions and in autumn during long rainfall. Intense rainstorms in summer did not affect debris storage that seems to rely on the stability of debris deposits. The morpho-geological implication in debris supply was evaluated using DEM and field surveys. A slope angle-based classification of topography could characterize the mode of debris production and transfer. A slope stability analysis derived from the structures in rock mass could assess susceptibility to failure. The modeled rockfall source areas included more than 97% of the recorded events and the sediment budgets appeared to be correlated to the density of potential slope failure. This work showed that the analysis of process-related terrain morphology and of susceptibility to slope failure document the sediment dynamics to quantitatively assess erosion zones leading to debris flow activity. The development of erosional landforms was evaluated by analyzing their geometry with the orientations of potential rock slope failure and with the direction of the maximum joint frequency. Structure in rock mass, but in particular wedge failure and the dominant discontinuities, appear as a first-order control of erosional mechanisms affecting bedrock- dominated catchment. They represent some weaknesses that are exploited primarily by mass wasting processes and erosion, promoting not only the initiation of rock couloirs and gullies, but also their propagation. Incorporating the geological control in geomorphic processes contributes to better understand the landscape evolution of active catchments. A sediment flux algorithm was implemented in a sediment cascade model that discretizes the torrent catchment in channel reaches and individual process-response systems. Each conceptual element includes in simple manner geomorphological and sediment flux information derived from GIS complemented with field mapping. This tool enables to simulate sediment transfers in channels considering evolving debris supply and conveyance, and helps reducing the uncertainty inherent to sediment budget prediction in torrent systems. Cette thèse est un recueil de projets d'études des processus de recharges sédimentaires des chenaux torrentiels. Ces travaux, réalisés lorsque j'étais employé à l'Université de Lausanne, se concentrent sur les implications géologiques et morphologiques des bassins dans l'apport de sédiments, élément fondamental dans la prédiction de laves torrentielles. D'autres aspects de dynamique sédimentaire ont été abordés, p. ex. le couplage torrent - bassin, ainsi qu'un modèle de simulation du transfert sédimentaire en milieu torrentiel. L'activité sédimentaire du Manival, un système torrentiel actif des Alpes françaises, a été étudiée par relevés au laser scanner terrestre et complétée par une étude géostructurale ainsi qu'un suivi du transfert en sédiments du torrent. Une année de flux sédimentaire a pu être observée, coïncidant avec deux laves torrentielles et plusieurs phénomènes de charriages. Cette étude a révélé que les laves s'étaient générées dans le torrent et étaient précédées par une recharge de débris depuis les versants. La production de débris s'est passée principalement en l'hiver - début du printemps, causée par de grandes ruptures de pentes. Le transfert était plus étrange, se produisant presque exclusivement au début du printemps subordonné aux conditions d'écoulement et en automne lors de longues pluies. Les orages d'été n'affectèrent guère les dépôts, qui semblent dépendre de leur stabilité. Les implications morpho-géologiques dans l'apport sédimentaire ont été évaluées à l'aide de MNT et études de terrain. Une classification de la topographie basée sur la pente a permis de charactériser le mode de production et transfert. Une analyse de stabilité de pente à partir des structures de roches a permis d'estimer la susceptibilité à la rupture. Les zones sources modélisées comprennent plus de 97% des chutes de blocs observées et les bilans sédimentaires sont corrélés à la densité de ruptures potentielles. Ce travail d'analyses des morphologies du terrain et de susceptibilité à la rupture documente la dynamique sédimentaire pour l'estimation quantitative des zones érosives induisant l'activité torrentielle. Le développement des formes d'érosion a été évalué par l'analyse de leur géométrie avec celle des ruptures potentielles et avec la direction de la fréquence maximale des joints. Les structures de roches, mais en particulier les dièdres et les discontinuités dominantes, semblent être très influents dans les mécanismes d'érosion affectant les bassins rocheux. Ils représentent des zones de faiblesse exploitées en priorité par les processus de démantèlement et d'érosion, encourageant l'initiation de ravines et couloirs, mais aussi leur propagation. L'incorporation du control géologique dans les processus de surface contribue à une meilleure compréhension de l'évolution topographique de bassins actifs. Un algorithme de flux sédimentaire a été implémenté dans un modèle en cascade, lequel divise le bassin en biefs et en systèmes individuels répondant aux processus. Chaque unité inclut de façon simple les informations géomorpologiques et celles du flux sédimentaire dérivées à partir de SIG et de cartographie de terrain. Cet outil permet la simulation des transferts de masse dans les chenaux, considérants la variabilité de l'apport et son transport, et aide à réduire l'incertitude liée à la prédiction de bilans sédimentaires torrentiels. Ce travail vise très humblement d'éclairer quelques aspects de la dynamique sédimentaire en milieu torrentiel.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Expression data contribute significantly to the biological value of the sequenced human genome, providing extensive information about gene structure and the pattern of gene expression. ESTs, together with SAGE libraries and microarray experiment information, provide a broad and rich view of the transcriptome. However, it is difficult to perform large-scale expression mining of the data generated by these diverse experimental approaches. Not only is the data stored in disparate locations, but there is frequent ambiguity in the meaning of terms used to describe the source of the material used in the experiment. Untangling semantic differences between the data provided by different resources is therefore largely reliant on the domain knowledge of a human expert. We present here eVOC, a system which associates labelled target cDNAs for microarray experiments, or cDNA libraries and their associated transcripts with controlled terms in a set of hierarchical vocabularies. eVOC consists of four orthogonal controlled vocabularies suitable for describing the domains of human gene expression data including Anatomical System, Cell Type, Pathology and Developmental Stage. We have curated and annotated 7016 cDNA libraries represented in dbEST, as well as 104 SAGE libraries,with expression information,and provide this as an integrated, public resource that allows the linking of transcripts and libraries with expression terms. Both the vocabularies and the vocabulary-annotated libraries can be retrieved from http://www.sanbi.ac.za/evoc/. Several groups are involved in developing this resource with the aim of unifying transcript expression information.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Résumé Suite aux recentes avancées technologiques, les archives d'images digitales ont connu une croissance qualitative et quantitative sans précédent. Malgré les énormes possibilités qu'elles offrent, ces avancées posent de nouvelles questions quant au traitement des masses de données saisies. Cette question est à la base de cette Thèse: les problèmes de traitement d'information digitale à très haute résolution spatiale et/ou spectrale y sont considérés en recourant à des approches d'apprentissage statistique, les méthodes à noyau. Cette Thèse étudie des problèmes de classification d'images, c'est à dire de catégorisation de pixels en un nombre réduit de classes refletant les propriétés spectrales et contextuelles des objets qu'elles représentent. L'accent est mis sur l'efficience des algorithmes, ainsi que sur leur simplicité, de manière à augmenter leur potentiel d'implementation pour les utilisateurs. De plus, le défi de cette Thèse est de rester proche des problèmes concrets des utilisateurs d'images satellite sans pour autant perdre de vue l'intéret des méthodes proposées pour le milieu du machine learning dont elles sont issues. En ce sens, ce travail joue la carte de la transdisciplinarité en maintenant un lien fort entre les deux sciences dans tous les développements proposés. Quatre modèles sont proposés: le premier répond au problème de la haute dimensionalité et de la redondance des données par un modèle optimisant les performances en classification en s'adaptant aux particularités de l'image. Ceci est rendu possible par un système de ranking des variables (les bandes) qui est optimisé en même temps que le modèle de base: ce faisant, seules les variables importantes pour résoudre le problème sont utilisées par le classifieur. Le manque d'information étiquétée et l'incertitude quant à sa pertinence pour le problème sont à la source des deux modèles suivants, basés respectivement sur l'apprentissage actif et les méthodes semi-supervisées: le premier permet d'améliorer la qualité d'un ensemble d'entraînement par interaction directe entre l'utilisateur et la machine, alors que le deuxième utilise les pixels non étiquetés pour améliorer la description des données disponibles et la robustesse du modèle. Enfin, le dernier modèle proposé considère la question plus théorique de la structure entre les outputs: l'intègration de cette source d'information, jusqu'à présent jamais considérée en télédétection, ouvre des nouveaux défis de recherche. Advanced kernel methods for remote sensing image classification Devis Tuia Institut de Géomatique et d'Analyse du Risque September 2009 Abstract The technical developments in recent years have brought the quantity and quality of digital information to an unprecedented level, as enormous archives of satellite images are available to the users. However, even if these advances open more and more possibilities in the use of digital imagery, they also rise several problems of storage and treatment. The latter is considered in this Thesis: the processing of very high spatial and spectral resolution images is treated with approaches based on data-driven algorithms relying on kernel methods. In particular, the problem of image classification, i.e. the categorization of the image's pixels into a reduced number of classes reflecting spectral and contextual properties, is studied through the different models presented. The accent is put on algorithmic efficiency and the simplicity of the approaches proposed, to avoid too complex models that would not be used by users. The major challenge of the Thesis is to remain close to concrete remote sensing problems, without losing the methodological interest from the machine learning viewpoint: in this sense, this work aims at building a bridge between the machine learning and remote sensing communities and all the models proposed have been developed keeping in mind the need for such a synergy. Four models are proposed: first, an adaptive model learning the relevant image features has been proposed to solve the problem of high dimensionality and collinearity of the image features. This model provides automatically an accurate classifier and a ranking of the relevance of the single features. The scarcity and unreliability of labeled. information were the common root of the second and third models proposed: when confronted to such problems, the user can either construct the labeled set iteratively by direct interaction with the machine or use the unlabeled data to increase robustness and quality of the description of data. Both solutions have been explored resulting into two methodological contributions, based respectively on active learning and semisupervised learning. Finally, the more theoretical issue of structured outputs has been considered in the last model, which, by integrating outputs similarity into a model, opens new challenges and opportunities for remote sensing image processing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

River bifurcations are key nodes within braided river systems controlling the flow and sediment partitioning and therefore the dynamics of the river braiding process. Recent research has shown that certain geometrical configurations induce instabilities that lead to downstream mid-channel bar formation and the formation of bifurcations. However, we currently have a poor understanding of the flow division process within bifurcations and the flow dynamics in the downstream bifurcates, both of which are needed to understand bifurcation stability. This paper presents results of a numerical sensitivity experiment undertaken using computational fluid dynamics (CFD) with the purpose of understanding the flow dynamics of a series of idealized bifurcations. A geometric sensitivity analysis is undertaken for a range of channel slopes (0.005 to 0.03), bifurcation angles (22 degrees to 42 degrees) and a restricted set of inflow conditions based upon simulating flow through meander bends with different curvature on the flow field dynamics through the bifurcation. The results demonstrate that the overall slope of the bifurcation affects the velocity of flow through the bifurcation and when slope asymmetry is introduced, the flow structures in the bifurcation are modified. In terms of bifurcation evolution the most important observation appears to be that once slope asymmetry is greater than 0.2 the flow within the steep bifurcate shows potential instability and the potential for alternate channel bar formation. Bifurcation angle also defines the flow structures within the bifurcation with an increase in bifurcation angle increasing the flow velocity down both bifurcates. However, redistributive effects of secondary circulation caused by upstream curvature can very easily counter the effects of local bifurcation characteristics. Copyright (C) 2011 John Wiley & Sons, Ltd.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

BACKGROUND: To compare the prognostic relevance of Masaoka and Müller-Hermelink classifications. METHODS: We treated 71 patients with thymic tumors at our institution between 1980 and 1997. Complete follow-up was achieved in 69 patients (97%) with a mean follow up-time of 8.3 years (range, 9 months to 17 years). RESULTS: Masaoka stage I was found in 31 patients (44.9%), stage II in 17 (24.6%), stage III in 19 (27.6%), and stage IV in 2 (2.9%). The 10-year overall survival rate was 83.5% for stage I, 100% for stage IIa, 58% for stage IIb, 44% for stage III, and 0% for stage IV. The disease-free survival rates were 100%, 70%, 40%, 38%, and 0%, respectively. Histologic classification according to Müller-Hermelink found medullary tumors in 7 patients (10.1%), mixed in 18 (26.1%), organoid in 14 (20.3%), cortical in 11 (15.9%), well-differentiated thymic carcinoma in 14 (20.3%), and endocrine carcinoma in 5 (7.3%), with 10-year overall survival rates of 100%, 75%, 92%, 87.5%, 30%, and 0%, respectively, and 10-year disease-free survival rates of 100%, 100%, 77%, 75%, 37%, and 0%, respectively. Medullary, mixed, and well-differentiated organoid tumors were correlated with stage I and II, and well-differentiated thymic carcinoma and endocrine carcinoma with stage III and IV (p < 0.001). Multivariate analysis showed age, gender, myasthenia gravis, and postoperative adjuvant therapy not to be significant predictors of overall and disease-free survival after complete resection, whereas the Müller-Hermelink and Masaoka classifications were independent significant predictors for overall (p < 0.05) and disease-free survival (p < 0.004; p < 0.0001). CONCLUSIONS: The consideration of staging and histology in thymic tumors has the potential to improve recurrence prediction and patient selection for combined treatment modalities.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

MOTIVATION: Analysis of millions of pyro-sequences is currently playing a crucial role in the advance of environmental microbiology. Taxonomy-independent, i.e. unsupervised, clustering of these sequences is essential for the definition of Operational Taxonomic Units. For this application, reproducibility and robustness should be the most sought after qualities, but have thus far largely been overlooked. RESULTS: More than 1 million hyper-variable internal transcribed spacer 1 (ITS1) sequences of fungal origin have been analyzed. The ITS1 sequences were first properly extracted from 454 reads using generalized profiles. Then, otupipe, cd-hit-454, ESPRIT-Tree and DBC454, a new algorithm presented here, were used to analyze the sequences. A numerical assay was developed to measure the reproducibility and robustness of these algorithms. DBC454 was the most robust, closely followed by ESPRIT-Tree. DBC454 features density-based hierarchical clustering, which complements the other methods by providing insights into the structure of the data. AVAILABILITY: An executable is freely available for non-commercial users at ftp://ftp.vital-it.ch/tools/dbc454. It is designed to run under MPI on a cluster of 64-bit Linux machines running Red Hat 4.x, or on a multi-core OSX system. CONTACT: dbc454@vital-it.ch or nicolas.guex@isb-sib.ch.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

RÉSUMÉ Cette thèse porte sur le développement de méthodes algorithmiques pour découvrir automatiquement la structure morphologique des mots d'un corpus. On considère en particulier le cas des langues s'approchant du type introflexionnel, comme l'arabe ou l'hébreu. La tradition linguistique décrit la morphologie de ces langues en termes d'unités discontinues : les racines consonantiques et les schèmes vocaliques. Ce genre de structure constitue un défi pour les systèmes actuels d'apprentissage automatique, qui opèrent généralement avec des unités continues. La stratégie adoptée ici consiste à traiter le problème comme une séquence de deux sous-problèmes. Le premier est d'ordre phonologique : il s'agit de diviser les symboles (phonèmes, lettres) du corpus en deux groupes correspondant autant que possible aux consonnes et voyelles phonétiques. Le second est de nature morphologique et repose sur les résultats du premier : il s'agit d'établir l'inventaire des racines et schèmes du corpus et de déterminer leurs règles de combinaison. On examine la portée et les limites d'une approche basée sur deux hypothèses : (i) la distinction entre consonnes et voyelles peut être inférée sur la base de leur tendance à alterner dans la chaîne parlée; (ii) les racines et les schèmes peuvent être identifiés respectivement aux séquences de consonnes et voyelles découvertes précédemment. L'algorithme proposé utilise une méthode purement distributionnelle pour partitionner les symboles du corpus. Puis il applique des principes analogiques pour identifier un ensemble de candidats sérieux au titre de racine ou de schème, et pour élargir progressivement cet ensemble. Cette extension est soumise à une procédure d'évaluation basée sur le principe de la longueur de description minimale, dans- l'esprit de LINGUISTICA (Goldsmith, 2001). L'algorithme est implémenté sous la forme d'un programme informatique nommé ARABICA, et évalué sur un corpus de noms arabes, du point de vue de sa capacité à décrire le système du pluriel. Cette étude montre que des structures linguistiques complexes peuvent être découvertes en ne faisant qu'un minimum d'hypothèses a priori sur les phénomènes considérés. Elle illustre la synergie possible entre des mécanismes d'apprentissage portant sur des niveaux de description linguistique distincts, et cherche à déterminer quand et pourquoi cette coopération échoue. Elle conclut que la tension entre l'universalité de la distinction consonnes-voyelles et la spécificité de la structuration racine-schème est cruciale pour expliquer les forces et les faiblesses d'une telle approche. ABSTRACT This dissertation is concerned with the development of algorithmic methods for the unsupervised learning of natural language morphology, using a symbolically transcribed wordlist. It focuses on the case of languages approaching the introflectional type, such as Arabic or Hebrew. The morphology of such languages is traditionally described in terms of discontinuous units: consonantal roots and vocalic patterns. Inferring this kind of structure is a challenging task for current unsupervised learning systems, which generally operate with continuous units. In this study, the problem of learning root-and-pattern morphology is divided into a phonological and a morphological subproblem. The phonological component of the analysis seeks to partition the symbols of a corpus (phonemes, letters) into two subsets that correspond well with the phonetic definition of consonants and vowels; building around this result, the morphological component attempts to establish the list of roots and patterns in the corpus, and to infer the rules that govern their combinations. We assess the extent to which this can be done on the basis of two hypotheses: (i) the distinction between consonants and vowels can be learned by observing their tendency to alternate in speech; (ii) roots and patterns can be identified as sequences of the previously discovered consonants and vowels respectively. The proposed algorithm uses a purely distributional method for partitioning symbols. Then it applies analogical principles to identify a preliminary set of reliable roots and patterns, and gradually enlarge it. This extension process is guided by an evaluation procedure based on the minimum description length principle, in line with the approach to morphological learning embodied in LINGUISTICA (Goldsmith, 2001). The algorithm is implemented as a computer program named ARABICA; it is evaluated with regard to its ability to account for the system of plural formation in a corpus of Arabic nouns. This thesis shows that complex linguistic structures can be discovered without recourse to a rich set of a priori hypotheses about the phenomena under consideration. It illustrates the possible synergy between learning mechanisms operating at distinct levels of linguistic description, and attempts to determine where and why such a cooperation fails. It concludes that the tension between the universality of the consonant-vowel distinction and the specificity of root-and-pattern structure is crucial for understanding the advantages and weaknesses of this approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ce texte est un « droit de réponse » par les auteurs de l'article « Vers un naturalisme social. À la croisée des sciences sociales et des sciences cognitives », publié par SociologieS en octobre 2011, au débat qu'il a suscité. Après une brève mise au point sur la forme même du débat, ainsi que sur les dissensions ponctuelles qui opposent les différents protagonistes, l'article répond aux inquiétudes parfaitement légitimes et aux questions de fond que soulève le naturalisme social.