968 resultados para lexical statistics
Resumo:
Ce mémoire présente une évaluation des différentes méthodes utilisées en lexicographie afin d’identifier les liens lexicaux dans les dictionnaires où sont répertoriées des collocations. Nous avons ici comparé le contenu de fiches du DiCo, un dictionnaire de dérivés sémantiques et de collocations créé selon les principes de la lexicologie explicative et combinatoire, avec les listes de cooccurrents générées automatiquement à partir du corpus Le Monde 2002. Notre objectif est ici de proposer des améliorations méthodologiques à la création de fiches de dictionnaire du type du DiCo, c’est-à-dire, des dictionnaires d’approche qualitative, où la collocation est définie comme une association récurrente et arbitraire entre deux items lexicaux et où les principaux outils méthodologiques utilisés sont la compétence linguistique de ses lexicographes et la consultation manuelle de corpus de textes. La consultation de listes de cooccurrents est une pratique associée habituellement à une approche lexicographique quantitative, qui définit la collocation comme une association entre deux items lexicaux qui est plus fréquente, dans un corpus, que ce qui pourrait être attendu si ces deux items lexicaux y étaient distribués de façon aléatoire. Nous voulons mesurer ici dans quelle mesure les outils utilisés traditionnellement dans une approche quantitative peuvent être utiles à la création de fiches lexicographiques d’approche qualitative, et de quelle façon leur utilisation peut être intégrée à la méthodologie actuelle de création de ces fiches.
Resumo:
Pós-graduação em Estudos Linguísticos - IBILCE
Resumo:
In this article, we present the first open-access lexical database that provides phonological representations for 120,000 Italian word forms. Each of these also includes syllable boundaries and stress markings and a comprehensive range of lexical statistics. Using data derived from this lexicon, we have also generated a set of derived databases and provided estimates of positional frequency use for Italian phonemes, syllables, syllable onsets and codas, and character and phoneme bigrams. These databases are freely available from phonitalia.org. This article describes the methods, content, and summarizing statistics for these databases. In a first application of this database, we also demonstrate how the distribution of phonological substitution errors made by Italian aphasic patients is related to phoneme frequency. © 2013 Psychonomic Society, Inc.
Resumo:
Peer to peer systems have been widely used in the internet. However, most of the peer to peer information systems are still missing some of the important features, for example cross-language IR (Information Retrieval) and collection selection / fusion features. Cross-language IR is the state-of-art research area in IR research community. It has not been used in any real world IR systems yet. Cross-language IR has the ability to issue a query in one language and receive documents in other languages. In typical peer to peer environment, users are from multiple countries. Their collections are definitely in multiple languages. Cross-language IR can help users to find documents more easily. E.g. many Chinese researchers will search research papers in both Chinese and English. With Cross-language IR, they can do one query in Chinese and get documents in two languages. The Out Of Vocabulary (OOV) problem is one of the key research areas in crosslanguage information retrieval. In recent years, web mining was shown to be one of the effective approaches to solving this problem. However, how to extract Multiword Lexical Units (MLUs) from the web content and how to select the correct translations from the extracted candidate MLUs are still two difficult problems in web mining based automated translation approaches. Discovering resource descriptions and merging results obtained from remote search engines are two key issues in distributed information retrieval studies. In uncooperative environments, query-based sampling and normalized-score based merging strategies are well-known approaches to solve such problems. However, such approaches only consider the content of the remote database but do not consider the retrieval performance of the remote search engine. This thesis presents research on building a peer to peer IR system with crosslanguage IR and advance collection profiling technique for fusion features. Particularly, this thesis first presents a new Chinese term measurement and new Chinese MLU extraction process that works well on small corpora. An approach to selection of MLUs in a more accurate manner is also presented. After that, this thesis proposes a collection profiling strategy which can discover not only collection content but also retrieval performance of the remote search engine. Based on collection profiling, a web-based query classification method and two collection fusion approaches are developed and presented in this thesis. Our experiments show that the proposed strategies are effective in merging results in uncooperative peer to peer environments. Here, an uncooperative environment is defined as each peer in the system is autonomous. Peer like to share documents but they do not share collection statistics. This environment is a typical peer to peer IR environment. Finally, all those approaches are grouped together to build up a secure peer to peer multilingual IR system that cooperates through X.509 and email system.
Resumo:
Matrix function approximation is a current focus of worldwide interest and finds application in a variety of areas of applied mathematics and statistics. In this thesis we focus on the approximation of A^(-α/2)b, where A ∈ ℝ^(n×n) is a large, sparse symmetric positive definite matrix and b ∈ ℝ^n is a vector. In particular, we will focus on matrix function techniques for sampling from Gaussian Markov random fields in applied statistics and the solution of fractional-in-space partial differential equations. Gaussian Markov random fields (GMRFs) are multivariate normal random variables characterised by a sparse precision (inverse covariance) matrix. GMRFs are popular models in computational spatial statistics as the sparse structure can be exploited, typically through the use of the sparse Cholesky decomposition, to construct fast sampling methods. It is well known, however, that for sufficiently large problems, iterative methods for solving linear systems outperform direct methods. Fractional-in-space partial differential equations arise in models of processes undergoing anomalous diffusion. Unfortunately, as the fractional Laplacian is a non-local operator, numerical methods based on the direct discretisation of these equations typically requires the solution of dense linear systems, which is impractical for fine discretisations. In this thesis, novel applications of Krylov subspace approximations to matrix functions for both of these problems are investigated. Matrix functions arise when sampling from a GMRF by noting that the Cholesky decomposition A = LL^T is, essentially, a `square root' of the precision matrix A. Therefore, we can replace the usual sampling method, which forms x = L^(-T)z, with x = A^(-1/2)z, where z is a vector of independent and identically distributed standard normal random variables. Similarly, the matrix transfer technique can be used to build solutions to the fractional Poisson equation of the form ϕn = A^(-α/2)b, where A is the finite difference approximation to the Laplacian. Hence both applications require the approximation of f(A)b, where f(t) = t^(-α/2) and A is sparse. In this thesis we will compare the Lanczos approximation, the shift-and-invert Lanczos approximation, the extended Krylov subspace method, rational approximations and the restarted Lanczos approximation for approximating matrix functions of this form. A number of new and novel results are presented in this thesis. Firstly, we prove the convergence of the matrix transfer technique for the solution of the fractional Poisson equation and we give conditions by which the finite difference discretisation can be replaced by other methods for discretising the Laplacian. We then investigate a number of methods for approximating matrix functions of the form A^(-α/2)b and investigate stopping criteria for these methods. In particular, we derive a new method for restarting the Lanczos approximation to f(A)b. We then apply these techniques to the problem of sampling from a GMRF and construct a full suite of methods for sampling conditioned on linear constraints and approximating the likelihood. Finally, we consider the problem of sampling from a generalised Matern random field, which combines our techniques for solving fractional-in-space partial differential equations with our method for sampling from GMRFs.
Resumo:
The refractive error of a human eye varies across the pupil and therefore may be treated as a random variable. The probability distribution of this random variable provides a means for assessing the main refractive properties of the eye without the necessity of traditional functional representation of wavefront aberrations. To demonstrate this approach, the statistical properties of refractive error maps are investigated. Closed-form expressions are derived for the probability density function (PDF) and its statistical moments for the general case of rotationally-symmetric aberrations. A closed-form expression for a PDF for a general non-rotationally symmetric wavefront aberration is difficult to derive. However, for specific cases, such as astigmatism, a closed-form expression of the PDF can be obtained. Further, interpretation of the distribution of the refractive error map as well as its moments is provided for a range of wavefront aberrations measured in real eyes. These are evaluated using a kernel density and sample moments estimators. It is concluded that the refractive error domain allows non-functional analysis of wavefront aberrations based on simple statistics in the form of its sample moments. Clinicians may find this approach to wavefront analysis easier to interpret due to the clinical familiarity and intuitive appeal of refractive error maps.
Resumo:
Now in its sixth edition, the Traffic Engineering Handbook continues to be a must have publication in the transportation industry, as it has been for the past 60 years. The new edition provides updated information for people entering the practice and for those already practicing. The handbook is a convenient desk reference, as well as an all in one source of principles and proven techniques in traffic engineering. Most chapters are presented in a new format, which divides the chapters into four areas-basics, current practice, emerging trends and information sources. Chapter topics include road users, vehicle characteristics, statistics, planning for operations, communications, safety, regulations, traffic calming, access management, geometrics, signs and markings, signals, parking, traffic demand, maintenance and studies. In addition, as the focus in transportation has shifted from project based to operations based, two new chapters have been added-"Planning for Operations" and "Managing Traffic Demand to Address Congestion: Providing Travelers with Choices." The Traffic Engineering Handbook continues to be one of the primary reference sources for study to become a certified Professional Traffic Operations Engineer™. Chapters are authored by notable and experienced authors, and reviewed and edited by a distinguished panel of traffic engineering experts.
Resumo:
For many decades correlation and power spectrum have been primary tools for digital signal processing applications in the biomedical area. The information contained in the power spectrum is essentially that of the autocorrelation sequence; which is sufficient for complete statistical descriptions of Gaussian signals of known means. However, there are practical situations where one needs to look beyond autocorrelation of a signal to extract information regarding deviation from Gaussianity and the presence of phase relations. Higher order spectra, also known as polyspectra, are spectral representations of higher order statistics, i.e. moments and cumulants of third order and beyond. HOS (higher order statistics or higher order spectra) can detect deviations from linearity, stationarity or Gaussianity in the signal. Most of the biomedical signals are non-linear, non-stationary and non-Gaussian in nature and therefore it can be more advantageous to analyze them with HOS compared to the use of second order correlations and power spectra. In this paper we have discussed the application of HOS for different bio-signals. HOS methods of analysis are explained using a typical heart rate variability (HRV) signal and applications to other signals are reviewed.