18 resultados para corpora de aprendizes
em BORIS: Bern Open Repository and Information System - Berna - Suiça
Resumo:
Las nuevas tecnologías y el procesamiento digital han facilitado considerablemente la lingüística de corpus, por ejemplo Internet es una herramienta fácil y barata para recopilar corpus. Internet es cada vez más popular y más importante para la comunicación a causa de la enorme influencia de los nuevos medios y ha afectado la vida y la sociedad de muchas maneras y en parte, de manera fundamental. No sorprende por eso que la lengua y la comunicación misma se vean afectadas. Uno de los fenómenos más interesantes dentro de la comunicación mediada por ordenadores (CMC) son las redes sociales en línea, que en pocos años se han convertido en un medio de comunicación muy difundido y en expansión continua. Su estudio es particularmente interesante porque debido al desarrollo constante de la tecnología las redes sociales en línea no son una entidad estática, sino que cambian incesantemente, introduciéndose frecuentemente novedades para su uso. Estas novedades están condicionadas por el medio electrónico que a su vez influye decisivamente en el estilo de comunicación empleado en redes sociales como Facebook y Tuenti. Al ser un nuevo medio de interacción social, las redes sociales en línea producen un estilo de comunicación propio. El objetivo de análisis de mi tesis es cómo los usuarios de Facebook y Tuenti de la ciudad de Málaga crean este estilo mediante el uso de rasgos fónicos propios de la variedad andaluza y de qué manera la actitud lingüística de los usuarios influye en el uso de dichos rasgos fónicos. Este estudio se basa en un corpus elaborado a partir de enunciados de informantes en Facebook y Tuenti. Un corpus constituido por transcripciones amplias de grabaciones de hablantes malagueños me sirve de corpus de comparación. Otra herramienta metodológica empleada para recopilar datos será la encuesta: un tipo de encuesta estará destinada a captar las actitudes de los participantes frente a diversos rasgos del habla andaluza/malagueña y otro a examinar por qué la gente utiliza estos rasgos en Facebook y Tuenti. Este estudio se apoya en los resultados de un estudio piloto que muestran que los factores sociales y lingüísticos analizados funcionan de manera distinta en el habla real y virtual. Debido a estos usos diferentes podemos considerar la comunicación electrónica de Facebook y Tuenti como un estilo condicionado por el tipo de espacio virtual. Se trata de un estilo que sirve a los usuarios para crear significado social y para expresar sus identidades a partir de la lengua.
Resumo:
Das Internet wird ein immer populäreres und wichtigeres Kommunikationsmittel, besonders die sogenannten Social-Networking-Sites. In dieser Studie wird untersucht wie die Social-Networking-Sites, Facebook und Tuenti, die Kommunikation beeinflussen. In einem Korpus von Usern aus Málaga, wurde der Gebrauch von nicht-Standard Merkmalen analysiert und mit dem in der gesprochenen Sprache verglichen. Aus diesem Vergleich lässt sich schließen, dass die untersuchten sozialen und linguistischen Faktoren in der virtuellen und der reellen Sprache unterschiedlich funktionieren. Aufgrund dieses unterschiedlichen Gebrauchs kann die elektronische Kommunikation Facebook und Tuenti’s als Stil betrachtet werden, welcher den Usern dazu dient, soziale Bedeutung zu kreieren und ihre sprachliche Identität auszudrücken.
Resumo:
Software corpora facilitate reproducibility of analyses, however, static analysis for an entire corpus still requires considerable effort, often duplicated unnecessarily by multiple users. Moreover, most corpora are designed for single languages increasing the effort for cross-language analysis. To address these aspects we propose Pangea, an infrastructure allowing fast development of static analyses on multi-language corpora. Pangea uses language-independent meta-models stored as object model snapshots that can be directly loaded into memory and queried without any parsing overhead. To reduce the effort of performing static analyses, Pangea provides out-of-the box support for: creating and refining analyses in a dedicated environment, deploying an analysis on an entire corpus, using a runner that supports parallel execution, and exporting results in various formats. In this tool demonstration we introduce Pangea and provide several usage scenarios that illustrate how it reduces the cost of analysis.
Resumo:
Discourse connectives are lexical items indicating coherence relations between discourse segments. Even though many languages possess a whole range of connectives, important divergences exist cross-linguistically in the number of connectives that are used to express a given relation. For this reason, connectives are not easily paired with a univocal translation equivalent across languages. This paper is a first attempt to design a reliable method to annotate the meaning of discourse connectives cross-linguistically using corpus data. We present the methodological choices made to reach this aim and report three annotation experiments using the framework of the Penn Discourse Tree Bank.
Resumo:
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on χ2 and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general measure. We also provide evidence for the existence of specific characteristics defining translated texts as opposed to non-translated ones, due to a universal tendency for explicitation.
Resumo:
When analysing software metrics, users find that visualisation tools lack support for (1) the detection of patterns within metrics; and (2) enabling analysis of software corpora. In this paper we present Explora, a visualisation tool designed for the simultaneous analysis of multiple metrics of systems in software corpora. Explora incorporates a novel lightweight visualisation technique called PolyGrid that promotes the detection of graphical patterns. We present an example where we analyse the relation of subtype polymorphism with inheritance and invocation in corpora of Smalltalk and Java systems and find that (1) subtype polymorphism is more likely to be found in large hierarchies; (2) as class hierarchies grow horizontally, they also do so vertically; and (3) in polymorphic hierarchies the length of the name of the classes is orthogonal to the cardinality of the call sites.
Resumo:
Visualisation provides good support for software analysis. It copes with the intangible nature of software by providing concrete representations of it. By reducing the complexity of software, visualisations are especially useful when dealing with large amounts of code. One domain that usually deals with large amounts of source code data is empirical analysis. Although there are many tools for analysis and visualisation, they do not cope well software corpora. In this paper we present Explora, an infrastructure that is specifically targeted at visualising corpora. We report on early results when conducting a sample analysis on Smalltalk and Java corpora.
Resumo:
Statistical shape analysis techniques commonly employed in the medical imaging community, such as active shape models or active appearance models, rely on principal component analysis (PCA) to decompose shape variability into a reduced set of interpretable components. In this paper we propose principal factor analysis (PFA) as an alternative and complementary tool to PCA providing a decomposition into modes of variation that can be more easily interpretable, while still being a linear efficient technique that performs dimensionality reduction (as opposed to independent component analysis, ICA). The key difference between PFA and PCA is that PFA models covariance between variables, rather than the total variance in the data. The added value of PFA is illustrated on 2D landmark data of corpora callosa outlines. Then, a study of the 3D shape variability of the human left femur is performed. Finally, we report results on vector-valued 3D deformation fields resulting from non-rigid registration of ventricles in MRI of the brain.
Resumo:
CONTEXT: Death from corpora aliena in the larynx is a well-known entity in forensic pathology. The correct diagnosis of this cause of death is difficult without an autopsy, and misdiagnoses by external examination alone are common. OBJECTIVE: To determine the postmortem usefulness of modern imaging techniques in the diagnosis of foreign bodies in the larynx, multislice computed tomography, magnetic resonance imaging, and postmortem full-body computed tomography-angiography were performed. DESIGN: Three decedents with a suspected foreign body in the larynx underwent the 3 different imaging techniques before medicolegal autopsy. RESULTS: Multislice computed tomography has a high diagnostic value in the noninvasive localization of a foreign body and abnormalities in the larynx. The differentiation between neoplasm or soft foreign bodies (eg, food) is possible, but difficult, by unenhanced multislice computed tomography. By magnetic resonance imaging, the discrimination of the soft tissue structures and soft foreign bodies is much easier. In addition to the postmortem multislice computed tomography, the combination with postmortem angiography will increase the diagnostic value. CONCLUSIONS: Postmortem, cross-sectional imaging methods are highly valuable procedures for the noninvasive detection of corpora aliena in the larynx.
Resumo:
In variational linguistics, the concept of space has always been a central issue. However, different research traditions considering space coexisted for a long time separately. Traditional dialectology focused primarily on the diatopic dimension of linguistic variation, whereas in sociolinguistic studies diastratic and diaphasic dimensions were considered. For a long time only very few linguistic investigations tried to combine both research traditions in a two-dimensional design – a desideratum which is meant to be compensated by the contributions of this volume. The articles present findings from empirical studies which take on these different concepts and examine how they relate to one another. Besides dialectological and sociolinguistic concepts also a lay perspective of linguistic space is considered, a paradigm that is often referred to as “folk dialectology”. Many of the studies in this volume make use of new computational possibilities of processing and cartographically representing large corpora of linguistic data. The empirical studies incorporate findings from different linguistic communities in Europe and pursue the objective to shed light on the inter-relationship between the different concepts of space and their relevance to variational linguistics.
Resumo:
The present article deals exemplarily with the remarkable iconographic attestations connected with the Wadi ed-Daliye (WD) findings. The discussed bullae were attached to papyri which provide a clear dating of the hoard between 375-335 BCE. Considering style and convention the preserved motives are to be classified as Persian, Greek or Greco-Persian. A major goal of the following presentation is the contextualization of the very material; this is achieved by taking into account local parallels as well as relevant attestations of the dominant / “imperial” cultures of Persia and Greece. The correlation of motives with the (often more complex, more detailed or more contoured) examples stemming from the “source-cultures” follows a clear agenda: It is methodologically based on the approach that was employed by Silvia Schroer and Othmar Keel throughout the project „Die Ikonographie Palästinas/Israels und der Alte Orient (IPIAO). Eine Religionsgeschichte in Bildern” (2005, 23ff). The WD-findings demand a careful analysis since the influencing cultures behind the imagery are deeply rooted in the field of Greek mythology and iconography. Special attention has to be drawn to the bullae, as far as excavated, from a huge Punic temple archive of Carthago (Berges 1997 and 2002) as well as those from the archive of the satrap seat in Daskyleion in the Northwest of Asia Minor (Kaptan 2002), which are chronologically close (end 5th and 4th century BCE) to the WD-finds. Not each and every single motive and artifact of the WD-corpus comprising more than 120 items can be dealt with in detail throughout the following pages. We refer to the editio princeps by Leith (1990, 1997) respectively to the concerning chapter in Keel’s Corpus volume II (Keel 2010, 340-379). The article gives a brief history of research (2.), some basic remarks on the development of style (3.) and a selection of detail-studies (4.). A crosscheck with other relevant corpora of stamp-seals (5.) as well as a compressed synthesis (6.) are contributions in order to characterize and classify the unique iconographic assemblage. There are rather few references to the late Persian coins from Samaria (Meshorer/Qedar 1999), which have been impressed about contemporaneous with the WD-bullae (372-333 BCE), as there is an article by Patrick Wyssmann in this volume about that specific corpus. Through the perspective of the late Persian iconography, Samaria appears as a dazzling metropolis at the crossroads of Greek and Persian culture, which is far away from a strict and revolutionary religious orthodoxy
Resumo:
Software developers often ask questions about software systems and software ecosystems that entail exploration and navigation, such as who uses this component?, and where is this feature implemented?. Software visualisation can be a great aid to understanding and exploring the answers to such questions, but visualisations require expertise to implement effectively, and they do not always scale well to large systems. We propose to automatically generate software visualisations based on software models derived from open source software corpora and from an analysis of the properties of typical developers queries and commonly used visualisations. The key challenges we see are (1) understanding how to match queries to suitable visualisations, and (2) scaling visualisations effectively to very large software systems and corpora. In the paper we motivate the idea of automatic software visualisation, we enumerate the challenges and our proposals to address them, and we describe some very initial results in our attempts to develop scalable visualisations of open source software corpora.