934 resultados para Multilingual database
Resumo:
This paper addresses the problem of multilingual digital libraries. The motivation for a such a digital library comes from the diversity of languages of the Internet users as well as the diversity of content authors, from e-book authors to writers of courseware. The basic definitions of such a system, the specifications of its functionality and the identification of the items it holds are discussed. The impact of multilinguism in each of the former aspects is presented. A case study of a multilingual digital library - in the Maxwell System in PUC-Rio - is described in the last sections. Its main characteristics are described and the current status of its digital library is shown.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
In the present dissertation, multilingual thesauri were approached as cultural products and the focus was twofold: On the empirical level the focus was placed on the translatability of certain British-English social science indexing terms into the Finnish language and culture at a concept, a term and an indexing term level. On the theoretical level the focus was placed on the aim of translation and on the concept of equivalence. In accordance with modern communicative and dynamic translation theories the interest was on the human dimension. The study is qualitative. In this study, equivalence was understood in a similar way to how dynamic, functional equivalence is commonly understood in translation studies. Translating was seen as a decision-making process, where a translator often has different kinds of possibilities to choose in order to fulfil the function of the translation. Accordingly, and as a starting point for the construction of the empirical part, the function of the source text was considered to be the same or similar to the function of the target text, that is, a functional thesaurus both in source and target context. Further, the study approached the challenges of multilingual thesaurus construction from the perspectives of semantics and pragmatics. In semantic analysis the focus was on what the words conventionally mean and in pragmatics on the ‘invisible’ meaning - or how we recognise what is meant even when it is not actually said (or written). Languages and ideas expressed by languages are created mainly in accordance with expressional needs of the surrounding culture and thesauri were considered to reflect several subcultures and consequently the discourses which represent them. The research material consisted of different kinds of potential discourses: dictionaries, database records, and thesauri, Finnish versus British social science researches, Finnish versus British indexers, simulated indexing tasks with five articles and Finnish versus British thesaurus constructors. In practice, the professional background of the two last mentioned groups was rather similar. It became even more clear that all the material types had their own characteristics, although naturally not entirely separate from each other. It is further noteworthy that the different types and origins of research material were not used to represent true comparison pairs, and that the aim of triangulation of methods and material was to gain a holistic view. The general research questions were: 1. Can differences be found between Finnish and British discourses regarding family roles as thesaurus terms, and if so, what kinds of differences and which are the implications for multilingual thesaurus construction? 2. What is the pragmatic indexing term equivalence? The first question studied how the same topic (family roles) was represented in different contexts and by different users, and further focused on how the possible differences were handled in multilingual thesaurus construction. The second question was based on findings of the previous one, and answered to the final question as to what kinds of factors should be considered when defining translation equivalence in multilingual thesaurus construction. The study used multiple cases and several data collection and analysis methods aiming at theoretical replication and complementarity. The empirical material and analysis consisted of focused interviews (with Finnish and British social scientists, thesaurus constructors and indexers), simulated indexing tasks with Finnish and British indexers, semantic component analysis of dictionary definitions and translations, coword analysis and datasets retrieved in databases, and discourse analysis of thesauri. As a terminological starting point a topic and case family roles was selected. The results were clear: 1) It was possible to identify different discourses. There also existed subdiscourses. For example within the group of social scientists the orientation to qualitative versus quantitative research had an impact on the way they reacted to the studied words and discourses, and indexers placed more emphasis on the information seekers whereas thesaurus constructors approached the construction problems from a more material based solution. The differences between the different specialist groups i.e. the social scientists, the indexers and the thesaurus constructors were often greater than between the different geo-cultural groups i.e. Finnish versus British. The differences occurred as a result of different translation aims, diverging expectations for multilingual thesauri and variety of practices. For multilingual thesaurus construction this means severe challenges. The clearly ambiguous concept of multilingual thesaurus as well as different construction and translation strategies should be considered more precisely in order to shed light on focus and equivalence types, which are clearly not self-evident. The research also revealed the close connection between the aims of multilingual thesauri and the pragmatic indexing term equivalence. 2) The pragmatic indexing term equivalence is very much context-depended. Although thesaurus term equivalence is defined and standardised in the field of library and information science (LIS), it is not understood in one established way and the current LIS tools are inadequate to provide enough analytical tools for both constructing and studying different kinds of multilingual thesauri as well as their indexing term equivalence. The tools provided in translation science were more practical and theoretical, and especially the division of different meanings of a word provided a useful tool in analysing the pragmatic equivalence, which often differs from the ideal model represented in thesaurus construction literature. The study thus showed that the variety of different discourses should be acknowledged, there is a need for operationalisation of new types of multilingual thesauri, and the factors influencing pragmatic indexing term equivalence should be discussed more precisely than is traditionally done.
Resumo:
The goal of this work was developing a query processing system using software agents. Open Agent Architecture framework is used for system development. The system supports queries in both Hindi and Malayalam; two prominent regional languages of India. Natural language processing techniques are used for meaning extraction from the plain query and information from database is given back to the user in his native language. The system architecture is designed in a structured way that it can be adapted to other regional languages of India. . This system can be effectively used in application areas like e-governance, agriculture, rural health, education, national resource planning, disaster management, information kiosks etc where people from all walks of life are involved.
Resumo:
The goal of this work is to develop an Open Agent Architecture for Multilingual information retrieval from Relational Database. The query for information retrieval can be given in plain Hindi or Malayalam; two prominent regional languages of India. The system supports distributed processing of user requests through collaborating agents. Natural language processing techniques are used for meaning extraction from the plain query and information is given back to the user in his/ her native language. The system architecture is designed in a structured way so that it can be adapted to other regional languages of India
Resumo:
Princeton WordNet (WN.Pr) lexical database has motivated efficient compilations of bulky relational lexicons since its inception in the 1980's. The EuroWordNet project, the first multilingual initiative built upon WN.Pr, opened up ways of building individual wordnets, and interrelating them by means of the so-called Inter-Lingual-Index, an unstructured list of the WN.Pr synsets. Other important initiative, relying on a slightly different method of building multilingual wordnets, is the MultiWordNet project, where the key strategy is building language specific wordnets keeping as much as possible of the semantic relations available in the WN.Pr. This paper, in particular, stresses that the additional advantage of using WN.Pr lexical database as a resource for building wordnets for other languages is to explore possibilities of implementing an automatic procedure to map the WN.Pr conceptual relations as hyponymy, co-hyponymy, troponymy, meronymy, cause, and entailment onto the lexical database of the wordnet under construction, a viable possibility, for those are language-independent relations that hold between lexicalized concepts, not between lexical units. Accordingly, combining methods from both initiatives, this paper presents the ongoing implementation of the WN.Br lexical database and the aforementioned automation procedure illustrated with a sample of the automatic encoding of the hyponymy and co-hyponymy relations.
Resumo:
Different types of water bodies, including lakes, streams, and coastal marine waters, are often susceptible to fecal contamination from a range of point and nonpoint sources, and have been evaluated using fecal indicator microorganisms. The most commonly used fecal indicator is Escherichia coli, but traditional cultivation methods do not allow discrimination of the source of pollution. The use of triplex PCR offers an approach that is fast and inexpensive, and here enabled the identification of phylogroups. The phylogenetic distribution of E. coli subgroups isolated from water samples revealed higher frequencies of subgroups A1 and B23 in rivers impacted by human pollution sources, while subgroups D1 and D2 were associated with pristine sites, and subgroup B1 with domesticated animal sources, suggesting their use as a first screening for pollution source identification. A simple classification is also proposed based on phylogenetic subgroup distribution using the w-clique metric, enabling differentiation of polluted and unpolluted sites.
Resumo:
Despite a strong increase in research on seamounts and oceanic islands ecology and biogeography, many basic aspects of their biodiversity are still unknown. In the southwestern Atlantic, the Vitória-Trindade Seamount Chain (VTC) extends ca. 1,200 km offshore the Brazilian continental shelf, from the Vitória seamount to the oceanic islands of Trindade and Martin Vaz. For a long time, most of the biological information available regarded its islands. Our study presents and analyzes an extensive database on the VTC fish biodiversity, built on data compiled from literature and recent scientific expeditions that assessed both shallow to mesophotic environments. A total of 273 species were recorded, 211 of which occur on seamounts and 173 at the islands. New records for seamounts or islands include 191 reef fish species and 64 depth range extensions. The structure of fish assemblages was similar between islands and seamounts, not differing in species geographic distribution, trophic composition, or spawning strategies. Main differences were related to endemism, higher at the islands, and to the number of endangered species, higher at the seamounts. Since unregulated fishing activities are common in the region, and mining activities are expected to drastically increase in the near future (carbonates on seamount summits and metals on slopes), this unique biodiversity needs urgent attention and management.
Resumo:
Considering the difficulties in finding good-quality images for the development and test of computer-aided diagnosis (CAD), this paper presents a public online mammographic images database free for all interested viewers and aimed to help develop and evaluate CAD schemes. The digitalization of the mammographic images is made with suitable contrast and spatial resolution for processing purposes. The broad recuperation system allows the user to search for different images, exams, or patient characteristics. Comparison with other databases currently available has shown that the presented database has a sufficient number of images, is of high quality, and is the only one to include a functional search system.
Resumo:
This article documents the addition of 229 microsatellite marker loci to the Molecular Ecology Resources Database. Loci were developed for the following species: Acacia auriculiformis x Acacia mangium hybrid, Alabama argillacea, Anoplopoma fimbria, Aplochiton zebra, Brevicoryne brassicae, Bruguiera gymnorhiza, Bucorvus leadbeateri, Delphacodes detecta, Tumidagena minuta, Dictyostelium giganteum, Echinogammarus berilloni, Epimedium sagittatum, Fraxinus excelsior, Labeo chrysophekadion, Oncorhynchus clarki lewisi, Paratrechina longicornis, Phaeocystis antarctica, Pinus roxburghii and Potamilus capax. These loci were cross-tested on the following species: Acacia peregrinalis, Acacia crassicarpa, Bruguiera cylindrica, Delphacodes detecta, Tumidagena minuta, Dictyostelium macrocephalum, Dictyostelium discoideum, Dictyostelium purpureum, Dictyostelium mucoroides, Dictyostelium rosarium, Polysphondylium pallidum, Epimedium brevicornum, Epimedium koreanum, Epimedium pubescens, Epimedium wushanese and Fraxinus angustifolia.
Resumo:
Much information on flavonoid content of Brazilian foods has already been obtained; however, this information is spread in scientific publications and non-published data. The objectives of this work were to compile and evaluate the quality of national flavonoid data according to the United States Department of Agriculture`s Data Quality Evaluation System (USDA-DQES) with few modifications, for future dissemination in the TBCA-USP (Brazilian Food Composition Database). For the compilation, the most abundant compounds in the flavonoid subclasses were considered (flavonols, flavones, isoflavones, flavanones, flavan-3-ols, and anthocyanidins) and the analysis of the compounds by HPLC was adopted as criteria for data inclusion. The evaluation system considers five categories, and the maximum score assigned to each category is 20. For each data, a confidence code (CC) was attributed (A, B, C and D), indicating the quality and reliability of the information. Flavonoid data (773) present in 197 Brazilian foods were evaluated. The CC ""C"" (as average) was attributed to 99% of the data and ""B"" (above average) to 1%. The main categories assigned low average scores were: number of samples; sampling plan and analytical quality control (average scores 2, 5 and 4, respectively). The analytical method category received an average score of 9. The category assigned the highest score was the sample handling (20 average). These results show that researchers need to be conscious about the importance of the number and plan of evaluated samples and the complete description and documentation of all the processes of methodology execution and analytical quality control. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
Foods that contain unavailable carbohydrates may lower the risks for some non-transmissible chronic diseases because of the potential benefits provided by the products of colonic fermentation. On the other hand, foods that are sources of available carbohydrates may have higher energy value and increase the post-prandial glycemic response. The biomarker glycemic index and the resulting glycemic load may be used to classify foods according to their potential to increase blood glucose. Information about glycemic index and glycemic load may be useful in diet therapy. Currently, food composition tables in Brazil do not provide data for individually analyzed carbohydrates even though some quality data are available in scientific publications. The objectives of this work were to produce and compile information about the concentration of individual carbohydrates in foods and their glycemic responses and to disseminate this information through the Brazilian Food Composition Database (TBCA-USP). The glycemic index and glycemic load of foods were evaluated in healthy individuals. Concentrations of available carbohydrates (soluble sugars and available starch) and unavailable carbohydrates (dietary fiber, resistant starch, beta-glucans, fructans) were quantified by official methods, and other national data were compiled. TBCA-USP (http://www.fcf.usp.br/tabela), which is used by professionals and the population in general, now offers both chemical and biological information for carbohydrates. (C) 2009 Elsevier Inc. All rights reserved.