9 resultados para Latent semantic indexing

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Internet on elektronisen postin perusrakenne ja ollut tärkeä tiedonlähde akateemisille käyttäjille jo pitkään. Siitä on tullut merkittävä tietolähde kaupallisille yrityksille niiden pyrkiessä pitämään yhteyttä asiakkaisiinsa ja seuraamaan kilpailijoitansa. WWW:n kasvu sekä määrällisesti että sen moninaisuus on luonut kasvavan kysynnän kehittyneille tiedonhallintapalveluille. Tällaisia palveluja ovet ryhmittely ja luokittelu, tiedon löytäminen ja suodattaminen sekä lähteiden käytön personointi ja seuranta. Vaikka WWW:stä saatavan tieteellisen ja kaupallisesti arvokkaan tiedon määrä on huomattavasti kasvanut viime vuosina sen etsiminen ja löytyminen on edelleen tavanomaisen Internet hakukoneen varassa. Tietojen hakuun kohdistuvien kasvavien ja muuttuvien tarpeiden tyydyttämisestä on tullut monimutkainen tehtävä Internet hakukoneille. Luokittelu ja indeksointi ovat merkittävä osa luotettavan ja täsmällisen tiedon etsimisessä ja löytämisessä. Tämä diplomityö esittelee luokittelussa ja indeksoinnissa käytettävät yleisimmät menetelmät ja niitä käyttäviä sovelluksia ja projekteja, joissa tiedon hakuun liittyvät ongelmat on pyritty ratkaisemaan.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the present dissertation, multilingual thesauri were approached as cultural products and the focus was twofold: On the empirical level the focus was placed on the translatability of certain British-English social science indexing terms into the Finnish language and culture at a concept, a term and an indexing term level. On the theoretical level the focus was placed on the aim of translation and on the concept of equivalence. In accordance with modern communicative and dynamic translation theories the interest was on the human dimension. The study is qualitative. In this study, equivalence was understood in a similar way to how dynamic, functional equivalence is commonly understood in translation studies. Translating was seen as a decision-making process, where a translator often has different kinds of possibilities to choose in order to fulfil the function of the translation. Accordingly, and as a starting point for the construction of the empirical part, the function of the source text was considered to be the same or similar to the function of the target text, that is, a functional thesaurus both in source and target context. Further, the study approached the challenges of multilingual thesaurus construction from the perspectives of semantics and pragmatics. In semantic analysis the focus was on what the words conventionally mean and in pragmatics on the ‘invisible’ meaning - or how we recognise what is meant even when it is not actually said (or written). Languages and ideas expressed by languages are created mainly in accordance with expressional needs of the surrounding culture and thesauri were considered to reflect several subcultures and consequently the discourses which represent them. The research material consisted of different kinds of potential discourses: dictionaries, database records, and thesauri, Finnish versus British social science researches, Finnish versus British indexers, simulated indexing tasks with five articles and Finnish versus British thesaurus constructors. In practice, the professional background of the two last mentioned groups was rather similar. It became even more clear that all the material types had their own characteristics, although naturally not entirely separate from each other. It is further noteworthy that the different types and origins of research material were not used to represent true comparison pairs, and that the aim of triangulation of methods and material was to gain a holistic view. The general research questions were: 1. Can differences be found between Finnish and British discourses regarding family roles as thesaurus terms, and if so, what kinds of differences and which are the implications for multilingual thesaurus construction? 2. What is the pragmatic indexing term equivalence? The first question studied how the same topic (family roles) was represented in different contexts and by different users, and further focused on how the possible differences were handled in multilingual thesaurus construction. The second question was based on findings of the previous one, and answered to the final question as to what kinds of factors should be considered when defining translation equivalence in multilingual thesaurus construction. The study used multiple cases and several data collection and analysis methods aiming at theoretical replication and complementarity. The empirical material and analysis consisted of focused interviews (with Finnish and British social scientists, thesaurus constructors and indexers), simulated indexing tasks with Finnish and British indexers, semantic component analysis of dictionary definitions and translations, coword analysis and datasets retrieved in databases, and discourse analysis of thesauri. As a terminological starting point a topic and case family roles was selected. The results were clear: 1) It was possible to identify different discourses. There also existed subdiscourses. For example within the group of social scientists the orientation to qualitative versus quantitative research had an impact on the way they reacted to the studied words and discourses, and indexers placed more emphasis on the information seekers whereas thesaurus constructors approached the construction problems from a more material based solution. The differences between the different specialist groups i.e. the social scientists, the indexers and the thesaurus constructors were often greater than between the different geo-cultural groups i.e. Finnish versus British. The differences occurred as a result of different translation aims, diverging expectations for multilingual thesauri and variety of practices. For multilingual thesaurus construction this means severe challenges. The clearly ambiguous concept of multilingual thesaurus as well as different construction and translation strategies should be considered more precisely in order to shed light on focus and equivalence types, which are clearly not self-evident. The research also revealed the close connection between the aims of multilingual thesauri and the pragmatic indexing term equivalence. 2) The pragmatic indexing term equivalence is very much context-depended. Although thesaurus term equivalence is defined and standardised in the field of library and information science (LIS), it is not understood in one established way and the current LIS tools are inadequate to provide enough analytical tools for both constructing and studying different kinds of multilingual thesauri as well as their indexing term equivalence. The tools provided in translation science were more practical and theoretical, and especially the division of different meanings of a word provided a useful tool in analysing the pragmatic equivalence, which often differs from the ideal model represented in thesaurus construction literature. The study thus showed that the variety of different discourses should be acknowledged, there is a need for operationalisation of new types of multilingual thesauri, and the factors influencing pragmatic indexing term equivalence should be discussed more precisely than is traditionally done.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

"Helmiä sioille", pärlor för svin, säger man på finska om någonting bra och fint som tas emot av en mottagare som inte vill eller har ingen förmåga att förstå, uppskatta eller utnyttja hela den potential som finns hos det mottagna föremålet, är ointresserad av den eller gillar den inte. För sådana relativt stabila flerordiga uttryck, som är lagrade i språkbrukarnas minnen och som demonstrerar olika slags oregelbundna drag i sin struktur använder man inom lingvistiken bl.a. termerna "idiom" eller "fraseologiska enheter". Som en oregelbundenhet kan man t.ex. beskriva det faktum att betydelsen hos uttrycket inte är densamma som man skulle komma till ifall man betraktade det som en vanlig regelbunden fras. En annan oregelbundenhet, som idiomforskare har observerat, ligger i den begränsade förmågan att varieras i form och betydelse, som många idiom har jämfört med regelbundna fraser. Därför talas det ofta om "grundform" och "grundbetydelse" hos idiom och variationen avses som avvikelse från dessa. Men när man tittar på ett stort antal förekomstexempel av idiom i språkbruk, märker man att många av dem tillåter variation, t.o.m. i sådan utsträckning att gränserna mellan en variant och en "grundform" suddas ut, och istället för ett idiom råkar vi plötsligt på en "familj" av flera besläktade uttryck. Allt detta väcker frågan om hur dessa uttryck egentligen ska vara representerade i språket. I avhandlingen utförs en kritisk granskning av olika tidigare tillvägagångssätt att beskriva fraseologiska enheter i syfte att klargöra vilka svårigheter deras struktur och variation erbjuder för den lingvistiska teorin. Samtidigt presenteras ett alternativt sätt att beskriva dessa uttryck. En systematisk och formell modell som utvecklas i denna avhandling integrerar en beskrivning av idiom på många olika språkliga nivåer och skildrar deras variation i form av ett nätverk och som ett resultat av samspel mellan idiomets struktur och kontexter där det förekommer, samt av interaktion med andra fasta uttryck. Modellen bygger på en fördjupande, språkbrukbaserad analys av det finska idiomet "X HEITTÄÄ HELMIÄ SIOILLE" (X kastar pärlor för svin).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Search engine optimization & marketing is a set of processes widely used on websites to improve search engine rankings which generate quality web traffic and increase ROI. Content is the most important part of any website. CMS web development is now become very essential for most of organizations and online businesses to develop their online system and websites. Every online business using a CMS wants to get users (customers) to make profit and ROI. This thesis comprises a brief study of existing SEO methods, tools and techniques and how they can be implemented to optimize a content base website. In results, the study provides recommendations about how to use SEO methods; tools and techniques to optimize CMS based websites on major search engines. This study compares popular CMS systems like Drupal, WordPress and Joomla SEO features and how implementing SEO can be improved on these CMS systems. Having knowledge of search engine indexing and search engine working is essential for a successful SEO campaign. This work is a complete guideline for web developers or SEO experts who want to optimize a CMS based website on all major search engines.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Human activity recognition in everyday environments is a critical, but challenging task in Ambient Intelligence applications to achieve proper Ambient Assisted Living, and key challenges still remain to be dealt with to realize robust methods. One of the major limitations of the Ambient Intelligence systems today is the lack of semantic models of those activities on the environment, so that the system can recognize the speci c activity being performed by the user(s) and act accordingly. In this context, this thesis addresses the general problem of knowledge representation in Smart Spaces. The main objective is to develop knowledge-based models, equipped with semantics to learn, infer and monitor human behaviours in Smart Spaces. Moreover, it is easy to recognize that some aspects of this problem have a high degree of uncertainty, and therefore, the developed models must be equipped with mechanisms to manage this type of information. A fuzzy ontology and a semantic hybrid system are presented to allow modelling and recognition of a set of complex real-life scenarios where vagueness and uncertainty are inherent to the human nature of the users that perform it. The handling of uncertain, incomplete and vague data (i.e., missing sensor readings and activity execution variations, since human behaviour is non-deterministic) is approached for the rst time through a fuzzy ontology validated on real-time settings within a hybrid data-driven and knowledgebased architecture. The semantics of activities, sub-activities and real-time object interaction are taken into consideration. The proposed framework consists of two main modules: the low-level sub-activity recognizer and the high-level activity recognizer. The rst module detects sub-activities (i.e., actions or basic activities) that take input data directly from a depth sensor (Kinect). The main contribution of this thesis tackles the second component of the hybrid system, which lays on top of the previous one, in a superior level of abstraction, and acquires the input data from the rst module's output, and executes ontological inference to provide users, activities and their in uence in the environment, with semantics. This component is thus knowledge-based, and a fuzzy ontology was designed to model the high-level activities. Since activity recognition requires context-awareness and the ability to discriminate among activities in di erent environments, the semantic framework allows for modelling common-sense knowledge in the form of a rule-based system that supports expressions close to natural language in the form of fuzzy linguistic labels. The framework advantages have been evaluated with a challenging and new public dataset, CAD-120, achieving an accuracy of 90.1% and 91.1% respectively for low and high-level activities. This entails an improvement over both, entirely data-driven approaches, and merely ontology-based approaches. As an added value, for the system to be su ciently simple and exible to be managed by non-expert users, and thus, facilitate the transfer of research to industry, a development framework composed by a programming toolbox, a hybrid crisp and fuzzy architecture, and graphical models to represent and con gure human behaviour in Smart Spaces, were developed in order to provide the framework with more usability in the nal application. As a result, human behaviour recognition can help assisting people with special needs such as in healthcare, independent elderly living, in remote rehabilitation monitoring, industrial process guideline control, and many other cases. This thesis shows use cases in these areas.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this thesis, we propose to infer pixel-level labelling in video by utilising only object category information, exploiting the intrinsic structure of video data. Our motivation is the observation that image-level labels are much more easily to be acquired than pixel-level labels, and it is natural to find a link between the image level recognition and pixel level classification in video data, which would transfer learned recognition models from one domain to the other one. To this end, this thesis proposes two domain adaptation approaches to adapt the deep convolutional neural network (CNN) image recognition model trained from labelled image data to the target domain exploiting both semantic evidence learned from CNN, and the intrinsic structures of unlabelled video data. Our proposed approaches explicitly model and compensate for the domain adaptation from the source domain to the target domain which in turn underpins a robust semantic object segmentation method for natural videos. We demonstrate the superior performance of our methods by presenting extensive evaluations on challenging datasets comparing with the state-of-the-art methods.