10 resultados para World Wide Web (WWW)
em Helda - Digital Repository of University of Helsinki
Resumo:
Tämä tutkielma käsittelee World Wide Webin sisältämien verkkosivujen sisältöjen käyttöä korpusmaisesti kielitieteellisenä tutkimusaineistona. World Wide Web sisältää moninkertaisesti enemmän tekstiä kuin suurimmat olemassa olevat perinteiset tekstikorpukset, joten verkkosivuilta voi todennäköisesti löytää paljon esiintymiä sellaisista sanoista ja rakenteista, jotka ovat perinteisissä korpuksissa harvinaisia. Verkkosivuja voidaan käyttää aineistona kahdella eri tavalla: voidaan kerätä satunnainen otos verkkosivuista ja luoda itsenäinen korpus niiden sisällöistä, tai käyttää koko World Wide Webiä korpuksena verkkohakukoneiden kautta. Verkkosivuja on käytetty tutkimusaineistona monilla eri kielitieteen aloilla, kuten leksikograafisessa tutkimuksessa, syntaktisten rakenteiden tutkimuksessa, pedagogisena materiaalina ja vähemmistökielten tutkimuksessa. Verkkosivuilla on perinteisiin korpuksiin verrattuna useita haitallisia ominaisuuksia, jotka pitää ottaa huomioon, kun niitä käytetään aineistona. Kaikki sivut eivät sisällä kelvollista tekstiä, ja sivut ovat usein esimerkiksi HTML-muotoisia, jolloin ne pitää muuttaa helpommin käsiteltävissä olevaan muotoon. Verkkosivut sisältävät enemmän kielellisiä virheitä kuin perinteiset korpukset, ja niiden tekstityypit ja aihepiirit ovat runsaslukuisempia kuin perinteisten korpusten. Aineiston keräämiseen verkkosivuilta tarvitaan tehokkaita ohjelmatyökaluja. Näistä yleisimpiä ovat kaupalliset verkkohakukoneet, joiden kautta on mahdollista päästä nopeasti käsiksi suureen määrään erilaisia sivuja. Näiden lisäksi voidaan käyttää erityisesti kielitieteellisiin tarpeisiin kehitettyjä työkaluja. Tässä tutkielmassa esitellään ohjelmatyökalut WebCorp, WebAsCorpus.org, BootCaT ja Web as Corpus Toolkit, joiden avulla voi hakea aineistoa verkkosivuilta nimenomaan kielitieteellisiin tarkoituksiin.
Resumo:
Tutkimuksen tarkoituksena on selvittää, millaisia tiedonhakustrategioita tiedonhakijatkäyttävät etsiessään tietoa Internetistä. Käyttäjät luokitellaan kolmeen ryhmään tiedonhakustrategiansa mukaan. Haku-suuntautuneet käyttäjät käyttävät enimmäkseen hakukoneita, niin koko Internetin kattavia kuin sivustojen sisäisiäkin. Linkkisuuntautuneet taas joko tietävät tai arvaavat kohdesivuston osoitteen tai käyttävät laajoja hierarkkisia hakemistoja tiedon löytämiseen. He navigoivat mielummin sivustollakin linkkejä käyttäen eivätkä yleensä käytä hakutoimintoa. Eriytyneet käyttäjät eivät säännönmukaisesti suosi kumpaakaan tapaa, vaan valitsevat strategian tehtävän mukaan. Tietoa kerättiin kahdella tavalla: WWW-sivulla olleella kyselylomakkeella ja tiedonhakutestillä, jossa käyttäjille annettiin suoritettavaksi erilaisia tiedonhakutehtäviä. Tiedonhakutehtävät lajiteltiin kolmeen ryhmään sen mukaan, mitä strategiaa ne suosivat: hakustrategiaa suosivat, linkkistrategiaa suosivat ja neutraalit tehtävät. Tutkimusongelmana oli selvittää, kuinka tehtävän tyyppi ja ATK- ja Internet-kokemus vaikuttavat tiedonhakustrategian valintaan. Kävi ilmi, ettei käyttäjien suuntautuneisuus tiettyyn strategiaan vaikuta tiedonhakustrategian valintaan, vaan ainoastaan tehtävän tyyppi oli merkitsevä tekijä. Aikaisemman tutkimustiedon valossa kokeenet suosivat haku-suuntautunutta strategiaa. Tässä tutkimuksessa havaittiin, että kokemus lisäsi molempien strategioiden käyttöä yhtäläisesti, mutta tämä ilmiö oli havaittavissa ainoastaan kysely-lomakkeen pohjalta, ei testeissä. Molempien tiedonhakustrategioiden käyttö lisääntyy kokemuksen myötä, mutta suhteelliset osuudet pysyvät samoina. Syyksi sille, että kokeneet eivät suosineet hakustrategiaa, esitetään sitä, että tehtävät olivat liian helppoja, jolloin kokemus ei pääse auttamaan. Oleellisia eroja suoritusajoissa tai hakustrategian vaihdon tiheydessä ei havaittu suhteessa kokemukseen, ainoastaan suhteessa tehtävän tyyppiin.Tämäkin selitettiin toisentyyppisten tehtävien helppoudella. Tutkimuksessa pohditaan lisäksi asiantuntijuuden syntyä tiedonhakukontekstissa sekä esitetään metatietohypoteesi, jonka mukaan tiedonhakustrategian valintaan vaikuttaa tärkeänä tekijänä käyttäjän metatieto hakupalveluista. Metatietoon kuuluu tieto siitä, mitä hakukoneita on saatavilla, mitä tietoa verkosta kannattaa hakea, millä yrityksillä ja yhteisöillä on sisältörikkaat sivut jne, ja minkä tyyppistä tietoa yleensä on saatavilla. Kaikenkaikkiaan strategian valintaan esitetään taustalle kolmen tason tiedon vaikutusta: 1) oma asiantuntemus haettavasta alasta, 2) metatieto Internetin tiedonhakupalveluista sekä 3) tekninen tieto siitä, kuinka hakukoneet toimivat. Avainsanat: tiedonhaku, tiedonhakustrategia, hakukone, WWW, metatieto, kognitiivinen psykologia
Resumo:
The study of social phenomena in the World Wide Web has been rather fragmentary, andthere is no coherent, reseach-based theory about sense of community in Web environment. Sense of community means part of one's self-concept that has to do with perceiving oneself belonging to, and feeling affinity to a certain social grouping. The present study aimed to find evidence for sense of community in Web environment, and specifically find out what the most critical psychological factors of sense of community would be. Based on known characteristics of real life communities and sense of community, and few occational studies of Web-communities, it was hypothesized that the following factors would be the most critical ones and that they could be grouped as prerequisites, facilitators and consequences of sense of community: awareness and social presence (prerequisites), criteria for membership and borders, common purpose, social interaction and reciprocity, norms and conformity, common history (facilitators), trust and accountability (consequences). In addition to critical factors, the present study aimed to find out if this kind of grouping would be valid. Furthermore, the effect of Web-community members' background variables to sense of community was of interest. In order to answer the questions, an online-questionnaire was created and tested. It included propositions that reflect factors that precede, facilitate and follow the sense of community in Web environment. A factor analysis was calculated to find out the critical factors and analyses of variance were calculated to see if the grouping to prerequisites, facilitators and consequences was right and how the background variables would affect the sense of community in Web environment. The results indicated that the psychological structure of sense of community in Web environment could not be presented with critical variables grouped as prerequisites, facilitators and consequences. Most factors did facilitate the sense of community, but based on this data it could not be argued that some of the factors chronologically precedesense of community and some follow it. Instead, the factor analysis revealed that the most critical factors in sense of community in Web environment are 1) reciprocal involvement, 2) basic trust for others, 3) similarity and common purpose of members, and 4) shared history of members. The most influencing background variables were the member's own participation activity (indicated with reading and writing messages) and the phase in membership lifecycle (from visitor to leader). The more the member participated and the further in membership life cycle he was, the more he felt sense of community. There are many descreptions of sense of community, but the present study was one of the first to actually measure the phenomenon in Web environment, and that gained well documented, valid results based on large data, proving that sense of community in Web environment is possible, and clarifying its psychological structure, thus enhancing the understanding of sense of community in Web environment. Keywords: sense of community, Web-community, psychology of the Internet
Resumo:
The TCP protocol is used by most Internet applications today, including the recent mobile wireless terminals that use TCP for their World-Wide Web, E-mail and other traffic. The recent wireless network technologies, such as GPRS, are known to cause delay spikes in packet transfer. This causes unnecessary TCP retransmission timeouts. This dissertation proposes a mechanism, Forward RTO-Recovery (F-RTO) for detecting the unnecessary TCP retransmission timeouts and thus allow TCP to take appropriate follow-up actions. We analyze a Linux F-RTO implementation in various network scenarios and investigate different alternatives to the basic algorithm. The second part of this dissertation is focused on quickly adapting the TCP's transmission rate when the underlying link characteristics change suddenly. This can happen, for example, due to vertical hand-offs between GPRS and WLAN wireless technologies. We investigate the Quick-Start algorithm that, in collaboration with the network routers, aims to quickly probe the available bandwidth on a network path, and allow TCP's congestion control algorithms to use that information. By extensive simulations we study the different router algorithms and parameters for Quick-Start, and discuss the challenges Quick-Start faces in the current Internet. We also study the performance of Quick-Start when applied to vertical hand-offs between different wireless link technologies.
Resumo:
Electronic document management (EDM) technology has the potential to enhance the information management in construction projects considerably, without radical changes to current practice. Over the past fifteen years this topic has been overshadowed by building product modelling in the construction IT research world, but at present EDM is quickly being introduced in practice, in particular in bigger projects. Often this is done in the form of third party services available over the World Wide Web. In the paper, a typology of research questions and methods is presented, which can be used to position the individual research efforts which are surveyed in the paper. Questions dealt with include: What features should EMD systems have? How much are they used? Are there benefits from use and how should these be measured? What are the barriers to wide-spread adoption? Which technical questions need to be solved? Is there scope for standardisation? How will the market for such systems evolve?
Resumo:
The TOTEM experiment at the LHC will measure the total proton-proton cross-section with a precision better than 1%, elastic proton scattering over a wide range in momentum transfer -t= p^2 theta^2 up to 10 GeV^2 and diffractive dissociation, including single, double and central diffraction topologies. The total cross-section will be measured with the luminosity independent method that requires the simultaneous measurements of the total inelastic rate and the elastic proton scattering down to four-momentum transfers of a few 10^-3 GeV^2, corresponding to leading protons scattered in angles of microradians from the interaction point. This will be achieved using silicon microstrip detectors, which offer attractive properties such as good spatial resolution (<20 um), fast response (O(10ns)) to particles and radiation hardness up to 10^14 "n"/cm^2. This work reports about the development of an innovative structure at the detector edge reducing the conventional dead width of 0.5-1 mm to 50-60 um, compatible with the requirements of the experiment.
Resumo:
The World Wide Web provides the opportunity for a radically changed and much more efficient communication process for scientific results. A survey in the closely related domains of construction information technology and construction management was conducted in February 2000, aimed at measuring to what extent these opportunities are already changing the scientific information exchange and how researchers feel about the changes. The paper presents the results based on 236 replies to an extensive Web based questionnaire. 65% of the respondents stated their primary research interest as IT in A/E/C and 20% as construction management and economics. The questions dealt with how researchers find, access and read different sources; how much and what publications they read; how often and to which conferences they travel; how much they publish, and what are the criteria for where they eventually decide to publish. Some of the questions confronted traditional and electronic publishing with one final section dedicated to opinions about electronic publishing. According to the survey researchers already download half of the material that they read digitally from the Web. The most popular method for retrieving an interesting publication is downloading it for free from the author’s or publisher’s website. Researchers are not particularly willing to pay for electronic scientific publications. There is much support for a scenario of electronic journals available totally freely on the Web, where the costs could be covered by for instance professional societies or the publishing university. The shift that the Web is causing seems to be towards the "just in time" reading of literature. Also, frequent users of the Web rely less on scientific publications and tend to read fewer articles. If available with little effort, papers published in traditional journals are preferred; if not, the papers should be on the Web. In these circumstances, the role of paper-based journals published by established publishers is shifting from the core "information exchange" to the building of authors' prestige. The respondents feel they should build up their reputations by publishing in journals and relevant conferences, but then make their work freely available on the Web.
Resumo:
During the past ten years, large-scale transcript analysis using microarrays has become a powerful tool to identify and predict functions for new genes. It allows simultaneous monitoring of the expression of thousands of genes and has become a routinely used tool in laboratories worldwide. Microarray analysis will, together with other functional genomics tools, take us closer to understanding the functions of all genes in genomes of living organisms. Flower development is a genetically regulated process which has mostly been studied in the traditional model species Arabidopsis thaliana, Antirrhinum majus and Petunia hybrida. The molecular mechanisms behind flower development in them are partly applicable in other plant systems. However, not all biological phenomena can be approached with just a few model systems. In order to understand and apply the knowledge to ecologically and economically important plants, other species also need to be studied. Sequencing of 17 000 ESTs from nine different cDNA libraries of the ornamental plant Gerbera hybrida made it possible to construct a cDNA microarray with 9000 probes. The probes of the microarray represent all different ESTs in the database. From the gerbera ESTs 20% were unique to gerbera while 373 were specific to the Asteraceae family of flowering plants. Gerbera has composite inflorescences with three different types of flowers that vary from each other morphologically. The marginal ray flowers are large, often pigmented and female, while the central disc flowers are smaller and more radially symmetrical perfect flowers. Intermediate trans flowers are similar to ray flowers but smaller in size. This feature together with the molecular tools applied to gerbera, make gerbera a unique system in comparison to the common model plants with only a single kind of flowers in their inflorescence. In the first part of this thesis, conditions for gerbera microarray analysis were optimised including experimental design, sample preparation and hybridization, as well as data analysis and verification. Moreover, in the first study, the flower and flower organ-specific genes were identified. After the reliability and reproducibility of the method were confirmed, the microarrays were utilized to investigate transcriptional differences between ray and disc flowers. This study revealed novel information about the morphological development as well as the transcriptional regulation of early stages of development in various flower types of gerbera. The most interesting finding was differential expression of MADS-box genes, suggesting the existence of flower type-specific regulatory complexes in the specification of different types of flowers. The gerbera microarray was further used to profile changes in expression during petal development. Gerbera ray flower petals are large, which makes them an ideal model to study organogenesis. Six different stages were compared and specifically analysed. Expression profiles of genes related to cell structure and growth implied that during stage two, cells divide, a process which is marked by expression of histones, cyclins and tubulins. Stage 4 was found to be a transition stage between cell division and expansion and by stage 6 cells had stopped division and instead underwent expansion. Interestingly, at the last analysed stage, stage 9, when cells did not grow any more, the highest number of upregulated genes was detected. The gerbera microarray is a fully-functioning tool for large-scale studies of flower development and correlation with real-time RT-PCR results show that it is also highly sensitive and reliable. Gene expression data presented here will be a source for gene expression mining or marker gene discovery in the future studies that will be performed in the Gerbera Laboratory. The publicly available data will also serve the plant research community world-wide.
Resumo:
The Transition Radiation Tracker (TRT) of the ATLAS experiment at the LHC is part of the Inner Detector. It is designed as a robust and powerful gaseous detector that provides tracking through individual drift-tubes (straws) as well as particle identification via transition radiation (TR) detection. The straw tubes are operated with Xe-CO2-O2 70/27/3, a gas that combines the advantages of efficient TR absorption, a short electron drift time and minimum ageing effects. The modules of the barrel part of the TRT were built in the United States while the end-cap wheels are assembled at two Russian institutes. Acceptance tests of barrel modules and end-cap wheels are performed at CERN before assembly and integration with the Semiconductor Tracker (SCT) and the Pixel Detector. This thesis first describes simulations the TRT straw tube. The argon-based acceptance gas mixture as well as two xenon-based operating gases are examined for its properties. Drift velocities and Townsend coefficients are computed with the help of the program Magboltz and used to study electron drift and multiplication in the straw using the software Garfield. The inclusion of Penning transfers in the avalanche process leads to remarkable agreements with experimental data. A high level of cleanliness in the TRT s acceptance test gas system is indispensable. To monitor gas purity, a small straw tube detector has been constructed and extensively used to study the ageing behaviour of the straw tube in Ar-CO2. A variety of ageing tests are presented and discussed. Acceptance tests for the TRT survey dimensions, wire tension, gas-tightness, high-voltage stability and gas gain uniformity along each individual straw. The thesis gives details on acceptance criteria and measurement methods in the case of the end-cap wheels. Special focus is put on wire tension and straw straightness. The effect of geometrically deformed straws on gas gain and energy resolution is examined in an experimental setup and compared to simulation studies. An overview of the most important results from the end-cap wheels tested up to this point is presented.
Resumo:
Introduction. We estimate the total yearly volume of peer-reviewed scientific journal articles published world-wide as well as the share of these articles available openly on the Web either directly or as copies in e-print repositories. Method. We rely on data from two commercial databases (ISI and Ulrich's Periodicals Directory) supplemented by sampling and Google searches. Analysis. A central issue is the finding that ISI-indexed journals publish far more articles per year (111) than non ISI-indexed journals (26), which means that the total figure we obtain is much lower than many earlier estimates. Our method of analysing the number of repository copies (green open access) differs from several earlier studies which have studied the number of copies in identified repositories, since we start from a random sample of articles and then test if copies can be found by a Web search engine. Results. We estimate that in 2006 the total number of articles published was approximately 1,350,000. Of this number 4.6% became immediately openly available and an additional 3.5% after an embargo period of, typically, one year. Furthermore, usable copies of 11.3% could be found in subject-specific or institutional repositories or on the home pages of the authors. Conclusions. We believe our results are the most reliable so far published and, therefore, should be useful in the on-going debate about Open Access among both academics and science policy makers. The method is replicable and also lends itself to longitudinal studies in the future.