25 resultados para Processing wikipedia data
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Because of the increased availability of different kind of business intelligence technologies and tools it can be easy to fall in illusion that new technologies will automatically solve the problems of data management and reporting of the company. The management is not only about management of technology but also the management of processes and people. This thesis is focusing more into traditional data management and performance management of production processes which both can be seen as a requirement for long lasting development. Also some of the operative BI solutions are considered in the ideal state of reporting system. The objectives of this study are to examine what requirements effective performance management of production processes have for data management and reporting of the company and to see how they are effecting on the efficiency of it. The research is executed as a theoretical literary research about the subjects and as a qualitative case study about reporting development project of Finnsugar Ltd. The case study is examined through theoretical frameworks and by the active participant observation. To get a better picture about the ideal state of reporting system simple investment calculations are performed. According to the results of the research, requirements for effective performance management of production processes are automation in the collection of data, integration of operative databases, usage of efficient data management technologies like ETL (Extract, Transform, Load) processes, data warehouse (DW) and Online Analytical Processing (OLAP) and efficient management of processes, data and roles.
Resumo:
The general aim of the thesis was to study university students’ learning from the perspective of regulation of learning and text processing. The data were collected from the two academic disciplines of medical and teacher education, which share the features of highly scheduled study, a multidisciplinary character, a complex relationship between theory and practice and a professional nature. Contemporary information society poses new challenges for learning, as it is not possible to learn all the information needed in a profession during a study programme. Therefore, it is increasingly important to learn how to think and learn independently, how to recognise gaps in and update one’s knowledge and how to deal with the huge amount of constantly changing information. In other words, it is critical to regulate one’s learning and to process text effectively. The thesis comprises five sub-studies that employed cross-sectional, longitudinal and experimental designs and multiple methods, from surveys to eye tracking. Study I examined the connections between students’ study orientations and the ways they regulate their learning. In total, 410 second-, fourth- and sixth-year medical students from two Finnish medical schools participated in the study by completing a questionnaire measuring both general study orientations and regulation strategies. The students were generally deeply oriented towards their studies. However, they regulated their studying externally. Several interesting and theoretically reasonable connections between the variables were found. For instance, self-regulation was positively correlated with deep orientation and achievement orientation and was negatively correlated with non-commitment. However, external regulation was likewise positively correlated with deep orientation and achievement orientation but also with surface orientation and systematic orientation. It is argued that external regulation might function as an effective coping strategy in the cognitively loaded medical curriculum. Study II focused on medical students’ regulation of learning and their conceptions of the learning environment in an innovative medical course where traditional lectures were combined wth problem-based learning (PBL) group work. First-year medical and dental students (N = 153) completed a questionnaire assessing their regulation strategies of learning and views about the PBL group work. The results indicated that external regulation and self-regulation of the learning content were the most typical regulation strategies among the participants. In line with previous studies, self-regulation wasconnected with study success. Strictly organised PBL sessions were not considered as useful as lectures, although the students’ views of the teacher/tutor and the group were mainly positive. Therefore, developers of teaching methods are challenged to think of new solutions that facilitate reflection of one’s learning and that improve the development of self-regulation. In Study III, a person-centred approach to studying regulation strategies was employed, in contrast to the traditional variable-centred approach used in Study I and Study II. The aim of Study III was to identify different regulation strategy profiles among medical students (N = 162) across time and to examine to what extent these profiles predict study success in preclinical studies. Four regulation strategy profiles were identified, and connections with study success were found. Students with the lowest self-regulation and with an increasing lack of regulation performed worse than the other groups. As the person-centred approach enables us to individualise students with diverse regulation patterns, it could be used in supporting student learning and in facilitating the early diagnosis of learning difficulties. In Study IV, 91 student teachers participated in a pre-test/post-test design where they answered open-ended questions about a complex science concept both before and after reading either a traditional, expository science text or a refutational text that prompted the reader to change his/her beliefs according to scientific beliefs about the phenomenon. The student teachers completed a questionnaire concerning their regulation and processing strategies. The results showed that the students’ understanding improved after text reading intervention and that refutational text promoted understanding better than the traditional text. Additionally, regulation and processing strategies were found to be connected with understanding the science phenomenon. A weak trend showed that weaker learners would benefit more from the refutational text. It seems that learners with effective learning strategies are able to pick out the relevant content regardless of the text type, whereas weaker learners might benefit from refutational parts that contrast the most typical misconceptions with scientific views. The purpose of Study V was to use eye tracking to determine how third-year medical studets (n = 39) and internal medicine residents (n = 13) read and solve patient case texts. The results revealed differences between medical students and residents in processing patient case texts; compared to the students, the residents were more accurate in their diagnoses and processed the texts significantly faster and with a lower number of fixations. Different reading patterns were also found. The observed differences between medical students and residents in processing patient case texts could be used in medical education to model expert reasoning and to teach how a good medical text should be constructed. The main findings of the thesis indicate that even among very selected student populations, such as high-achieving medical students or student teachers, there seems to be a lot of variation in regulation strategies of learning and text processing. As these learning strategies are related to successful studying, students enter educational programmes with rather different chances of managing and achieving success. Further, the ways of engaging in learning seldom centre on a single strategy or approach; rather, students seem to combine several strategies to a certain degree. Sometimes, it can be a matter of perspective of which way of learning can be considered best; therefore, the reality of studying in higher education is often more complicated than the simplistic view of self-regulation as a good quality and external regulation as a harmful quality. The beginning of university studies may be stressful for many, as the gap between high school and university studies is huge and those strategies that were adequate during high school might not work as well in higher education. Therefore, it is important to map students’ learning strategies and to encourage them to engage in using high-quality learning strategies from the beginning. Instead of separate courses on learning skills, the integration of these skills into course contents should be considered. Furthermore, learning complex scientific phenomena could be facilitated by paying attention to high-quality learning materials and texts and other support from the learning environment also in the university. Eye tracking seems to have great potential in evaluating performance and growing diagnostic expertise in text processing, although more research using texts as stimulus is needed. Both medical and teacher education programmes and the professions themselves are challenging in terms of their multidisciplinary nature and increasing amounts of information and therefore require good lifelong learning skills during the study period and later in work life.
Resumo:
Kandidaatintyö on toteutettu kirjallisuuskatsauksena, jonka tavoitteena on selvittää data-analytiikan käyttökohteita ja datan hyödyntämisen vaikutusta liiketoimintaan. Työ käsittelee data-analytiikan käyttöä ja datan tehokkaan hyödyntämisen haasteita. Työ on rajattu tarkastelemaan yrityksen talouden ohjausta, jossa analytiikkaa käytetään johdon ja rahoituksen laskentatoimessa. Datan määrän eksponentiaalinen kasvunopeus luo data-analytiikan käytölle uusia haasteita ja mahdollisuuksia. Datalla itsessään ei kuitenkaan ole suurta arvoa yritykselle, vaan arvo syntyy prosessoinnin kautta. Vaikka data-analytiikkaa tutkitaan ja käytetään jo runsaasti, se tarjoaa paljon nykyisiä sovelluksia suurempia mahdollisuuksia. Yksi työn keskeisimmistä tuloksista on, että data-analytiikalla voidaan tehostaa johdon laskentatoimea ja helpottaa rahoituksen laskentatoimen tehtäviä. Tarjolla olevan datan määrä kasvaa kuitenkin niin nopeasti, että käytettävissä oleva teknologia ja osaamisen taso eivät pysy kehityksessä mukana. Varsinkin big datan laajempi käyttöönotto ja sen tehokas hyödyntäminen vaikuttavat jatkossa talouden ohjauksen käytäntöihin ja sovelluksiin yhä enemmän.
Resumo:
Diplomityö tarkastelee säikeistettyä ohjelmointia rinnakkaisohjelmoinnin ylemmällä hierarkiatasolla tarkastellen erityisesti hypersäikeistysteknologiaa. Työssä tarkastellaan hypersäikeistyksen hyviä ja huonoja puolia sekä sen vaikutuksia rinnakkaisalgoritmeihin. Työn tavoitteena oli ymmärtää Intel Pentium 4 prosessorin hypersäikeistyksen toteutus ja mahdollistaa sen hyödyntäminen, missä se tuo suorituskyvyllistä etua. Työssä kerättiin ja analysoitiin suorituskykytietoa ajamalla suuri joukko suorituskykytestejä eri olosuhteissa (muistin käsittely, kääntäjän asetukset, ympäristömuuttujat...). Työssä tarkasteltiin kahdentyyppisiä algoritmeja: matriisioperaatioita ja lajittelua. Näissä sovelluksissa on säännöllinen muistinkäyttökuvio, mikä on kaksiteräinen miekka. Se on etu aritmeettis-loogisissa prosessoinnissa, mutta toisaalta huonontaa muistin suorituskykyä. Syynä siihen on nykyaikaisten prosessorien erittäin hyvä raaka suorituskyky säännöllistä dataa käsiteltäessä, mutta muistiarkkitehtuuria rajoittaa välimuistien koko ja useat puskurit. Kun ongelman koko ylittää tietyn rajan, todellinen suorituskyky voi pudota murto-osaan huippusuorituskyvystä.
Resumo:
Coastal birds are an integral part of coastal ecosystems, which nowadays are subject to severe environmental pressures. Effective measures for the management and conservation of seabirds and their habitats call for insight into their population processes and the factors affecting their distribution and abundance. Central to national and international management and conservation measures is the availability of accurate data and information on bird populations, as well as on environmental trends and on measures taken to solve environmental problems. In this thesis I address different aspects of the occurrence, abundance, population trends and breeding success of waterbirds breeding on the Finnish coast of the Baltic Sea, and discuss the implications of the results for seabird monitoring, management and conservation. In addition, I assess the position and prospects of coastal bird monitoring data, in the processing and dissemination of biodiversity data and information in accordance with the Convention on Biological Diversity (CBD) and other national and international commitments. I show that important factors for seabird habitat selection are island area and elevation, water depth, shore openness, and the composition of island cover habitats. Habitat preferences are species-specific, with certain similarities within species groups. The occurrence of the colonial Arctic Tern (Sterna paradisaea) is partly affected by different habitat characteristics than its abundance. Using long-term bird monitoring data, I show that eutrophication and winter severity have reduced the populations of several Finnish seabird species. A major demographic factor through which environmental changes influence bird populations is breeding success. Breeding success can function as a more rapid indicator of sublethal environmental impacts than population trends, particularly for long-lived and slowbreeding species, and should therefore be included in coastal bird monitoring schemes. Among my target species, local breeding success can be shown to affect the populations of the Mallard (Anas platyrhynchos), the Eider (Somateria mollissima) and the Goosander (Mergus merganser) after a time lag corresponding to their species-specific recruitment age. For some of the target species, the number of individuals in late summer can be used as an easier and more cost-effective indicator of breeding success than brood counts. My results highlight that the interpretation and application of habitat and population studies require solid background knowledge of the ecology of the target species. In addition, the special characteristics of coastal birds, their habitats, and coastal bird monitoring data have to be considered in the assessment of their distribution and population trends. According to the results, the relationships between the occurrence, abundance and population trends of coastal birds and environmental factors can be quantitatively assessed using multivariate modelling and model selection. Spatial data sets widely available in Finland can be utilised in the calculation of several variables that are relevant to the habitat selection of Finnish coastal species. Concerning some habitat characteristics field work is still required, due to a lack of remotely sensed data or the low resolution of readily available data in relation to the fine scale of the habitat patches in the archipelago. While long-term data sets exist for water quality and weather, the lack of data concerning for instance the food resources of birds hampers more detailed studies of environmental effects on bird populations. Intensive studies of coastal bird species in different archipelago areas should be encouraged. The provision and free delivery of high-quality coastal data concerning bird populations and their habitats would greatly increase the capability of ecological modelling, as well as the management and conservation of coastal environments and communities. International initiatives that promote open spatial data infrastructures and sharing are therefore highly regarded. To function effectively, international information networks, such as the biodiversity Clearing House Mechanism (CHM) under the CBD, need to be rooted at regional and local levels. Attention should also be paid to the processing of data for higher levels of the information hierarchy, so that data are synthesized and developed into high-quality knowledge applicable to management and conservation.
Resumo:
Data management consists of collecting, storing, and processing the data into the format which provides value-adding information for decision-making process. The development of data management has enabled of designing increasingly effective database management systems to support business needs. Therefore as well as advanced systems are designed for reporting purposes, also operational systems allow reporting and data analyzing. The used research method in the theory part is qualitative research and the research type in the empirical part is case study. Objective of this paper is to examine database management system requirements from reporting managements and data managements perspectives. In the theory part these requirements are identified and the appropriateness of the relational data model is evaluated. In addition key performance indicators applied to the operational monitoring of production are studied. The study has revealed that the appropriate operational key performance indicators of production takes into account time, quality, flexibility and cost aspects. Especially manufacturing efficiency has been highlighted. In this paper, reporting management is defined as a continuous monitoring of given performance measures. According to the literature review, the data management tool should cover performance, usability, reliability, scalability, and data privacy aspects in order to fulfill reporting managements demands. A framework is created for the system development phase based on requirements, and is used in the empirical part of the thesis where such a system is designed and created for reporting management purposes for a company which operates in the manufacturing industry. Relational data modeling and database architectures are utilized when the system is built for relational database platform.
Resumo:
Energy scenarios are used as a tool to examine credible future states and pathways. The one who constructs a scenario defines the framework in which the possible outcomes exist. The credibility of a scenario depends on its compatibility with real world experiences, and on how well the general information of the study, methodology, and originality and processing of data are disclosed. In the thesis, selected global energy scenarios’ transparency and desirability from the society’s point of view were evaluated based on literature derived criteria. The global energy transition consists of changes to social conventions and economic development in addition to technological development. Energy solutions are economic and ethical choices due to far-reaching impacts of energy decision-making. Currently the global energy system is mostly based on fossil fuels, which is unsustainable over the long-term due to various reasons: negative climate change impacts, negative health impacts, depletion of fossil fuel reserves, resource-use conflicts with water management and food supply, loss of biodiversity, challenge to preserve ecosystems and resources for future generations, and inability of fossil fuels to provide universal access to modern energy services. Nuclear power and carbon capture and storage cannot be regarded as sustainable energy solutions due to their inherent risks and required long-term storage. The energy transition is driven by a growing energy demand, decreasing costs of renewables, modularity and scalability of renewable technologies, macroeconomic benefits of using renewables, investors’ risk awareness, renewable energy related attractive business opportunities, almost even distribution of solar and wind resources on the planet, growing awareness of the planet’s environmental status, environmental movements and tougher environmental legislation. Many of the investigated scenarios identified solar and wind power as a backbone for future energy systems. The scenarios, in which the solar and wind potentials were deployed in largest scale, met best the set out sustainability criteria. In future research, energy scenarios’ transparency can be improved by better disclosure on who has ordered the study, clarifying the funding, clearly referencing to used sources and indicating processed data, and by exploring how variations in cost assumptions and deployment of technologies influence on the outcomes of the study.
Resumo:
The Solar Intensity X-ray and particle Spectrometer (SIXS) on board BepiColombo's Mercury Planetary Orbiter (MPO) will study solar energetic particles moving towards Mercury and solar X-rays on the dayside of Mercury. The SIXS instrument consists of two detector sub-systems; X-ray detector SIXS-X and particle detector SIXS-P. The SIXS-P subdetector will detect solar energetic electrons and protons in a broad energy range using a particle telescope approach with five outer Si detectors around a central CsI(Tl) scintillator. The measurements made by the SIXS instrument are necessary for other instruments on board the spacecraft. SIXS data will be used to study the Solar X-ray corona, solar flares, solar energetic particles, the Hermean magnetosphere, and solar eruptions. The SIXS-P detector was calibrated by comparing experimental measurement data from the instrument with Geant4 simulation data. Calibration curves were produced for the different side detectors and the core scintillator for electrons and protons, respectively. The side detector energy response was found to be linear for both electrons and protons. The core scintillator energy response to protons was found to be non-linear. The core scintillator calibration for electrons was omitted due to insufficient experimental data. The electron and proton acceptance of the SIXS-P detector was determined with Geant4 simulations. Electron and proton energy channels are clean in the main energy range of the instrument. At higher energies, protons and electrons produce non-ideal response in the energy channels. Due to the limited bandwidth of the spacecraft's telemetry, the particle measurements made by SIXS-P have to be pre-processed in the data processing unit of the SIXS instrument. A lookup table was created for the pre-processing of data with Geant4 simulations, and the ability of the lookup table to provide spectral information from a simulated electron event was analysed. The lookup table produces clean electron and proton channels and is able to separate protons and electrons. Based on a simulated solar energetic electron event, the incident electron spectrum cannot be determined from channel particle counts with a standard analysis method.
Resumo:
Due to the sensitive nature of patient data, the secondary use of electronic health records (EHR) is restricted in scientific research and product development. Such restrictions pursue to preserve the privacy of respective patients by limiting the availability and variety of sensitive patient data. Current limitations do not correspond with the actual needs requested by the potential secondary users. In this thesis, the secondary use of Finnish and Swedish EHR data is explored for the purpose of enhancing the availability of such data for clinical research and product development. Involved EHR-related procedures and technologies are analysed to identify the issues limiting the secondary use of patient data. Successful secondary use of patient data increases the data value. To explore the identified circumstances, a case study of potential secondary users and use intentions regarding EHR data was carried out in Finland and Sweden. The data collection for the conducted case study was performed using semi-structured interviews. In total, 14 Finnish and Swedish experts representing scientific research, health management, and business were interviewed. The motivation for the corresponding interviews was to evaluate the protection of EHR data used for secondary purposes. The efficiency of implemented procedures and technologies was analysed in terms of data availability and privacy preserving. The results of the conducted case study show that the factors affecting EHR availability are divided to three categories: management of patient data, preservation of patients' privacy, and potential secondary users. Identified issues regarding data management included laborious and inconsistent data request procedures and the role and effect of external service providers. Based on the study findings, two secondary use approaches enabling the secondary use of EHR data are identified: data alteration and protected processing environment. Data alteration increases the availability of relevant EHR data, further decreasing the value of such data. Protected processing approach restricts the amount of potential users and use intentions while providing more valuable data content.
Resumo:
The main objective of this study was todo a statistical analysis of ecological type from optical satellite data, using Tipping's sparse Bayesian algorithm. This thesis uses "the Relevence Vector Machine" algorithm in ecological classification betweenforestland and wetland. Further this bi-classification technique was used to do classification of many other different species of trees and produces hierarchical classification of entire subclasses given as a target class. Also, we carried out an attempt to use airborne image of same forest area. Combining it with image analysis, using different image processing operation, we tried to extract good features and later used them to perform classification of forestland and wetland.
Resumo:
Operatiivisen tiedon tuottaminen loppukäyttäjille analyyttistä tarkastelua silmällä pitäen aiheuttaa ongelmia useille yrityksille. Diplomityö pyrkii ratkaisemaan ko. ongelman Teleste Oyj:ssä. Työ on jaettu kolmeen pääkappaleeseen. Kappale 2 selkiyttää On-Line Analytical Processing (OLAP)- käsitteen. Kappale 3 esittelee muutamia OLAP-tuotteiden valmistajia ja heidän arkkitehtuurejaan sekä tyypillisten sovellusalueiden lisäksi huomioon otettavia asioita OLAP käyttöönoton yhteydessä. Kappale 4, tuo esille varsinaisen ratkaisun. Teknisellä arkkitehtuurilla on merkittävä asema ratkaisun rakenteen kannalta. Tässä on sovellettu Microsoft:n tietovarasto kehysrakennetta. Kappaleen 4 edetessä, tapahtumakäsittelytieto muutetaan informaatioksi ja edelleen loppukäyttäjien tiedoksi. Loppukäyttäjät varustetaan tehokkaalla ja tosiaikaisella analysointityökalulla moniulotteisessa ympäristössä. Vaikka kiertonopeus otetaan työssä sovellusesimerkiksi, työ ei pyri löytämään optimaalista tasoa Telesten varastoille. Siitä huolimatta eräitä parannusehdotuksia mainitaan.
Resumo:
The globalization and development of an information society promptly change shape of the modern world. Cities and especially megacities including Saint-Petersburg are in the center of occuring changes. As a result of these changes the economic activities connected to reception and processing of the information now play very important role in economy of megacities what allows to characterize them as "information". Despite of wide experience in decision of information questions Russia, and in particular Saint-Petersburg, lag behind in development of information systems from the advanced European countries. The given master's thesis is devoted to development of an information system (data transmission network) on the basis of wireless technology in territory of Saint-Petersburg region within the framework of FTOP "Electronic Russia" and RTOP "Electronic Saint-Petersburg" programs. Logically the master's thesis can be divided into 3 parts: 1. The problems, purposes, expected results, terms and implementation of the "Electronic Russia" program. 2. Discussion about wireless data transmission networks (description of technology, substantiation of choice, description of signal's transmission techniques and types of network topology). 3. Fulfillment of the network (organization of central network node, regional centers, access lines, description of used equipment, network's capabilities), financial provision of the project, possible network management models.
Resumo:
Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.
Resumo:
The objectives of this research work “Identification of the Emerging Issues in Recycled Fiber processing” are discovering of emerging research issues and presenting of new approaches to identify promising research themes in recovered paper application and production. The projected approach consists of identifying technological problems often encountered in wastepaper preparation processes and also improving the quality of recovered paper and increasing its proportion in the composition of paper and board. The source of information for the problem retrieval is scientific publications in which waste paper application and production were discussed. The study has exploited several research methods to understand the changes related to utilization of recovered paper. The all assembled data was carefully studied and categorized by applying software called RefViz and CiteSpace. Suggestions were made on the various classes of these problems that need further investigation in order to propose an emerging research trends in recovered paper.