70 resultados para Data anonymization and sanitization
Resumo:
Acid sulfate (a.s.) soils constitute a major environmental issue. Severe ecological damage results from the considerable amounts of acidity and metals leached by these soils in the recipient watercourses. As even small hot spots may affect large areas of coastal waters, mapping represents a fundamental step in the management and mitigation of a.s. soil environmental risks (i.e. to target strategic areas). Traditional mapping in the field is time-consuming and therefore expensive. Additional more cost-effective techniques have, thus, to be developed in order to narrow down and define in detail the areas of interest. The primary aim of this thesis was to assess different spatial modeling techniques for a.s. soil mapping, and the characterization of soil properties relevant for a.s. soil environmental risk management, using all available data: soil and water samples, as well as datalayers (e.g. geological and geophysical). Different spatial modeling techniques were applied at catchment or regional scale. Two artificial neural networks were assessed on the Sirppujoki River catchment (c. 440 km2) located in southwestern Finland, while fuzzy logic was assessed on several areas along the Finnish coast. Quaternary geology, aerogeophysics and slope data (derived from a digital elevation model) were utilized as evidential datalayers. The methods also required the use of point datasets (i.e. soil profiles corresponding to known a.s. or non-a.s. soil occurrences) for training and/or validation within the modeling processes. Applying these methods, various maps were generated: probability maps for a.s. soil occurrence, as well as predictive maps for different soil properties (sulfur content, organic matter content and critical sulfide depth). The two assessed artificial neural networks (ANNs) demonstrated good classification abilities for a.s. soil probability mapping at catchment scale. Slightly better results were achieved using a Radial Basis Function (RBF) -based ANN than a Radial Basis Functional Link Net (RBFLN) method, narrowing down more accurately the most probable areas for a.s. soil occurrence and defining more properly the least probable areas. The RBF-based ANN also demonstrated promising results for the characterization of different soil properties in the most probable a.s. soil areas at catchment scale. Since a.s. soil areas constitute highly productive lands for agricultural purpose, the combination of a probability map with more specific soil property predictive maps offers a valuable toolset to more precisely target strategic areas for subsequent environmental risk management. Notably, the use of laser scanning (i.e. Light Detection And Ranging, LiDAR) data enabled a more precise definition of a.s. soil probability areas, as well as the soil property modeling classes for sulfur content and the critical sulfide depth. Given suitable training/validation points, ANNs can be trained to yield a more precise modeling of the occurrence of a.s. soils and their properties. By contrast, fuzzy logic represents a simple, fast and objective alternative to carry out preliminary surveys, at catchment or regional scale, in areas offering a limited amount of data. This method enables delimiting and prioritizing the most probable areas for a.s soil occurrence, which can be particularly useful in the field. Being easily transferable from area to area, fuzzy logic modeling can be carried out at regional scale. Mapping at this scale would be extremely time-consuming through manual assessment. The use of spatial modeling techniques enables the creation of valid and comparable maps, which represents an important development within the a.s. soil mapping process. The a.s. soil mapping was also assessed using water chemistry data for 24 different catchments along the Finnish coast (in all, covering c. 21,300 km2) which were mapped with different methods (i.e. conventional mapping, fuzzy logic and an artificial neural network). Two a.s. soil related indicators measured in the river water (sulfate content and sulfate/chloride ratio) were compared to the extent of the most probable areas for a.s. soils in the surveyed catchments. High sulfate contents and sulfate/chloride ratios measured in most of the rivers demonstrated the presence of a.s. soils in the corresponding catchments. The calculated extent of the most probable a.s. soil areas is supported by independent data on water chemistry, suggesting that the a.s. soil probability maps created with different methods are reliable and comparable.
Resumo:
Human activity recognition in everyday environments is a critical, but challenging task in Ambient Intelligence applications to achieve proper Ambient Assisted Living, and key challenges still remain to be dealt with to realize robust methods. One of the major limitations of the Ambient Intelligence systems today is the lack of semantic models of those activities on the environment, so that the system can recognize the speci c activity being performed by the user(s) and act accordingly. In this context, this thesis addresses the general problem of knowledge representation in Smart Spaces. The main objective is to develop knowledge-based models, equipped with semantics to learn, infer and monitor human behaviours in Smart Spaces. Moreover, it is easy to recognize that some aspects of this problem have a high degree of uncertainty, and therefore, the developed models must be equipped with mechanisms to manage this type of information. A fuzzy ontology and a semantic hybrid system are presented to allow modelling and recognition of a set of complex real-life scenarios where vagueness and uncertainty are inherent to the human nature of the users that perform it. The handling of uncertain, incomplete and vague data (i.e., missing sensor readings and activity execution variations, since human behaviour is non-deterministic) is approached for the rst time through a fuzzy ontology validated on real-time settings within a hybrid data-driven and knowledgebased architecture. The semantics of activities, sub-activities and real-time object interaction are taken into consideration. The proposed framework consists of two main modules: the low-level sub-activity recognizer and the high-level activity recognizer. The rst module detects sub-activities (i.e., actions or basic activities) that take input data directly from a depth sensor (Kinect). The main contribution of this thesis tackles the second component of the hybrid system, which lays on top of the previous one, in a superior level of abstraction, and acquires the input data from the rst module's output, and executes ontological inference to provide users, activities and their in uence in the environment, with semantics. This component is thus knowledge-based, and a fuzzy ontology was designed to model the high-level activities. Since activity recognition requires context-awareness and the ability to discriminate among activities in di erent environments, the semantic framework allows for modelling common-sense knowledge in the form of a rule-based system that supports expressions close to natural language in the form of fuzzy linguistic labels. The framework advantages have been evaluated with a challenging and new public dataset, CAD-120, achieving an accuracy of 90.1% and 91.1% respectively for low and high-level activities. This entails an improvement over both, entirely data-driven approaches, and merely ontology-based approaches. As an added value, for the system to be su ciently simple and exible to be managed by non-expert users, and thus, facilitate the transfer of research to industry, a development framework composed by a programming toolbox, a hybrid crisp and fuzzy architecture, and graphical models to represent and con gure human behaviour in Smart Spaces, were developed in order to provide the framework with more usability in the nal application. As a result, human behaviour recognition can help assisting people with special needs such as in healthcare, independent elderly living, in remote rehabilitation monitoring, industrial process guideline control, and many other cases. This thesis shows use cases in these areas.
Resumo:
Sales and operations research publications have increased significantly in the last decades. The concept of sales and operations planning (S&OP) has gained increased recognition and has been put forward as the area within Supply Chain Management (SCM). Development of S&OP is based on the need for determining future actions, both for sales and operations, since off-shoring, outsourcing, complex supply chains and extended lead times make challenges for responding to changes in the marketplace when they occur. Order intake of the case company has grown rapidly during the last years. Along with the growth, new challenges considering data management and information flow have arisen due to increasing customer orders. To manage these challenges, case company has implemented S&OP process, though initial process is in early stage and due to this, the process is not managing the increased customer orders adequately. Thesis objective is to explore extensively the S&OP process content of the case company and give further recommendations. Objectives are categorized into six different groups, to clarify the purpose of this thesis. Qualitative research methods used are active participant observation, qualitative interviews, enquiry, education, and a workshop. It is notable that demand planning was felt as cumbersome, so it is typically the biggest challenge in S&OP process. More proactive the sales forecasting can be, more expanded the time horizon of operational planning will turn out. S&OP process is 60 percent change management, 30 percent process development and 10 percent technology. The change management and continuous improvement can sometimes be arduous and set as secondary. It is important that different people are required to improve the process and the process is constantly evaluated. As well as, process governance is substantially in a central role and it has to be managed consciously. Generally, S&OP process was seen important and all the stakeholders were committed to the process. Particular sections were experienced more important than others, depending on the stakeholders’ point of views. Recommendations to objective groups are evaluated by the achievable benefit and resource requirement. The urgent and easily implemented improvement recommendations should be executed firstly. Next steps are to develop more coherent process structure and refine cost awareness. Afterwards demand planning, supply planning, and reporting should be developed more profoundly. For last, information technology system should be implemented to support the process phases.
Resumo:
The emerging technologies have recently challenged the libraries to reconsider their role as a mere mediator between the collections, researchers, and wider audiences (Sula, 2013), and libraries, especially the nationwide institutions like national libraries, haven’t always managed to face the challenge (Nygren et al., 2014). In the Digitization Project of Kindred Languages, the National Library of Finland has become a node that connects the partners to interplay and work for shared goals and objectives. In this paper, I will be drawing a picture of the crowdsourcing methods that have been established during the project to support both linguistic research and lingual diversity. The National Library of Finland has been executing the Digitization Project of Kindred Languages since 2012. The project seeks to digitize and publish approximately 1,200 monograph titles and more than 100 newspapers titles in various, and in some cases endangered Uralic languages. Once the digitization has been completed in 2015, the Fenno-Ugrica online collection will consist of 110,000 monograph pages and around 90,000 newspaper pages to which all users will have open access regardless of their place of residence. The majority of the digitized literature was originally published in the 1920s and 1930s in the Soviet Union, and it was the genesis and consolidation period of literary languages. This was the era when many Uralic languages were converted into media of popular education, enlightenment, and dissemination of information pertinent to the developing political agenda of the Soviet state. The ‘deluge’ of popular literature in the 1920s to 1930s suddenly challenged the lexical orthographic norms of the limited ecclesiastical publications from the 1880s onward. Newspapers were now written in orthographies and in word forms that the locals would understand. Textbooks were written to address the separate needs of both adults and children. New concepts were introduced in the language. This was the beginning of a renaissance and period of enlightenment (Rueter, 2013). The linguistically oriented population can also find writings to their delight, especially lexical items specific to a given publication, and orthographically documented specifics of phonetics. The project is financially supported by the Kone Foundation in Helsinki and is part of the Foundation’s Language Programme. One of the key objectives of the Kone Foundation Language Programme is to support a culture of openness and interaction in linguistic research, but also to promote citizen science as a tool for the participation of the language community in research. In addition to sharing this aspiration, our objective within the Language Programme is to make sure that old and new corpora in Uralic languages are made available for the open and interactive use of the academic community as well as the language societies. Wordlists are available in 17 languages, but without tokenization, lemmatization, and so on. This approach was verified with the scholars, and we consider the wordlists as raw data for linguists. Our data is used for creating the morphological analyzers and online dictionaries at the Helsinki and Tromsø Universities, for instance. In order to reach the targets, we will produce not only the digitized materials but also their development tools for supporting linguistic research and citizen science. The Digitization Project of Kindred Languages is thus linked with the research of language technology. The mission is to improve the usage and usability of digitized content. During the project, we have advanced methods that will refine the raw data for further use, especially in the linguistic research. How does the library meet the objectives, which appears to be beyond its traditional playground? The written materials from this period are a gold mine, so how could we retrieve these hidden treasures of languages out of the stack that contains more than 200,000 pages of literature in various Uralic languages? The problem is that the machined-encoded text (OCR) contains often too many mistakes to be used as such in research. The mistakes in OCRed texts must be corrected. For enhancing the OCRed texts, the National Library of Finland developed an open-source code OCR editor that enabled the editing of machine-encoded text for the benefit of linguistic research. This tool was necessary to implement, since these rare and peripheral prints did often include already perished characters, which are sadly neglected by the modern OCR software developers, but belong to the historical context of kindred languages and thus are an essential part of the linguistic heritage (van Hemel, 2014). Our crowdsourcing tool application is essentially an editor of Alto XML format. It consists of a back-end for managing users, permissions, and files, communicating through a REST API with a front-end interface—that is, the actual editor for correcting the OCRed text. The enhanced XML files can be retrieved from the Fenno-Ugrica collection for further purposes. Could the crowd do this work to support the academic research? The challenge in crowdsourcing lies in its nature. The targets in the traditional crowdsourcing have often been split into several microtasks that do not require any special skills from the anonymous people, a faceless crowd. This way of crowdsourcing may produce quantitative results, but from the research’s point of view, there is a danger that the needs of linguists are not necessarily met. Also, the remarkable downside is the lack of shared goal or the social affinity. There is no reward in the traditional methods of crowdsourcing (de Boer et al., 2012). Also, there has been criticism that digital humanities makes the humanities too data-driven and oriented towards quantitative methods, losing the values of critical qualitative methods (Fish, 2012). And on top of that, the downsides of the traditional crowdsourcing become more imminent when you leave the Anglophone world. Our potential crowd is geographically scattered in Russia. This crowd is linguistically heterogeneous, speaking 17 different languages. In many cases languages are close to extinction or longing for language revitalization, and the native speakers do not always have Internet access, so an open call for crowdsourcing would not have produced appeasing results for linguists. Thus, one has to identify carefully the potential niches to complete the needed tasks. When using the help of a crowd in a project that is aiming to support both linguistic research and survival of endangered languages, the approach has to be a different one. In nichesourcing, the tasks are distributed amongst a small crowd of citizen scientists (communities). Although communities provide smaller pools to draw resources, their specific richness in skill is suited for complex tasks with high-quality product expectations found in nichesourcing. Communities have a purpose and identity, and their regular interaction engenders social trust and reputation. These communities can correspond to research more precisely (de Boer et al., 2012). Instead of repetitive and rather trivial tasks, we are trying to utilize the knowledge and skills of citizen scientists to provide qualitative results. In nichesourcing, we hand in such assignments that would precisely fill the gaps in linguistic research. A typical task would be editing and collecting the words in such fields of vocabularies where the researchers do require more information. For instance, there is lack of Hill Mari words and terminology in anatomy. We have digitized the books in medicine, and we could try to track the words related to human organs by assigning the citizen scientists to edit and collect words with the OCR editor. From the nichesourcing’s perspective, it is essential that altruism play a central role when the language communities are involved. In nichesourcing, our goal is to reach a certain level of interplay, where the language communities would benefit from the results. For instance, the corrected words in Ingrian will be added to an online dictionary, which is made freely available for the public, so the society can benefit, too. This objective of interplay can be understood as an aspiration to support the endangered languages and the maintenance of lingual diversity, but also as a servant of ‘two masters’: research and society.
Resumo:
The whole research of the current Master Thesis project is related to Big Data transfer over Parallel Data Link and my main objective is to assist the Saint-Petersburg National Research University ITMO research team to accomplish this project and apply Green IT methods for the data transfer system. The goal of the team is to transfer Big Data by using parallel data links with SDN Openflow approach. My task as a team member was to compare existing data transfer applications in case to verify which results the highest data transfer speed in which occasions and explain the reasons. In the context of this thesis work a comparison between 5 different utilities was done, which including Fast Data Transfer (FDT), BBCP, BBFTP, GridFTP, and FTS3. A number of scripts where developed which consist of creating random binary data to be incompressible to have fair comparison between utilities, execute the Utilities with specified parameters, create log files, results, system parameters, and plot graphs to compare the results. Transferring such an enormous variety of data can take a long time, and hence, the necessity appears to reduce the energy consumption to make them greener. In the context of Green IT approach, our team used Cloud Computing infrastructure called OpenStack. It’s more efficient to allocated specific amount of hardware resources to test different scenarios rather than using the whole resources from our testbed. Testing our implementation with OpenStack infrastructure results that the virtual channel does not consist of any traffic and we can achieve the highest possible throughput. After receiving the final results we are in place to identify which utilities produce faster data transfer in different scenarios with specific TCP parameters and we can use them in real network data links.
Integration of marketing research data in new product development. Case study: Food industry company
Resumo:
The aim of this master’s thesis is to provide a real life example of how marketing research data is used by different functions in the NPD process. In order to achieve this goal, a case study in a company was implemented where gathering, analysis, distribution and synthesis of marketing research data in NPD were studied. The main research question was formulated as follows: How is marketing research data integrated and used by different company functions in the NPD process? The theory part of the master’s thesis was focused on the discussion of the marketing function role in NPD, use of marketing research particularly in the food industry, as well as issues related to the marketing/R&D interface during the NPD process. The empirical part of the master’s thesis was based on qualitative explanatory case study research. Individual in-depth interviews with company representatives, company documents and online research were used for data collection and analyzed through triangulation method. The empirical findings advocate that the most important marketing data sources at the concept generation stage of NPD are: global trends monitoring, retailing audit and consumers insights. These data sets are crucial for establishing the potential of the product on the market and defining the desired features for the new product to be developed. The findings also suggest the example of successful crossfunctional communication during the NPD process with formal and informal communication patterns. General managerial recommendations are given on the integration in NPD of a strategy, process, continuous improvement, and motivated cross-functional product development teams.
Resumo:
This thesis presented the overview of Open Data research area, quantity of evidence and establishes the research evidence based on the Systematic Mapping Study (SMS). There are 621 such publications were identified published between years 2005 and 2014, but only 243 were selected in the review process. This thesis highlights the implications of Open Data principals’ proliferation in the emerging era of the accessibility, reusability and sustainability of data transparency. The findings of mapping study are described in quantitative and qualitative measurement based on the organization affiliation, countries, year of publications, research method, star rating and units of analysis identified. Furthermore, units of analysis were categorized by development lifecycle, linked open data, type of data, technical platforms, organizations, ontology and semantic, adoption and awareness, intermediaries, security and privacy and supply of data which are important component to provide a quality open data applications and services. The results of the mapping study help the organizations (such as academia, government and industries), re-searchers and software developers to understand the existing trend of open data, latest research development and the demand of future research. In addition, the proposed conceptual framework of Open Data research can be adopted and expanded to strengthen and improved current open data applications.
Resumo:
In the new age of information technology, big data has grown to be the prominent phenomena. As information technology evolves, organizations have begun to adopt big data and apply it as a tool throughout their decision-making processes. Research on big data has grown in the past years however mainly from a technical stance and there is a void in business related cases. This thesis fills the gap in the research by addressing big data challenges and failure cases. The Technology-Organization-Environment framework was applied to carry out a literature review on trends in Business Intelligence and Knowledge management information system failures. A review of extant literature was carried out using a collection of leading information system journals. Academic papers and articles on big data, Business Intelligence, Decision Support Systems, and Knowledge Management systems were studied from both failure and success aspects in order to build a model for big data failure. I continue and delineate the contribution of the Information System failure literature as it is the principal dynamics behind technology-organization-environment framework. The gathered literature was then categorised and a failure model was developed from the identified critical failure points. The failure constructs were further categorized, defined, and tabulated into a contextual diagram. The developed model and table were designed to act as comprehensive starting point and as general guidance for academics, CIOs or other system stakeholders to facilitate decision-making in big data adoption process by measuring the effect of technological, organizational, and environmental variables with perceived benefits, dissatisfaction and discontinued use.
Resumo:
In the new age of information technology, big data has grown to be the prominent phenomena. As information technology evolves, organizations have begun to adopt big data and apply it as a tool throughout their decision-making processes. Research on big data has grown in the past years however mainly from a technical stance and there is a void in business related cases. This thesis fills the gap in the research by addressing big data challenges and failure cases. The Technology-Organization-Environment framework was applied to carry out a literature review on trends in Business Intelligence and Knowledge management information system failures. A review of extant literature was carried out using a collection of leading information system journals. Academic papers and articles on big data, Business Intelligence, Decision Support Systems, and Knowledge Management systems were studied from both failure and success aspects in order to build a model for big data failure. I continue and delineate the contribution of the Information System failure literature as it is the principal dynamics behind technology-organization-environment framework. The gathered literature was then categorised and a failure model was developed from the identified critical failure points. The failure constructs were further categorized, defined, and tabulated into a contextual diagram. The developed model and table were designed to act as comprehensive starting point and as general guidance for academics, CIOs or other system stakeholders to facilitate decision-making in big data adoption process by measuring the effect of technological, organizational, and environmental variables with perceived benefits, dissatisfaction and discontinued use.
Resumo:
Outsourcing and offshoring or any combinations of these have not just become a popular phenomenon, but are viewed as one of the most important management strategies due to the new possibilities from globalization. They have been seen as a possibility to save costs and improve customer service. Executing offshoring and offshore outsourcing successfully can be more complex than initially expected. Potential cost savings resulting from of offshoring and offshore outsourcing are often based on lower manufacturing costs. However, these benefits might be conflicted by a more complex supply chain with service level challenges that can respectively increase costs. Therefore analyzing the total cost effects of offshoring and outsourcing is necessary. The aim of this Master´s Thesis was to to construct a total cost model using academic literature to calculate the total costs and analyze the reasonability of offshoring and offshore outsourcing production of a case company compared to insourcing production. The research data was mainly quantitative and collected mainly from the case company past sales and production records. In addition management level interviews from the case company were conducted. The information from these interviews was used for the qualification of the necessary quantitative data and adding supportive information that could not be gathered from the quantitative data. Both data collection and analysis were guided by a theoretical frame of reference that was based on academic literature concerning offshoring and outsourcing, statistical calculation of demand and total costs. The results confirm the theories that offshoring and offshore outsourcing would reduce total costs as both offshoring and offshore outsourcing options result in lower total annual costs than insourcing mainly due to lower manufacturing costs. However, increased demand uncertainty would make the alternative of offshore outsourcing more risky and difficult to manage. Therefore when assessing the overall impact of the alternatives, offshoring is the most preferable option. As the main cost savings in offshore outsourcing came from lower manufacturing costs, more specifically labour costs, the logistics costs in this case company did not have an essential effect in total costs. The management should therefore pay attention initially to manufacturing costs and then logistics costs when choosing the best production sourcing option for the company.