913 resultados para 004 - Informatik (Data processing Computer science)
Resumo:
In this session we'll explore how Microsoft uses data science and machine learning across it's entire business, from Windows and Office, to Skype and XBox. We'll look at how companies across the world use Microsoft technology for empowering their businesses in many different industries. And we'll look at data science technologies you can use yourselves, such as Azure Machine Learning and Power BI. Finally we'll discuss job opportunities for data scientists and tips on how you can be successful!
Resumo:
An emerging consensus in cognitive science views the biological brain as a hierarchically-organized predictive processing system. This is a system in which higher-order regions are continuously attempting to predict the activity of lower-order regions at a variety of (increasingly abstract) spatial and temporal scales. The brain is thus revealed as a hierarchical prediction machine that is constantly engaged in the effort to predict the flow of information originating from the sensory surfaces. Such a view seems to afford a great deal of explanatory leverage when it comes to a broad swathe of seemingly disparate psychological phenomena (e.g., learning, memory, perception, action, emotion, planning, reason, imagination, and conscious experience). In the most positive case, the predictive processing story seems to provide our first glimpse at what a unified (computationally-tractable and neurobiological plausible) account of human psychology might look like. This obviously marks out one reason why such models should be the focus of current empirical and theoretical attention. Another reason, however, is rooted in the potential of such models to advance the current state-of-the-art in machine intelligence and machine learning. Interestingly, the vision of the brain as a hierarchical prediction machine is one that establishes contact with work that goes under the heading of 'deep learning'. Deep learning systems thus often attempt to make use of predictive processing schemes and (increasingly abstract) generative models as a means of supporting the analysis of large data sets. But are such computational systems sufficient (by themselves) to provide a route to general human-level analytic capabilities? I will argue that they are not and that closer attention to a broader range of forces and factors (many of which are not confined to the neural realm) may be required to understand what it is that gives human cognition its distinctive (and largely unique) flavour. The vision that emerges is one of 'homomimetic deep learning systems', systems that situate a hierarchically-organized predictive processing core within a larger nexus of developmental, behavioural, symbolic, technological and social influences. Relative to that vision, I suggest that we should see the Web as a form of 'cognitive ecology', one that is as much involved with the transformation of machine intelligence as it is with the progressive reshaping of our own cognitive capabilities.
Resumo:
Most multidimensional projection techniques rely on distance (dissimilarity) information between data instances to embed high-dimensional data into a visual space. When data are endowed with Cartesian coordinates, an extra computational effort is necessary to compute the needed distances, making multidimensional projection prohibitive in applications dealing with interactivity and massive data. The novel multidimensional projection technique proposed in this work, called Part-Linear Multidimensional Projection (PLMP), has been tailored to handle multivariate data represented in Cartesian high-dimensional spaces, requiring only distance information between pairs of representative samples. This characteristic renders PLMP faster than previous methods when processing large data sets while still being competitive in terms of precision. Moreover, knowing the range of variation for data instances in the high-dimensional space, we can make PLMP a truly streaming data projection technique, a trait absent in previous methods.
Resumo:
The web services (WS) technology provides a comprehensive solution for representing, discovering, and invoking services in a wide variety of environments, including Service Oriented Architectures (SOA) and grid computing systems. At the core of WS technology lie a number of XML-based standards, such as the Simple Object Access Protocol (SOAP), that have successfully ensured WS extensibility, transparency, and interoperability. Nonetheless, there is an increasing demand to enhance WS performance, which is severely impaired by XML's verbosity. SOAP communications produce considerable network traffic, making them unfit for distributed, loosely coupled, and heterogeneous computing environments such as the open Internet. Also, they introduce higher latency and processing delays than other technologies, like Java RMI and CORBA. WS research has recently focused on SOAP performance enhancement. Many approaches build on the observation that SOAP message exchange usually involves highly similar messages (those created by the same implementation usually have the same structure, and those sent from a server to multiple clients tend to show similarities in structure and content). Similarity evaluation and differential encoding have thus emerged as SOAP performance enhancement techniques. The main idea is to identify the common parts of SOAP messages, to be processed only once, avoiding a large amount of overhead. Other approaches investigate nontraditional processor architectures, including micro-and macrolevel parallel processing solutions, so as to further increase the processing rates of SOAP/XML software toolkits. This survey paper provides a concise, yet comprehensive review of the research efforts aimed at SOAP performance enhancement. A unified view of the problem is provided, covering almost every phase of SOAP processing, ranging over message parsing, serialization, deserialization, compression, multicasting, security evaluation, and data/instruction-level processing.
Resumo:
Spatial data warehouses (SDWs) allow for spatial analysis together with analytical multidimensional queries over huge volumes of data. The challenge is to retrieve data related to ad hoc spatial query windows according to spatial predicates, avoiding the high cost of joining large tables. Therefore, mechanisms to provide efficient query processing over SDWs are essential. In this paper, we propose two efficient indices for SDW: the SB-index and the HSB-index. The proposed indices share the following characteristics. They enable multidimensional queries with spatial predicate for SDW and also support predefined spatial hierarchies. Furthermore, they compute the spatial predicate and transform it into a conventional one, which can be evaluated together with other conventional predicates by accessing a star-join Bitmap index. While the SB-index has a sequential data structure, the HSB-index uses a hierarchical data structure to enable spatial objects clustering and a specialized buffer-pool to decrease the number of disk accesses. The advantages of the SB-index and the HSB-index over the DBMS resources for SDW indexing (i.e. star-join computation and materialized views) were investigated through performance tests, which issued roll-up operations extended with containment and intersection range queries. The performance results showed that improvements ranged from 68% up to 99% over both the star-join computation and the materialized view. Furthermore, the proposed indices proved to be very compact, adding only less than 1% to the storage requirements. Therefore, both the SB-index and the HSB-index are excellent choices for SDW indexing. Choosing between the SB-index and the HSB-index mainly depends on the query selectivity of spatial predicates. While low query selectivity benefits the HSB-index, the SB-index provides better performance for higher query selectivity.
Resumo:
Current commercial and academic OLAP tools do not process XML data that contains XLink. Aiming at overcoming this issue, this paper proposes an analytical system composed by LMDQL, an analytical query language. Also, the XLDM metamodel is given to model cubes of XML documents with XLink and to deal with syntactic, semantic and structural heterogeneities commonly found in XML documents. As current W3C query languages for navigating in XML documents do not support XLink, XLPath is discussed in this article to provide features for the LMDQL query processing. A prototype system enabling the analytical processing of XML documents that use XLink is also detailed. This prototype includes a driver, named sql2xquery, which performs the mapping of SQL queries into XQuery. To validate the proposed system, a case study and its performance evaluation are presented to analyze the impact of analytical processing over XML/XLink documents.
Resumo:
Current scientific applications have been producing large amounts of data. The processing, handling and analysis of such data require large-scale computing infrastructures such as clusters and grids. In this area, studies aim at improving the performance of data-intensive applications by optimizing data accesses. In order to achieve this goal, distributed storage systems have been considering techniques of data replication, migration, distribution, and access parallelism. However, the main drawback of those studies is that they do not take into account application behavior to perform data access optimization. This limitation motivated this paper which applies strategies to support the online prediction of application behavior in order to optimize data access operations on distributed systems, without requiring any information on past executions. In order to accomplish such a goal, this approach organizes application behaviors as time series and, then, analyzes and classifies those series according to their properties. By knowing properties, the approach selects modeling techniques to represent series and perform predictions, which are, later on, used to optimize data access operations. This new approach was implemented and evaluated using the OptorSim simulator, sponsored by the LHC-CERN project and widely employed by the scientific community. Experiments confirm this new approach reduces application execution time in about 50 percent, specially when handling large amounts of data.
Resumo:
In this paper, we investigate content-centric data transmission in the context of short opportunistic contacts and base our work on an existing content-centric networking architecture. In case of short interconnection times, file transfers may not be completed and the received information is discarded. Caches in content-centric networks are used for short-term storage and do not guarantee persistence. We implemented a mechanism to extend caching on persistent storage enabling the completion of disrupted content transfers. The mechanisms have been implemented in the CCNx framework and have been evaluated on wireless mesh nodes. Our evaluations using multicast and unicast communication show that the implementation can support content transfers in opportunistic environments without significant processing and storing overhead.
Resumo:
This paper considers a framework where data from correlated sources are transmitted with the help of network coding in ad hoc network topologies. The correlated data are encoded independently at sensors and network coding is employed in the intermediate nodes in order to improve the data delivery performance. In such settings, we focus on the problem of reconstructing the sources at decoder when perfect decoding is not possible due to losses or bandwidth variations. We show that the source data similarity can be used at decoder to permit decoding based on a novel and simple approximate decoding scheme. We analyze the influence of the network coding parameters and in particular the size of finite coding fields on the decoding performance. We further determine the optimal field size that maximizes the expected decoding performance as a trade-off between information loss incurred by limiting the resolution of the source data and the error probability in the reconstructed data. Moreover, we show that the performance of the approximate decoding improves when the accuracy of the source model increases even with simple approximate decoding techniques. We provide illustrative examples showing how the proposed algorithm can be deployed in sensor networks and distributed imaging applications.
Resumo:
Recognizing the increasing amount of information shared on Social Networking Sites (SNS), in this study we aim to explore the information processing strategies of users on Facebook. Specifically, we aim to investigate the impact of various factors on user attitudes towards the posts on their Newsfeed. To collect the data, we program a Facebook application that allows users to evaluate posts in real time. Applying Structural Equation Modeling to a sample of 857 observations we find that it is mostly the affective attitude that shapes user behavior on the network. This attitude, in turn, is mainly determined by the communication intensity between users, overriding comprehensibility of the post and almost neglecting post length and user posting frequency.
Resumo:
The article proposes granular computing as a theoretical, formal and methodological basis for the newly emerging research field of human–data interaction (HDI). We argue that the ability to represent and reason with information granules is a prerequisite for data legibility. As such, it allows for extending the research agenda of HDI to encompass the topic of collective intelligence amplification, which is seen as an opportunity of today’s increasingly pervasive computing environments. As an example of collective intelligence amplification in HDI, we introduce a collaborative urban planning use case in a cognitive city environment and show how an iterative process of user input and human-oriented automated data processing can support collective decision making. As a basis for automated human-oriented data processing, we use the spatial granular calculus of granular geometry.
Resumo:
Service providers make use of cost-effective wireless solutions to identify, localize, and possibly track users using their carried MDs to support added services, such as geo-advertisement, security, and management. Indoor and outdoor hotspot areas play a significant role for such services. However, GPS does not work in many of these areas. To solve this problem, service providers leverage available indoor radio technologies, such as WiFi, GSM, and LTE, to identify and localize users. We focus our research on passive services provided by third parties, which are responsible for (i) data acquisition and (ii) processing, and network-based services, where (i) and (ii) are done inside the serving network. For better understanding of parameters that affect indoor localization, we investigate several factors that affect indoor signal propagation for both Bluetooth and WiFi technologies. For GSM-based passive services, we developed first a data acquisition module: a GSM receiver that can overhear GSM uplink messages transmitted by MDs while being invisible. A set of optimizations were made for the receiver components to support wideband capturing of the GSM spectrum while operating in real-time. Processing the wide-spectrum of the GSM is possible using a proposed distributed processing approach over an IP network. Then, to overcome the lack of information about tracked devices’ radio settings, we developed two novel localization algorithms that rely on proximity-based solutions to estimate in real environments devices’ locations. Given the challenging indoor environment on radio signals, such as NLOS reception and multipath propagation, we developed an original algorithm to detect and remove contaminated radio signals before being fed to the localization algorithm. To improve the localization algorithm, we extended our work with a hybrid based approach that uses both WiFi and GSM interfaces to localize users. For network-based services, we used a software implementation of a LTE base station to develop our algorithms, which characterize the indoor environment before applying the localization algorithm. Experiments were conducted without any special hardware, any prior knowledge of the indoor layout or any offline calibration of the system.
Resumo:
In this work, we propose a novel network coding enabled NDN architecture for the delivery of scalable video. Our scheme utilizes network coding in order to address the problem that arises in the original NDN protocol, where optimal use of the bandwidth and caching resources necessitates the coordination of the forwarding decisions. To optimize the performance of the proposed network coding based NDN protocol and render it appropriate for transmission of scalable video, we devise a novel rate allocation algorithm that decides on the optimal rates of Interest messages sent by clients and intermediate nodes. This algorithm guarantees that the achieved flow of Data objects will maximize the average quality of the video delivered to the client population. To support the handling of Interest messages and Data objects when intermediate nodes perform network coding, we modify the standard NDN protocol and introduce the use of Bloom filters, which store efficiently additional information about the Interest messages and Data objects. The proposed architecture is evaluated for transmission of scalable video over PlanetLab topologies. The evaluation shows that the proposed scheme performs very close to the optimal performance
Resumo:
This paper presents a shallow dialogue analysis model, aimed at human-human dialogues in the context of staff or business meetings. Four components of the model are defined, and several machine learning techniques are used to extract features from dialogue transcripts: maximum entropy classifiers for dialogue acts, latent semantic analysis for topic segmentation, or decision tree classifiers for discourse markers. A rule-based approach is proposed for solving cross-modal references to meeting documents. The methods are trained and evaluated thanks to a common data set and annotation format. The integration of the components into an automated shallow dialogue parser opens the way to multimodal meeting processing and retrieval applications.
Resumo:
Ciao Prolog incorporates a module system which allows sepárate compilation and sensible creation of standalone executables. We describe some of the main aspects of the Ciao modular compiler, ciaoc, which takes advantage of the characteristics of the Ciao Prolog module system to automatically perform sepárate and incremental compilation and efficiently build small, standalone executables with competitive run-time performance, ciaoc can also detect statically a larger number of programming errors. We also present a generic code processing library for handling modular programs, which provides an important part of the functionality of ciaoc. This library allows the development of program analysis and transformation tools in a way that is to some extent orthogonal to the details of module system design, and has been used in the implementation of ciaoc and other Ciao system tools. We also describe the different types of executables which can be generated by the Ciao compiler, which offer different tradeoffs between executable size, startup time, and portability, depending, among other factors, on the linking regime used (static, dynamic, lazy, etc.). Finally, we provide experimental data which illustrate these tradeoffs.