168 resultados para data movement problem
Resumo:
This study considered the problem of predicting survival, based on three alternative models: a single Weibull, a mixture of Weibulls and a cure model. Instead of the common procedure of choosing a single “best” model, where “best” is defined in terms of goodness of fit to the data, a Bayesian model averaging (BMA) approach was adopted to account for model uncertainty. This was illustrated using a case study in which the aim was the description of lymphoma cancer survival with covariates given by phenotypes and gene expression. The results of this study indicate that if the sample size is sufficiently large, one of the three models emerge as having highest probability given the data, as indicated by the goodness of fit measure; the Bayesian information criterion (BIC). However, when the sample size was reduced, no single model was revealed as “best”, suggesting that a BMA approach would be appropriate. Although a BMA approach can compromise on goodness of fit to the data (when compared to the true model), it can provide robust predictions and facilitate more detailed investigation of the relationships between gene expression and patient survival. Keywords: Bayesian modelling; Bayesian model averaging; Cure model; Markov Chain Monte Carlo; Mixture model; Survival analysis; Weibull distribution
Resumo:
The Bluetooth technology is being increasingly used, among the Automated Vehicle Identification Systems, to retrieve important information about urban networks. Because the movement of Bluetooth-equipped vehicles can be monitored, throughout the network of Bluetooth sensors, this technology represents an effective means to acquire accurate time dependant Origin Destination information. In order to obtain reliable estimations, however, a number of issues need to be addressed, through data filtering and correction techniques. Some of the main challenges inherent to Bluetooth data are, first, that Bluetooth sensors may fail to detect all of the nearby Bluetooth-enabled vehicles. As a consequence, the exact journey for some vehicles may become a latent pattern that will need to be estimated. Second, sensors that are in close proximity to each other may have overlapping detection areas, thus making the task of retrieving the correct travelled path even more challenging. The aim of this paper is twofold: to give an overview of the issues inherent to the Bluetooth technology, through the analysis of the data available from the Bluetooth sensors in Brisbane; and to propose a method for retrieving the itineraries of the individual Bluetooth vehicles. We argue that estimating these latent itineraries, accurately, is a crucial step toward the retrieval of accurate dynamic Origin Destination Matrices.
Resumo:
Technological advances have led to an influx of affordable hardware that supports sensing, computation and communication. This hardware is increasingly deployed in public and private spaces, tracking and aggregating a wealth of real-time environmental data. Although these technologies are the focus of several research areas, there is a lack of research dealing with the problem of making these capabilities accessible to everyday users. This thesis represents a first step towards developing systems that will allow users to leverage the available infrastructure and create custom tailored solutions. It explores how this notion can be utilized in the context of energy monitoring to improve conventional approaches. The project adopted a user-centered design process to inform the development of a flexible system for real-time data stream composition and visualization. This system features an extensible architecture and defines a unified API for heterogeneous data streams. Rather than displaying the data in a predetermined fashion, it makes this information available as building blocks that can be combined and shared. It is based on the insight that individual users have diverse information needs and presentation preferences. Therefore, it allows users to compose rich information displays, incorporating personally relevant data from an extensive information ecosystem. The prototype was evaluated in an exploratory study to observe its natural use in a real-world setting, gathering empirical usage statistics and conducting semi-structured interviews. The results show that a high degree of customization does not warrant sustained usage. Other factors were identified, yielding recommendations for increasing the impact on energy consumption.
Resumo:
Big Data presents many challenges related to volume, whether one is interested in studying past datasets or, even more problematically, attempting to work with live streams of data. The most obvious challenge, in a ‘noisy’ environment such as contemporary social media, is to collect the pertinent information; be that information for a specific study, tweets which can inform emergency services or other responders to an ongoing crisis, or give an advantage to those involved in prediction markets. Often, such a process is iterative, with keywords and hashtags changing with the passage of time, and both collection and analytic methodologies need to be continually adapted to respond to this changing information. While many of the data sets collected and analyzed are preformed, that is they are built around a particular keyword, hashtag, or set of authors, they still contain a large volume of information, much of which is unnecessary for the current purpose and/or potentially useful for future projects. Accordingly, this panel considers methods for separating and combining data to optimize big data research and report findings to stakeholders. The first paper considers possible coding mechanisms for incoming tweets during a crisis, taking a large stream of incoming tweets and selecting which of those need to be immediately placed in front of responders, for manual filtering and possible action. The paper suggests two solutions for this, content analysis and user profiling. In the former case, aspects of the tweet are assigned a score to assess its likely relationship to the topic at hand, and the urgency of the information, whilst the latter attempts to identify those users who are either serving as amplifiers of information or are known as an authoritative source. Through these techniques, the information contained in a large dataset could be filtered down to match the expected capacity of emergency responders, and knowledge as to the core keywords or hashtags relating to the current event is constantly refined for future data collection. The second paper is also concerned with identifying significant tweets, but in this case tweets relevant to particular prediction market; tennis betting. As increasing numbers of professional sports men and women create Twitter accounts to communicate with their fans, information is being shared regarding injuries, form and emotions which have the potential to impact on future results. As has already been demonstrated with leading US sports, such information is extremely valuable. Tennis, as with American Football (NFL) and Baseball (MLB) has paid subscription services which manually filter incoming news sources, including tweets, for information valuable to gamblers, gambling operators, and fantasy sports players. However, whilst such services are still niche operations, much of the value of information is lost by the time it reaches one of these services. The paper thus considers how information could be filtered from twitter user lists and hash tag or keyword monitoring, assessing the value of the source, information, and the prediction markets to which it may relate. The third paper examines methods for collecting Twitter data and following changes in an ongoing, dynamic social movement, such as the Occupy Wall Street movement. It involves the development of technical infrastructure to collect and make the tweets available for exploration and analysis. A strategy to respond to changes in the social movement is also required or the resulting tweets will only reflect the discussions and strategies the movement used at the time the keyword list is created — in a way, keyword creation is part strategy and part art. In this paper we describe strategies for the creation of a social media archive, specifically tweets related to the Occupy Wall Street movement, and methods for continuing to adapt data collection strategies as the movement’s presence in Twitter changes over time. We also discuss the opportunities and methods to extract data smaller slices of data from an archive of social media data to support a multitude of research projects in multiple fields of study. The common theme amongst these papers is that of constructing a data set, filtering it for a specific purpose, and then using the resulting information to aid in future data collection. The intention is that through the papers presented, and subsequent discussion, the panel will inform the wider research community not only on the objectives and limitations of data collection, live analytics, and filtering, but also on current and in-development methodologies that could be adopted by those working with such datasets, and how such approaches could be customized depending on the project stakeholders.
Resumo:
The geographic location of cloud data storage centres is an important issue for many organisations and individuals due to various regulations that require data and operations to reside in specific geographic locations. Thus, cloud users may want to be sure that their stored data have not been relocated into unknown geographic regions that may compromise the security of their stored data. Albeshri et al. (2012) combined proof of storage (POS) protocols with distance-bounding protocols to address this problem. However, their scheme involves unnecessary delay when utilising typical POS schemes due to computational overhead at the server side. The aim of this paper is to improve the basic GeoProof protocol by reducing the computation overhead at the server side. We show how this can maintain the same level of security while achieving more accurate geographic assurance.
Resumo:
The use of Mahalanobis squared distance–based novelty detection in statistical damage identification has become increasingly popular in recent years. The merit of the Mahalanobis squared distance–based method is that it is simple and requires low computational effort to enable the use of a higher dimensional damage-sensitive feature, which is generally more sensitive to structural changes. Mahalanobis squared distance–based damage identification is also believed to be one of the most suitable methods for modern sensing systems such as wireless sensors. Although possessing such advantages, this method is rather strict with the input requirement as it assumes the training data to be multivariate normal, which is not always available particularly at an early monitoring stage. As a consequence, it may result in an ill-conditioned training model with erroneous novelty detection and damage identification outcomes. To date, there appears to be no study on how to systematically cope with such practical issues especially in the context of a statistical damage identification problem. To address this need, this article proposes a controlled data generation scheme, which is based upon the Monte Carlo simulation methodology with the addition of several controlling and evaluation tools to assess the condition of output data. By evaluating the convergence of the data condition indices, the proposed scheme is able to determine the optimal setups for the data generation process and subsequently avoid unnecessarily excessive data. The efficacy of this scheme is demonstrated via applications to a benchmark structure data in the field.
Resumo:
This work considers the problem of building high-fidelity 3D representations of the environment from sensor data acquired by mobile robots. Multi-sensor data fusion allows for more complete and accurate representations, and for more reliable perception, especially when different sensing modalities are used. In this paper, we propose a thorough experimental analysis of the performance of 3D surface reconstruction from laser and mm-wave radar data using Gaussian Process Implicit Surfaces (GPIS), in a realistic field robotics scenario. We first analyse the performance of GPIS using raw laser data alone and raw radar data alone, respectively, with different choices of covariance matrices and different resolutions of the input data. We then evaluate and compare the performance of two different GPIS fusion approaches. The first, state-of-the-art approach directly fuses raw data from laser and radar. The alternative approach proposed in this paper first computes an initial estimate of the surface from each single source of data, and then fuses these two estimates. We show that this method outperforms the state of the art, especially in situations where the sensors react differently to the targets they perceive.
Resumo:
This article summarizes research from an ecological dynamics program of work on team sports exemplifying how small-sided and conditioned games (SSCG) can enhance skill acquisition and decision-making processes during training. The data highlighted show how constraints of different SSCG can facilitate emergence of continuous interpersonal coordination tendencies during practice to benefit team game players.
Resumo:
After nearly fifteen years of the open access (OA) movement and its hard-fought struggle for a more open scholarly communication system, publishers are realizing that business models can be both open and profitable. Making journal articles available on an OA license is becoming an accepted strategy for maximizing the value of content to both research communities and the businesses that serve them. The first blog in this two-part series celebrating Data Innovation Day looks at the role that data-innovation is playing in the shift to open access for journal articles.
Resumo:
Governments are challenged by the need to ensure that ageing populations stay active and engaged as they age. Therefore, it is critical to investigate the role of mobility in older people's engagement in out-of-home activities, and to identify the experiences they have within their communities. This research investigates the use of transportation by older people and its implications for their out-of-home activities within suburban environments. The qualitative, mixed-method approach employs data collection methods which include a daily travel diary (including a questionnaire), Global Positioning System (GPS) tracking and semi-structured interviews with older people living in suburban environments in Brisbane, Australia. Results show that older people are mobile throughout the city, and their car provides them with that opportunity to access desired destinations. This ability to drive allows older people to live independently and to assist others who do not drive, particularly where transport alternatives are not as accessible. The ability to transport goods and other people is a significant advantage of the private car over other transport options. People with no access to private transportation who live in low-density environments are disadvantaged when it comes to participation within the community. Further research is needed to better understand the relationship between transportation and participation within the community environment, to assist policy makers and city and transportation planners to develop strategies for age-friendly environments within the community.
Resumo:
The policies and regulations governing the practice of state asset management have emerged as an urgent question among many countries worldwide for there is heightened awareness of the complex and crucial role that state assets play in public service provision. Indonesia is an example of such country, introducing a ‘big-bang’ reform in state asset management laws, policies, regulations, and technical guidelines. Indonesia exemplified its enthusiasm in reforming state asset management policies and practices through the establishment of the Directorate General of State Assets in 2006. The Directorate General of State Assets have stressed the new direction that it is taking state asset management laws and policies through the introduction of Republic of Indonesia Law Number 38 Year 2008, which is an amended regulation overruling Republic of Indonesia Law Number 6 Year 2006 on Central/Regional Government State Asset Management. Law number 38/2008 aims to further exemplify good governance principles and puts forward a ‘the highest and best use of assets’ principle in state asset management. The purpose of this study is to explore and analyze specific contributing influences to state asset management practices, answering the question why innovative state asset management policy implementation is stagnant. The methodology of this study is that of qualitative case study approach, utilizing empirical data sample of four Indonesian regional governments. Through a thematic analytical approach this study provides an in-depth analysis of each influencing factors to state asset management reform. Such analysis suggests the potential of an ‘excuse rhetoric’; whereby the influencing factors identified are a smoke-screen, or are myths that public policy makers and implementers believe in, as a means to ex-plain stagnant implementation of innovative state asset management practice. Thus this study offers deeper insights of the intricate web that influences state as-set management innovative policies to state asset management policy makers; to be taken into consideration in future policy writing.
Resumo:
This chapter addresses data modelling as a means of promoting statistical literacy in the early grades. Consideration is first given to the importance of increasing young children’s exposure to statistical reasoning experiences and how data modelling can be a rich means of doing so. Selected components of data modelling are then reviewed, followed by a report on some findings from the third-year of a three-year longitudinal study across grades one through three.
Resumo:
Social media have become crucial tools for political activists and protest movements, providing another channel for promoting messages and garnering support. Twitter, in particular, has been identified as a noteworthy medium for protests in countries including Iran and Egypt to receive global attention. The Occupy movement, originating with protests in, and the physical occupation of, Wall Street, and inspiring similar demonstrations in other U.S. cities and around the world, has been intrinsically linked with social media through location-specific hashtags: #ows for Occupy Wall Street, #occupysf for San Francisco, and so on. While the individual protests have a specific geographical focus-highlighted by the physical occupation of parks, buildings, and other urban areas-Twitter provides a means for these different movements to be linked and promoted through tweets containing multiple hashtags. It also serves as a channel for tactical communications during actions and as a space in which movement debates take place. This paper examines Twitter's use within the Occupy Oakland movement. We use a mixture of ethnographic research through interviews with activists and participant observation of the movements' activities, and a dataset of public tweets containing the #oo hashtag from early 2012. This research methodology allows us to develop a more accurate and nuanced understanding of how movement activists use Twitter by cross-checking trends in the online data with observations and activists' own reported use of Twitter. We also study the connections between a geographically focused movement such as Occupy Oakland and related, but physically distant, protests taking place concurrently in other cities. This study forms part of a wider research project, Mapping Movements, exploring the politics of place, investigating how social movements are composed and sustained, and the uses of online communication within these movements.
Resumo:
We study the multicast stream authentication problem when an opponent can drop, reorder and introduce data packets into the communication channel. In such a model, packet overhead and computing efficiency are two parameters to be taken into account when designing a multicast stream protocol. In this paper, we propose to use two families of erasure codes to deal with this problem, namely, rateless codes and maximum distance separable codes. Our constructions will have the following advantages. First, our packet overhead will be small. Second, the number of signature verifications to be performed at the receiver is O(1). Third, every receiver will be able to recover all the original data packets emitted by the sender despite losses and injection occurred during the transmission of information.
Resumo:
We consider the following problem: members in a dynamic group retrieve their encrypted data from an untrusted server based on keywords and without any loss of data confidentiality and member’s privacy. In this paper, we investigate common secure indices for conjunctive keyword-based retrieval over encrypted data, and construct an efficient scheme from Wang et al. dynamic accumulator, Nyberg combinatorial accumulator and Kiayias et al. public-key encryption system. The proposed scheme is trapdoorless and keyword-field free. The security is proved under the random oracle, decisional composite residuosity and extended strong RSA assumptions.