557 resultados para Missing Data
Resumo:
Background: National physical activity data suggest that there is a considerable difference in physical activity levels of US and Australian adults. Although different surveys (Active Australia and BRFSS) are used, the questions are similar. Different protocols, however, are used to estimate “activity” from the data collected. The primary aim of this study was to assess whether the 2 approaches to the management of PA data could explain some of the difference in prevalence estimates derived from the two national surveys. Methods: Secondary data analysis of the most recent AA survey (N = 2987). Results: 15% of the sample was defined as “active” using Australian criteria but as “inactive” using the BRFSS protocol, even though weekly energy expenditure was commensurate with meeting current guidelines. Younger respondents (age < 45 y) were more likely to be “misclassified” using the BRFSS criteria. Conclusions: The prevalence of activity in Australia and the US appears to be more similar than we had previously thought.
Resumo:
Client owners usually need an estimate or forecast of their likely building costs in advance of detailed design in order to confirm the financial feasibility of their projects. Because of their timing in the project life cycle, these early stage forecasts are characterized by the minimal amount of information available concerning the new (target) project to the point that often only its size and type are known. One approach is to use the mean contract sum of a sample, or base group, of previous projects of a similar type and size to the project for which the estimate is needed. Bernoulli’s law of large numbers implies that this base group should be as large as possible. However, increasing the size of the base group inevitably involves including projects that are less and less similar to the target project. Deciding on the optimal number of base group projects is known as the homogeneity or pooling problem. A method of solving the homogeneity problem is described involving the use of closed form equations to compare three different sampling arrangements of previous projects for their simulated forecasting ability by a cross-validation method, where a series of targets are extracted, with replacement, from the groups and compared with the mean value of the projects in the base groups. The procedure is then demonstrated with 450 Hong Kong projects (with different project types: Residential, Commercial centre, Car parking, Social community centre, School, Office, Hotel, Industrial, University and Hospital) clustered into base groups according to their type and size.
Resumo:
Invited book review of Carolyn Carpan, 2009, Sisters, Schoolgirls and Sleuths : Girls' Series Books in America, MD: Scarecrow Press
Resumo:
The selection of appropriate analogue materials is a central consideration in the design of realistic physical models. We investigate the rheology of highly-filled silicone polymers in order to find materials with a power-law strain-rate softening rheology suitable for modelling rock deformation by dislocation creep and report the rheological properties of the materials as functions of the filler content. The mixtures exhibit strain-rate softening behaviour but with increasing amounts of filler become strain-dependent. For the strain-independent viscous materials, flow laws are presented while for strain-dependent materials the relative importance of strain and strain rate softening/hardening is reported. If the stress or strain rate is above a threshold value some highly-filled silicone polymers may be considered linear visco-elastic (strain independent) and power-law strain-rate softening. The power-law exponent can be raised from 1 to ~3 by using mixtures of high-viscosity silicone and plasticine. However, the need for high shear strain rates to obtain the power-law rheology imposes some restrictions on the usage of such materials for geodynamic modelling. Two simple shear experiments are presented that use Newtonian and power-law strain-rate softening materials. The results demonstrate how materials with power-law rheology result in better strain localization in analogue experiments.
Resumo:
A significant issue encountered when fusing data received from multiple sensors is the accuracy of the timestamp associated with each piece of data. This is particularly important in applications such as Simultaneous Localisation and Mapping (SLAM) where vehicle velocity forms an important part of the mapping algorithms; on fastmoving vehicles, even millisecond inconsistencies in data timestamping can produce errors which need to be compensated for. The timestamping problem is compounded in a robot swarm environment due to the use of non-deterministic readily-available hardware (such as 802.11-based wireless) and inaccurate clock synchronisation protocols (such as Network Time Protocol (NTP)). As a result, the synchronisation of the clocks between robots can be out by tens-to-hundreds of milliseconds making correlation of data difficult and preventing the possibility of the units performing synchronised actions such as triggering cameras or intricate swarm manoeuvres. In this thesis, a complete data fusion unit is designed, implemented and tested. The unit, named BabelFuse, is able to accept sensor data from a number of low-speed communication buses (such as RS232, RS485 and CAN Bus) and also timestamp events that occur on General Purpose Input/Output (GPIO) pins referencing a submillisecondaccurate wirelessly-distributed "global" clock signal. In addition to its timestamping capabilities, it can also be used to trigger an attached camera at a predefined start time and frame rate. This functionality enables the creation of a wirelessly-synchronised distributed image acquisition system over a large geographic area; a real world application for this functionality is the creation of a platform to facilitate wirelessly-distributed 3D stereoscopic vision. A ‘best-practice’ design methodology is adopted within the project to ensure the final system operates according to its requirements. Initially, requirements are generated from which a high-level architecture is distilled. This architecture is then converted into a hardware specification and low-level design, which is then manufactured. The manufactured hardware is then verified to ensure it operates as designed and firmware and Linux Operating System (OS) drivers are written to provide the features and connectivity required of the system. Finally, integration testing is performed to ensure the unit functions as per its requirements. The BabelFuse System comprises of a single Grand Master unit which is responsible for maintaining the absolute value of the "global" clock. Slave nodes then determine their local clock o.set from that of the Grand Master via synchronisation events which occur multiple times per-second. The mechanism used for synchronising the clocks between the boards wirelessly makes use of specific hardware and a firmware protocol based on elements of the IEEE-1588 Precision Time Protocol (PTP). With the key requirement of the system being submillisecond-accurate clock synchronisation (as a basis for timestamping and camera triggering), automated testing is carried out to monitor the o.sets between each Slave and the Grand Master over time. A common strobe pulse is also sent to each unit for timestamping; the correlation between the timestamps of the di.erent units is used to validate the clock o.set results. Analysis of the automated test results show that the BabelFuse units are almost threemagnitudes more accurate than their requirement; clocks of the Slave and Grand Master units do not di.er by more than three microseconds over a running time of six hours and the mean clock o.set of Slaves to the Grand Master is less-than one microsecond. The common strobe pulse used to verify the clock o.set data yields a positive result with a maximum variation between units of less-than two microseconds and a mean value of less-than one microsecond. The camera triggering functionality is verified by connecting the trigger pulse output of each board to a four-channel digital oscilloscope and setting each unit to output a 100Hz periodic pulse with a common start time. The resulting waveform shows a maximum variation between the rising-edges of the pulses of approximately 39¥ìs, well below its target of 1ms.
Resumo:
Mandatory data breach notification laws are a novel statutory solution in relation to organizational protections of personal information. They require organizations which have suffered a breach of security involving personal information to notif'y those persons whose information may have been affected. These laws originated in the state based legislatures of the United States during the last decade and have subsequently garnered worldwide legislative interest. Despite their perceived utility, mandatory data breach notification laws have several conceptual and practical concems that limit the scope of their applicability, particularly in relation to existing information privacy law regimes. We outline these concerns, and in doing so, we contend that while mandatory data breach notification laws have many useful facets, their utility as an 'add-on' to enhance the failings of current information privacy law frameworks should not necessarily be taken for granted.
Resumo:
longitudinal study of data modelling across grades 1-3. The activity engaged children in designing, implementing, and analysing a survey about their new playground. Data modelling involves investigations of meaningful phenomena, deciding what is worthy of attention (identifying complex attributes), and then progressing to organising, structuring, visualising, and representing data. The core components of data modelling addressed here are children’s structuring and representing of data, with a focus on their display of metarepresentational competence (diSessa, 2004). Such competence includes students’ abilities to invent or design a variety of new representations, explain their creations, understand the role they play, and critique and compare the adequacy of representations. Reported here are the ways in which the children structured and represented their data, the metarepresentational competence displayed, and links between their metarepresentational competence and conceptual competence.
Resumo:
Background Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers, and the currently available R package implementing the method is similarly restricted to the analysis of over-expressed outliers. Here we present a modified outlier detection method, mCOPA, which contains refinements to the outlier-detection algorithm, identifies both over- and under-expressed outliers, is freely available, and can be applied to any expression dataset. Results We compare our method to other feature-selection approaches, and demonstrate that mCOPA frequently selects more-informative features than do differential expression or variance-based feature selection approaches, and is able to recover observed clinical subtypes more consistently. We demonstrate the application of mCOPA to prostate cancer expression data, and explore the use of outliers in clustering, pathway analysis, and the identification of tumour suppressors. We analyse the under-expressed outliers to identify known and novel prostate cancer tumour suppressor genes, validating these against data in Oncomine and the Cancer Gene Index. We also demonstrate how a combination of outlier analysis and pathway analysis can identify molecular mechanisms disrupted in individual tumours. Conclusions We demonstrate that mCOPA offers advantages, compared to differential expression or variance, in selecting outlier features, and that the features so selected are better able to assign samples to clinically annotated subtypes. Further, we show that the biology explored by outlier analysis differs from that uncovered in differential expression or variance analysis. mCOPA is an important new tool for the exploration of cancer datasets and the discovery of new cancer subtypes, and can be combined with pathway and functional analysis approaches to discover mechanisms underpinning heterogeneity in cancers
Resumo:
This paper presents an input-orientated data envelopment analysis (DEA) framework which allows the measurement and decomposition of economic, environmental and ecological efficiency levels in agricultural production across different countries. Economic, environmental and ecological optimisations search for optimal input combinations that minimise total costs, total amount of nutrients, and total amount of cumulative exergy contained in inputs respectively. The application of the framework to an agricultural dataset of 30 OECD countries revealed that (i) there was significant scope to make their agricultural production systemsmore environmentally and ecologically sustainable; (ii) the improvement in the environmental and ecological sustainability could be achieved by being more technically efficient and, even more significantly, by changing the input combinations; (iii) the rankings of sustainability varied significantly across OECD countries within frontier-based environmental and ecological efficiency measures and between frontier-based measures and indicators.
Resumo:
The ability to forecast machinery health is vital to reducing maintenance costs, operation downtime and safety hazards. Recent advances in condition monitoring technologies have given rise to a number of prognostic models which attempt to forecast machinery health based on condition data such as vibration measurements. This paper demonstrates how the population characteristics and condition monitoring data (both complete and suspended) of historical items can be integrated for training an intelligent agent to predict asset health multiple steps ahead. The model consists of a feed-forward neural network whose training targets are asset survival probabilities estimated using a variation of the Kaplan–Meier estimator and a degradation-based failure probability density function estimator. The trained network is capable of estimating the future survival probabilities when a series of asset condition readings are inputted. The output survival probabilities collectively form an estimated survival curve. Pump data from a pulp and paper mill were used for model validation and comparison. The results indicate that the proposed model can predict more accurately as well as further ahead than similar models which neglect population characteristics and suspended data. This work presents a compelling concept for longer-range fault prognosis utilising available information more fully and accurately.
Resumo:
Our paper approaches Twitter through the lens of “platform politics” (Gillespie, 2010), focusing in particular on controversies around user data access, ownership, and control. We characterise different actors in the Twitter data ecosystem: private and institutional end users of Twitter, commercial data resellers such as Gnip and DataSift, data scientists, and finally Twitter, Inc. itself; and describe their conflicting interests. We furthermore study Twitter’s Terms of Service and application programming interface (API) as material instantiations of regulatory instruments used by the platform provider and argue for a more promotion of data rights and literacy to strengthen the position of end users.
Resumo:
The deployment of new emerging technologies, such as cooperative systems, allows the traffic community to foresee relevant improvements in terms of traffic safety and efficiency. Vehicles are able to communicate on the local traffic state in real time, which could result in an automatic and therefore better reaction to the mechanism of traffic jam formation. An upstream single hop radio broadcast network can improve the perception of each cooperative driver within radio range and hence the traffic stability. The impact of a cooperative law on traffic congestion appearance is investigated, analytically and through simulation. Ngsim field data is used to calibrate the Optimal Velocity with Relative Velocity (OVRV) car following model and the MOBIL lane-changing model is implemented. Assuming that congestion can be triggered either by a perturbation in the instability domain or by a critical lane changing behavior, the calibrated car following behavior is used to assess the impact of a microscopic cooperative law on abnormal lane changing behavior. The cooperative law helps reduce and delay traffic congestion as it increases traffic flow stability.
Resumo:
Background Accumulated biological research outcomes show that biological functions do not depend on individual genes, but on complex gene networks. Microarray data are widely used to cluster genes according to their expression levels across experimental conditions. However, functionally related genes generally do not show coherent expression across all conditions since any given cellular process is active only under a subset of conditions. Biclustering finds gene clusters that have similar expression levels across a subset of conditions. This paper proposes a seed-based algorithm that identifies coherent genes in an exhaustive, but efficient manner. Methods In order to find the biclusters in a gene expression dataset, we exhaustively select combinations of genes and conditions as seeds to create candidate bicluster tables. The tables have two columns: (a) a gene set, and (b) the conditions on which the gene set have dissimilar expression levels to the seed. First, the genes with less than the maximum number of dissimilar conditions are identified and a table of these genes is created. Second, the rows that have the same dissimilar conditions are grouped together. Third, the table is sorted in ascending order based on the number of dissimilar conditions. Finally, beginning with the first row of the table, a test is run repeatedly to determine whether the cardinality of the gene set in the row is greater than the minimum threshold number of genes in a bicluster. If so, a bicluster is outputted and the corresponding row is removed from the table. Repeating this process, all biclusters in the table are systematically identified until the table becomes empty. Conclusions This paper presents a novel biclustering algorithm for the identification of additive biclusters. Since it involves exhaustively testing combinations of genes and conditions, the additive biclusters can be found more readily.