165 resultados para multiple data
Resumo:
This paper studies the missing covariate problem which is often encountered in survival analysis. Three covariate imputation methods are employed in the study, and the effectiveness of each method is evaluated within the hazard prediction framework. Data from a typical engineering asset is used in the case study. Covariate values in some time steps are deliberately discarded to generate an incomplete covariate set. It is found that although the mean imputation method is simpler than others for solving missing covariate problems, the results calculated by it can differ largely from the real values of the missing covariates. This study also shows that in general, results obtained from the regression method are more accurate than those of the mean imputation method but at the cost of a higher computational expensive. Gaussian Mixture Model (GMM) method is found to be the most effective method within these three in terms of both computation efficiency and predication accuracy.
Resumo:
The use of Wireless Sensor Networks (WSNs) for Structural Health Monitoring (SHM) has become a promising approach due to many advantages such as low cost, fast and flexible deployment. However, inherent technical issues such as data synchronization error and data loss have prevented these distinct systems from being extensively used. Recently, several SHM-oriented WSNs have been proposed and believed to be able to overcome a large number of technical uncertainties. Nevertheless, there is limited research verifying the applicability of those WSNs with respect to demanding SHM applications like modal analysis and damage identification. This paper first presents a brief review of the most inherent uncertainties of the SHM-oriented WSN platforms and then investigates their effects on outcomes and performance of the most robust Output-only Modal Analysis (OMA) techniques when employing merged data from multiple tests. The two OMA families selected for this investigation are Frequency Domain Decomposition (FDD) and Data-driven Stochastic Subspace Identification (SSI-data) due to the fact that they both have been widely applied in the past decade. Experimental accelerations collected by a wired sensory system on a large-scale laboratory bridge model are initially used as clean data before being contaminated by different data pollutants in sequential manner to simulate practical SHM-oriented WSN uncertainties. The results of this study show the robustness of FDD and the precautions needed for SSI-data family when dealing with SHM-WSN uncertainties. Finally, the use of the measurement channel projection for the time-domain OMA techniques and the preferred combination of the OMA techniques to cope with the SHM-WSN uncertainties is recommended.
Resumo:
Background: Multiple sclerosis (MS) is the most common cause of chronic neurologic disability beginning in early to middle adult life. Results from recent genome-wide association studies (GWAS) have substantially lengthened the list of disease loci and provide convincing evidence supporting a multifactorial and polygenic model of inheritance. Nevertheless, the knowledge of MS genetics remains incomplete, with many risk alleles still to be revealed. Methods: We used a discovery GWAS dataset (8,844 samples, 2,124 cases and 6,720 controls) and a multi-step logistic regression protocol to identify novel genetic associations. The emerging genetic profile included 350 independent markers and was used to calculate and estimate the cumulative genetic risk in an independent validation dataset (3,606 samples). Analysis of covariance (ANCOVA) was implemented to compare clinical characteristics of individuals with various degrees of genetic risk. Gene ontology and pathway enrichment analysis was done using the DAVID functional annotation tool, the GO Tree Machine, and the Pathway-Express profiling tool. Results: In the discovery dataset, the median cumulative genetic risk (P-Hat) was 0.903 and 0.007 in the case and control groups, respectively, together with 79.9% classification sensitivity and 95.8% specificity. The identified profile shows a significant enrichment of genes involved in the immune response, cell adhesion, cell communication/ signaling, nervous system development, and neuronal signaling, including ionotropic glutamate receptors, which have been implicated in the pathological mechanism driving neurodegeneration. In the validation dataset, the median cumulative genetic risk was 0.59 and 0.32 in the case and control groups, respectively, with classification sensitivity 62.3% and specificity 75.9%. No differences in disease progression or T2-lesion volumes were observed among four levels of predicted genetic risk groups (high, medium, low, misclassified). On the other hand, a significant difference (F = 2.75, P = 0.04) was detected for age of disease onset between the affected misclassified as controls (mean = 36 years) and the other three groups (high, 33.5 years; medium, 33.4 years; low, 33.1 years). Conclusions: The results are consistent with the polygenic model of inheritance. The cumulative genetic risk established using currently available genome-wide association data provides important insights into disease heterogeneity and completeness of current knowledge in MS genetics.
Resumo:
Background The expansion of cell colonies is driven by a delicate balance of several mechanisms including cell motility, cell-to-cell adhesion and cell proliferation. New approaches that can be used to independently identify and quantify the role of each mechanism will help us understand how each mechanism contributes to the expansion process. Standard mathematical modelling approaches to describe such cell colony expansion typically neglect cell-to-cell adhesion, despite the fact that cell-to-cell adhesion is thought to play an important role. Results We use a combined experimental and mathematical modelling approach to determine the cell diffusivity, D, cell-to-cell adhesion strength, q, and cell proliferation rate, ?, in an expanding colony of MM127 melanoma cells. Using a circular barrier assay, we extract several types of experimental data and use a mathematical model to independently estimate D, q and ?. In our first set of experiments, we suppress cell proliferation and analyse three different types of data to estimate D and q. We find that standard types of data, such as the area enclosed by the leading edge of the expanding colony and more detailed cell density profiles throughout the expanding colony, does not provide sufficient information to uniquely identify D and q. We find that additional data relating to the degree of cell-to-cell clustering is required to provide independent estimates of q, and in turn D. In our second set of experiments, where proliferation is not suppressed, we use data describing temporal changes in cell density to determine the cell proliferation rate. In summary, we find that our experiments are best described using the range D = 161 - 243 ?m2 hour-1, q = 0.3 - 0.5 (low to moderate strength) and ? = 0.0305 - 0.0398 hour-1, and with these parameters we can accurately predict the temporal variations in the spatial extent and cell density profile throughout the expanding melanoma cell colony. Conclusions Our systematic approach to identify the cell diffusivity, cell-to-cell adhesion strength and cell proliferation rate highlights the importance of integrating multiple types of data to accurately quantify the factors influencing the spatial expansion of melanoma cell colonies.
Resumo:
The use of Wireless Sensor Networks (WSNs) for vibration-based Structural Health Monitoring (SHM) has become a promising approach due to many advantages such as low cost, fast and flexible deployment. However, inherent technical issues such as data asynchronicity and data loss have prevented these distinct systems from being extensively used. Recently, several SHM-oriented WSNs have been proposed and believed to be able to overcome a large number of technical uncertainties. Nevertheless, there is limited research verifying the applicability of those WSNs with respect to demanding SHM applications like modal analysis and damage identification. Based on a brief review, this paper first reveals that Data Synchronization Error (DSE) is the most inherent factor amongst uncertainties of SHM-oriented WSNs. Effects of this factor are then investigated on outcomes and performance of the most robust Output-only Modal Analysis (OMA) techniques when merging data from multiple sensor setups. The two OMA families selected for this investigation are Frequency Domain Decomposition (FDD) and data-driven Stochastic Subspace Identification (SSI-data) due to the fact that they both have been widely applied in the past decade. Accelerations collected by a wired sensory system on a large-scale laboratory bridge model are initially used as benchmark data after being added with a certain level of noise to account for the higher presence of this factor in SHM-oriented WSNs. From this source, a large number of simulations have been made to generate multiple DSE-corrupted datasets to facilitate statistical analyses. The results of this study show the robustness of FDD and the precautions needed for SSI-data family when dealing with DSE at a relaxed level. Finally, the combination of preferred OMA techniques and the use of the channel projection for the time-domain OMA technique to cope with DSE are recommended.
Resumo:
The ability to build high-fidelity 3D representations of the environment from sensor data is critical for autonomous robots. Multi-sensor data fusion allows for more complete and accurate representations. Furthermore, using distinct sensing modalities (i.e. sensors using a different physical process and/or operating at different electromagnetic frequencies) usually leads to more reliable perception, especially in challenging environments, as modalities may complement each other. However, they may react differently to certain materials or environmental conditions, leading to catastrophic fusion. In this paper, we propose a new method to reliably fuse data from multiple sensing modalities, including in situations where they detect different targets. We first compute distinct continuous surface representations for each sensing modality, with uncertainty, using Gaussian Process Implicit Surfaces (GPIS). Second, we perform a local consistency test between these representations, to separate consistent data (i.e. data corresponding to the detection of the same target by the sensors) from inconsistent data. The consistent data can then be fused together, using another GPIS process, and the rest of the data can be combined as appropriate. The approach is first validated using synthetic data. We then demonstrate its benefit using a mobile robot, equipped with a laser scanner and a radar, which operates in an outdoor environment in the presence of large clouds of airborne dust and smoke.
Resumo:
Live migration of multiple Virtual Machines (VMs) has become an integral management activity in data centers for power saving, load balancing and system maintenance. While state-of-the-art live migration techniques focus on the improvement of migration performance of an independent single VM, only a little has been investigated to the case of live migration of multiple interacting VMs. Live migration is mostly influenced by the network bandwidth and arbitrarily migrating a VM which has data inter-dependencies with other VMs may increase the bandwidth consumption and adversely affect the performances of subsequent migrations. In this paper, we propose a Random Key Genetic Algorithm (RKGA) that efficiently schedules the migration of a given set of VMs accounting both inter-VM dependency and data center communication network. The experimental results show that the RKGA can schedule the migration of multiple VMs with significantly shorter total migration time and total downtime compared to a heuristic algorithm.
Resumo:
This chapter describes decentralized data fusion algorithms for a team of multiple autonomous platforms. Decentralized data fusion (DDF) provides a useful basis with which to build upon for cooperative information gathering tasks for robotic teams operating in outdoor environments. Through the DDF algorithms, each platform can maintain a consistent global solution from which decisions may then be made. Comparisons will be made between the implementation of DDF using two probabilistic representations. The first, Gaussian estimates and the second Gaussian mixtures are compared using a common data set. The overall system design is detailed, providing insight into the overall complexity of implementing a robust DDF system for use in information gathering tasks in outdoor UAV applications.
Resumo:
Background Australian national biomonitoring for persistent organic pollutants (POPs) relies upon age-specific pooled serum samples to characterize central tendencies of concentrations but does not provide estimates of upper bound concentrations. This analysis compares population variation from biomonitoring datasets from the US, Canada, Germany, Spain, and Belgium to identify and test patterns potentially useful for estimating population upper bound reference values for the Australian population. Methods Arithmetic means and the ratio of the 95th percentile to the arithmetic mean (P95:mean) were assessed by survey for defined age subgroups for three polychlorinated biphenyls (PCBs 138, 153, and 180), hexachlorobenzene (HCB), p,p-dichlorodiphenyldichloroethylene (DDE), 2,2′,4,4′ tetrabrominated diphenylether (PBDE 47), perfluorooctanoic acid (PFOA) and perfluorooctane sulfonate (PFOS). Results Arithmetic mean concentrations of each analyte varied widely across surveys and age groups. However, P95:mean ratios differed to a limited extent, with no systematic variation across ages. The average P95:mean ratios were 2.2 for the three PCBs and HCB; 3.0 for DDE; 2.0 and 2.3 for PFOA and PFOS, respectively. The P95:mean ratio for PBDE 47 was more variable among age groups, ranging from 2.7 to 4.8. The average P95:mean ratios accurately estimated age group-specific P95s in the Flemish Environmental Health Survey II and were used to estimate the P95s for the Australian population by age group from the pooled biomonitoring data. Conclusions Similar population variation patterns for POPs were observed across multiple surveys, even when absolute concentrations differed widely. These patterns can be used to estimate population upper bounds when only pooled sampling data are available.
Resumo:
The reliance on police data for the counting of road crash injuries can be problematic, as it is well known that not all road crash injuries are reported to police which under-estimates the overall burden of road crash injuries. The aim of this study was to use multiple linked data sources to estimate the extent of under-reporting of road crash injuries to police in the Australian state of Queensland. Data from the Queensland Road Crash Database (QRCD), the Queensland Hospital Admitted Patients Data Collection (QHAPDC), Emergency Department Information System (EDIS), and the Queensland Injury Surveillance Unit (QISU) for the year 2009 were linked. The completeness of road crash cases reported to police was examined via discordance rates between the police data (QRCD) and the hospital data collections. In addition, the potential bias of this discordance (under-reporting) was assessed based on gender, age, road user group, and regional location. Results showed that the level of under-reporting varied depending on the data set with which the police data was compared. When all hospital data collections are examined together the estimated population of road crash injuries was approximately 28,000, with around two-thirds not linking to any record in the police data. The results also showed that the under-reporting was more likely for motorcyclists, cyclists, males, young people, and injuries occurring in Remote and Inner Regional areas. These results have important implications for road safety research and policy in terms of: prioritising funding and resources; targeting road safety interventions into areas of higher risk; and estimating the burden of road crash injuries.
Resumo:
Fusing data from multiple sensing modalities, e.g. laser and radar, is a promising approach to achieve resilient perception in challenging environmental conditions. However, this may lead to \emph{catastrophic fusion} in the presence of inconsistent data, i.e. when the sensors do not detect the same target due to distinct attenuation properties. It is often difficult to discriminate consistent from inconsistent data across sensing modalities using local spatial information alone. In this paper we present a novel consistency test based on the log marginal likelihood of a Gaussian process model that evaluates data from range sensors in a relative manner. A new data point is deemed to be consistent if the model statistically improves as a result of its fusion. This approach avoids the need for absolute spatial distance threshold parameters as required by previous work. We report results from object reconstruction with both synthetic and experimental data that demonstrate an improvement in reconstruction quality, particularly in cases where data points are inconsistent yet spatially proximal.
Resumo:
Although live VM migration has been intensively studied, the problem of live migration of multiple interdependent VMs has hardly been investigated. The most important problem in the live migration of multiple interdependent VMs is how to schedule VM migrations as the schedule will directly affect the total migration time and the total downtime of those VMs. Aiming at minimizing both the total migration time and the total downtime simultaneously, this paper presents a Strength Pareto Evolutionary Algorithm 2 (SPEA2) for the multi-VM migration scheduling problem. The SPEA2 has been evaluated by experiments, and the experimental results show that the SPEA2 can generate a set of VM migration schedules with a shorter total migration time and a shorter total downtime than an existing genetic algorithm, namely Random Key Genetic Algorithm (RKGA). This paper also studies the scalability of the SPEA2.
Resumo:
Though difficult, the study of gene-environment interactions in multifactorial diseases is crucial for interpreting the relevance of non-heritable factors and prevents from overlooking genetic associations with small but measurable effects. We propose a "candidate interactome" (i.e. a group of genes whose products are known to physically interact with environmental factors that may be relevant for disease pathogenesis) analysis of genome-wide association data in multiple sclerosis. We looked for statistical enrichment of associations among interactomes that, at the current state of knowledge, may be representative of gene-environment interactions of potential, uncertain or unlikely relevance for multiple sclerosis pathogenesis: Epstein-Barr virus, human immunodeficiency virus, hepatitis B virus, hepatitis C virus, cytomegalovirus, HHV8-Kaposi sarcoma, H1N1-influenza, JC virus, human innate immunity interactome for type I interferon, autoimmune regulator, vitamin D receptor, aryl hydrocarbon receptor and a panel of proteins targeted by 70 innate immune-modulating viral open reading frames from 30 viral species. Interactomes were either obtained from the literature or were manually curated. The P values of all single nucleotide polymorphism mapping to a given interactome were obtained from the last genome-wide association study of the International Multiple Sclerosis Genetics Consortium & the Wellcome Trust Case Control Consortium, 2. The interaction between genotype and Epstein Barr virus emerges as relevant for multiple sclerosis etiology. However, in line with recent data on the coexistence of common and unique strategies used by viruses to perturb the human molecular system, also other viruses have a similar potential, though probably less relevant in epidemiological terms. © 2013 Mechelli et al.
Resumo:
Environmental data usually include measurements, such as water quality data, which fall below detection limits, because of limitations of the instruments or of certain analytical methods used. The fact that some responses are not detected needs to be properly taken into account in statistical analysis of such data. However, it is well-known that it is challenging to analyze a data set with detection limits, and we often have to rely on the traditional parametric methods or simple imputation methods. Distributional assumptions can lead to biased inference and justification of distributions is often not possible when the data are correlated and there is a large proportion of data below detection limits. The extent of bias is usually unknown. To draw valid conclusions and hence provide useful advice for environmental management authorities, it is essential to develop and apply an appropriate statistical methodology. This paper proposes rank-based procedures for analyzing non-normally distributed data collected at different sites over a period of time in the presence of multiple detection limits. To take account of temporal correlations within each site, we propose an optimal linear combination of estimating functions and apply the induced smoothing method to reduce the computational burden. Finally, we apply the proposed method to the water quality data collected at Susquehanna River Basin in United States of America, which dearly demonstrates the advantages of the rank regression models.
Resumo:
The extended recruitment season for short-lived species such as prawns biases the estimation of growth parameters from length-frequency data when conventional methods are used. We propose a simple method for overcoming this bias given a time series of length-frequency data. The difficulties arising from extended recruitment are eliminated by predicting the growth of the succeeding samples and the length increments of the recruits in previous samples. This method requires that some maximum size at recruitment can be specified. The advantages of this multiple length-frequency method are: it is simple to use; it requires only three parameters; no specific distributions need to be assumed; and the actual seasonal recruitment pattern does not have to be specified. We illustrate the new method with length-frequency data on the tiger prawn Penaeus esculentus from the north-western Gulf of Carpentaria, Australia.