982 resultados para Data handling


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modern technology has allowed real-time data collection in a variety of domains, ranging from environmental monitoring to healthcare. Consequently, there is a growing need for algorithms capable of performing inferential tasks in an online manner, continuously revising their estimates to reflect the current status of the underlying process. In particular, we are interested in constructing online and temporally adaptive classifiers capable of handling the possibly drifting decision boundaries arising in streaming environments. We first make a quadratic approximation to the log-likelihood that yields a recursive algorithm for fitting logistic regression online. We then suggest a novel way of equipping this framework with self-tuning forgetting factors. The resulting scheme is capable of tracking changes in the underlying probability distribution, adapting the decision boundary appropriately and hence maintaining high classification accuracy in dynamic or unstable environments. We demonstrate the scheme's effectiveness in both real and simulated streaming environments. © Springer-Verlag 2009.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Molecular markers have been demonstrated to be useful for the estimation of stock mixture proportions where the origin of individuals is determined from baseline samples. Bayesian statistical methods are widely recognized as providing a preferable strategy for such analyses. In general, Bayesian estimation is based on standard latent class models using data augmentation through Markov chain Monte Carlo techniques. In this study, we introduce a novel approach based on recent developments in the estimation of genetic population structure. Our strategy combines analytical integration with stochastic optimization to identify stock mixtures. An important enhancement over previous methods is the possibility of appropriately handling data where only partial baseline sample information is available. We address the potential use of nonmolecular, auxiliary biological information in our Bayesian model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A baseline survey for the project which had been conducted in 2009 had gaps that could not allow assessment of project performance in the outcome and impact indicators to be made. This study was, therefore, commissioned to reconstruct the baseline data, aligned to the impact and outcome indicators on the project logframe and results framework, against which project achievements could be assessed. The purpose and scope of the study was to reconstruct the baseline data and analysis describing the situation prior to QAFM Project inception, taking 2008 as the baseline year, which was aligned to the project logframe outcome and impact indicators; to collect data on current status to compare project outcome (and where possible impact) in improved fish handling sites in comparison with the baseline as well as with comparable non-improved fish landing sites as control group. The study was conducted through secondary data search from sources at NaFIRRI, DFR and ICEIDA. Field data collection was carried out using a sample survey covering 312 respondents including boat and gear owners, crew members, processors and traders at eight project and two control landing sites. Key Informant Interviews were conducted with DFOs and BMU leaders in the study districts and landing sites respectively.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

RFID is a technology that enables the automated capture of observations of uniquely identified physical objects as they move through supply chains. Discovery Services provide links to repositories that have traceability information about specific physical objects. Each supply chain party publishes records to a Discovery Service to create such links and also specifies access control policies to restrict who has visibility of link information, since it is commercially sensitive and could reveal inventory levels, flow patterns, trading relationships, etc. The requirement of being able to share information on a need-to-know basis, e.g. within the specific chain of custody of an individual object, poses a particular challenge for authorization and access control, because in many supply chain situations the information owner might not have sufficient knowledge about all the companies who should be authorized to view the information, because the path taken by an individual physical object only emerges over time, rather than being fully pre-determined at the time of manufacture. This led us to consider novel approaches to delegate trust and to control access to information. This paper presents an assessment of visibility restriction mechanisms for Discovery Services capable of handling emergent object paths. We compare three approaches: enumerated access control (EAC), chain-of-communication tokens (CCT), and chain-of-trust assertions (CTA). A cost model was developed to estimate the additional cost of restricting visibility in a baseline traceability system and the estimates were used to compare the approaches and to discuss the trade-offs. © 2012 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract—Personal communication devices are increasingly being equipped with sensors that are able to passively collect information from their surroundings – information that could be stored in fairly small local caches. We envision a system in which users of such devices use their collective sensing, storage, and communication resources to query the state of (possibly remote) neighborhoods. The goal of such a system is to achieve the highest query success ratio using the least communication overhead (power). We show that the use of Data Centric Storage (DCS), or directed placement, is a viable approach for achieving this goal, but only when the underlying network is well connected. Alternatively, we propose, amorphous placement, in which sensory samples are cached locally and informed exchanges of cached samples is used to diffuse the sensory data throughout the whole network. In handling queries, the local cache is searched first for potential answers. If unsuccessful, the query is forwarded to one or more direct neighbors for answers. This technique leverages node mobility and caching capabilities to avoid the multi-hop communication overhead of directed placement. Using a simplified mobility model, we provide analytical lower and upper bounds on the ability of amorphous placement to achieve uniform field coverage in one and two dimensions. We show that combining informed shuffling of cached samples upon an encounter between two nodes, with the querying of direct neighbors could lead to significant performance improvements. For instance, under realistic mobility models, our simulation experiments show that amorphous placement achieves 10% to 40% better query answering ratio at a 25% to 35% savings in consumed power over directed placement.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Training data for supervised learning neural networks can be clustered such that the input/output pairs in each cluster are redundant. Redundant training data can adversely affect training time. In this paper we apply two clustering algorithms, ART2 -A and the Generalized Equality Classifier, to identify training data clusters and thus reduce the training data and training time. The approach is demonstrated for a high dimensional nonlinear continuous time mapping. The demonstration shows six-fold decrease in training time at little or no loss of accuracy in the handling of evaluation data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The contribution of buildings towards total worldwide energy consumption in developed countries is between 20% and 40%. Heating Ventilation and Air Conditioning (HVAC), and more specifically Air Handling Units (AHUs) energy consumption accounts on average for 40% of a typical medical device manufacturing or pharmaceutical facility’s energy consumption. Studies have indicated that 20 – 30% energy savings are achievable by recommissioning HVAC systems, and more specifically AHU operations, to rectify faulty operation. Automated Fault Detection and Diagnosis (AFDD) is a process concerned with potentially partially or fully automating the commissioning process through the detection of faults. An expert system is a knowledge-based system, which employs Artificial Intelligence (AI) methods to replicate the knowledge of a human subject matter expert, in a particular field, such as engineering, medicine, finance and marketing, to name a few. This thesis details the research and development work undertaken in the development and testing of a new AFDD expert system for AHUs which can be installed in minimal set up time on a large cross section of AHU types in a building management system vendor neutral manner. Both simulated and extensive field testing was undertaken against a widely available and industry known expert set of rules known as the Air Handling Unit Performance Assessment Rules (APAR) (and a later more developed version known as APAR_extended) in order to prove its effectiveness. Specifically, in tests against a dataset of 52 simulated faults, this new AFDD expert system identified all 52 derived issues whereas the APAR ruleset identified just 10. In tests using actual field data from 5 operating AHUs in 4 manufacturing facilities, the newly developed AFDD expert system for AHUs was shown to identify four individual fault case categories that the APAR method did not, as well as showing improvements made in the area of fault diagnosis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is estimated that the quantity of digital data being transferred, processed or stored at any one time currently stands at 4.4 zettabytes (4.4 × 2 70 bytes) and this figure is expected to have grown by a factor of 10 to 44 zettabytes by 2020. Exploiting this data is, and will remain, a significant challenge. At present there is the capacity to store 33% of digital data in existence at any one time; by 2020 this capacity is expected to fall to 15%. These statistics suggest that, in the era of Big Data, the identification of important, exploitable data will need to be done in a timely manner. Systems for the monitoring and analysis of data, e.g. stock markets, smart grids and sensor networks, can be made up of massive numbers of individual components. These components can be geographically distributed yet may interact with one another via continuous data streams, which in turn may affect the state of the sender or receiver. This introduces a dynamic causality, which further complicates the overall system by introducing a temporal constraint that is difficult to accommodate. Practical approaches to realising the system described above have led to a multiplicity of analysis techniques, each of which concentrates on specific characteristics of the system being analysed and treats these characteristics as the dominant component affecting the results being sought. The multiplicity of analysis techniques introduces another layer of heterogeneity, that is heterogeneity of approach, partitioning the field to the extent that results from one domain are difficult to exploit in another. The question is asked can a generic solution for the monitoring and analysis of data that: accommodates temporal constraints; bridges the gap between expert knowledge and raw data; and enables data to be effectively interpreted and exploited in a transparent manner, be identified? The approach proposed in this dissertation acquires, analyses and processes data in a manner that is free of the constraints of any particular analysis technique, while at the same time facilitating these techniques where appropriate. Constraints are applied by defining a workflow based on the production, interpretation and consumption of data. This supports the application of different analysis techniques on the same raw data without the danger of incorporating hidden bias that may exist. To illustrate and to realise this approach a software platform has been created that allows for the transparent analysis of data, combining analysis techniques with a maintainable record of provenance so that independent third party analysis can be applied to verify any derived conclusions. In order to demonstrate these concepts, a complex real world example involving the near real-time capturing and analysis of neurophysiological data from a neonatal intensive care unit (NICU) was chosen. A system was engineered to gather raw data, analyse that data using different analysis techniques, uncover information, incorporate that information into the system and curate the evolution of the discovered knowledge. The application domain was chosen for three reasons: firstly because it is complex and no comprehensive solution exists; secondly, it requires tight interaction with domain experts, thus requiring the handling of subjective knowledge and inference; and thirdly, given the dearth of neurophysiologists, there is a real world need to provide a solution for this domain

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Dropouts and missing data are nearly-ubiquitous in obesity randomized controlled trails, threatening validity and generalizability of conclusions. Herein, we meta-analytically evaluate the extent of missing data, the frequency with which various analytic methods are employed to accommodate dropouts, and the performance of multiple statistical methods. METHODOLOGY/PRINCIPAL FINDINGS: We searched PubMed and Cochrane databases (2000-2006) for articles published in English and manually searched bibliographic references. Articles of pharmaceutical randomized controlled trials with weight loss or weight gain prevention as major endpoints were included. Two authors independently reviewed each publication for inclusion. 121 articles met the inclusion criteria. Two authors independently extracted treatment, sample size, drop-out rates, study duration, and statistical method used to handle missing data from all articles and resolved disagreements by consensus. In the meta-analysis, drop-out rates were substantial with the survival (non-dropout) rates being approximated by an exponential decay curve (e(-lambdat)) where lambda was estimated to be .0088 (95% bootstrap confidence interval: .0076 to .0100) and t represents time in weeks. The estimated drop-out rate at 1 year was 37%. Most studies used last observation carried forward as the primary analytic method to handle missing data. We also obtained 12 raw obesity randomized controlled trial datasets for empirical analyses. Analyses of raw randomized controlled trial data suggested that both mixed models and multiple imputation performed well, but that multiple imputation may be more robust when missing data are extensive. CONCLUSION/SIGNIFICANCE: Our analysis offers an equation for predictions of dropout rates useful for future study planning. Our raw data analyses suggests that multiple imputation is better than other methods for handling missing data in obesity randomized controlled trials, followed closely by mixed models. We suggest these methods supplant last observation carried forward as the primary method of analysis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper is concerned with handling uncertainty as part of the analysis of data from a medical study. The study is investigating connections between the birth weight of babies and the dietary intake of their mothers. Bayesian belief networks were used in the analysis. Their perceived benefits include (i) an ability to represent the evidence emerging from the evolving study, dealing effectively with the inherent uncertainty involved; (ii) providing a way of representing evidence graphically to facilitate analysis and communication with clinicians; (iii) helping in the exploration of the data to reveal undiscovered knowledge; and (iv) providing a means of developing an expert system application.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In studies of radiation-induced DNA fragmentation and repair, analytical models may provide rapid and easy-to-use methods to test simple hypotheses regarding the breakage and rejoining mechanisms involved. The random breakage model, according to which lesions are distributed uniformly and independently of each other along the DNA, has been the model most used to describe spatial distribution of radiation-induced DNA damage. Recently several mechanistic approaches have been proposed that model clustered damage to DNA. In general, such approaches focus on the study of initial radiation-induced DNA damage and repair, without considering the effects of additional (unwanted and unavoidable) fragmentation that may take place during the experimental procedures. While most approaches, including measurement of total DNA mass below a specified value, allow for the occurrence of background experimental damage by means of simple subtractive procedures, a more detailed analysis of DNA fragmentation necessitates a more accurate treatment. We have developed a new, relatively simple model of DNA breakage and the resulting rejoining kinetics of broken fragments. Initial radiation-induced DNA damage is simulated using a clustered breakage approach, with three free parameters: the number of independently located clusters, each containing several DNA double-strand breaks (DSBs), the average number of DSBs within a cluster (multiplicity of the cluster), and the maximum allowed radius within which DSBs belonging to the same cluster are distributed. Random breakage is simulated as a special case of the DSB clustering procedure. When the model is applied to the analysis of DNA fragmentation as measured with pulsed-field gel electrophoresis (PFGE), the hypothesis that DSBs in proximity rejoin at a different rate from that of sparse isolated breaks can be tested, since the kinetics of rejoining of fragments of varying size may be followed by means of computer simulations. The problem of how to account for background damage from experimental handling is also carefully considered. We have shown that the conventional procedure of subtracting the background damage from the experimental data may lead to erroneous conclusions during the analysis of both initial fragmentation and DSB rejoining. Despite its relative simplicity, the method presented allows both the quantitative and qualitative description of radiation-induced DNA fragmentation and subsequent rejoining of double-stranded DNA fragments. (C) 2004 by Radiation Research Society.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The member states of the European Union are faced with the challenges of handling “big data” as well as with a growing impact of the supranational level. Given that the success of efforts at European level strongly depends on corresponding national and local activities, i.e., the quality of implementation and the degree of consistency, this chapter centers upon the coherence of European strategies and national implementations concerning the reuse of public sector information. Taking the City of Vienna’s open data activities as an illustrative example, we seek an answer to the question whether and to what extent developments at European level and other factors have an effect on local efforts towards open data. We find that the European Commission’s ambitions are driven by a strong economic argumentation, while the efforts of the City of Vienna have only very little to do with the European orientation and are rather dominated by lifestyle and administrative reform arguments. Hence, we observe a decoupling of supranational strategies and national implementation activities. The very reluctant attitude at Austrian federal level might be one reason for this, nationally induced barriers—such as the administrative culture—might be another. In order to enhance the correspondence between the strategies of the supranational level and those of the implementers at national and regional levels, the strengthening of soft law measures could be promising.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Retrospective clinical datasets are often characterized by a relatively small sample size and many missing data. In this case, a common way for handling the missingness consists in discarding from the analysis patients with missing covariates, further reducing the sample size. Alternatively, if the mechanism that generated the missing allows, incomplete data can be imputed on the basis of the observed data, avoiding the reduction of the sample size and allowing methods to deal with complete data later on. Moreover, methodologies for data imputation might depend on the particular purpose and might achieve better results by considering specific characteristics of the domain. The problem of missing data treatment is studied in the context of survival tree analysis for the estimation of a prognostic patient stratification. Survival tree methods usually address this problem by using surrogate splits, that is, splitting rules that use other variables yielding similar results to the original ones. Instead, our methodology consists in modeling the dependencies among the clinical variables with a Bayesian network, which is then used to perform data imputation, thus allowing the survival tree to be applied on the completed dataset. The Bayesian network is directly learned from the incomplete data using a structural expectation–maximization (EM) procedure in which the maximization step is performed with an exact anytime method, so that the only source of approximation is due to the EM formulation itself. On both simulated and real data, our proposed methodology usually outperformed several existing methods for data imputation and the imputation so obtained improved the stratification estimated by the survival tree (especially with respect to using surrogate splits).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Perfect information is seldom available to man or machines due to uncertainties inherent in real world problems. Uncertainties in geographic information systems (GIS) stem from either vague/ambiguous or imprecise/inaccurate/incomplete information and it is necessary for GIS to develop tools and techniques to manage these uncertainties. There is a widespread agreement in the GIS community that although GIS has the potential to support a wide range of spatial data analysis problems, this potential is often hindered by the lack of consistency and uniformity. Uncertainties come in many shapes and forms, and processing uncertain spatial data requires a practical taxonomy to aid decision makers in choosing the most suitable data modeling and analysis method. In this paper, we: (1) review important developments in handling uncertainties when working with spatial data and GIS applications; (2) propose a taxonomy of models for dealing with uncertainties in GIS; and (3) identify current challenges and future research directions in spatial data analysis and GIS for managing uncertainties.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper demonstrates the unparalleled value of full scale data which has been acquired from ocean trials of Aquamarine Power’s Oyster 800 Wave Energy Converter (WEC) at the European Marine Energy Centre (EMEC), Orkney, Scotland.
High quality prototype and wave data were simultaneously recorded in over 750 distinct sea states (comprising different wave height, wave period and tidal height combinations) and include periods of operation where the hydraulic Power Take-Off (PTO) system was both pressurised (damped operation) and de-pressurised (undamped operation).
A detailed model-prototype correlation procedure is presented where the full scale prototype behaviour is compared to predictions from both experimental and numerical modelling techniques via a high temporal resolution wave-by-wave reconstruction. This unquestionably provides the definitive verification of the capabilities of such research techniques and facilitates a robust and meaningful uncertainty analysis to be performed on their outputs.
The importance of a good data capture methodology, both in terms of handling and accuracy is also presented. The techniques and procedures implemented by Aquamarine Power for real-time data management are discussed, including lessons learned on the instrumentation and infrastructure required to collect high-value data.