989 resultados para Imprecise Data
Resumo:
Objective: To determine whether there are clinical and public health dilemmas resulting from the reproducibility of routine vitamin D assays. Methods: Blinded agreement studies were conducted in eight clinical laboratories using two commonly used assays to measure serum 25-hydroxyvitamin D (25(OH)D) levels in Australasia and Canada (DiaSorin Radioimmunoassay (RIA) and DiaSorin LIAISON® one). Results: Only one laboratory measured 25(OH)D with excellent precision. Replicate 25(OH)D measurements varied by up to 97% and 15% of paired results differed by more than 50%. Thirteen percent of subjects received one result indicating insufficiency [25-50 nmol/l] and another suggesting adequacy [>50 nmol/l]). Agreement ranged from poor to excellent for laboratories using the manual RIA, while the precision of the semi-automated Liaison® system was consistently poor. Conclusions: Recent interest in the relevance of vitamin D to human health has increased demand for 25(OH)D testing and associated costs. Our results suggest clinicians and public health authorities are making decisions about treatment or changes to public health policy based on imprecise data. Clinicians, researchers and policy makers should be made aware of the imprecision of current 25(OH)D testing so that they exercise caution when using these assays for clinical practice, and when interpreting the findings of epidemiological studies based on vitamin D levels measured using these assays. Development of a rapid, reproducible, accurate and robust assay should be a priority due to interest in populationbased screening programs and research to inform public health policy about the amount of sun exposure required for human health. In the interim, 25(OH)D results should routinely include a statement of measurement uncertainty.
Resumo:
Data envelopment analysis (DEA) as introduced by Charnes, Cooper, and Rhodes (1978) is a linear programming technique that has widely been used to evaluate the relative efficiency of a set of homogenous decision making units (DMUs). In many real applications, the input-output variables cannot be precisely measured. This is particularly important in assessing efficiency of DMUs using DEA, since the efficiency score of inefficient DMUs are very sensitive to possible data errors. Hence, several approaches have been proposed to deal with imprecise data. Perhaps the most popular fuzzy DEA model is based on a-cut. One drawback of the a-cut approach is that it cannot include all information about uncertainty. This paper aims to introduce an alternative linear programming model that can include some uncertainty information from the intervals within the a-cut approach. We introduce the concept of "local a-level" to develop a multi-objective linear programming to measure the efficiency of DMUs under uncertainty. An example is given to illustrate the use of this method.
Resumo:
Although crisp data are fundamentally indispensable for determining the profit Malmquist productivity index (MPI), the observed values in real-world problems are often imprecise or vague. These imprecise or vague data can be suitably characterized with fuzzy and interval methods. In this paper, we reformulate the conventional profit MPI problem as an imprecise data envelopment analysis (DEA) problem, and propose two novel methods for measuring the overall profit MPI when the inputs, outputs, and price vectors are fuzzy or vary in intervals. We develop a fuzzy version of the conventional MPI model by using a ranking method, and solve the model with a commercial off-the-shelf DEA software package. In addition, we define an interval for the overall profit MPI of each decision-making unit (DMU) and divide the DMUs into six groups according to the intervals obtained for their overall profit efficiency and MPIs. We also present two numerical examples to demonstrate the applicability of the two proposed models and exhibit the efficacy of the procedures and algorithms. © 2011 Elsevier Ltd.
Resumo:
Lattice valued fuzziness is more general than crispness or fuzziness based on the unit interval. In this work, we present a query language for a lattice based fuzzy database. We define a Lattice Fuzzy Structured Query Language (LFSQL) taking its membership values from an arbitrary lattice L. LFSQL can handle, manage and represent crisp values, linear ordered membership degrees and also allows membership degrees from lattices with non-comparable values. This gives richer membership degrees, and hence makes LFSQL more flexible than FSQL or SQL. In order to handle vagueness or imprecise information, every entry into an L-fuzzy database is an L-fuzzy set instead of crisp values. All of this makes LFSQL an ideal query language to handle imprecise data where some factors are non-comparable. After defining the syntax of the language formally, we provide its semantics using L-fuzzy sets and relations. The semantics can be used in future work to investigate concepts such as functional dependencies. Last but not least, we present a parser for LFSQL implemented in Haskell.
Resumo:
This work presents a hybrid approach for the supplier selection problem in Supply Chain Management. We joined decision-making philosophy by researchers from business school and researchers from engineering in order to deal with the problem more extensively. We utilized traditional multicriteria decision-making methods, like AHP and TOPSIS, in order to evaluate alternatives according decision maker s preferences. The both techiniques were modeled by using definitions from the Fuzzy Sets Theory to deal with imprecise data. Additionally, we proposed a multiobjetive GRASP algorithm to perform an order allocation procedure between all pre-selected alternatives. These alternatives must to be pre-qualified on the basis of the AHP and TOPSIS methods before entering the LCR. Our allocation procedure has presented low CPU times for five pseudorandom instances, containing up to 1000 alternatives, as well as good values for all considered objectives. This way, we consider the proposed model as appropriate to solve the supplier selection problem in the SCM context. It can be used to help decision makers in reducing lead times, cost and risks in their supply chain. The proposed model can also improve firm s efficiency in relation to business strategies, according decision makers, even when a large number of alternatives must be considered, differently from classical models in purchasing literature
Resumo:
The objective of this research is to investigate the consequences of sharing or using information generated in one phase of the project to subsequent life cycle phases. Sometimes the assumptions supporting the information change, and at other times the context within which the information was created changes in a way that causes the information to become invalid. Often these inconsistencies are not discovered till the damage has occurred. This study builds on previous research that proposed a framework based on the metaphor of ‘ecosystems’ to model such inconsistencies in the 'supply chain' of life cycle information (Brokaw and Mukherjee, 2012). The outcome of such inconsistencies often results in litigation. Therefore, this paper studies a set of legal cases that resulted from inconsistencies in life cycle information, within the ecosystems framework. For each project, the errant information type, creator and user of the information and their relationship, time of creation and usage of the information in the life cycle of the project are investigated to assess the causes of failure of precise and accurate information flow as well as the impact of such failures in later stages of the project. The analysis shows that the misleading information is mostly due to lack of collaboration. Besides, in all the studied cases, lack of compliance checking, imprecise data and insufficient clarifications hinder accurate and smooth flow of information. The paper presents findings regarding the bottleneck of the information flow process during the design, construction and post construction phases. It also highlights the role of collaboration as well as information integration and management during the project life cycle and presents a baseline for improvement in information supply chain through the life cycle of the project.
Resumo:
Linear programming (LP) is the most widely used optimization technique for solving real-life problems because of its simplicity and efficiency. Although conventional LP models require precise data, managers and decision makers dealing with real-world optimization problems often do not have access to exact values. Fuzzy sets have been used in the fuzzy LP (FLP) problems to deal with the imprecise data in the decision variables, objective function and/or the constraints. The imprecisions in the FLP problems could be related to (1) the decision variables; (2) the coefficients of the decision variables in the objective function; (3) the coefficients of the decision variables in the constraints; (4) the right-hand-side of the constraints; or (5) all of these parameters. In this paper, we develop a new stepwise FLP model where fuzzy numbers are considered for the coefficients of the decision variables in the objective function, the coefficients of the decision variables in the constraints and the right-hand-side of the constraints. In the first step, we use the possibility and necessity relations for fuzzy constraints without considering the fuzzy objective function. In the subsequent step, we extend our method to the fuzzy objective function. We use two numerical examples from the FLP literature for comparison purposes and to demonstrate the applicability of the proposed method and the computational efficiency of the procedures and algorithms. © 2013-IOS Press and the authors. All rights reserved.
Resumo:
Stochastic arithmetic has been developed as a model for exact computing with imprecise data. Stochastic arithmetic provides confidence intervals for the numerical results and can be implemented in any existing numerical software by redefining types of the variables and overloading the operators on them. Here some properties of stochastic arithmetic are further investigated and applied to the computation of inner products and the solution to linear systems. Several numerical experiments are performed showing the efficiency of the proposed approach.
Resumo:
The wide range of contributing factors and circumstances surrounding crashes on road curves suggest that no single intervention can prevent these crashes. This paper presents a novel methodology, based on data mining techniques, to identify contributing factors and the relationship between them. It identifies contributing factors that influence the risk of a crash. Incident records, described using free text, from a large insurance company were analysed with rough set theory. Rough set theory was used to discover dependencies among data, and reasons using the vague, uncertain and imprecise information that characterised the insurance dataset. The results show that male drivers, who are between 50 and 59 years old, driving during evening peak hours are involved with a collision, had a lowest crash risk. Drivers between 25 and 29 years old, driving from around midnight to 6 am and in a new car has the highest risk. The analysis of the most significant contributing factors on curves suggests that drivers with driving experience of 25 to 42 years, who are driving a new vehicle have the highest crash cost risk, characterised by the vehicle running off the road and hitting a tree. This research complements existing statistically based tools approach to analyse road crashes. Our data mining approach is supported with proven theory and will allow road safety practitioners to effectively understand the dependencies between contributing factors and the crash type with the view to designing tailored countermeasures.
Resumo:
Hydrologic impacts of climate change are usually assessed by downscaling the General Circulation Model (GCM) output of large-scale climate variables to local-scale hydrologic variables. Such an assessment is characterized by uncertainty resulting from the ensembles of projections generated with multiple GCMs, which is known as intermodel or GCM uncertainty. Ensemble averaging with the assignment of weights to GCMs based on model evaluation is one of the methods to address such uncertainty and is used in the present study for regional-scale impact assessment. GCM outputs of large-scale climate variables are downscaled to subdivisional-scale monsoon rainfall. Weights are assigned to the GCMs on the basis of model performance and model convergence, which are evaluated with the Cumulative Distribution Functions (CDFs) generated from the downscaled GCM output (for both 20th Century [20C3M] and future scenarios) and observed data. Ensemble averaging approach, with the assignment of weights to GCMs, is characterized by the uncertainty caused by partial ignorance, which stems from nonavailability of the outputs of some of the GCMs for a few scenarios (in Intergovernmental Panel on Climate Change [IPCC] data distribution center for Assessment Report 4 [AR4]). This uncertainty is modeled with imprecise probability, i.e., the probability being represented as an interval gray number. Furthermore, the CDF generated with one GCM is entirely different from that with another and therefore the use of multiple GCMs results in a band of CDFs. Representing this band of CDFs with a single valued weighted mean CDF may be misleading. Such a band of CDFs can only be represented with an envelope that contains all the CDFs generated with a number of GCMs. Imprecise CDF represents such an envelope, which not only contains the CDFs generated with all the available GCMs but also to an extent accounts for the uncertainty resulting from the missing GCM output. This concept of imprecise probability is also validated in the present study. The imprecise CDFs of monsoon rainfall are derived for three 30-year time slices, 2020s, 2050s and 2080s, with A1B, A2 and B1 scenarios. The model is demonstrated with the prediction of monsoon rainfall in Orissa meteorological subdivision, which shows a possible decreasing trend in the future.
Resumo:
Uncertainty plays an important role in water quality management problems. The major sources of uncertainty in a water quality management problem are the random nature of hydrologic variables and imprecision (fuzziness) associated with goals of the dischargers and pollution control agencies (PCA). Many Waste Load Allocation (WLA)problems are solved by considering these two sources of uncertainty. Apart from randomness and fuzziness, missing data in the time series of a hydrologic variable may result in additional uncertainty due to partial ignorance. These uncertainties render the input parameters as imprecise parameters in water quality decision making. In this paper an Imprecise Fuzzy Waste Load Allocation Model (IFWLAM) is developed for water quality management of a river system subject to uncertainty arising from partial ignorance. In a WLA problem, both randomness and imprecision can be addressed simultaneously by fuzzy risk of low water quality. A methodology is developed for the computation of imprecise fuzzy risk of low water quality, when the parameters are characterized by uncertainty due to partial ignorance. A Monte-Carlo simulation is performed to evaluate the imprecise fuzzy risk of low water quality by considering the input variables as imprecise. Fuzzy multiobjective optimization is used to formulate the multiobjective model. The model developed is based on a fuzzy multiobjective optimization problem with max-min as the operator. This usually does not result in a unique solution but gives multiple solutions. Two optimization models are developed to capture all the decision alternatives or multiple solutions. The objective of the two optimization models is to obtain a range of fractional removal levels for the dischargers, such that the resultant fuzzy risk will be within acceptable limits. Specification of a range for fractional removal levels enhances flexibility in decision making. The methodology is demonstrated with a case study of the Tunga-Bhadra river system in India.
Resumo:
We present TANC, a TAN classifier (tree-augmented naive) based on imprecise probabilities. TANC models prior near-ignorance via the Extreme Imprecise Dirichlet Model (EDM). A first contribution of this paper is the experimental comparison between EDM and the global Imprecise Dirichlet Model using the naive credal classifier (NCC), with the aim of showing that EDM is a sensible approximation of the global IDM. TANC is able to deal with missing data in a conservative manner by considering all possible completions (without assuming them to be missing-at-random), but avoiding an exponential increase of the computational time. By experiments on real data sets, we show that TANC is more reliable than the Bayesian TAN and that it provides better performance compared to previous TANs based on imprecise probabilities. Yet, TANC is sometimes outperformed by NCC because the learned TAN structures are too complex; this calls for novel algorithms for learning the TAN structures, better suited for an imprecise probability classifier.
Resumo:
In this paper we present TANC, i.e., a tree-augmented naive credal classifier based on imprecise probabilities; it models prior near-ignorance via the Extreme Imprecise Dirichlet Model (EDM) (Cano et al., 2007) and deals conservatively with missing data in the training set, without assuming them to be missing-at-random. The EDM is an approximation of the global Imprecise Dirichlet Model (IDM), which considerably simplifies the computation of upper and lower probabilities; yet, having been only recently introduced, the quality of the provided approximation needs still to be verified. As first contribution, we extensively compare the output of the naive credal classifier (one of the few cases in which the global IDM can be exactly implemented) when learned with the EDM and the global IDM; the output of the classifier appears to be identical in the vast majority of cases, thus supporting the adoption of the EDM in real classification problems. Then, by experiments we show that TANC is more reliable than the precise TAN (learned with uniform prior), and also that it provides better performance compared to a previous (Zaffalon, 2003) TAN model based on imprecise probabilities. TANC treats missing data by considering all possible completions of the training set, but avoiding an exponential increase of the computational times; eventually, we present some preliminary results with missing data.
Resumo:
Background: Long working hours might increase the risk of cardiovascular disease, but prospective evidence is scarce, imprecise, and mostly limited to coronary heart disease. We aimed to assess long working hours as a risk factor for incident coronary heart disease and stroke.
Methods We identified published studies through a systematic review of PubMed and Embase from inception to Aug 20, 2014. We obtained unpublished data for 20 cohort studies from the Individual-Participant-Data Meta-analysis in Working Populations (IPD-Work) Consortium and open-access data archives. We used cumulative random-effects meta-analysis to combine effect estimates from published and unpublished data.
Findings We included 25 studies from 24 cohorts in Europe, the USA, and Australia. The meta-analysis of coronary heart disease comprised data for 603 838 men and women who were free from coronary heart disease at baseline; the meta-analysis of stroke comprised data for 528 908 men and women who were free from stroke at baseline. Follow-up for coronary heart disease was 5·1 million person-years (mean 8·5 years), in which 4768 events were recorded, and for stroke was 3·8 million person-years (mean 7·2 years), in which 1722 events were recorded. In cumulative meta-analysis adjusted for age, sex, and socioeconomic status, compared with standard hours (35-40 h per week), working long hours (≥55 h per week) was associated with an increase in risk of incident coronary heart disease (relative risk [RR] 1·13, 95% CI 1·02-1·26; p=0·02) and incident stroke (1·33, 1·11-1·61; p=0·002). The excess risk of stroke remained unchanged in analyses that addressed reverse causation, multivariable adjustments for other risk factors, and different methods of stroke ascertainment (range of RR estimates 1·30-1·42). We recorded a dose-response association for stroke, with RR estimates of 1·10 (95% CI 0·94-1·28; p=0·24) for 41-48 working hours, 1·27 (1·03-1·56; p=0·03) for 49-54 working hours, and 1·33 (1·11-1·61; p=0·002) for 55 working hours or more per week compared with standard working hours (ptrend<0·0001).
Interpretation Employees who work long hours have a higher risk of stroke than those working standard hours; the association with coronary heart disease is weaker. These findings suggest that more attention should be paid to the management of vascular risk factors in individuals who work long hours.
Resumo:
Master data management (MDM) integrates data from multiple
structured data sources and builds a consolidated 360-
degree view of business entities such as customers and products.
Today’s MDM systems are not prepared to integrate
information from unstructured data sources, such as news
reports, emails, call-center transcripts, and chat logs. However,
those unstructured data sources may contain valuable
information about the same entities known to MDM from
the structured data sources. Integrating information from
unstructured data into MDM is challenging as textual references
to existing MDM entities are often incomplete and
imprecise and the additional entity information extracted
from text should not impact the trustworthiness of MDM
data.
In this paper, we present an architecture for making MDM
text-aware and showcase its implementation as IBM InfoSphere
MDM Extension for Unstructured Text Correlation,
an add-on to IBM InfoSphere Master Data Management
Standard Edition. We highlight how MDM benefits from
additional evidence found in documents when doing entity
resolution and relationship discovery. We experimentally
demonstrate the feasibility of integrating information from
unstructured data sources into MDM.