3 resultados para data quality issues
em Universidad de Alicante
Open business intelligence: on the importance of data quality awareness in user-friendly data mining
Resumo:
Citizens demand more and more data for making decisions in their daily life. Therefore, mechanisms that allow citizens to understand and analyze linked open data (LOD) in a user-friendly manner are highly required. To this aim, the concept of Open Business Intelligence (OpenBI) is introduced in this position paper. OpenBI facilitates non-expert users to (i) analyze and visualize LOD, thus generating actionable information by means of reporting, OLAP analysis, dashboards or data mining; and to (ii) share the new acquired information as LOD to be reused by anyone. One of the most challenging issues of OpenBI is related to data mining, since non-experts (as citizens) need guidance during preprocessing and application of mining algorithms due to the complexity of the mining process and the low quality of the data sources. This is even worst when dealing with LOD, not only because of the different kind of links among data, but also because of its high dimensionality. As a consequence, in this position paper we advocate that data mining for OpenBI requires data quality-aware mechanisms for guiding non-expert users in obtaining and sharing the most reliable knowledge from the available LOD.
Resumo:
Comunicación presentada en las XVI Jornadas de Ingeniería del Software y Bases de Datos, JISBD 2011, A Coruña, 5-7 septiembre 2011.
Resumo:
Phase equilibrium data regression is an unavoidable task necessary to obtain the appropriate values for any model to be used in separation equipment design for chemical process simulation and optimization. The accuracy of this process depends on different factors such as the experimental data quality, the selected model and the calculation algorithm. The present paper summarizes the results and conclusions achieved in our research on the capabilities and limitations of the existing GE models and about strategies that can be included in the correlation algorithms to improve the convergence and avoid inconsistencies. The NRTL model has been selected as a representative local composition model. New capabilities of this model, but also several relevant limitations, have been identified and some examples of the application of a modified NRTL equation have been discussed. Furthermore, a regression algorithm has been developed that allows for the advisable simultaneous regression of all the condensed phase equilibrium regions that are present in ternary systems at constant T and P. It includes specific strategies designed to avoid some of the pitfalls frequently found in commercial regression tools for phase equilibrium calculations. Most of the proposed strategies are based on the geometrical interpretation of the lowest common tangent plane equilibrium criterion, which allows an unambiguous comprehension of the behavior of the mixtures. The paper aims to show all the work as a whole in order to reveal the necessary efforts that must be devoted to overcome the difficulties that still exist in the phase equilibrium data regression problem.