846 resultados para Internet of Things,Internet of Things collaborativo,Open data,Data Mining,Clustering,Classificazione,Dati sensoristici


Relevância:

100.00% 100.00%

Publicador:

Resumo:

As the number of data sources publishing their data on the Web of Data is growing, we are experiencing an immense growth of the Linked Open Data cloud. The lack of control on the published sources, which could be untrustworthy or unreliable, along with their dynamic nature that often invalidates links and causes conflicts or other discrepancies, could lead to poor quality data. In order to judge data quality, a number of quality indicators have been proposed, coupled with quality metrics that quantify the “quality level” of a dataset. In addition to the above, some approaches address how to improve the quality of the datasets through a repair process that focuses on how to correct invalidities caused by constraint violations by either removing or adding triples. In this paper we argue that provenance is a critical factor that should be taken into account during repairs to ensure that the most reliable data is kept. Based on this idea, we propose quality metrics that take into account provenance and evaluate their applicability as repair guidelines in a particular data fusion setting.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Linked Data initiative offers a straight method to publish structured data in the World Wide Web and link it to other data, resulting in a world wide network of semantically codified data known as the Linked Open Data cloud. The size of the Linked Open Data cloud, i.e. the amount of data published using Linked Data principles, is growing exponentially, including life sciences data. However, key information for biological research is still missing in the Linked Open Data cloud. For example, the relation between orthologs genes and genetic diseases is absent, even though such information can be used for hypothesis generation regarding human diseases. The OGOLOD system, an extension of the OGO Knowledge Base, publishes orthologs/diseases information using Linked Data. This gives the scientists the ability to query the structured information in connection with other Linked Data and to discover new information related to orthologs and human diseases in the cloud.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: An estimation of cut-off points for the diagnosis of diabetes mellitus (DM) based on individual risk factors. Methods: A subset of the 1991 Oman National Diabetes Survey is used, including all patients with a 2h post glucose load >= 200 mg/dl (278 subjects) and a control group of 286 subjects. All subjects previously diagnosed as diabetic and all subjects with missing data values were excluded. The data set was analyzed by use of the SPSS Clementine data mining system. Decision Tree Learners (C5 and CART) and a method for mining association rules (the GRI algorithm) are used. The fasting plasma glucose (FPG), age, sex, family history of diabetes and body mass index (BMI) are input risk factors (independent variables), while diabetes onset (the 2h post glucose load >= 200 mg/dl) is the output (dependent variable). All three techniques used were tested by use of crossvalidation (89.8%). Results: Rules produced for diabetes diagnosis are: A- GRI algorithm (1) FPG>=108.9 mg/dl, (2) FPG>=107.1 and age>39.5 years. B- CART decision trees: FPG >=110.7 mg/dl. C- The C5 decision tree learner: (1) FPG>=95.5 and 54, (2) FPG>=106 and 25.2 kg/m2. (3) FPG>=106 and =133 mg/dl. The three techniques produced rules which cover a significant number of cases (82%), with confidence between 74 and 100%. Conclusion: Our approach supports the suggestion that the present cut-off value of fasting plasma glucose (126 mg/dl) for the diagnosis of diabetes mellitus needs revision, and the individual risk factors such as age and BMI should be considered in defining the new cut-off value.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents the results of our data mining study of Pb-Zn (lead-zinc) ore assay records from a mine enterprise in Bulgaria. We examined the dataset, cleaned outliers, visualized the data, and created dataset statistics. A Pb-Zn cluster data mining model was created for segmentation and prediction of Pb-Zn ore assay data. The Pb-Zn cluster data model consists of five clusters and DMX queries. We analyzed the Pb-Zn cluster content, size, structure, and characteristics. The set of the DMX queries allows for browsing and managing the clusters, as well as predicting ore assay records. A testing and validation of the Pb-Zn cluster data mining model was developed in order to show its reasonable accuracy before beingused in a production environment. The Pb-Zn cluster data mining model can be used for changes of the mine grinding and floatation processing parameters in almost real-time, which is important for the efficiency of the Pb-Zn ore beneficiation process. ACM Computing Classification System (1998): H.2.8, H.3.3.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining can be defined as the extraction of implicit, previously un-known, and potentially useful information from data. Numerous re-searchers have been developing security technology and exploring new methods to detect cyber-attacks with the DARPA 1998 dataset for Intrusion Detection and the modified versions of this dataset KDDCup99 and NSL-KDD, but until now no one have examined the performance of the Top 10 data mining algorithms selected by experts in data mining. The compared classification learning algorithms in this thesis are: C4.5, CART, k-NN and Naïve Bayes. The performance of these algorithms are compared with accuracy, error rate and average cost on modified versions of NSL-KDD train and test dataset where the instances are classified into normal and four cyber-attack categories: DoS, Probing, R2L and U2R. Additionally the most important features to detect cyber-attacks in all categories and in each category are evaluated with Weka’s Attribute Evaluator and ranked according to Information Gain. The results show that the classification algorithm with best performance on the dataset is the k-NN algorithm. The most important features to detect cyber-attacks are basic features such as the number of seconds of a network connection, the protocol used for the connection, the network service used, normal or error status of the connection and the number of data bytes sent. The most important features to detect DoS, Probing and R2L attacks are basic features and the least important features are content features. Unlike U2R attacks, where the content features are the most important features to detect attacks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A project to identify metrics for assessing the quality of open data based on the needs of small voluntary sector organisations in the UK and India. For this project we assumed the purpose of open data metrics is to determine the value of a group of open datasets to a defined community of users. We adopted a much more user-centred approach than most open data research using small structured workshops to identify users’ key problems and then working from those problems to understand how open data can help address them and the key attributes of the data if it is to be successful. We then piloted different metrics that might be used to measure the presence of those attributes. The result was six metrics that we assessed for validity, reliability, discrimination, transferability and comparability. This user-centred approach to open data research highlighted some fundamental issues with expanding the use of open data from its enthusiast base.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The strategic management of information plays a fundamental role in the organizational management process since the decision-making process depend on the need for survival in a highly competitive market. Companies are constantly concerned about information transparency and good practices of corporate governance (CG) which, in turn, directs relations between the controlling power of the company and investors. In this context, this article presents the relationship between the disclosing of information of joint-stock companies by means of using XBRL, the open data model adopted by the Brazilian government, a model that boosted the publication of Information Access Law (Lei de Acesso à Informação), nº 12,527 of 18 November 2011. Information access should be permeated by a mediation policy in order to subsidize the knowledge construction and decision-making of investors. The XBRL is the main model for the publishing of financial information. The use of XBRL by means of new semantic standard created for Linked Data, strengthens the information dissemination, as well as creates analysis mechanisms and cross-referencing of data with different open databases available on the Internet, providing added value to the data/information accessed by civil society.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the integration of information between Digital Library of Historical Cartography and Bibliographical Database (DEDALUS), both of the University of São Paulo (USP), to guarantee open, public access by Internet to the maps in the collection and make them available to users everywhere. This digital library was designed by Historical Cartography Studies Laboratory team (LECH/USP), and provides maps images on the Web, of high resolution, as well as such information on these maps as technical-scientific data (projection, scale, coordinates), printing techniques and material support that have made their circulation and cultural consumption possible. The Digital Library of Historical Cartography is accessible not only to the historical cartography researchers, but also to students and the general public. Beyond being a source of information about maps, the Digital Library of Historical Cartography seeks to be interactive, exchanging information and seeking dialogue with different branches of knowledge

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

"Mr. J. S. Hales ... was assisted by Mr. W. C. Moss, B. SC., and the work was supervised by Mr. A. Blackie ..."--p. iii.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mode of access: Internet.