7 resultados para Data Mining, Big Data, Consumi energetici, Weka Data Cleaning
em Dalarna University College Electronic Archive
Resumo:
The thesis aims to elaborate on the optimum trigger speed for Vehicle Activated Signs (VAS) and to study the effectiveness of VAS trigger speed on drivers’ behaviour. Vehicle activated signs (VAS) are speed warning signs that are activated by individual vehicle when the driver exceeds a speed threshold. The threshold, which triggers the VAS, is commonly based on a driver speed, and accordingly, is called a trigger speed. At present, the trigger speed activating the VAS is usually set to a constant value and does not consider the fact that an optimal trigger speed might exist. The optimal trigger speed significantly impacts driver behaviour. In order to be able to fulfil the aims of this thesis, systematic vehicle speed data were collected from field experiments that utilized Doppler radar. Further calibration methods for the radar used in the experiment have been developed and evaluated to provide accurate data for the experiment. The calibration method was bidirectional; consisting of data cleaning and data reconstruction. The data cleaning calibration had a superior performance than the calibration based on the reconstructed data. To study the effectiveness of trigger speed on driver behaviour, the collected data were analysed by both descriptive and inferential statistics. Both descriptive and inferential statistics showed that the change in trigger speed had an effect on vehicle mean speed and on vehicle standard deviation of the mean speed. When the trigger speed was set near the speed limit, the standard deviation was high. Therefore, the choice of trigger speed cannot be based solely on the speed limit at the proposed VAS location. The optimal trigger speeds for VAS were not considered in previous studies. As well, the relationship between the trigger value and its consequences under different conditions were not clearly stated. The finding from this thesis is that the optimal trigger speed should be primarily based on lowering the standard deviation rather than lowering the mean speed of vehicles. Furthermore, the optimal trigger speed should be set near the 85th percentile speed, with the goal of lowering the standard deviation.
Resumo:
Data mining is a relatively new field of research that its objective is to acquire knowledge from large amounts of data. In medical and health care areas, due to regulations and due to the availability of computers, a large amount of data is becoming available [27]. On the one hand, practitioners are expected to use all this data in their work but, at the same time, such a large amount of data cannot be processed by humans in a short time to make diagnosis, prognosis and treatment schedules. A major objective of this thesis is to evaluate data mining tools in medical and health care applications to develop a tool that can help make rather accurate decisions. In this thesis, the goal is finding a pattern among patients who got pneumonia by clustering of lab data values which have been recorded every day. By this pattern we can generalize it to the patients who did not have been diagnosed by this disease whose lab values shows the same trend as pneumonia patients does. There are 10 tables which have been extracted from a big data base of a hospital in Jena for my work .In ICU (intensive care unit), COPRA system which is a patient management system has been used. All the tables and data stored in German Language database.
Resumo:
The main purpose of this thesis project is to prediction of symptom severity and cause in data from test battery of the Parkinson’s disease patient, which is based on data mining. The collection of the data is from test battery on a hand in computer. We use the Chi-Square method and check which variables are important and which are not important. Then we apply different data mining techniques on our normalize data and check which technique or method gives good results.The implementation of this thesis is in WEKA. We normalize our data and then apply different methods on this data. The methods which we used are Naïve Bayes, CART and KNN. We draw the Bland Altman and Spearman’s Correlation for checking the final results and prediction of data. The Bland Altman tells how the percentage of our confident level in this data is correct and Spearman’s Correlation tells us our relationship is strong. On the basis of results and analysis we see all three methods give nearly same results. But if we see our CART (J48 Decision Tree) it gives good result of under predicted and over predicted values that’s lies between -2 to +2. The correlation between the Actual and Predicted values is 0,794in CART. Cause gives the better percentage classification result then disability because it can use two classes.
Resumo:
Market research is often conducted through conventional methods such as surveys, focus groups and interviews. But the drawbacks of these methods are that they can be costly and timeconsuming. This study develops a new method, based on a combination of standard techniques like sentiment analysis and normalisation, to conduct market research in a manner that is free and quick. The method can be used in many application-areas, but this study focuses mainly on the veganism market to identify vegan food preferences in the form of a profile. Several food words are identified, along with their distribution between positive and negative sentiments in the profile. Surprisingly, non-vegan foods such as cheese, cake, milk, pizza and chicken dominate the profile, indicating that there is a significant market for vegan-suitable alternatives for such foods. Meanwhile, vegan-suitable foods such as coconut, potato, blueberries, kale and tofu also make strong appearances in the profile. Validation is performed by using the method on Volkswagen vehicle data to identify positive and negative sentiment across five car models. Some results were found to be consistent with sales figures and expert reviews, while others were inconsistent. The reliability of the method is therefore questionable, so the results should be used with caution.
Resumo:
Parkinson's disease (PD) is a degenerative illness whose cardinal symptoms include rigidity, tremor, and slowness of movement. In addition to its widely recognized effects PD can have a profound effect on speech and voice.The speech symptoms most commonly demonstrated by patients with PD are reduced vocal loudness, monopitch, disruptions of voice quality, and abnormally fast rate of speech. This cluster of speech symptoms is often termed Hypokinetic Dysarthria.The disease can be difficult to diagnose accurately, especially in its early stages, due to this reason, automatic techniques based on Artificial Intelligence should increase the diagnosing accuracy and to help the doctors make better decisions. The aim of the thesis work is to predict the PD based on the audio files collected from various patients.Audio files are preprocessed in order to attain the features.The preprocessed data contains 23 attributes and 195 instances. On an average there are six voice recordings per person, By using data compression technique such as Discrete Cosine Transform (DCT) number of instances can be minimized, after data compression, attribute selection is done using several WEKA build in methods such as ChiSquared, GainRatio, Infogain after identifying the important attributes, we evaluate attributes one by one by using stepwise regression.Based on the selected attributes we process in WEKA by using cost sensitive classifier with various algorithms like MultiPass LVQ, Logistic Model Tree(LMT), K-Star.The classified results shows on an average 80%.By using this features 95% approximate classification of PD is acheived.This shows that using the audio dataset, PD could be predicted with a higher level of accuracy.
Resumo:
The accurate measurement of a vehicle’s velocity is an essential feature in adaptive vehicle activated sign systems. Since the velocities of the vehicles are acquired from a continuous wave Doppler radar, the data collection becomes challenging. Data accuracy is sensitive to the calibration of the radar on the road. However, clear methodologies for in-field calibration have not been carefully established. The signs are often installed by subjective judgment which results in measurement errors. This paper develops a calibration method based on mining the data collected and matching individual vehicles travelling between two radars. The data was cleaned and prepared in two ways: cleaning and reconstructing. The results showed that the proposed correction factor derived from the cleaned data corresponded well with the experimental factor done on site. In addition, this proposed factor showed superior performance to the one derived from the reconstructed data.
Resumo:
The Twitter System is the biggest social network in the world, and everyday millions of tweets are posted and talked about, expressing various views and opinions. A large variety of research activities have been conducted to study how the opinions can be clustered and analyzed, so that some tendencies can be uncovered. Due to the inherent weaknesses of the tweets - very short texts and very informal styles of writing - it is rather hard to make an investigation of tweet data analysis giving results with good performance and accuracy. In this paper, we intend to attack the problem from another aspect - using a two-layer structure to analyze the twitter data: LDA with topic map modelling. The experimental results demonstrate that this approach shows a progress in twitter data analysis. However, more experiments with this method are expected in order to ensure that the accurate analytic results can be maintained.