847 resultados para Data Mining, Big Data, Consumi energetici, Weka Data Cleaning
Resumo:
Teknologian nopea kehitys ja toimintaympäristön muutokset kannustavat organisaatioita omaksumaan innovatiivisia ratkaisuja, pysyäkseen kilpailukykyisinä ja kehityksessä mukana. Perinteisen kustannussäästöjen ja toiminnan tehostamisen tavoittelun onkin syrjäyttänyt halu vahvistaa ja kasvattaa kilpailuetua. Näistä kannusteista huolimatta tutkimukset osoittavat, että suurin osa Suomessa toimivista organisaatioista ei johda eri teknologioihin pohjautuvia innovaatioita kovinkaan kokonaisvaltaisesti, vaan IT:n käyttö on enemmänkin reaktiivista. Tutkimuksessa tutkimmekin, miten Suomessa johdetaan IT-innovaatioiden, ja näistä erityisesti big datan, käyttöönottoa sekä madollisia poikkeavuuksia näiden välillä. Tutkimus on tehty kvalitatiivisin tutkimusmenetelmin, hyödyntämällä fokusryhmätutkimusmetodia empiirisen aineiston keruussa. Tutkimus esittää IT-innovaatioiden käyttöönoton johtamisen prosessina, joka lähtee liikkeelle viestintäkanavista saatavasta ärsykkeesta. Tietämystä vahvistetaan suostutteluvaiheessa, joka laukaisee arviointivaiheen. Prosessi etenee lopulta IT-innovaatioiden hyötyjen ja kustannusten arvioinnin myötä käyttöönotosta tehtävään päätökseen. Käyttöönottoon vaikuttavat myös erilaiset taustatekijät, jotka voivat edistää tai estää IT-innovaation omaksumista. Päätöksentekovaiheessa organisaation tietohallintojohdolla ja liiketoimintajohdolla on omat roolinsa, jotka muotoutuvat organisaation työnjaon ja investoinnin suuruuden mukaan. Tutkimuskohteiden käyttöönoton johtamistavoista kertovat, miten organisaatioiden käyttöönottoprosessin ja päätöksentekoprosessin vaiheet etenevät, mitkä taustatekijät vaikuttavat käyttöönottopäätökseen ja millaisia hyötyjä tavoitellaan. Big datan johtamistapojen selvittämiseen vaikuttaa myös se, onko organisaatiolla strategiaa tai toimintasuunnitelmaa sen hyödyntämiseksi. Tutkielman johtopäätöksenä toteamme, että yleistä IT-innovaatioiden käyttöönottoa johdetaan kolmella tavalla: strategisesti, reaktiivisesti ja muutoksen pakottamana. Johtamistapojen erot tulevat esiin investoinnin suuruuden, käyttöönottoon johtavan päätöksenteon sekä käyttöönoton taustalla vaikuttavien syiden kautta. Yleisesti IT-innovaatioita näytettäisiin johdettavan melko samassa suhteessa strategisesti, reaktiivisesti ja muutoksen pakottamana. Big datan käyttöönoton johtamisessa havaitsimme piirteitä vain strategisesta ja reaktiivisesta johtamisesta. Yleinen IT-innovaatioiden ja big datan käyttöönoton johtaminen eroavat toisistaan sen suhteen, että big dataa näytettäisiin johdettavan vielä vähemmän strategisesti ja sen päätöksentekovastuut ovat hajanaisempia. Yleisesti voidaan sanoa, että tutkimuskohteilla esiintyi heikosti selkeitä ja kokonaisvaltaisia strategioita tai toimintasuunnitelmia IT-innovaatioiden käyttöönoton johtamiseksi.
Resumo:
In the new age of information technology, big data has grown to be the prominent phenomena. As information technology evolves, organizations have begun to adopt big data and apply it as a tool throughout their decision-making processes. Research on big data has grown in the past years however mainly from a technical stance and there is a void in business related cases. This thesis fills the gap in the research by addressing big data challenges and failure cases. The Technology-Organization-Environment framework was applied to carry out a literature review on trends in Business Intelligence and Knowledge management information system failures. A review of extant literature was carried out using a collection of leading information system journals. Academic papers and articles on big data, Business Intelligence, Decision Support Systems, and Knowledge Management systems were studied from both failure and success aspects in order to build a model for big data failure. I continue and delineate the contribution of the Information System failure literature as it is the principal dynamics behind technology-organization-environment framework. The gathered literature was then categorised and a failure model was developed from the identified critical failure points. The failure constructs were further categorized, defined, and tabulated into a contextual diagram. The developed model and table were designed to act as comprehensive starting point and as general guidance for academics, CIOs or other system stakeholders to facilitate decision-making in big data adoption process by measuring the effect of technological, organizational, and environmental variables with perceived benefits, dissatisfaction and discontinued use.
Resumo:
In the new age of information technology, big data has grown to be the prominent phenomena. As information technology evolves, organizations have begun to adopt big data and apply it as a tool throughout their decision-making processes. Research on big data has grown in the past years however mainly from a technical stance and there is a void in business related cases. This thesis fills the gap in the research by addressing big data challenges and failure cases. The Technology-Organization-Environment framework was applied to carry out a literature review on trends in Business Intelligence and Knowledge management information system failures. A review of extant literature was carried out using a collection of leading information system journals. Academic papers and articles on big data, Business Intelligence, Decision Support Systems, and Knowledge Management systems were studied from both failure and success aspects in order to build a model for big data failure. I continue and delineate the contribution of the Information System failure literature as it is the principal dynamics behind technology-organization-environment framework. The gathered literature was then categorised and a failure model was developed from the identified critical failure points. The failure constructs were further categorized, defined, and tabulated into a contextual diagram. The developed model and table were designed to act as comprehensive starting point and as general guidance for academics, CIOs or other system stakeholders to facilitate decision-making in big data adoption process by measuring the effect of technological, organizational, and environmental variables with perceived benefits, dissatisfaction and discontinued use.
Resumo:
Data mining is one of the hottest research areas nowadays as it has got wide variety of applications in common man’s life to make the world a better place to live. It is all about finding interesting hidden patterns in a huge history data base. As an example, from a sales data base, one can find an interesting pattern like “people who buy magazines tend to buy news papers also” using data mining. Now in the sales point of view the advantage is that one can place these things together in the shop to increase sales. In this research work, data mining is effectively applied to a domain called placement chance prediction, since taking wise career decision is so crucial for anybody for sure. In India technical manpower analysis is carried out by an organization named National Technical Manpower Information System (NTMIS), established in 1983-84 by India's Ministry of Education & Culture. The NTMIS comprises of a lead centre in the IAMR, New Delhi, and 21 nodal centres located at different parts of the country. The Kerala State Nodal Centre is located at Cochin University of Science and Technology. In Nodal Centre, they collect placement information by sending postal questionnaire to passed out students on a regular basis. From this raw data available in the nodal centre, a history data base was prepared. Each record in this data base includes entrance rank ranges, reservation, Sector, Sex, and a particular engineering. From each such combination of attributes from the history data base of student records, corresponding placement chances is computed and stored in the history data base. From this data, various popular data mining models are built and tested. These models can be used to predict the most suitable branch for a particular new student with one of the above combination of criteria. Also a detailed performance comparison of the various data mining models is done.This research work proposes to use a combination of data mining models namely a hybrid stacking ensemble for better predictions. A strategy to predict the overall absorption rate for various branches as well as the time it takes for all the students of a particular branch to get placed etc are also proposed. Finally, this research work puts forward a new data mining algorithm namely C 4.5 * stat for numeric data sets which has been proved to have competent accuracy over standard benchmarking data sets called UCI data sets. It also proposes an optimization strategy called parameter tuning to improve the standard C 4.5 algorithm. As a summary this research work passes through all four dimensions for a typical data mining research work, namely application to a domain, development of classifier models, optimization and ensemble methods.
Resumo:
For years, choosing the right career by monitoring the trends and scope for different career paths have been a requirement for all youngsters all over the world. In this paper we provide a scientific, data mining based method for job absorption rate prediction and predicting the waiting time needed for 100% placement, for different engineering courses in India. This will help the students in India in a great deal in deciding the right discipline for them for a bright future. Information about passed out students are obtained from the NTMIS ( National technical manpower information system ) NODAL center in Kochi, India residing in Cochin University of science and technology
Resumo:
In the current study, epidemiology study is done by means of literature survey in groups identified to be at higher potential for DDIs as well as in other cases to explore patterns of DDIs and the factors affecting them. The structure of the FDA Adverse Event Reporting System (FAERS) database is studied and analyzed in detail to identify issues and challenges in data mining the drug-drug interactions. The necessary pre-processing algorithms are developed based on the analysis and the Apriori algorithm is modified to suit the process. Finally, the modules are integrated into a tool to identify DDIs. The results are compared using standard drug interaction database for validation. 31% of the associations obtained were identified to be new and the match with existing interactions was 69%. This match clearly indicates the validity of the methodology and its applicability to similar databases. Formulation of the results using the generic names expanded the relevance of the results to a global scale. The global applicability helps the health care professionals worldwide to observe caution during various stages of drug administration thus considerably enhancing pharmacovigilance
Resumo:
Data mining means to summarize information from large amounts of raw data. It is one of the key technologies in many areas of economy, science, administration and the internet. In this report we introduce an approach for utilizing evolutionary algorithms to breed fuzzy classifier systems. This approach was exercised as part of a structured procedure by the students Achler, Göb and Voigtmann as contribution to the 2006 Data-Mining-Cup contest, yielding encouragingly positive results.
Resumo:
We present a new algorithm called TITANIC for computing concept lattices. It is based on data mining techniques for computing frequent itemsets. The algorithm is experimentally evaluated and compared with B. Ganter's Next-Closure algorithm.
Resumo:
Formal Concept Analysis is an unsupervised learning technique for conceptual clustering. We introduce the notion of iceberg concept lattices and show their use in Knowledge Discovery in Databases (KDD). Iceberg lattices are designed for analyzing very large databases. In particular they serve as a condensed representation of frequent patterns as known from association rule mining. In order to show the interplay between Formal Concept Analysis and association rule mining, we discuss the algorithm TITANIC. We show that iceberg concept lattices are a starting point for computing condensed sets of association rules without loss of information, and are a visualization method for the resulting rules.
Resumo:
Die zunehmende Vernetzung der Informations- und Kommunikationssysteme führt zu einer weiteren Erhöhung der Komplexität und damit auch zu einer weiteren Zunahme von Sicherheitslücken. Klassische Schutzmechanismen wie Firewall-Systeme und Anti-Malware-Lösungen bieten schon lange keinen Schutz mehr vor Eindringversuchen in IT-Infrastrukturen. Als ein sehr wirkungsvolles Instrument zum Schutz gegenüber Cyber-Attacken haben sich hierbei die Intrusion Detection Systeme (IDS) etabliert. Solche Systeme sammeln und analysieren Informationen von Netzwerkkomponenten und Rechnern, um ungewöhnliches Verhalten und Sicherheitsverletzungen automatisiert festzustellen. Während signatur-basierte Ansätze nur bereits bekannte Angriffsmuster detektieren können, sind anomalie-basierte IDS auch in der Lage, neue bisher unbekannte Angriffe (Zero-Day-Attacks) frühzeitig zu erkennen. Das Kernproblem von Intrusion Detection Systeme besteht jedoch in der optimalen Verarbeitung der gewaltigen Netzdaten und der Entwicklung eines in Echtzeit arbeitenden adaptiven Erkennungsmodells. Um diese Herausforderungen lösen zu können, stellt diese Dissertation ein Framework bereit, das aus zwei Hauptteilen besteht. Der erste Teil, OptiFilter genannt, verwendet ein dynamisches "Queuing Concept", um die zahlreich anfallenden Netzdaten weiter zu verarbeiten, baut fortlaufend Netzverbindungen auf, und exportiert strukturierte Input-Daten für das IDS. Den zweiten Teil stellt ein adaptiver Klassifikator dar, der ein Klassifikator-Modell basierend auf "Enhanced Growing Hierarchical Self Organizing Map" (EGHSOM), ein Modell für Netzwerk Normalzustand (NNB) und ein "Update Model" umfasst. In dem OptiFilter werden Tcpdump und SNMP traps benutzt, um die Netzwerkpakete und Hostereignisse fortlaufend zu aggregieren. Diese aggregierten Netzwerkpackete und Hostereignisse werden weiter analysiert und in Verbindungsvektoren umgewandelt. Zur Verbesserung der Erkennungsrate des adaptiven Klassifikators wird das künstliche neuronale Netz GHSOM intensiv untersucht und wesentlich weiterentwickelt. In dieser Dissertation werden unterschiedliche Ansätze vorgeschlagen und diskutiert. So wird eine classification-confidence margin threshold definiert, um die unbekannten bösartigen Verbindungen aufzudecken, die Stabilität der Wachstumstopologie durch neuartige Ansätze für die Initialisierung der Gewichtvektoren und durch die Stärkung der Winner Neuronen erhöht, und ein selbst-adaptives Verfahren eingeführt, um das Modell ständig aktualisieren zu können. Darüber hinaus besteht die Hauptaufgabe des NNB-Modells in der weiteren Untersuchung der erkannten unbekannten Verbindungen von der EGHSOM und der Überprüfung, ob sie normal sind. Jedoch, ändern sich die Netzverkehrsdaten wegen des Concept drif Phänomens ständig, was in Echtzeit zur Erzeugung nicht stationärer Netzdaten führt. Dieses Phänomen wird von dem Update-Modell besser kontrolliert. Das EGHSOM-Modell kann die neuen Anomalien effektiv erkennen und das NNB-Model passt die Änderungen in Netzdaten optimal an. Bei den experimentellen Untersuchungen hat das Framework erfolgversprechende Ergebnisse gezeigt. Im ersten Experiment wurde das Framework in Offline-Betriebsmodus evaluiert. Der OptiFilter wurde mit offline-, synthetischen- und realistischen Daten ausgewertet. Der adaptive Klassifikator wurde mit dem 10-Fold Cross Validation Verfahren evaluiert, um dessen Genauigkeit abzuschätzen. Im zweiten Experiment wurde das Framework auf einer 1 bis 10 GB Netzwerkstrecke installiert und im Online-Betriebsmodus in Echtzeit ausgewertet. Der OptiFilter hat erfolgreich die gewaltige Menge von Netzdaten in die strukturierten Verbindungsvektoren umgewandelt und der adaptive Klassifikator hat sie präzise klassifiziert. Die Vergleichsstudie zwischen dem entwickelten Framework und anderen bekannten IDS-Ansätzen zeigt, dass der vorgeschlagene IDSFramework alle anderen Ansätze übertrifft. Dies lässt sich auf folgende Kernpunkte zurückführen: Bearbeitung der gesammelten Netzdaten, Erreichung der besten Performanz (wie die Gesamtgenauigkeit), Detektieren unbekannter Verbindungen und Entwicklung des in Echtzeit arbeitenden Erkennungsmodells von Eindringversuchen.
Resumo:
Relates to the following software for analysing Blackboard stats http://www.edshare.soton.ac.uk/11134/ Is supporting material for the following podcast: http://youtu.be/yHxCzjiYBoU
Resumo:
Abstract: Big Data has been characterised as a great economic opportunity and a massive threat to privacy. Both may be correct: the same technology can indeed be used in ways that are highly beneficial and those that are ethically intolerable, maybe even simultaneously. Using examples of how Big Data might be used in education - normally referred to as "learning analytics" - the seminar will discuss possible ethical and legal frameworks for Big Data, and how these might guide the development of technologies, processes and policies that can deliver the benefits of Big Data without the nightmares. Speaker Biography: Andrew Cormack is Chief Regulatory Adviser, Jisc Technologies. He joined the company in 1999 as head of the JANET-CERT and EuroCERT incident response teams. In his current role he concentrates on the security, policy and regulatory issues around the network and services that Janet provides to its customer universities and colleges. Previously he worked for Cardiff University running web and email services, and for NERC's Shipboard Computer Group. He has degrees in Mathematics, Humanities and Law.
Resumo:
Abstract Big data nowadays is a fashionable topic, independently of what people mean when they use this term. But being big is just a matter of volume, although there is no clear agreement in the size threshold. On the other hand, it is easy to capture large amounts of data using a brute force approach. So the real goal should not be big data but to ask ourselves, for a given problem, what is the right data and how much of it is needed. For some problems this would imply big data, but for the majority of the problems much less data will and is needed. In this talk we explore the trade-offs involved and the main problems that come with big data using the Web as case study: scalability, redundancy, bias, noise, spam, and privacy. Speaker Biography Ricardo Baeza-Yates Ricardo Baeza-Yates is VP of Research for Yahoo Labs leading teams in United States, Europe and Latin America since 2006 and based in Sunnyvale, California, since August 2014. During this time he has lead the labs in Barcelona and Santiago de Chile. Between 2008 and 2012 he also oversaw the Haifa lab. He is also part time Professor at the Dept. of Information and Communication Technologies of the Universitat Pompeu Fabra, in Barcelona, Spain. During 2005 he was an ICREA research professor at the same university. Until 2004 he was Professor and before founder and Director of the Center for Web Research at the Dept. of Computing Science of the University of Chile (in leave of absence until today). He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989. Before he obtained two masters (M.Sc. CS & M.Eng. EE) and the electronics engineer degree from the University of Chile in Santiago. He is co-author of the best-seller Modern Information Retrieval textbook, published in 1999 by Addison-Wesley with a second enlarged edition in 2011, that won the ASIST 2012 Book of the Year award. He is also co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 500 other publications. From 2002 to 2004 he was elected to the board of governors of the IEEE Computer Society and in 2012 he was elected for the ACM Council. He has received the Organization of American States award for young researchers in exact sciences (1993), the Graham Medal for innovation in computing given by the University of Waterloo to distinguished ex-alumni (2007), the CLEI Latin American distinction for contributions to CS in the region (2009), and the National Award of the Chilean Association of Engineers (2010), among other distinctions. In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences and since 2010 is a founding member of the Chilean Academy of Engineering. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow.