Biblioteca Digital

855 resultados para text mining clusterizzazione clustering auto-organizzazione conoscenza MoK

Towards Next Generation Vertical Search Engines

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As the Web evolves unexpectedly fast, information grows explosively. Useful resources become more and more difficult to find because of their dynamic and unstructured characteristics. A vertical search engine is designed and implemented towards a specific domain. Instead of processing the giant volume of miscellaneous information distributed in the Web, a vertical search engine targets at identifying relevant information in specific domains or topics and eventually provides users with up-to-date information, highly focused insights and actionable knowledge representation. As the mobile device gets more popular, the nature of the search is changing. So, acquiring information on a mobile device poses unique requirements on traditional search engines, which will potentially change every feature they used to have. To summarize, users are strongly expecting search engines that can satisfy their individual information needs, adapt their current situation, and present highly personalized search results. In my research, the next generation vertical search engine means to utilize and enrich existing domain information to close the loop of vertical search engine's system that mutually facilitate knowledge discovering, actionable information extraction, and user interests modeling and recommendation. I investigate three problems in which domain taxonomy plays an important role, including taxonomy generation using a vertical search engine, actionable information extraction based on domain taxonomy, and the use of ensemble taxonomy to catch user's interests. As the fundamental theory, ultra-metric, dendrogram, and hierarchical clustering are intensively discussed. Methods on taxonomy generation using my research on hierarchical clustering are developed. The related vertical search engine techniques are practically used in Disaster Management Domain. Especially, three disaster information management systems are developed and represented as real use cases of my research work.

AUTO-RUINS. Thinks about automobile into the contemporary landscape

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The article analyses the evolution of the representation of the automobile inserted in the natural and urban environment in the Contemporary Art, from the appearance of the first cars in the beginning of the 20th century until the present day. The text compares the diverse attitudes and analysis of some representative artists who have used the image of the machine in general and the car in particular in their aesthetic discourse, using as a conductive thread the metaphor of the life cycle (birth, growth, feeding, reproduction and death). It deals with the discovery, the development and the coexistence between human and the automobile and its interpretation as a basic element of the artistic work. The text connects the image of the automobile located in the contemporary industrial landscape utilizing the artist references who have integrated the car in their work inside the natural or artificial environment characteristic of each moment. At the same time, the article goes deeply into the relationship of romantic ruin and natural landscape and the evolution of the industrial and architectural modern environment, through the work of the artists who has used the car as an inhabitant of the landscape.

Clustering in non-parametric multivariate analyses.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum  in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.

Clustering in non-parametric multivariate analyses.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Non-parametric multivariate analyses of complex ecological datasets are widely used. Following appropriate pre-treatment of the data inter-sample resemblances are calculated using appropriate measures. Ordination and clustering derived from these resemblances are used to visualise relationships among samples (or variables). Hierarchical agglomerative clustering with group-average (UPGMA) linkage is often the clustering method chosen. Using an example dataset of zooplankton densities from the Bristol Channel and Severn Estuary, UK, a range of existing and new clustering methods are applied and the results compared. Although the examples focus on analysis of samples, the methods may also be applied to species analysis. Dendrograms derived by hierarchical clustering are compared using cophenetic correlations, which are also used to determine optimum  in flexible beta clustering. A plot of cophenetic correlation against original dissimilarities reveals that a tree may be a poor representation of the full multivariate information. UNCTREE is an unconstrained binary divisive clustering algorithm in which values of the ANOSIM R statistic are used to determine (binary) splits in the data, to form a dendrogram. A form of flat clustering, k-R clustering, uses a combination of ANOSIM R and Similarity Profiles (SIMPROF) analyses to determine the optimum value of k, the number of groups into which samples should be clustered, and the sample membership of the groups. Robust outcomes from the application of such a range of differing techniques to the same resemblance matrix, as here, result in greater confidence in the validity of a clustering approach.

Extracting Patterns from Educational Traces via Clustering and Associated Quality Metrics

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering algorithms, pattern mining techniques and associated quality metrics emerged as reliable methods for modeling learners’ performance, comprehension and interaction in given educational scenarios. The specificity of available data such as missing values, extreme values or outliers, creates a challenge to extract significant user models from an educational perspective. In this paper we introduce a pattern detection mechanism with-in our data analytics tool based on k-means clustering and on SSE, silhouette, Dunn index and Xi-Beni index quality metrics. Experiments performed on a dataset obtained from our online e-learning platform show that the extracted interaction patterns were representative in classifying learners. Furthermore, the performed monitoring activities created a strong basis for generating automatic feedback to learners in terms of their course participation, while relying on their previous performance. In addition, our analysis introduces automatic triggers that highlight learners who will potentially fail the course, enabling tutors to take timely actions.

Data Mining for Network Intrusion Detection : A comparison of data mining algorithms and an analysis of relevant features for detecting cyber-attacks

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data mining can be defined as the extraction of implicit, previously un-known, and potentially useful information from data. Numerous re-searchers have been developing security technology and exploring new methods to detect cyber-attacks with the DARPA 1998 dataset for Intrusion Detection and the modified versions of this dataset KDDCup99 and NSL-KDD, but until now no one have examined the performance of the Top 10 data mining algorithms selected by experts in data mining. The compared classification learning algorithms in this thesis are: C4.5, CART, k-NN and Naïve Bayes. The performance of these algorithms are compared with accuracy, error rate and average cost on modified versions of NSL-KDD train and test dataset where the instances are classified into normal and four cyber-attack categories: DoS, Probing, R2L and U2R. Additionally the most important features to detect cyber-attacks in all categories and in each category are evaluated with Weka’s Attribute Evaluator and ranked according to Information Gain. The results show that the classification algorithm with best performance on the dataset is the k-NN algorithm. The most important features to detect cyber-attacks are basic features such as the number of seconds of a network connection, the protocol used for the connection, the network service used, normal or error status of the connection and the number of data bytes sent. The most important features to detect DoS, Probing and R2L attacks are basic features and the least important features are content features. Unlike U2R attacks, where the content features are the most important features to detect attacks.

LDA-TM : A two-step approach to twitter topic data clustering

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Twitter System is the biggest social network in the world, and everyday millions of tweets are posted and talked about, expressing various views and opinions. A large variety of research activities have been conducted to study how the opinions can be clustered and analyzed, so that some tendencies can be uncovered. Due to the inherent weaknesses of the tweets - very short texts and very informal styles of writing - it is rather hard to make an investigation of tweet data analysis giving results with good performance and accuracy. In this paper, we intend to attack the problem from another aspect - using a two-layer structure to analyze the twitter data: LDA with topic map modelling. The experimental results demonstrate that this approach shows a progress in twitter data analysis. However, more experiments with this method are expected in order to ensure that the accurate analytic results can be maintained.

A New Clustering Methodology for the Analysis of Sorted or Categorized Stimuli

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper introduces a new stochastic clustering methodology devised for the analysis of categorized or sorted data. The methodology reveals consumers' common category knowledge as well as individual differences in using this knowledge for classifying brands in a designated product class. A small study involving the categorization of 28 brands of U.S. automobiles is presented where the results of the proposed methodology are compared with those obtained from KMEANS clustering. Finally, directions for future research are discussed.

Innovazione nel Semantic Web: Evoluzione della base di conoscenza semantica YAGO

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La presente ricerca tratta lo studio delle basi di conoscenza, volto a facilitare la raccolta, l'organizzazione e la distribuzione della conoscenza. La scelta dell’oggetto è dovuta all'importanza sempre maggiore acquisita da questo ambito di ricerca e all'innovazione che esso è in grado di apportare nel campo del Web semantico. Viene analizzata la base di conoscenza YAGO: se ne descrivono lo stato dell’arte, le applicazioni e i progetti per sviluppi futuri. Il lavoro è stato condotto esaminando le pubblicazioni relative al tema e rappresenta una risorsa in lingua italiana sull'argomento.

Basi di conoscenza collaborative: il progetto Wikidata

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Il lavoro svolto è motivato dall'esigenza di utilizzare strumenti per la gestione di grandi quantità di dati, disponibili in seguito alla diffusione del Web. Si sono analizzate le basi di conoscenza, definendone le caratteristiche comuni e presentando poi un confronto fra alcune delle più significative. Infine si è analizzato più dettagliatamente il progetto Wikidata.

Cosmic Topologies of Imitation: From the Horror of Digital Autotoxicus to the Auto-Toxicity of the Social

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article expands on an earlier concept of horror autotoxicus linked to digital contagions of spam and network Virality.1 It aims to present, as such, a broader conception of cosmic topologies of imitation (CTI) intended to better grasp the relatively new practices of social media marketing. Similar to digital autotoxicity, CTI provide the perfect medium for sharing while also spreading contagions that can potentially contaminate the medium itself. However, whereas digital contagions are perhaps limited to the toxicity of a technical layer of information viruses, the contagions of CTI are an all pervasive auto-toxicity which can infect human bodies and technologies increasingly in concert with each other. This is an exceptional autotoxicus that significantly blurs the immunological line of exemption between self and nonself, and potentially, the anthropomorphic distinction between individual self and collective others.

Mining Police-Recorded Offence and Incident Data to Inform a Definition of Repeat Domestic Abuse Victimization for Statistical Reporting

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Following inspections in 2013 of all police forces, Her Majesty’s Inspectorate of Constabulary found that one-third of forces could not provide data on repeat victims of domestic abuse (DA) and concluded that in general there were ambiguities around the term ‘repeat victim’ and that there was a need for consistent and comparable statistics on DA. Using an analysis of police-recorded DA data from two forces, an argument is made for including both offences and non-crime incidents when identifying repeat victims of DA. Furthermore, for statistical purposes the counting period for repeat victimizations should be taken as a rolling 12 months from first recorded victimization. Examples are given of summary statistics that can be derived from these data down to Community Safety Partnership level. To reinforce the need to include both offences and incidents in analyses, repeat victim chronologies from policerecorded data are also used to briefly examine cases of escalation to homicide as an example of how they can offer new insights and greater scope for evaluating risk and effectiveness of interventions.

Search Based Clustering for Protecting Software with Diversified Updates

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Reverse engineering is usually the stepping stone of a variety of at-tacks aiming at identifying sensitive information (keys, credentials, data, algo-rithms) or vulnerabilities and ﬂaws for broader exploitation. Software applica-tions are usually deployed as identical binary code installed on millions of com-puters, enabling an adversary to develop a generic reverse-engineering strategy that, if working on one code instance, could be applied to crack all the other in-stances. A solution to mitigate this problem is represented by Software Diversity, which aims at creating several structurally different (but functionally equivalent) binary code versions out of the same source code, so that even if a successful attack can be elaborated for one version, it should not work on a diversiﬁed ver-sion. In this paper, we address the problem of maximizing software diversity from a search-based optimization point of view. The program to protect is subject to a catalogue of transformations to generate many candidate versions. The problem of selecting the subset of most diversiﬁed versions to be deployed is formulated as an optimisation problem, that we tackle with different search heuristics. We show the applicability of this approach on some popular Android apps.

Sustainability reports: environmental friendly or a greenwashing tool? : A study of how global mining companies use sustainability report

Relevância:

30.00% 30.00%

Publicador:

Exploring the feasibility of applying data mining for library reference service improvement : a case study of Turku Main Library

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data mining, as a heatedly discussed term, has been studied in various fields. Its possibilities in refining the decision-making process, realizing potential patterns and creating valuable knowledge have won attention of scholars and practitioners. However, there are less studies intending to combine data mining and libraries where data generation occurs all the time. Therefore, this thesis plans to fill such a gap. Meanwhile, potential opportunities created by data mining are explored to enhance one of the most important elements of libraries: reference service. In order to thoroughly demonstrate the feasibility and applicability of data mining, literature is reviewed to establish a critical understanding of data mining in libraries and attain the current status of library reference service. The result of the literature review indicates that free online data resources other than data generated on social media are rarely considered to be applied in current library data mining mandates. Therefore, the result of the literature review motivates the presented study to utilize online free resources. Furthermore, the natural match between data mining and libraries is established. The natural match is explained by emphasizing the data richness reality and considering data mining as one kind of knowledge, an easy choice for libraries, and a wise method to overcome reference service challenges. The natural match, especially the aspect that data mining could be helpful for library reference service, lays the main theoretical foundation for the empirical work in this study. Turku Main Library was selected as the case to answer the research question: whether data mining is feasible and applicable for reference service improvement. In this case, the daily visit from 2009 to 2015 in Turku Main Library is considered as the resource for data mining. In addition, corresponding weather conditions are collected from Weather Underground, which is totally free online. Before officially being analyzed, the collected dataset is cleansed and preprocessed in order to ensure the quality of data mining. Multiple regression analysis is employed to mine the final dataset. Hourly visits are the independent variable and weather conditions, Discomfort Index and seven days in a week are dependent variables. In the end, four models in different seasons are established to predict visiting situations in each season. Patterns are realized in different seasons and implications are created based on the discovered patterns. In addition, library-climate points are generated by a clustering method, which simplifies the process for librarians using weather data to forecast library visiting situation. Then the data mining result is interpreted from the perspective of improving reference service. After this data mining work, the result of the case study is presented to librarians so as to collect professional opinions regarding the possibility of employing data mining to improve reference services. In the end, positive opinions are collected, which implies that it is feasible to utilizing data mining as a tool to enhance library reference service.

«
1
2
...
47
48
49
50
51
52
53
...
56
57
»