844 resultados para Data mining and knowledge discovery


Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a new algorithm called TITANIC for computing concept lattices. It is based on data mining techniques for computing frequent itemsets. The algorithm is experimentally evaluated and compared with B. Ganter's Next-Closure algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. This survey analyzes the convergence of trends from both areas: Growing numbers of researchers work on improving the results of Web Mining by exploiting semantic structures in the Web, and they use Web Mining techniques for building the Semantic Web. Last but not least, these techniques can be used for mining the Semantic Web itself. The second aim of this paper is to use these concepts to circumscribe what Web space is, what it represents and how it can be represented and analyzed. This is used to sketch the role that Semantic Web Mining and the software agents and human agents involved in it can play in the evolution of Web space.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Our purpose is to provide a set-theoretical frame to clustering fuzzy relational data basically based on cardinality of the fuzzy subsets that represent objects and their complementaries, without applying any crisp property. From this perspective we define a family of fuzzy similarity indexes which includes a set of fuzzy indexes introduced by Tolias et al, and we analyze under which conditions it is defined a fuzzy proximity relation. Following an original idea due to S. Miyamoto we evaluate the similarity between objects and features by means the same mathematical procedure. Joining these concepts and methods we establish an algorithm to clustering fuzzy relational data. Finally, we present an example to make clear all the process

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The “butterfly effect” is a popularly known paradigm; commonly it is said that when a butterfly flaps its wings in Brazil, it may cause a tornado in Texas. This essentially describes how weather forecasts can be extremely senstive to small changes in the given atmospheric data, or initial conditions, used in computer model simulations. In 1961 Edward Lorenz found, when running a weather model, that small changes in the initial conditions given to the model can, over time, lead to entriely different forecasts (Lorenz, 1963). This discovery highlights one of the major challenges in modern weather forecasting; that is to provide the computer model with the most accurately specified initial conditions possible. A process known as data assimilation seeks to minimize the errors in the given initial conditions and was, in 1911, described by Bjerkness as “the ultimate problem in meteorology” (Bjerkness, 1911).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper reports the findings of a small-scale research project, which investigated the levels of awareness and knowledge of written standard English of 10- and 11-year-old children in two English primary schools over a six-year period, coinciding with the implementation in the schools of the National Literacy Strategy (NLS). A questionnaire was used to provide quantitative and qualitative data relating to: features of writing which were recognised as standard or non-standard; children's understanding of technical terminology; variations between boys' and girls' performance; and the impact of the NLS over time. The findings reveal variations in levels of recognition of different non-standard features, differences between girls' and boys' recognition, possible examples of language change, but no evidence of a positive impact of the NLS. The implications of these findings are discussed both in terms of changes in educational standards and changes to standard English.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is a report on the data-mining of two chess databases, the objective being to compare their sub-7-man content with perfect play as documented in Nalimov endgame tables. Van der Heijden’s ENDGAME STUDY DATABASE IV is a definitive collection of 76,132 studies in which White should have an essentially unique route to the stipulated goal. Chessbase’s BIG DATABASE 2010 holds some 4.5 million games. Insight gained into both database content and data-mining has led to some delightful surprises and created a further agenda.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There are a number of challenges associated with managing knowledge and information in construction organizations delivering major capital assets. These include the ever-increasing volumes of information, losing people because of retirement or competitors, the continuously changing nature of information, lack of methods on eliciting useful knowledge, development of new information technologies and changes in management and innovation practices. Existing tools and methodologies for valuing intangible assets in fields such as engineering, project management and financial, accounting, do not address fully the issues associated with the valuation of information and knowledge. Information is rarely recorded in a way that a document can be valued, when either produced or subsequently retrieved and re-used. In addition there is a wealth of tacit personal knowledge which, if codified into documentary information, may prove to be very valuable to operators of the finished asset or future designers. This paper addresses the problem of information overload and identifies the differences between data, information and knowledge. An exploratory study was conducted with a leading construction consultant examining three perspectives (business, project management and document management) by structured interviews and specifically how to value information in practical terms. Major challenges in information management are identified. An through-life Information Evaluation methodology (IEM) is presented to reduce information overload and to make the information more valuable in the future.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper provides an account of the changing livelihood dynamics unfolding in diamond-rich territories of rural Liberia. In these areas, many farm families are using the rice harvested on their plots to attract and feed labourers recruited specifically to mine for diamonds. The monies accrued from the sales of all recovered stones are divided evenly between the family and hired hands, an arrangement which, for thousands of people, has proved to be an effective short-term buffer against poverty. A deepened knowledge of these dynamics could be an important step towards facilitating lasting development in Liberia’s highly-impoverished rural areas.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aircraft Maintenance, Repair and Overhaul (MRO) agencies rely largely on row-data based quotation systems to select the best suppliers for the customers (airlines). The data quantity and quality becomes a key issue to determining the success of an MRO job, since we need to ensure we achieve cost and quality benchmarks. This paper introduces a data mining approach to create an MRO quotation system that enhances the data quantity and data quality, and enables significantly more precise MRO job quotations. Regular Expression was utilized to analyse descriptive textual feedback (i.e. engineer’s reports) in order to extract more referable highly normalised data for job quotation. A text mining based key influencer analysis function enables the user to proactively select sub-parts, defects and possible solutions to make queries more accurate. Implementation results show that system data would improve cost quotation in 40% of MRO jobs, would reduce service cost without causing a drop in service quality.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper reports the findings of a small-scale research project which investigated the levels of awareness and knowledge of written standard English of 10 and 11 year old children in two English primary schools. The project involved repeating in 2010 a written questionnaire previously used with children in the same schools in three separate surveys in 1999, 2002 and 2005. Data from the latest survey are compared to those from the previous three. The analysis seeks to identify any changes over time in children’s ability to recognise non-standard forms and supply standard English alternatives, as well as their ability to use technical terms related to language variation. Differences between the performance of boys and girls and that of the two schools are also analysed. The paper concludes that the socio-economic context of the schools may be a more important factor than gender in variations over time identified in the data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pocket Data Mining (PDM) is our new term describing collaborative mining of streaming data in mobile and distributed computing environments. With sheer amounts of data streams are now available for subscription on our smart mobile phones, the potential of using this data for decision making using data stream mining techniques has now been achievable owing to the increasing power of these handheld devices. Wireless communication among these devices using Bluetooth and WiFi technologies has opened the door wide for collaborative mining among the mobile devices within the same range that are running data mining techniques targeting the same application. This paper proposes a new architecture that we have prototyped for realizing the significant applications in this area. We have proposed using mobile software agents in this application for several reasons. Most importantly the autonomic intelligent behaviour of the agent technology has been the driving force for using it in this application. Other efficiency reasons are discussed in details in this paper. Experimental results showing the feasibility of the proposed architecture are presented and discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Collaborative mining of distributed data streams in a mobile computing environment is referred to as Pocket Data Mining PDM. Hoeffding trees techniques have been experimentally and analytically validated for data stream classification. In this paper, we have proposed, developed and evaluated the adoption of distributed Hoeffding trees for classifying streaming data in PDM applications. We have identified a realistic scenario in which different users equipped with smart mobile devices run a local Hoeffding tree classifier on a subset of the attributes. Thus, we have investigated the mining of vertically partitioned datasets with possible overlap of attributes, which is the more likely case. Our experimental results have validated the efficiency of our proposed model achieving promising accuracy for real deployment.