89 resultados para Data mining and knowledge discovery


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper reports the findings of a small-scale research project which investigated the levels of awareness and knowledge of written standard English of 10 and 11 year old children in two English primary schools. The project involved repeating in 2010 a written questionnaire previously used with children in the same schools in three separate surveys in 1999, 2002 and 2005. Data from the latest survey are compared to those from the previous three. The analysis seeks to identify any changes over time in children’s ability to recognise non-standard forms and supply standard English alternatives, as well as their ability to use technical terms related to language variation. Differences between the performance of boys and girls and that of the two schools are also analysed. The paper concludes that the socio-economic context of the schools may be a more important factor than gender in variations over time identified in the data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pocket Data Mining (PDM) is our new term describing collaborative mining of streaming data in mobile and distributed computing environments. With sheer amounts of data streams are now available for subscription on our smart mobile phones, the potential of using this data for decision making using data stream mining techniques has now been achievable owing to the increasing power of these handheld devices. Wireless communication among these devices using Bluetooth and WiFi technologies has opened the door wide for collaborative mining among the mobile devices within the same range that are running data mining techniques targeting the same application. This paper proposes a new architecture that we have prototyped for realizing the significant applications in this area. We have proposed using mobile software agents in this application for several reasons. Most importantly the autonomic intelligent behaviour of the agent technology has been the driving force for using it in this application. Other efficiency reasons are discussed in details in this paper. Experimental results showing the feasibility of the proposed architecture are presented and discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Collaborative mining of distributed data streams in a mobile computing environment is referred to as Pocket Data Mining PDM. Hoeffding trees techniques have been experimentally and analytically validated for data stream classification. In this paper, we have proposed, developed and evaluated the adoption of distributed Hoeffding trees for classifying streaming data in PDM applications. We have identified a realistic scenario in which different users equipped with smart mobile devices run a local Hoeffding tree classifier on a subset of the attributes. Thus, we have investigated the mining of vertically partitioned datasets with possible overlap of attributes, which is the more likely case. Our experimental results have validated the efficiency of our proposed model achieving promising accuracy for real deployment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Distributed and collaborative data stream mining in a mobile computing environment is referred to as Pocket Data Mining PDM. Large amounts of available data streams to which smart phones can subscribe to or sense, coupled with the increasing computational power of handheld devices motivates the development of PDM as a decision making system. This emerging area of study has shown to be feasible in an earlier study using technological enablers of mobile software agents and stream mining techniques [1]. A typical PDM process would start by having mobile agents roam the network to discover relevant data streams and resources. Then other (mobile) agents encapsulating stream mining techniques visit the relevant nodes in the network in order to build evolving data mining models. Finally, a third type of mobile agents roam the network consulting the mining agents for a final collaborative decision, when required by one or more users. In this paper, we propose the use of distributed Hoeffding trees and Naive Bayes classifers in the PDM framework over vertically partitioned data streams. Mobile policing, health monitoring and stock market analysis are among the possible applications of PDM. An extensive experimental study is reported showing the effectiveness of the collaborative data mining with the two classifers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article clarifies what was done with the sub-7-man positions in data-mining Harold van der Heijden's 'HHdbIV' database of chess studies prior to its publication. It emphasises that only positions in the main lines of studies were examined and that the information about uniqueness of move was not incorporated in HHdbIV. There is some reflection on the separate technical and artistic dimensions of study evaluation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article explores the contribution that artisanal and small-scale mining (ASM) makes to poverty reduction in Tanzania, based on data on gold and diamond mining in Mwanza Region. The evidence suggests that people working in mining or related services are less likely to be in poverty than those with other occupations. However, the picture is complex; while mining income can help reduce poverty and provide a buffer from livelihood shocks, peoples inability to obtain a formal mineral claim, or to effectively exploit their claims, contributes to insecurity. This is reinforced by a context in which ASM is peripheral to large-scale mining interests, is only gradually being addressed within national poverty reduction policies, and is segregated from district-level planning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The third chapter, data mining in education, examines potentials and constraints in the use of data mining in education, summarizing the potential they have to offer meaningful support to: students, teachers, tutors, authors, developers, researchers, and the education and training institutions in which they work and study.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This chapter introduces the latest practices and technologies in the interactive interpretation of environmental data. With environmental data becoming ever larger, more diverse and more complex, there is a need for a new generation of tools that provides new capabilities over and above those of the standard workhorses of science. These new tools aid the scientist in discovering interesting new features (and also problems) in large datasets by allowing the data to be explored interactively using simple, intuitive graphical tools. In this way, new discoveries are made that are commonly missed by automated batch data processing. This chapter discusses the characteristics of environmental science data, common current practice in data analysis and the supporting tools and infrastructure. New approaches are introduced and illustrated from the points of view of both the end user and the underlying technology. We conclude by speculating as to future developments in the field and what must be achieved to fulfil this vision.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Much recent research in SLA is guided by the hypothesis of L2 interface vulnerability (see Sorace 2005). This study contributes to this general project by examining the acquisition of two classes of subjunctive complement clauses in L2 Spanish: subjunctive complements of volitional predicates (purely syntactic) and subjunctive vs. indicative complements with negated epistemic matrix predicates, where the mood distinction is discourse dependent (thus involving the syntax-discourse interface). We provide an analysis of the volitional subjunctive in English and Spanish, suggesting that English learners of L2 Spanish need to access the functional projection Mood P and an uninterpretable modal feature on the Force head available to them from their formal English register grammar, and simultaneously must unacquire the structure of English for-to clauses. For negated epistemic predicates, our analysis maintains that they need to revalue the modal feature on the Force head from uninterpretable to interpretable, within the L2 grammar.With others (e.g. Borgonovo & Prévost 2003; Borgonovo, Bruhn de Garavito & Prévost 2005) and in line with Sorace's (2000, 2003, 2005) notion of interface vulnerability, we maintain that the latter case is more difficult for L2 learners, which is borne out in the data we present. However, the data also show that the indicative/subjunctive distinction with negated epistemics can be acquired by advanced stages of acquisition, questioning the notion of obligatory residual optionality for all properties which require the integration of syntactic and discourse information.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Twitter is both a micro-blogging service and a platform for public conversation. Direct conversation is facilitated in Twitter through the use of @’s (mentions) and replies. While the conversational element of Twitter is of particular interest to the marketing sector, relatively few data-mining studies have focused on this area. We analyse conversations associated with reciprocated mentions that take place in a data-set consisting of approximately 4 million tweets collected over a period of 28 days that contain at least one mention. We ignore tweet content and instead use the mention network structure and its dynamical properties to identify and characterise Twitter conversations between pairs of users and within larger groups. We consider conversational balance, meaning the fraction of content contributed by each party. The goal of this work is to draw out some of the mechanisms driving conversation in Twitter, with the potential aim of developing conversational models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A combination of satellite data, reanalysis products and climate models are combined to monitor changes in water vapour, clear-sky radiative cooling of the atmosphere and precipitation over the period 1979-2006. Climate models are able to simulate observed increases in column integrated water vapour (CWV) with surface temperature (Ts) over the ocean. Changes in the observing system lead to spurious variability in water vapour and clear-sky longwave radiation in reanalysis products. Nevertheless all products considered exhibit a robust increase in clear-sky longwave radiative cooling from the atmosphere to the surface; clear-sky longwave radiative cooling of the atmosphere is found to increase with Ts at the rate of ~4 Wm-2 K-1 over tropical ocean regions of mean descending vertical motion. Precipitation (P) is tightly coupled to atmospheric radiative cooling rates and this implies an increase in P with warming at a slower rate than the observed increases in CWV. Since convective precipitation depends on moisture convergence, the above implies enhanced precipitation over convective regions and reduced precipitation over convectively suppressed regimes. To quantify this response, observed and simulated changes in precipitation rate are analysed separately over regions of mean ascending and descending vertical motion over the tropics. The observed response is found to be substantially larger than the model simulations and climate change projections. It is currently not clear whether this is due to deficiencies in model parametrizations or errors in satellite retrievals.