916 resultados para Data Mining and its Application
Resumo:
Data mining means to summarize information from large amounts of raw data. It is one of the key technologies in many areas of economy, science, administration and the internet. In this report we introduce an approach for utilizing evolutionary algorithms to breed fuzzy classifier systems. This approach was exercised as part of a structured procedure by the students Achler, Göb and Voigtmann as contribution to the 2006 Data-Mining-Cup contest, yielding encouragingly positive results.
Resumo:
We present a new algorithm called TITANIC for computing concept lattices. It is based on data mining techniques for computing frequent itemsets. The algorithm is experimentally evaluated and compared with B. Ganter's Next-Closure algorithm.
Resumo:
Formal Concept Analysis is an unsupervised learning technique for conceptual clustering. We introduce the notion of iceberg concept lattices and show their use in Knowledge Discovery in Databases (KDD). Iceberg lattices are designed for analyzing very large databases. In particular they serve as a condensed representation of frequent patterns as known from association rule mining. In order to show the interplay between Formal Concept Analysis and association rule mining, we discuss the algorithm TITANIC. We show that iceberg concept lattices are a starting point for computing condensed sets of association rules without loss of information, and are a visualization method for the resulting rules.
Resumo:
Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. This survey analyzes the convergence of trends from both areas: Growing numbers of researchers work on improving the results of Web Mining by exploiting semantic structures in the Web, and they use Web Mining techniques for building the Semantic Web. Last but not least, these techniques can be used for mining the Semantic Web itself. The second aim of this paper is to use these concepts to circumscribe what Web space is, what it represents and how it can be represented and analyzed. This is used to sketch the role that Semantic Web Mining and the software agents and human agents involved in it can play in the evolution of Web space.
Resumo:
The application of compositional data analysis through log ratio trans- formations corresponds to a multinomial logit model for the shares themselves. This model is characterized by the property of Independence of Irrelevant Alter- natives (IIA). IIA states that the odds ratio in this case the ratio of shares is invariant to the addition or deletion of outcomes to the problem. It is exactly this invariance of the ratio that underlies the commonly used zero replacement procedure in compositional data analysis. In this paper we investigate using the nested logit model that does not embody IIA and an associated zero replacement procedure and compare its performance with that of the more usual approach of using the multinomial logit model. Our comparisons exploit a data set that com- bines voting data by electoral division with corresponding census data for each division for the 2001 Federal election in Australia
Resumo:
Published on Jun 7, 2012 by icocomms This ICO training video helps answer questions about the Data Protection Act, its impact on the working environment and how to handle and protect people's information. (Produced by Central Office of Information, Crown Copyright 2006)
Resumo:
Abstract: Big Data has been characterised as a great economic opportunity and a massive threat to privacy. Both may be correct: the same technology can indeed be used in ways that are highly beneficial and those that are ethically intolerable, maybe even simultaneously. Using examples of how Big Data might be used in education - normally referred to as "learning analytics" - the seminar will discuss possible ethical and legal frameworks for Big Data, and how these might guide the development of technologies, processes and policies that can deliver the benefits of Big Data without the nightmares. Speaker Biography: Andrew Cormack is Chief Regulatory Adviser, Jisc Technologies. He joined the company in 1999 as head of the JANET-CERT and EuroCERT incident response teams. In his current role he concentrates on the security, policy and regulatory issues around the network and services that Janet provides to its customer universities and colleges. Previously he worked for Cardiff University running web and email services, and for NERC's Shipboard Computer Group. He has degrees in Mathematics, Humanities and Law.
Resumo:
This paper examines the impact on old age poverty and the fiscal cost of universal minimum oldage pensions in Latin America using recent household survey data for 18 countries. Alleviatingold age poverty requires different approach from other age groups and a minimum pension islikely to be the only alternative available. First we measure old age poverty rates for all countries.Second we discuss the design of minimum pensions schemes, means-tested or not, as wellas the disincentive effects that they are expected to have on the economic and social behavior ofhouseholds including labor supply, saving and family solidarity. Third we use the household surveysto simulate the fiscal cost and the impact on poverty rates of alternative minimum pensionschemes in the 18 countries. We show that a universal minimum pension would substantiallyreduce poverty among the elderly except in Argentina, Brazil, Chile and Uruguay where minimumpension systems already exist and poverty rates are low. Such schemes have much tobe commended in terms of incentives, spillover effects and administrative simplicity but have ahigh fiscal cost. The latter is a function of the age at which benefits are awarded, the prevailinglongevity, the generosity of benefits, the efficacy of means testing, and naturally the fiscal capacityof the country.
Resumo:
Resumen basado en el de la publicación
Resumo:
Aquesta tesi estudia com estimar la distribució de les variables regionalitzades l'espai mostral i l'escala de les quals admeten una estructura d'espai Euclidià. Apliquem el principi del treball en coordenades: triem una base ortonormal, fem estadística sobre les coordenades de les dades, i apliquem els output a la base per tal de recuperar un resultat en el mateix espai original. Aplicant-ho a les variables regionalitzades, obtenim una aproximació única consistent, que generalitza les conegudes propietats de les tècniques de kriging a diversos espais mostrals: dades reals, positives o composicionals (vectors de components positives amb suma constant) són tractades com casos particulars. D'aquesta manera, es generalitza la geostadística lineal, i s'ofereix solucions a coneguts problemes de la no-lineal, tot adaptant la mesura i els criteris de representativitat (i.e., mitjanes) a les dades tractades. L'estimador per a dades positives coincideix amb una mitjana geomètrica ponderada, equivalent a l'estimació de la mediana, sense cap dels problemes del clàssic kriging lognormal. El cas composicional ofereix solucions equivalents, però a més permet estimar vectors de probabilitat multinomial. Amb una aproximació bayesiana preliminar, el kriging de composicions esdevé també una alternativa consistent al kriging indicador. Aquesta tècnica s'empra per estimar funcions de probabilitat de variables qualsevol, malgrat que sovint ofereix estimacions negatives, cosa que s'evita amb l'alternativa proposada. La utilitat d'aquest conjunt de tècniques es comprova estudiant la contaminació per amoníac a una estació de control automàtic de la qualitat de l'aigua de la conca de la Tordera, i es conclou que només fent servir les tècniques proposades hom pot detectar en quins instants l'amoni es transforma en amoníac en una concentració superior a la legalment permesa.
Resumo:
The representation of the diurnal cycle in the Hadley Centre climate model is evaluated using simulations of the infrared radiances observed by Meteosat 7. In both the window and water vapour channels, the standard version of the model with 19 levels produces a good simulation of the geographical distributions of the mean radiances and of the amplitude of the diurnal cycle. Increasing the vertical resolution to 30 levels leads to further improvements in the mean fields. The timing of the maximum and minimum radiances reveals significant model errors, however, which are sensitive to the frequency with which the radiation scheme is called. In most regions, these errors are consistent with well documented errors in the timing of convective precipitation, which peaks before noon in the model, in contrast to the observed peak in the late afternoon or evening. When the radiation scheme is called every model time step (half an hour), as opposed to every three hours in the standard version, the timing of the minimum radiance is improved for convective regions over central Africa, due to the creation of upper-level layer-cloud by detrainment from the convection scheme, which persists well after the convection itself has dissipated. However, this produces a decoupling between the timing of the diurnal cycles of precipitation and window channel radiance. The possibility is raised that a similar decoupling may occur in reality and the implications of this for the retrieval of the diurnal cycle of precipitation from infrared radiances are discussed.
Resumo:
One of the major uncertainties in the ability to predict future climate change, and hence its impacts, is the lack of knowledge of the earth's climate sensitivity. Here, data are combined from the 1985-96 Earth Radiation Budget Experiment (ERBE) with surface temperature change information and estimates of radiative forcing to diagnose the climate sensitivity. Importantly, the estimate is completely independent of climate model results. A climate feedback parameter of 2.3 +/- 1.4 W m(-2) K-1 is found. This corresponds to a 1.0-4.1-K range for the equilibrium warming due to a doubling of carbon dioxide (assuming Gaussian errors in observable parameters, which is approximately equivalent to a uniform "prior" in feedback parameter). The uncertainty range is due to a combination of the short time period for the analysis as well as uncertainties in the surface temperature time series and radiative forcing time series, mostly the former. Radiative forcings may not all be fully accounted for; however, all argument is presented that the estimate of climate sensitivity is still likely to be representative of longer-term climate change. The methodology can be used to 1) retrieve shortwave and longwave components of climate feedback and 2) suggest clear-sky and cloud feedback terms. There is preliminary evidence of a neutral or even negative longwave feedback in the observations, suggesting that current climate models may not be representing some processes correctly if they give a net positive longwave feedback.