8 resultados para membership categorisation analysis
em CentAUR: Central Archive University of Reading - UK
Resumo:
Automatic keyword or keyphrase extraction is concerned with assigning keyphrases to documents based on words from within the document. Previous studies have shown that in a significant number of cases author-supplied keywords are not appropriate for the document to which they are attached. This can either be because they represent what the author believes the paper is about not what it actually is, or because they include keyphrases which are more classificatory than explanatory e.g., “University of Poppleton” instead of “Knowledge Discovery in Databases”. Thus, there is a need for a system that can generate appropriate and diverse range of keyphrases that reflect the document. This paper proposes a solution that examines the synonyms of words and phrases in the document to find the underlying themes, and presents these as appropriate keyphrases. The primary method explores taking n-grams of the source document phrases, and examining the synonyms of these, while the secondary considers grouping outputs by their synonyms. The experiments undertaken show the primary method produces good results and that the secondary method produces both good results and potential for future work.
Resumo:
Automatic keyword or keyphrase extraction is concerned with assigning keyphrases to documents based on words from within the document. Previous studies have shown that in a significant number of cases author-supplied keywords are not appropriate for the document to which they are attached. This can either be because they represent what the author believes a paper is about not what it actually is, or because they include keyphrases which are more classificatory than explanatory e.g., “University of Poppleton” instead of “Knowledge Discovery in Databases”. Thus, there is a need for a system that can generate an appropriate and diverse range of keyphrases that reflect the document. This paper proposes two possible solutions that examine the synonyms of words and phrases in the document to find the underlying themes, and presents these as appropriate keyphrases. Using three different freely available thesauri, the work undertaken examines two different methods of producing keywords and compares the outcomes across multiple strands in the timeline. The primary method explores taking n-grams of the source document phrases, and examining the synonyms of these, while the secondary considers grouping outputs by their synonyms. The experiments undertaken show the primary method produces good results and that the secondary method produces both good results and potential for future work. In addition, the different qualities of the thesauri are examined and it is concluded that the more entries in a thesaurus, the better it is likely to perform. The age of the thesaurus or the size of each entry does not correlate to performance.
Resumo:
This paper investigates the scale and drivers of cross-border real estate development in western and central and eastern Europe (CEE). Drawing upon existing literature on the integration of international real estate markets, we make some inferences on expected patterns of cross-border real estate development from this literature review. The paper draws upon a transactions database in order to assess the penetration of national markets by international real estate developers. The determinants of cross-border transaction flows are modeled as a function the range of economic and real estate variables. Whilst western European markets tend to be dominated by local developers, much higher levels of market penetration by international real estate developers are found in the less mature markets of central and eastern Europe. Empirical modelling based on gravity model specifications reveal the importance of size of the economies, distance between countries, extent of globalization and EU membership as significant determinants of cross-border real estate development flow.
Resumo:
The paper analyses the impact of a priori determinants of biosecurity behaviour of farmers in Great Britain. We use a dataset collected through a stratified telephone survey of 900 cattle and sheep farmers in Great Britain (400 in England and a further 250 in Wales and Scotland respectively) which took place between 25 March 2010 and 18 June 2010. The survey was stratified by farm type, farm size and region. To test the influence of a priori determinants on biosecurity behaviour we used a behavioural economics method, structural equation modelling (SEM) with observed and latent variables. SEM is a statistical technique for testing and estimating causal relationships amongst variables, some of which may be latent using a combination of statistical data and qualitative causal assumptions. Thirteen latent variables were identified and extracted, expressing the behaviour and the underlying determining factors. The variables were: experience, economic factors, organic certification of farm, membership in a cattle/sheep health scheme, perceived usefulness of biosecurity information sources, knowledge about biosecurity measures, perceived importance of specific biosecurity strategies, perceived effect (on farm business in the past five years) of welfare/health regulation, perceived effect of severe outbreaks of animal diseases, attitudes towards livestock biosecurity, attitudes towards animal welfare, influence on decision to apply biosecurity measures and biosecurity behaviour. The SEM model applied on the Great Britain sample has an adequate fit according to the measures of absolute, incremental and parsimonious fit. The results suggest that farmers’ perceived importance of specific biosecurity strategies, organic certification of farm, knowledge about biosecurity measures, attitudes towards animal welfare, perceived usefulness of biosecurity information sources, perceived effect on business during the past five years of severe outbreaks of animal diseases, membership in a cattle/sheep health scheme, attitudes towards livestock biosecurity, influence on decision to apply biosecurity measures, experience and economic factors are significantly influencing behaviour (overall explaining 64% of the variance in behaviour).
Resumo:
This paper describes the methodology used to compile a corpus called MorphoQuantics that contains a comprehensive set of 17,943 complex word types extracted from the spoken component of the British National Corpus (BNC). The categorisation of these complex words was derived primarily from the classification of Prefixes, Suffixes and Combining Forms proposed by Stein (2007). The MorphoQuantics corpus has been made available on a website of the same name; it lists 554 word-initial and 281 word-final morphemes in English, their etymology and meaning, and records the type and token frequencies of all the associated complex words containing these morphemes from the spoken element of the BNC, together with their Part of Speech. The results show that, although the number of word-initial affixes is nearly double that of word-final affixes, the relative number of each observed in the BNC is very similar; however, word-final affixes are more productive in that, on average, the frequency with which they attach to different bases is three times that of word-initial affixes. Finally, this paper considers how linguists, psycholinguists and psychologists may use MorphoQuantics to support their empirical work in first and second language acquisition, and clinical and educational research.
Resumo:
Land cover plays a key role in global to regional monitoring and modeling because it affects and is being affected by climate change and thus became one of the essential variables for climate change studies. National and international organizations require timely and accurate land cover information for reporting and management actions. The North American Land Change Monitoring System (NALCMS) is an international cooperation of organizations and entities of Canada, the United States, and Mexico to map land cover change of North America's changing environment. This paper presents the methodology to derive the land cover map of Mexico for the year 2005 which was integrated in the NALCMS continental map. Based on a time series of 250 m Moderate Resolution Imaging Spectroradiometer (MODIS) data and an extensive sample data base the complexity of the Mexican landscape required a specific approach to reflect land cover heterogeneity. To estimate the proportion of each land cover class for every pixel several decision tree classifications were combined to obtain class membership maps which were finally converted to a discrete map accompanied by a confidence estimate. The map yielded an overall accuracy of 82.5% (Kappa of 0.79) for pixels with at least 50% map confidence (71.3% of the data). An additional assessment with 780 randomly stratified samples and primary and alternative calls in the reference data to account for ambiguity indicated 83.4% overall accuracy (Kappa of 0.80). A high agreement of 83.6% for all pixels and 92.6% for pixels with a map confidence of more than 50% was found for the comparison between the land cover maps of 2005 and 2006. Further wall-to-wall comparisons to related land cover maps resulted in 56.6% agreement with the MODIS land cover product and a congruence of 49.5 with Globcover.
Resumo:
Background: The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. New method: We propose a complete pipeline for the cluster analysis of ERP data. To increase the signalto-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA)to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). Results: After validating the pipeline on simulated data, we tested it on data from two experiments – a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership.
Resumo:
Epidemic protocols are a bio-inspired communication and computation paradigm for extreme-scale network system based on randomized communication. The protocols rely on a membership service to build decentralized and random overlay topologies. In a weakly connected overlay topology, a naive mechanism of membership protocols can break the connectivity, thus impairing the accuracy of the application. This work investigates the factors in membership protocols that cause the loss of global connectivity and introduces the first topology connectivity recovery mechanism. The mechanism is integrated into the Expander Membership Protocol, which is then evaluated against other membership protocols. The analysis shows that the proposed connectivity recovery mechanism is effective in preserving topology connectivity and also helps to improve the application performance in terms of convergence speed.