946 resultados para Statistics as Topic
Resumo:
This is a discussion of the journal article: "Construcing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation". The article and discussion have appeared in the Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Resumo:
Ranking documents according to the Probability Ranking Principle has been theoretically shown to guarantee optimal retrieval effectiveness in tasks such as ad hoc document retrieval. This ranking strategy assumes independence among document relevance assessments. This assumption, however, often does not hold, for example in the scenarios where redundancy in retrieved documents is of major concern, as it is the case in the sub–topic retrieval task. In this chapter, we propose a new ranking strategy for sub–topic retrieval that builds upon the interdependent document relevance and topic–oriented models. With respect to the topic– oriented model, we investigate both static and dynamic clustering techniques, aiming to group topically similar documents. Evidence from clusters is then combined with information about document dependencies to form a new document ranking. We compare and contrast the proposed method against state–of–the–art approaches, such as Maximal Marginal Relevance, Portfolio Theory for Information Retrieval, and standard cluster–based diversification strategies. The empirical investigation is performed on the ImageCLEF 2009 Photo Retrieval collection, where images are assessed with respect to sub–topics of a more general query topic. The experimental results show that our approaches outperform the state–of–the–art strategies with respect to a number of diversity measures.
Resumo:
Loop detectors are widely used on the motorway networks where they provide point speed and traffic volumes. Models have been proposed for temporal and spatial generalization of speed for average travel time estimation. Advancement in technology provides complementary data sources such as Bluetooth MAC Scanner (BMS), detecting the MAC ID of the Bluetooth devices transported by the traveller. Matching the data from two BMS stations provides individual vehicle travel time. Generally, on the motorways loops are closely spaced, whereas BMS are placed few kilometres apart. In this research, we fuse BMSs and loops data to define the trajectories of the Bluetooth vehicles. The trajectories are utilised to estimate the travel time statistics between any two points along the motorway. The proposed model is tested using simulation and validated with real data from Pacific motorway, Brisbane. Comparing the model with the linear interpolation based trajectory provides significant improvements.
Resumo:
Basic mathematical skills are critical to a student’s ability to successfully undertake an introductory statistics course. Yet in business education this vitally important area of mathematics and statistics education is under-researched. The question therefore arises as to what level of mathematical skill a typical business studies student will possess as they enter the tertiary environment, and whether there are any common deficiencies that we can identify with a view to tackling the problem. This paper will focus on a study designed to measure the level of mathematical ability of first year business students. The results provide timely insight into a growing problem faced by many tertiary educators in this field.
Resumo:
This paper investigates the effect of topic dependent language models (TDLM) on phonetic spoken term detection (STD) using dynamic match lattice spotting (DMLS). Phonetic STD consists of two steps: indexing and search. The accuracy of indexing audio segments into phone sequences using phone recognition methods directly affects the accuracy of the final STD system. If the topic of a document in known, recognizing the spoken words and indexing them to an intermediate representation is an easier task and consequently, detecting a search word in it will be more accurate and robust. In this paper, we propose the use of TDLMs in the indexing stage to improve the accuracy of STD in situations where the topic of the audio document is known in advance. It is shown that using TDLMs instead of the traditional general language model (GLM) improves STD performance according to figure of merit (FOM) criteria.
Resumo:
The aim of spoken term detection (STD) is to find all occurrences of a specified query term in a large audio database. This process is usually divided into two steps: indexing and search. In a previous study, it was shown that knowing the topic of an audio document would help to improve the accuracy of indexing step which results in a better performance for STD system. In this paper, we propose the use of topic information not only in the indexing step, but also in the search step. Results of our experiments show that topic information could also be used in search step to improve the STD accuracy.
A tag-based personalized item recommendation system using tensor modeling and topic model approaches
Resumo:
This research falls in the area of enhancing the quality of tag-based item recommendation systems. It aims to achieve this by employing a multi-dimensional user profile approach and by analyzing the semantic aspects of tags. Tag-based recommender systems have two characteristics that need to be carefully studied in order to build a reliable system. Firstly, the multi-dimensional correlation, called as tag assignment
Resumo:
Australia has a significantly higher suicide rate than England. Rather than accepting that this ‘statistical fact’ is a direct reflection of some positivist truth, this paper begins with the premise that how suicide is counted depends upon what counts as suicide. This study involves semi-structured interviews with coroners both in Australia and England, as well as observations at inquests. Important differences between the two coronial systems include: first, quite different logics of operation; second, the burden of proof for reaching a finding of suicide is significantly higher in England; and third, the presence of family members at English inquests results in far greater pressure being brought to bear upon coroners. These combined factors result in a reduced likelihood of English coroners reaching a finding of suicide. The conclusions are twofold. First, this research supports existing criticisms of comparative suicide statistics. Second, this research adds theoretical weight to criticisms of positivist analyses of social phenomena.
Resumo:
Interpolation techniques for spatial data have been applied frequently in various fields of geosciences. Although most conventional interpolation methods assume that it is sufficient to use first- and second-order statistics to characterize random fields, researchers have now realized that these methods cannot always provide reliable interpolation results, since geological and environmental phenomena tend to be very complex, presenting non-Gaussian distribution and/or non-linear inter-variable relationship. This paper proposes a new approach to the interpolation of spatial data, which can be applied with great flexibility. Suitable cross-variable higher-order spatial statistics are developed to measure the spatial relationship between the random variable at an unsampled location and those in its neighbourhood. Given the computed cross-variable higher-order spatial statistics, the conditional probability density function (CPDF) is approximated via polynomial expansions, which is then utilized to determine the interpolated value at the unsampled location as an expectation. In addition, the uncertainty associated with the interpolation is quantified by constructing prediction intervals of interpolated values. The proposed method is applied to a mineral deposit dataset, and the results demonstrate that it outperforms kriging methods in uncertainty quantification. The introduction of the cross-variable higher-order spatial statistics noticeably improves the quality of the interpolation since it enriches the information that can be extracted from the observed data, and this benefit is substantial when working with data that are sparse or have non-trivial dependence structures.
Resumo:
The use of ‘topic’ concepts has shown improved search performance, given a query, by bringing together relevant documents which use different terms to describe a higher level concept. In this paper, we propose a method for discovering and utilizing concepts in indexing and search for a domain specific document collection being utilized in industry. This approach differs from others in that we only collect focused concepts to build the concept space and that instead of turning a user’s query into a concept based query, we experiment with different techniques of combining the original query with a concept query. We apply the proposed approach to a real-world document collection and the results show that in this scenario the use of concept knowledge at index and search can improve the relevancy of results.
Resumo:
Meta-analysis is a method to obtain a weighted average of results from various studies. In addition to pooling effect sizes, meta-analysis can also be used to estimate disease frequencies, such as incidence and prevalence. In this article we present methods for the meta-analysis of prevalence. We discuss the logit and double arcsine transformations to stabilise the variance. We note the special situation of multiple category prevalence, and propose solutions to the problems that arise. We describe the implementation of these methods in the MetaXL software, and present a simulation study and the example of multiple sclerosis from the Global Burden of Disease 2010 project. We conclude that the double arcsine transformation is preferred over the logit, and that the MetaXL implementation of multiple category prevalence is an improvement in the methodology of the meta-analysis of prevalence.
Resumo:
Yield in cultivated cotton (Gossypium spp.) is affected by the number and distribution of fibres initiated on the seed surface but, apart from simple statistical summaries, little has been done to assess this phenotype quantitatively. Here we use two types of spatial statistics to describe and quantify differences in patterning of cotton ovule fibre initials (FI). The following five different species of Gossypium were analysed: G. hirsutum L., G. barbadense L., G. arboreum, G. raimondii Ulbrich. and G. trilobum (DC.) Skovsted. Scanning electron micrographs of FIs were taken on the day of anthesis. Cell centres for fibre and epidermal cells were digitised and analysed by spatial statistics methods appropriate for marked point processes and tessellations. Results were consistent with previously published reports of fibre number and spacing. However, it was shown that the spatial distributions of FIs in all of species examined exhibit regularity, and are not completely random as previously implied. The regular arrangement indicates FIs do not appear independently of each other and we surmise there may be some form of mutual inhibition specifying fibre-initial development. It is concluded that genetic control of FIs differs from that of stomata, another well studied plant idioblast. Since spatial statistics show clear species differences in the distribution of FIs within this genus, they provide a useful method for phenotyping cotton. © CSIRO 2007.
Resumo:
Three core components in developing children’s understanding and appreciation of data — establish a context, pose and answer statistical questions, represent and interpret data — lay the foundation for the fourth component: use data to enhance existing context.
Resumo:
This paper is a bridge between two studies by the author: (i) completed MA research; and (ii) on-going PhD research, on male sexual health and the street healing system in Bangladesh. Street healing, a traditional healing system in Bangladesh, is at the centre of the studies. This is a popular form of folk healing in Bangladesh, where male impotency is a central issue. The author has been researching street healing to understand male sexual health-seeking behaviour in Bangladesh. In this paper, the author brings in experiences from his MA research to explore the challenges of studying sexuality and street healing in Bangladesh and concludes by describing his plan to address those issues in his on-going PhD research.