898 resultados para Latent semantic indexing
Resumo:
Latent class and genetic analyses were used to identify subgroups of migraine sufferers in a community sample of 6,265 Australian twins (55% female) aged 25-36 who had completed an interview based on International Headache Society (IHS) criteria. Consistent with prevalence rates from other population-based studies, 703 (20%) female and 250 (9%) male twins satisfied the IHS criteria for migraine without aura (MO), and of these, 432 (13%) female and 166 (6%) male twins satisfied the criteria for migraine with aura (MA) as indicated by visual symptoms. Latent class analysis (LCA) of IHS symptoms identified three major symptomatic classes, representing 1) a mild form of recurrent nonmigrainous headache, 2) a moderately severe form of migraine, typically without visual aura symptoms (although 40% of individuals in this class were positive for aura), and 3) a severe form of migraine typically with visual aura symptoms (although 24% of individuals were negative for aura). Using the LCA classification, many more individuals were considered affected to some degree than when using IHS criteria (35% vs. 13%). Furthermore, genetic model fitting indicated a greater genetic contribution to migraine using the LCA classification (heritability, h(2)=0.40; 95% CI, 0.29-0.46) compared with the IHS classification (h(2)=0.36; 95% CI, 0.22-0.42). Exploratory latent class modeling, fitting up to 10 classes, did not identify classes corresponding to either the IHS MO or MA classification. Our data indicate the existence of a continuum of severity, with MA more severe but not etiologically distinct from MO. In searching for predisposing genes, we should therefore expect to find some genes that may underlie all major recurrent headache subtypes, with modifying genetic or environmental factors that may lead to differential expression of the liability for migraine.
Resumo:
For zygosity diagnosis in the absence of genotypic data, or in the recruitment phase of a twin study where only single twins from same-sex pairs are being screened, or to provide a test for sample duplication leading to the false identification of a dizygotic pair as monozygotic, the appropriate analysis of respondents' answers to questions about zygosity is critical. Using data from a young adult Australian twin cohort (N = 2094 complete pairs and 519 singleton twins from same-sex pairs with complete responses to all zygosity items), we show that application of latent class analysis (LCA), fitting a 2-class model, yields results that show good concordance with traditional methods of zygosity diagnosis, but with certain important advantages. These include the ability, in many cases, to assign zygosity with specified probability on the basis of responses of a single informant (advantageous when one zygosity type is being oversampled); and the ability to quantify the probability of misassignment of zygosity, allowing prioritization of cases for genotyping as well as identification of cases of probable laboratory error. Out of 242 twins (from 121 like-sex pairs) where genotypic data were available for zygosity confirmation, only a single case was identified of incorrect zygosity assignment by the latent class algorithm. Zygosity assignment for that single case was identified by the LCA as uncertain (probability of being a monozygotic twin only 76%), and the co-twin's responses clearly identified the pair as dizygotic (probability of being dizygotic 100%). In the absence of genotypic data, or as a safeguard against sample duplication, application of LCA for zygosity assignment or confirmation is strongly recommended.
Resumo:
The prevalence of latent autoimmune diabetes in adults (LADA) in patients diagnosed with type 2 diabetes mellitus (T2DM) ranges from 7 to 10% (1). They present at a younger age and have a lower BMI but poorer glycemic control, which may increase the risk of complications (2). However, a recent analysis of the Collaborative Atorvastatin Diabetes Study (CARDS) has demonstrated no difference in macrovascular or microvascular events between patients with LADA and T2DM, but neuropathy was not assessed (3). Previous studies quantifying neuropathy in patients with LADA are limited. In this study, we aimed to accurately quantify neuropathy in subjects with LADA compared with matched patients with T2DM.
Resumo:
Despite compulsory mathematics throughout primary and junior secondary schooling, many schools across Australia continue in their struggle to achieve satisfactory numeracy levels. Numeracy is not a distinct subject in school curriculum, and in fact appears as a general capability in the Australian Curriculum, wherein all teachers across all curriculum areas are responsible for numeracy. This general capability approach confuses what numeracy should look like, especially when compared to the structure of numeracy as defined on standardised national tests. In seeking to define numeracy, schools tend to look at past NAPLAN papers, and in doing so, we do not find examples drawn from the various aspects of school curriculum. What we find are more traditional forms of mathematical worded problems.
Resumo:
In this article, we introduce the general statistical analysis approach known as latent class analysis and discuss some of the issues associated with this type of analysis in practice. Two recent examples from the respiratory health literature are used to highlight the types of research questions that have been addressed using this approach.
Resumo:
The current state of the practice in Blackspot Identification (BSI) utilizes safety performance functions based on total crash counts to identify transport system sites with potentially high crash risk. This paper postulates that total crash count variation over a transport network is a result of multiple distinct crash generating processes including geometric characteristics of the road, spatial features of the surrounding environment, and driver behaviour factors. However, these multiple sources are ignored in current modelling methodologies in both trying to explain or predict crash frequencies across sites. Instead, current practice employs models that imply that a single underlying crash generating process exists. The model mis-specification may lead to correlating crashes with the incorrect sources of contributing factors (e.g. concluding a crash is predominately caused by a geometric feature when it is a behavioural issue), which may ultimately lead to inefficient use of public funds and misidentification of true blackspots. This study aims to propose a latent class model consistent with a multiple crash process theory, and to investigate the influence this model has on correctly identifying crash blackspots. We first present the theoretical and corresponding methodological approach in which a Bayesian Latent Class (BLC) model is estimated assuming that crashes arise from two distinct risk generating processes including engineering and unobserved spatial factors. The Bayesian model is used to incorporate prior information about the contribution of each underlying process to the total crash count. The methodology is applied to the state-controlled roads in Queensland, Australia and the results are compared to an Empirical Bayesian Negative Binomial (EB-NB) model. A comparison of goodness of fit measures illustrates significantly improved performance of the proposed model compared to the NB model. The detection of blackspots was also improved when compared to the EB-NB model. In addition, modelling crashes as the result of two fundamentally separate underlying processes reveals more detailed information about unobserved crash causes.
Resumo:
With livestock manures being increasingly sought as alternatives to costly synthetic fertilisers, it is imperative that we understand and manage their associated greenhouse gas (GHG) emissions. Here we provide the first dedicated assessment into how the GHG emitting potential of various manures responds to the different stages of the manure management continuum (e.g., from feed pen surface vs stockpiled). The research is important from the perspective of manure application to agricultural soils. Manures studied included: manure from beef feedpen surfaces and stockpiles; poultry broiler litter (8-week batch); fresh and composted egg layer litter; and fresh and composted piggery litter. Gases assessed were methane (CH4) and nitrous oxide (N2O), the two principal agricultural GHGs. We employed proven protocols to determine the manures’ ultimate CH4 producing potential. We also devised a novel incubation experiment to elucidate their N2O emitting potential; a measure for which no established methods exist. We found lower CH4 potentials in manures from later stages in their management sequence compared with earlier stages, but only by a factor of 0.65×. Moreover, for the beef manures this decrease was not significant (P < 0.05). Nitrous oxide emission potential was significantly positively (P < 0.05) correlated with C/N ratios yet showed no obvious relationship with manure management stage. Indeed, N2O emissions from the composted egg manure were considerably (13×) and significantly (P < 0.05) higher than that of the fresh egg manure. Our study demonstrates that manures from all stages of the manure management continuum potentially entail significant GHG risk when applied to arable landscapes. Efforts to harness manure resources need to account for this.
Resumo:
Topic detection and tracking (TDT) is an area of information retrieval research the focus of which revolves around news events. The problems TDT deals with relate to segmenting news text into cohesive stories, detecting something new, previously unreported, tracking the development of a previously reported event, and grouping together news that discuss the same event. The performance of the traditional information retrieval techniques based on full-text similarity has remained inadequate for online production systems. It has been difficult to make the distinction between same and similar events. In this work, we explore ways of representing and comparing news documents in order to detect new events and track their development. First, however, we put forward a conceptual analysis of the notions of topic and event. The purpose is to clarify the terminology and align it with the process of news-making and the tradition of story-telling. Second, we present a framework for document similarity that is based on semantic classes, i.e., groups of words with similar meaning. We adopt people, organizations, and locations as semantic classes in addition to general terms. As each semantic class can be assigned its own similarity measure, document similarity can make use of ontologies, e.g., geographical taxonomies. The documents are compared class-wise, and the outcome is a weighted combination of class-wise similarities. Third, we incorporate temporal information into document similarity. We formalize the natural language temporal expressions occurring in the text, and use them to anchor the rest of the terms onto the time-line. Upon comparing documents for event-based similarity, we look not only at matching terms, but also how near their anchors are on the time-line. Fourth, we experiment with an adaptive variant of the semantic class similarity system. The news reflect changes in the real world, and in order to keep up, the system has to change its behavior based on the contents of the news stream. We put forward two strategies for rebuilding the topic representations and report experiment results. We run experiments with three annotated TDT corpora. The use of semantic classes increased the effectiveness of topic tracking by 10-30\% depending on the experimental setup. The gain in spotting new events remained lower, around 3-4\%. The anchoring the text to a time-line based on the temporal expressions gave a further 10\% increase the effectiveness of topic tracking. The gains in detecting new events, again, remained smaller. The adaptive systems did not improve the tracking results.
Resumo:
XML documents are becoming more and more common in various environments. In particular, enterprise-scale document management is commonly centred around XML, and desktop applications as well as online document collections are soon to follow. The growing number of XML documents increases the importance of appropriate indexing methods and search tools in keeping the information accessible. Therefore, we focus on content that is stored in XML format as we develop such indexing methods. Because XML is used for different kinds of content ranging all the way from records of data fields to narrative full-texts, the methods for Information Retrieval are facing a new challenge in identifying which content is subject to data queries and which should be indexed for full-text search. In response to this challenge, we analyse the relation of character content and XML tags in XML documents in order to separate the full-text from data. As a result, we are able to both reduce the size of the index by 5-6\% and improve the retrieval precision as we select the XML fragments to be indexed. Besides being challenging, XML comes with many unexplored opportunities which are not paid much attention in the literature. For example, authors often tag the content they want to emphasise by using a typeface that stands out. The tagged content constitutes phrases that are descriptive of the content and useful for full-text search. They are simple to detect in XML documents, but also possible to confuse with other inline-level text. Nonetheless, the search results seem to improve when the detected phrases are given additional weight in the index. Similar improvements are reported when related content is associated with the indexed full-text including titles, captions, and references. Experimental results show that for certain types of document collections, at least, the proposed methods help us find the relevant answers. Even when we know nothing about the document structure but the XML syntax, we are able to take advantage of the XML structure when the content is indexed for full-text search.
Resumo:
Natural history collections are an invaluable resource housing a wealth of knowledge with a long tradition of contributing to a wide range of fields such as taxonomy, quarantine, conservation and climate change. It is recognized however [Smith and Blagoderov 2012] that such physical collections are often heavily underutilized as a result of the practical issues of accessibility. The digitization of these collections is a step towards removing these access issues, but other hurdles must be addressed before we truly unlock the potential of this knowledge.
Resumo:
This paper shows that by using only symbolic language phrases, a mobile robot can purposefully navigate to specified rooms in previously unexplored environments. The robot intelligently organises a symbolic language description of the unseen environment and “imagines” a representative map, called the abstract map. The abstract map is an internal representation of the topological structure and spatial layout of symbolically defined locations. To perform goal-directed exploration, the abstract map creates a high-level semantic plan to reason about spaces beyond the robot’s known world. While completing the plan, the robot uses the metric guidance provided by a spatial layout, and grounded observations of door labels, to efficiently guide its navigation. The system is shown to complete exploration in unexplored spaces by travelling only 13.3% further than the optimal path.
Resumo:
Mobile applications are being increasingly deployed on a massive scale in various mobile sensor grid database systems. With limited resources from the mobile devices, how to process the huge number of queries from mobile users with distributed sensor grid databases becomes a critical problem for such mobile systems. While the fundamental semantic cache technique has been investigated for query optimization in sensor grid database systems, the problem is still difficult due to the fact that more realistic multi-dimensional constraints have not been considered in existing methods. To solve the problem, a new semantic cache scheme is presented in this paper for location-dependent data queries in distributed sensor grid database systems. It considers multi-dimensional constraints or factors in a unified cost model architecture, determines the parameters of the cost model in the scheme by using the concept of Nash equilibrium from game theory, and makes semantic cache decisions from the established cost model. The scenarios of three factors of semantic, time and locations are investigated as special cases, which improve existing methods. Experiments are conducted to demonstrate the semantic cache scheme presented in this paper for distributed sensor grid database systems.