283 resultados para Compressed text search


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The literacy demands of tables and graphs are different from those of prose texts such as narrative. This paper draws from part of a qualitative case study which sought to investigate strategies that scaffold and enhance the teaching and learning of varied representations in text. As indicated in the paper, the method focused on the teaching and learning of tables and graphs with use of Freebody and Luke's (1990) four resources model from literacy education.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Traditional text classification technology based on machine learning and data mining techniques has made a big progress. However, it is still a big problem on how to draw an exact decision boundary between relevant and irrelevant objects in binary classification due to much uncertainty produced in the process of the traditional algorithms. The proposed model CTTC (Centroid Training for Text Classification) aims to build an uncertainty boundary to absorb as many indeterminate objects as possible so as to elevate the certainty of the relevant and irrelevant groups through the centroid clustering and training process. The clustering starts from the two training subsets labelled as relevant or irrelevant respectively to create two principal centroid vectors by which all the training samples are further separated into three groups: POS, NEG and BND, with all the indeterminate objects absorbed into the uncertain decision boundary BND. Two pairs of centroid vectors are proposed to be trained and optimized through the subsequent iterative multi-learning process, all of which are proposed to collaboratively help predict the polarities of the incoming objects thereafter. For the assessment of the proposed model, F1 and Accuracy have been chosen as the key evaluation measures. We stress the F1 measure because it can display the overall performance improvement of the final classifier better than Accuracy. A large number of experiments have been completed using the proposed model on the Reuters Corpus Volume 1 (RCV1) which is important standard dataset in the field. The experiment results show that the proposed model has significantly improved the binary text classification performance in both F1 and Accuracy compared with three other influential baseline models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Advances in neural network language models have demonstrated that these models can effectively learn representations of words meaning. In this paper, we explore a variation of neural language models that can learn on concepts taken from structured ontologies and extracted from free-text, rather than directly from terms in free-text. This model is employed for the task of measuring semantic similarity between medical concepts, a task that is central to a number of techniques in medical informatics and information retrieval. The model is built with two medical corpora (journal abstracts and patient records) and empirically validated on two ground-truth datasets of human-judged concept pairs assessed by medical professionals. Empirically, our approach correlates closely with expert human assessors ($\approx$ 0.9) and outperforms a number of state-of-the-art benchmarks for medical semantic similarity. The demonstrated superiority of this model for providing an effective semantic similarity measure is promising in that this may translate into effectiveness gains for techniques in medical information retrieval and medical informatics (e.g., query expansion and literature-based discovery).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we present an original approach for finding approximate nearest neighbours in collections of locality-sensitive hashes. The paper demonstrates that this approach makes high-performance nearest-neighbour searching feasible on Web-scale collections and commodity hardware with minimal degradation in search quality.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Despite extensive literature on female mate choice, empirical evidence on women’s mating preferences in the search for a sperm donor is scarce, even though this search, by isolating a male’s genetic impact on offspring from other factors like paternal investment, offers a naturally ”controlled” research setting. In this paper, we work to fill this void by examining the rapidly growing online sperm donor market, which is raising new challenges by offering women novel ways to seek out donor sperm. We not only identify individual factors that influence women’s mating preferences but find strong support for the proposition that behavioural traits (inner values) are more important in these choices than physical appearance (exterior values). We also report evidence that physical factors matter more than resources or other external cues of material success, perhaps because the relevance of good character in donor selection is part of a female psychological adaptation throughout evolutionary history. The lack of evidence on a preference for material resources, on the other hand, may indicate the ability of peer socialization and better access to resources to rapidly shape the female decision process. Overall, the paper makes useful contributions to both the literature on human behaviour and that on decision-making in extreme and highly important situations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Monte-Carlo Tree Search (MCTS) is a heuristic to search in large trees. We apply it to argumentative puzzles where MCTS pursues the best argumentation with respect to a set of arguments to be argued. To make our ideas as widely applicable as possible, we integrate MCTS to an abstract setting for argumentation where the content of arguments is left unspecified. Experimental results show the pertinence of this integration for learning argumentations by comparing it with a basic reinforcement learning.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Particle swarm optimization (PSO), a new population based algorithm, has recently been used on multi-robot systems. Although this algorithm is applied to solve many optimization problems as well as multi-robot systems, it has some drawbacks when it is applied on multi-robot search systems to find a target in a search space containing big static obstacles. One of these defects is premature convergence. This means that one of the properties of basic PSO is that when particles are spread in a search space, as time increases they tend to converge in a small area. This shortcoming is also evident on a multi-robot search system, particularly when there are big static obstacles in the search space that prevent the robots from finding the target easily; therefore, as time increases, based on this property they converge to a small area that may not contain the target and become entrapped in that area.Another shortcoming is that basic PSO cannot guarantee the global convergence of the algorithm. In other words, initially particles explore different areas, but in some cases they are not good at exploiting promising areas, which will increase the search time.This study proposes a method based on the particle swarm optimization (PSO) technique on a multi-robot system to find a target in a search space containing big static obstacles. This method is not only able to overcome the premature convergence problem but also establishes an efficient balance between exploration and exploitation and guarantees global convergence, reducing the search time by combining with a local search method, such as A-star.To validate the effectiveness and usefulness of algorithms,a simulation environment has been developed for conducting simulation-based experiments in different scenarios and for reporting experimental results. These experimental results have demonstrated that the proposed method is able to overcome the premature convergence problem and guarantee global convergence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This report identifies the outcomes of a program evaluation of the five year Workplace Health and Safety Strategy (2012-2017), specifically, the engagement component within the Queensland Ambulance Service. As part of the former Department of Community Safety, their objective was to work towards harmonising the occupational health and safety policies and process to improve the workplace culture. The report examines and assess the process paths and resource inputs into the strategy, provides feedback on progress to achieving identified goals as well as identify opportunities for improvements and barriers to progress. Consultations were held with key stakeholders within QAS and focus groups were facilitated with managers and health and safety representatives of each Local Area Service Network.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The SNP-SNP interactome has rarely been explored in the context of neuroimaging genetics mainly due to the complexity of conducting approximately 10(11) pairwise statistical tests. However, recent advances in machine learning, specifically the iterative sure independence screening (SIS) method, have enabled the analysis of datasets where the number of predictors is much larger than the number of observations. Using an implementation of the SIS algorithm (called EPISIS), we used exhaustive search of the genome-wide, SNP-SNP interactome to identify and prioritize SNPs for interaction analysis. We identified a significant SNP pair, rs1345203 and rs1213205, associated with temporal lobe volume. We further examined the full-brain, voxelwise effects of the interaction in the ADNI dataset and separately in an independent dataset of healthy twins (QTIM). We found that each additional loading in the epistatic effect was associated with approximately 5% greater brain regional brain volume (a protective effect) in both the ADNI and QTIM samples.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The caudate is a subcortical brain structure implicated in many common neurological and psychiatric disorders. To identify specific genes associated with variations in caudate volume, structural magnetic resonance imaging and genome-wide genotypes were acquired from two large cohorts, the Alzheimer's Disease NeuroImaging Initiative (ADNI; N=734) and the Brisbane Adolescent/Young Adult Longitudinal Twin Study (BLTS; N=464). In a preliminary analysis of heritability, around 90% of the variation in caudate volume was due to genetic factors. We then conducted genome-wide association to find common variants that contribute to this relatively high heritability. Replicated genetic association was found for the right caudate volume at single-nucleotide polymorphism rs163030 in the ADNI discovery sample (P=2.36 × 10 -6) and in the BLTS replication sample (P=0.012). This genetic variation accounted for 2.79 and 1.61% of the trait variance, respectively. The peak of association was found in and around two genes, WDR41 and PDE8B, involved in dopamine signaling and development. In addition, a previously identified mutation in PDE8B causes a rare autosomal-dominant type of striatal degeneration. Searching across both samples offers a rigorous way to screen for genes consistently influencing brain structure at different stages of life. Variants identified here may be relevant to common disorders affecting the caudate.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Several genetic variants are thought to influence white matter (WM) integrity, measured with diffusion tensor imaging (DTI). Voxel based methods can test genetic associations, but heavy multiple comparisons corrections are required to adjust for searching the whole brain and for all genetic variants analyzed. Thus, genetic associations are hard to detect even in large studies. Using a recently developed multi-SNP analysis, we examined the joint predictive power of a group of 18 cholesterol-related single nucleotide polymorphisms (SNPs) on WM integrity, measured by fractional anisotropy. To boost power, we limited the analysis to brain voxels that showed significant associations with total serum cholesterol levels. From this space, we identified two genes with effects that replicated in individual voxel-wise analyses of the whole brain. Multivariate analyses of genetic variants on a reduced anatomical search space may help to identify SNPs with strongest effects on the brain from a broad panel of genes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we use an experimental design to compare the performance of elicitation rules for subjective beliefs. Contrary to previous works in which elicited beliefs are compared to an objective benchmark, we consider a purely subjective belief framework (confidence in one’s own performance in a cognitive task and a perceptual task). The performance of different elicitation rules is assessed according to the accuracy of stated beliefs in predicting success. We measure this accuracy using two main factors: calibration and discrimination. For each of them, we propose two statistical indexes and we compare the rules’ performances for each measurement. The matching probability method provides more accurate beliefs in terms of discrimination, while the quadratic scoring rule reduces overconfidence and the free rule, a simple rule with no incentives, which succeeds in eliciting accurate beliefs. Nevertheless, the matching probability appears to be the best mechanism for eliciting beliefs due to its performances in terms of calibration and discrimination, but also its ability to elicit consistent beliefs across measures and across tasks, as well as its empirical and theoretical properties.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we present a robust method to detect handwritten text from unconstrained drawings on normal whiteboards. Unlike printed text on documents, free form handwritten text has no pattern in terms of size, orientation and font and it is often mixed with other drawings such as lines and shapes. Unlike handwritings on paper, handwritings on a normal whiteboard cannot be scanned so the detection has to be based on photos. Our work traces straight edges on photos of the whiteboard and builds graph representation of connected components. We use geometric properties such as edge density, graph density, aspect ratio and neighborhood similarity to differentiate handwritten text from other drawings. The experiment results show that our method achieves satisfactory precision and recall. Furthermore, the method is robust and efficient enough to be deployed in a mobile device. This is an important enabler of business applications that support whiteboard-centric visual meetings in enterprise scenarios. © 2012 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Experiences showed that developing business applications that base on text analysis normally requires a lot of time and expertise in the field of computer linguistics. Several approaches of integrating text analysis systems with business applications have been proposed, but so far there has been no coordinated approach which would enable building scalable and flexible applications of text analysis in enterprise scenarios. In this paper, a service-oriented architecture for text processing applications in the business domain is introduced. It comprises various groups of processing components and knowledge resources. The architecture, created as a result of our experiences with building natural language processing applications in business scenarios, allows for the reuse of text analysis and other components, and facilitates the development of business applications. We verify our approach by showing how the proposed architecture can be applied to create a text analytics enabled business application that addresses a concrete business scenario. © 2010 IEEE.