295 resultados para n-grams


Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Experiments show that for a large corpus, Zipf’s law does not hold for all rank of words: the frequencies fall below those predicted by Zipf’s law for ranks greater than about 5,000 word types in the English language and about 30,000 word types in the inflected languages Irish and Latin. It also does not hold for syllables or words in the syllable-based languages, Chinese or Vietnamese. However, when single words are combined together with word n-grams in one list and put in rank order, the frequency of tokens in the combined list extends Zipf’s law with a slope close to -1 on a log-log plot in all five languages. Further experiments have demonstrated the validity of this extension of Zipf’s law to n-grams of letters, phonemes or binary bits in English. It is shown theoretically that probability theory
alone can predict this behavior in randomly created n-grams of binary bits.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To evaluate the mortality and long-term morbidity rates of extremely low birth weight (ELBW) infants admitted to neonatal intensive care units (NICUs).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Author identification is the problem of identifying the author of an anonymous text or text whose authorship is in doubt from a given set of authors. The works by different authors are strongly distinguished by quantifiable features of the text. This paper deals with the attempts made on identifying the most likely author of a text in Malayalam from a list of authors. Malayalam is a Dravidian language with agglutinative nature and not much successful tools have been developed to extract syntactic & semantic features of texts in this language. We have done a detailed study on the various stylometric features that can be used to form an authors profile and have found that the frequencies of word collocations can be used to clearly distinguish an author in a highly inflectious language such as Malayalam. In our work we try to extract the word level and character level features present in the text for characterizing the style of an author. Our first step was towards creating a profile for each of the candidate authors whose texts were available with us, first from word n-gram frequencies and then by using variable length character n-gram frequencies. Profiles of the set of authors under consideration thus formed, was then compared with the features extracted from anonymous text, to suggest the most likely author.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper a method of copy detection in short Malayalam text passages is proposed. Given two passages one as the source text and another as the copied text it is determined whether the second passage is plagiarized version of the source text. An algorithm for plagiarism detection using the n-gram model for word retrieval is developed and found tri-grams as the best model for comparing the Malayalam text. Based on the probability and the resemblance measures calculated from the n-gram comparison , the text is categorized on a threshold. Texts are compared by variable length n-gram(n={2,3,4}) comparisons. The experiments show that trigram model gives the average acceptable performance with affordable cost in terms of complexity

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automatic keyword or keyphrase extraction is concerned with assigning keyphrases to documents based on words from within the document. Previous studies have shown that in a significant number of cases author-supplied keywords are not appropriate for the document to which they are attached. This can either be because they represent what the author believes the paper is about not what it actually is, or because they include keyphrases which are more classificatory than explanatory e.g., “University of Poppleton” instead of “Knowledge Discovery in Databases”. Thus, there is a need for a system that can generate appropriate and diverse range of keyphrases that reflect the document. This paper proposes a solution that examines the synonyms of words and phrases in the document to find the underlying themes, and presents these as appropriate keyphrases. The primary method explores taking n-grams of the source document phrases, and examining the synonyms of these, while the secondary considers grouping outputs by their synonyms. The experiments undertaken show the primary method produces good results and that the secondary method produces both good results and potential for future work.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automatic keyword or keyphrase extraction is concerned with assigning keyphrases to documents based on words from within the document. Previous studies have shown that in a significant number of cases author-supplied keywords are not appropriate for the document to which they are attached. This can either be because they represent what the author believes a paper is about not what it actually is, or because they include keyphrases which are more classificatory than explanatory e.g., “University of Poppleton” instead of “Knowledge Discovery in Databases”. Thus, there is a need for a system that can generate an appropriate and diverse range of keyphrases that reflect the document. This paper proposes two possible solutions that examine the synonyms of words and phrases in the document to find the underlying themes, and presents these as appropriate keyphrases. Using three different freely available thesauri, the work undertaken examines two different methods of producing keywords and compares the outcomes across multiple strands in the timeline. The primary method explores taking n-grams of the source document phrases, and examining the synonyms of these, while the secondary considers grouping outputs by their synonyms. The experiments undertaken show the primary method produces good results and that the secondary method produces both good results and potential for future work. In addition, the different qualities of the thesauri are examined and it is concluded that the more entries in a thesaurus, the better it is likely to perform. The age of the thesaurus or the size of each entry does not correlate to performance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper reports results from a study in which we automatically classified the query reformulation patterns for 964,780 Web searching sessions (composed of 1,523,072 queries) in order to predict what the next query reformulation would be. We employed an n-gram modeling approach to describe the probability of searchers transitioning from one query reformulation state to another and predict their next state. We developed first, second, third, and fourth order models and evaluated each model for accuracy of prediction. Findings show that Reformulation and Assistance account for approximately 45 percent of all query reformulations. Searchers seem to seek system searching assistant early in the session or after a content change. The results of our evaluations show that the first and second order models provided the best predictability, between 28 and 40 percent overall, and higher than 70 percent for some patterns. Implications are that the n-gram approach can be used for improving searching systems and searching assistance in real time.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Typical quadrotor aerial robots used in research weigh inlMMLBox and carry payloads measured in hundreds of grams. Several obstacles in design and control must be overcome to cater for expected industry demands that push the boundaries of existing quadrotor performance. The X-4 Flyer, a 4 kg quadrotor with a 1 kg payload, is intended to be prototypical of useful commercial quadrotors. The custom-built craft uses tuned plant dynamics with an onboard embedded attitude controller to stabilise flight. Independent linear SISO controllers were designed to regulate flyer attitude. The performance of the system is demonstrated in indoor and outdoor flight.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a graph-based method to weight medical concepts in documents for the purposes of information retrieval. Medical concepts are extracted from free-text documents using a state-of-the-art technique that maps n-grams to concepts from the SNOMED CT medical ontology. In our graph-based concept representation, concepts are vertices in a graph built from a document, edges represent associations between concepts. This representation naturally captures dependencies between concepts, an important requirement for interpreting medical text, and a feature lacking in bag-of-words representations. We apply existing graph-based term weighting methods to weight medical concepts. Using concepts rather than terms addresses vocabulary mismatch as well as encapsulates terms belonging to a single medical entity into a single concept. In addition, we further extend previous graph-based approaches by injecting domain knowledge that estimates the importance of a concept within the global medical domain. Retrieval experiments on the TREC Medical Records collection show our method outperforms both term and concept baselines. More generally, this work provides a means of integrating background knowledge contained in medical ontologies into data-driven information retrieval approaches.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Sex-based comparisons of myofibrillar protein synthesis after resistance exercise in the fed state. J Appl Physiol 112: 1805-1813, 2012. First published March 1, 2012; doi:10.1152/japplphysiol.00170.2012.- We made sex-based comparisons of rates of myofibrillar protein synthesis (MPS) and anabolic signaling after a single bout of high-intensity resistance exercise. Eight men (20 ± 10 yr, BMI = 24.3 ± 2.4) and eight women (22 ± 1.8 yr, BMI = 23.0 ± 1.9) underwent primed constant infusions of L-[ring-13C6]phenylalanine on consecutive days with serial muscle biopsies. Biopsies were taken from the vastus lateralis at rest and 1, 3, 5, 24, 26, and 28 h after exercise. Twenty-five grams of whey protein was ingested immediately and 26 h after exercise. We also measured exercise-induced serum testosterone because it is purported to contribute to increases in myofibrillar protein synthesis (MPS) postexercise and its absence has been hypothesized to attenuate adaptative responses to resistance exercise in women. The exercise-induced area under the testosterone curve was 45-fold greater in men than women in the early (1 h) recovery period following exercise (P < 0.001). MPS was elevated similarly in men and women (2.3- and 2.7-fold, respectively) 1-5 h postexercise and after protein ingestion following 24 h recovery. Phosphorylation of mTORSer2448 was elevated to a greater extent in men than women acutely after exercise (P = 0.003), whereas increased phosphorylation of p70S6K1Thr389 was not different between sexes. Androgen receptor content was greater in men (main effect for sex, P = 0.049). Atrogin-1 mRNA abundance was decreased after 5 h recovery in both men and women (P < 0.001), and MuRF-1 expression was elevated in men after protein ingestion following 24 h recovery (P = 0.003). These results demonstrate minor sex-based differences in signaling responses and no difference in the MPS response to resistance exercise in the fed state. Interestingly, our data demonstrate that exerciseinduced increases in MPS are dissociated from postexercise testosteronemia and that stimulation of MPS occurs effectively with low systemic testosterone concentrations in women.