991 resultados para text length


Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this study two new measures of lexical diversity are tested for the first time on French. The usefulness of these measures, MTLD (McCarthy and Jarvis (2010 and this volume) ) and HD-D (McCarthy and Jarvis 2007), in predicting different aspects of language proficiency is assessed and compared with D (Malvern and Richards 1997; Malvern, Richards, Chipere and Durán 2004) and Maas (1972) in analyses of stories told by two groups of learners (n=41) of two different proficiency levels and one group of native speakers of French (n=23). The importance of careful lemmatization in studies of lexical diversity which involve highly inflected languages is also demonstrated. The paper shows that the measures of lexical diversity under study are valid proxies for language ability in that they explain up to 62 percent of the variance in French C-test scores, and up to 33 percent of the variance in a measure of complexity. The paper also provides evidence that dependence on segment size continues to be a problem for the measures of lexical diversity discussed in this paper. The paper concludes that limiting the range of text lengths or even keeping text length constant is the safest option in analysing lexical diversity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This study contributes to ongoing discussions on how measures of lexical diversity (LD) can help discriminate between essays from second language learners of English, whose work has been assessed as belonging to levels B1 to C2 of the Common European Framework of Reference (CEFR). The focus is in particular on how different operationalisations of what constitutes a “different word” (type) impact on the LD measures themselves and on their ability to discriminate between CEFR levels. The results show that basic measures of LD, such as the number of different words, the TTR (Templin 1957) and the Index of Guiraud (Guiraud 1954) explain more variance in the CEFR levels than sophisticated measures, such as D (Malvern et al. 2004), HD-D (McCarthy and Jarvis 2007) and MTLD (McCarthy 2005) provided text length is kept constant across texts. A simple count of different words (defined as lemma’s and not as word families) was the best predictor of CEFR levels and explained 22 percent of the variance in overall scores on the Pearson Test of English Academic in essays written by 176 test takers.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Idag använder allt fler människor sina smartphones för att surfa och använda tjänster online. Detta innebär att en stor del text läses på små skärmar. Detta arbete handlar om hur text bäst utformas och struktureras för att enklast kunna läsas och uppfattas på en mobilskärm. Faktorer som berörs är typgrad, ljusrum, textlängd, teckensnittsklass, radlängd, bild i text och kontrastverkan. Arbetet utgår från en normativ, svensk person utan funktionsnedsättningar. Arbetet är grundat på litteratur, egna analyser, intervjuer med branschfolk, enkätundersökning och test med fokusgrupp. Resultatet blev att en text på smartphone bäst utformas med flera styckesindelningar i form av blankrader, med bilder där bilden tillför något i informativt syfte och textlängd baserat på ämneskategori. Bilden bör sättas ovanför brödtexten. Längre texter ska sättas med scrollfunktion. Kontrast har stor betydelse på mobiltelefoner, texter går bra att läsa även när de är satta negativt. Teckensnittsklass är enligt resultatet av denna rapport inte av betydelse. Både seriff och sanserif kan läsas utan problem på smartphones. Typgraden bör förstoras något i förhållande till telefonens default-inställningar. På grund av att text på skärm inte stödjer avstavningar rekommenderas korta ord där det är möjligt för att förhindra en allt för hackig högerkant.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Students in upper secondary school write in a number of different genres, and do this in school contexts as well as in their spare time. The study presented here is an overview of this activity and the genres concerned. The theoretical framework of the study is that of genre theory whereby genre is understood as a socially situated concept. The study is based on 2 000 texts gathered from students on different study programmes all over Sweden in the school year of 1996-97. The texts were written in different situations. The most important distinction made here is between test texts (i.e. texts from national tests) and self-chosen texts, which may come from schoolwriting or spare-time writing. The texts are categorized according to genre. This text inventory shows a repertoire of 33 different genres in the text material. A small number of genres, such as story, book-review and expository essay dominate the school writing. The test genres differ from this pattern in that they clearly imitate texts with a genuine communicative intent. The most frequent genres are studied further and each of them is demonstrated by an interpretative reading. This reading shows that the genres differ considerably with respect to genre character and stability of text structure. A quantitative study of text length and variation in vocabulary further shows that texts written by two categories of students, those on vocationally oriented programmes and those on programmes preparing for higher education, differ significantly. Reference cohesion is studied in a smaller sample of the texts. This lexico-semantic mechanism of cohesion proves to exhibit an interrelation with variation in vocabulary as well as with text type. One particular cohesive tie, inference, shows different patterns in texts written by the two categories of students mentioned above.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study addresses three important issues in tree bucking optimization in the context of cut-to-length harvesting. (1) Would the fit between the log demand and log output distributions be better if the price and/or demand matrices controlling the bucking decisions on modern cut-to-length harvesters were adjusted to the unique conditions of each individual stand? (2) In what ways can we generate stand and product specific price and demand matrices? (3) What alternatives do we have to measure the fit between the log demand and log output distributions, and what would be an ideal goodness-of-fit measure? Three iterative search systems were developed for seeking stand-specific price and demand matrix sets: (1) A fuzzy logic control system for calibrating the price matrix of one log product for one stand at a time (the stand-level one-product approach); (2) a genetic algorithm system for adjusting the price matrices of one log product in parallel for several stands (the forest-level one-product approach); and (3) a genetic algorithm system for dividing the overall demand matrix of each of the several log products into stand-specific sub-demands simultaneously for several stands and products (the forest-level multi-product approach). The stem material used for testing the performance of the stand-specific price and demand matrices against that of the reference matrices was comprised of 9 155 Norway spruce (Picea abies (L.) Karst.) sawlog stems gathered by harvesters from 15 mature spruce-dominated stands in southern Finland. The reference price and demand matrices were either direct copies or slightly modified versions of those used by two Finnish sawmilling companies. Two types of stand-specific bucking matrices were compiled for each log product. One was from the harvester-collected stem profiles and the other was from the pre-harvest inventory data. Four goodness-of-fit measures were analyzed for their appropriateness in determining the similarity between the log demand and log output distributions: (1) the apportionment degree (index), (2) the chi-square statistic, (3) Laspeyres quantity index, and (4) the price-weighted apportionment degree. The study confirmed that any improvement in the fit between the log demand and log output distributions can only be realized at the expense of log volumes produced. Stand-level pre-control of price matrices was found to be advantageous, provided the control is done with perfect stem data. Forest-level pre-control of price matrices resulted in no improvement in the cumulative apportionment degree. Cutting stands under the control of stand-specific demand matrices yielded a better total fit between the demand and output matrices at the forest level than was obtained by cutting each stand with non-stand-specific reference matrices. The theoretical and experimental analyses suggest that none of the three alternative goodness-of-fit measures clearly outperforms the traditional apportionment degree measure. Keywords: harvesting, tree bucking optimization, simulation, fuzzy control, genetic algorithms, goodness-of-fit

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Separation of printed text blocks from the non-text areas, containing signatures, handwritten text, logos and other such symbols, is a necessary first step for an OCR involving printed text recognition. In the present work, we compare the efficacy of some feature-classifier combinations to carry out this separation task. We have selected length-nomalized horizontal projection profile (HPP) as the starting point of such a separation task. This is with the assumption that the printed text blocks contain lines of text which generate HPP's with some regularity. Such an assumption is demonstrated to be valid. Our features are the HPP and its two transformed versions, namely, eigen and Fisher profiles. Four well known classifiers, namely, Nearest neighbor, Linear discriminant function, SVM's and artificial neural networks have been considered and efficiency of the combination of these classifiers with the above features is compared. A sequential floating feature selection technique has been adopted to enhance the efficiency of this separation task. The results give an average accuracy of about 96.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this study was to examine the applicability of the Phonological Mean Length of Utterance (pMLU) method to the data of children acquiring Finnish, for both typically developing children and children with a Specific Language Impairment (SLI). Study I examined typically developing children at the end of the one-word stage (N=17, mean age 1;8), and Study II analysed children s (N=5) productions in a follow-up study with four assessment points (ages 2;0, 2;6, 3;0, 3;6). Study III was carried out in the form of a review article that examined recent research on the phonological development of children acquiring Finnish and compared the results with general trends and cross-linguistic findings in phonological development. Study IV included children with SLI (N=4, mean age 4;10) and age-matched peers. The analyses in Studies I, II and IV were made using the quantitative pMLU method. In the pMLU method, pMLU values are counted for both the words that the children targeted (so-called target words) and the words produced by the children. When the child s average pMLU value was divided with the average target word pMLU value, it is possible to examine that child s accuracy in producing the words with the Whole-Word Proximity (PWP) value. In addition, the number of entirely correctly produced words is counted to obtain the Whole-Word Correctness (PWC) value. Qualitative analyses were carried out in order to examine how the children s phoneme inventories and deficiencies in phonotactics would explain the observed pMLU, PWP and PWC values. The results showed that the pMLU values for children acquiring Finnish were relatively high already at the end of the one-word stage (Study I). The values were found to reflect the characteristics of the ambient language. Typological features that lead to cross-linguistic differences in pMLU values were also observed in the review article (Study III), which noted that in the course of phonological acquisition there are a large number of language-specific phenomena and processes. Study II indicated that overall the children s phonological development during the follow-up period was reflected in the pMLU, PWP and PWC values, although the method showed limitations in detecting qualitative differences between the children. Correct vowels were not scored in the pMLU counts, which led to some misleadingly high pMLU and PWP results: vowel errors were only reflected in the PWC values. Typically developing children in Study II reached the highest possible pMLU results already around age 3;6. At the same time, the differences between the children with SLI and age-matched peers in the pMLU values were very prominent (Study IV). The values for the children with SLI were similar to the ones reported for two-year-old children. Qualitative analyses revealed that the phonologies of the children with SLI largely resembled the ones of younger, typically developing children. However, unusual errors were also witnessed (e.g., vowel errors, omissions of word-initial stops, consonants added to the initial position in words beginning with a vowel). This dissertation provides an application of a new tool for quantitative phonological assessment and analysis in children acquiring Finnish. The preliminary results suggest that, with some modifications, the pMLU method can be used to assess children s phonological development and that it has some advantages compared to the earlier, segment-oriented approaches. Qualitative analyses complemented the pMLU s observations on the children s phonologies. More research is needed in order to verify the levels of the pMLU, PWP and PWC values in children acquiring Finnish.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents speaker normalization approaches for audio search task. Conventional state-of-the-art feature set, viz., Mel Frequency Cepstral Coefficients (MFCC) is known to contain speaker-specific and linguistic information implicitly. This might create problem for speaker-independent audio search task. In this paper, universal warping-based approach is used for vocal tract length normalization in audio search. In particular, features such as scale transform and warped linear prediction are used to compensate speaker variability in audio matching. The advantage of these features over conventional feature set is that they apply universal frequency warping for both the templates to be matched during audio search. The performance of Scale Transform Cepstral Coefficients (STCC) and Warped Linear Prediction Cepstral Coefficients (WLPCC) are about 3% higher than the state-of-the-art MFCC feature sets on TIMIT database.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Few studies have analyzed predictors of length of stay (LOS) in patients admitted due to acute bipolar manic episodes. The purpose of the present study was to estimate LOS and to determine the potential sociodemographic and clinical risk factors associated with a longer hospitalization. Such information could be useful to identify those patients at high risk for long LOS and to allocate them to special treatments, with the aim of optimizing their hospital management. Methods: This was a cross-sectional study recruiting adult patients with a diagnosis of bipolar disorder (Diagnostic and Statistical Manual of Mental Disorders, 4th edition, text revision (DSM-IV-TR) criteria) who had been hospitalized due to an acute manic episode with a Young Mania Rating Scale total score greater than 20. Bivariate correlational and multiple linear regression analyses were performed to identify independent predictors of LOS. Results: A total of 235 patients from 44 centers were included in the study. The only factors that were significantly associated to LOS in the regression model were the number of previous episodes and the Montgomery-Åsberg Depression Rating Scale (MADRS) total score at admission (P < 0.05). Conclusions: Patients with a high number of previous episodes and those with depressive symptoms during mania are more likely to stay longer in hospital. Patients with severe depressive symptoms may have a more severe or treatment-resistant course of the acute bipolar manic episode.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

N-gram analysis is an approach that investigates the structure of a program using bytes, characters or text strings. This research uses dynamic analysis to investigate malware detection using a classification approach based on N-gram analysis. A key issue with dynamic analysis is the length of time a program has to be run to ensure a correct classification. The motivation for this research is to find the optimum subset of operational codes (opcodes) that make the best indicators of malware and to determine how long a program has to be monitored to ensure an accurate support vector machine (SVM) classification of benign and malicious software. The experiments within this study represent programs as opcode density histograms gained through dynamic analysis for different program run periods. A SVM is used as the program classifier to determine the ability of different program run lengths to correctly determine the presence of malicious software. The findings show that malware can be detected with different program run lengths using a small number of opcodes

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper a method of copy detection in short Malayalam text passages is proposed. Given two passages one as the source text and another as the copied text it is determined whether the second passage is plagiarized version of the source text. An algorithm for plagiarism detection using the n-gram model for word retrieval is developed and found tri-grams as the best model for comparing the Malayalam text. Based on the probability and the resemblance measures calculated from the n-gram comparison , the text is categorized on a threshold. Texts are compared by variable length n-gram(n={2,3,4}) comparisons. The experiments show that trigram model gives the average acceptable performance with affordable cost in terms of complexity

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Exam questions and solutions in LaTex

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper compares conventional auditory brainstem response tests (ABRs) and Maximum Length Sequence auditory brainstem response tests (MLS ABRs). The results found that the faster MLS ABRs could prove an accurate screening tool for auditory sensitivity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper reviews speechreading and the effect of sentence length and linguistic complexity on deaf children.