928 resultados para Textual simplification
Resumo:
Objective To synthesise recent research on the use of machine learning approaches to mining textual injury surveillance data. Design Systematic review. Data sources The electronic databases which were searched included PubMed, Cinahl, Medline, Google Scholar, and Proquest. The bibliography of all relevant articles was examined and associated articles were identified using a snowballing technique. Selection criteria For inclusion, articles were required to meet the following criteria: (a) used a health-related database, (b) focused on injury-related cases, AND used machine learning approaches to analyse textual data. Methods The papers identified through the search were screened resulting in 16 papers selected for review. Articles were reviewed to describe the databases and methodology used, the strength and limitations of different techniques, and quality assurance approaches used. Due to heterogeneity between studies meta-analysis was not performed. Results Occupational injuries were the focus of half of the machine learning studies and the most common methods described were Bayesian probability or Bayesian network based methods to either predict injury categories or extract common injury scenarios. Models were evaluated through either comparison with gold standard data or content expert evaluation or statistical measures of quality. Machine learning was found to provide high precision and accuracy when predicting a small number of categories, was valuable for visualisation of injury patterns and prediction of future outcomes. However, difficulties related to generalizability, source data quality, complexity of models and integration of content and technical knowledge were discussed. Conclusions The use of narrative text for injury surveillance has grown in popularity, complexity and quality over recent years. With advances in data mining techniques, increased capacity for analysis of large databases, and involvement of computer scientists in the injury prevention field, along with more comprehensive use and description of quality assurance methods in text mining approaches, it is likely that we will see a continued growth and advancement in knowledge of text mining in the injury field.
Resumo:
In this thesis we present and evaluate two pattern matching based methods for answer extraction in textual question answering systems. A textual question answering system is a system that seeks answers to natural language questions from unstructured text. Textual question answering systems are an important research problem because as the amount of natural language text in digital format grows all the time, the need for novel methods for pinpointing important knowledge from the vast textual databases becomes more and more urgent. We concentrate on developing methods for the automatic creation of answer extraction patterns. A new type of extraction pattern is developed also. The pattern matching based approach chosen is interesting because of its language and application independence. The answer extraction methods are developed in the framework of our own question answering system. Publicly available datasets in English are used as training and evaluation data for the methods. The techniques developed are based on the well known methods of sequence alignment and hierarchical clustering. The similarity metric used is based on edit distance. The main conclusions of the research are that answer extraction patterns consisting of the most important words of the question and of the following information extracted from the answer context: plain words, part-of-speech tags, punctuation marks and capitalization patterns, can be used in the answer extraction module of a question answering system. This type of patterns and the two new methods for generating answer extraction patterns provide average results when compared to those produced by other systems using the same dataset. However, most answer extraction methods in the question answering systems tested with the same dataset are both hand crafted and based on a system-specific and fine-grained question classification. The the new methods developed in this thesis require no manual creation of answer extraction patterns. As a source of knowledge, they require a dataset of sample questions and answers, as well as a set of text documents that contain answers to most of the questions. The question classification used in the training data is a standard one and provided already in the publicly available data.
Resumo:
Concept inventory tests are one method to evaluate conceptual understanding and identify possible misconceptions. The multiple-choice question format, offering a choice between a correct selection and common misconceptions, can provide an assessment of students' conceptual understanding in various dimensions. Misconceptions of some engineering concepts exist due to a lack of mental frameworks, or schemas, for these types of concepts or conceptual areas. This study incorporated an open textual response component in a multiple-choice concept inventory test to capture written explanations of students' selections. The study's goal was to identify, through text analysis of student responses, the types and categorizations of concepts in these explanations that had not been uncovered by the distractor selections. The analysis of the textual explanations of a subset of the discrete-time signals and systems concept inventory questions revealed that students have difficulty conceptually explaining several dimensions of signal processing. This contributed to their inability to provide a clear explanation of the underlying concepts, such as mathematical concepts. The methods used in this study evaluate students' understanding of signals and systems concepts through their ability to express understanding in written text. This may present a bias for students with strong written communication skills. This study presents a framework for extracting and identifying the types of concepts students use to express their reasoning when answering conceptual questions.
Resumo:
Non-government actors such as think-tanks are playing an important role in Australian policy work. As governments increasingly outsource policy work previously done by education departments and academics to these new policy actors, more think-tanks have emerged that represent a wide range of political views and ideological positions. This paper looks at the emergence of the Grattan Institute as one significant player in Australian education policy with a particular emphasis on Grattan’s report ‘Turning around low-performing schools’. Grattan exemplifies many of the facets of Barber’s ‘deliverology’, as they produce reports designed to be easily digested, simply actioned and provide reassurance that there is an answer, often through focusing on ‘what works’ recipes. ‘Turning around low-performing schools’ is a perfect example of this deliverology. However, a close analysis of the Report suggests that it contains four major problems which seriously impact its usefulness for schools and policymakers: it ignores data that may be more important in explaining the turn-around of schools, the Report is overly reliant on NAPLAN data, there are reasons to be suspicious about the evidence assembled, and finally the Report falls into a classic trap of logic—the post hoc fallacy.
Resumo:
The diffusion equation-based modeling of near infrared light propagation in tissue is achieved by using finite-element mesh for imaging real-tissue types, such as breast and brain. The finite-element mesh size (number of nodes) dictates the parameter space in the optical tomographic imaging. Most commonly used finite-element meshing algorithms do not provide the flexibility of distinct nodal spacing in different regions of imaging domain to take the sensitivity of the problem into consideration. This study aims to present a computationally efficient mesh simplification method that can be used as a preprocessing step to iterative image reconstruction, where the finite-element mesh is simplified by using an edge collapsing algorithm to reduce the parameter space at regions where the sensitivity of the problem is relatively low. It is shown, using simulations and experimental phantom data for simple meshes/domains, that a significant reduction in parameter space could be achieved without compromising on the reconstructed image quality. The maximum errors observed by using the simplified meshes were less than 0.27% in the forward problem and 5% for inverse problem.
Resumo:
We address the task of mapping a given textual domain model (e.g., an industry-standard reference model) for a given domain (e.g., ERP), with the source code of an independently developed application in the same domain. This has applications in improving the understandability of an existing application, migrating it to a more flexible architecture, or integrating it with other related applications. We use the vector-space model to abstractly represent domain model elements as well as source-code artifacts. The key novelty in our approach is to leverage the relationships between source-code artifacts in a principled way to improve the mapping process. We describe experiments wherein we apply our approach to the task of matching two real, open-source applications to corresponding industry-standard domain models. We demonstrate the overall usefulness of our approach, as well as the role of our propagation techniques in improving the precision and recall of the mapping task.
Resumo:
As one of the most abundant polysaccharides on Earth, xylan will provide more than a third of the sugars for lignocellulosic biofuel production when using grass or hardwood feedstocks. Xylan is characterized by a linear β(1,4)-linked backbone of xylosyl residues substituted by glucuronic acid, 4-O-methylglucuronic acid or arabinose, depending on plant species and cell types. The biological role of these decorations is unclear, but they have a major influence on the properties of the polysaccharide. Despite the recent isolation of several mutants with reduced backbone, the mechanisms of xylan synthesis and substitution are unclear. We identified two Golgi-localized putative glycosyltransferases, GlucUronic acid substitution of Xylan (GUX)-1 and GUX2 that are required for the addition of both glucuronic acid and 4-O-methylglucuronic acid branches to xylan in Arabidopsis stem cell walls. The gux1 gux2 double mutants show loss of xylan glucuronyltransferase activity and lack almost all detectable xylan substitution. Unexpectedly, they show no change in xylan backbone quantity, indicating that backbone synthesis and substitution can be uncoupled. Although the stems are weakened, the xylem vessels are not collapsed, and the plants grow to normal size. The xylan in these plants shows improved extractability from the cell wall, is composed of a single monosaccharide, and requires fewer enzymes for complete hydrolysis. These findings have implications for our understanding of the synthesis and function of xylan in plants. The results also demonstrate the potential for manipulating and simplifying the structure of xylan to improve the properties of lignocellulose for bioenergy and other uses.
Resumo:
[ES]En este trabajo se estudia el uso de los marcadores del discurso y del asíndeton como medios de articulación textual entre los diversos enunciados que constituyen los "Progumnásmata" de Nicolao. Este estudio permite observar si existen diferencias entre las dos partes que componen la edición de Felten y si el uso de partículas de Nicolao es diferente del que hacen los demás autores de "Progumnásmata".
Resumo:
Raquel Merino Álvarez, José Miguel Santamaría, Eterio Pajares (eds.)
Resumo:
[EN]Measuring semantic similarity and relatedness between textual items (words, sentences, paragraphs or even documents) is a very important research area in Natural Language Processing (NLP). In fact, it has many practical applications in other NLP tasks. For instance, Word Sense Disambiguation, Textual Entailment, Paraphrase detection, Machine Translation, Summarization and other related tasks such as Information Retrieval or Question Answering. In this masther thesis we study di erent approaches to compute the semantic similarity between textual items. In the framework of the european PATHS project1, we also evaluate a knowledge-base method on a dataset of cultural item descriptions. Additionaly, we describe the work carried out for the Semantic Textual Similarity (STS) shared task of SemEval-2012. This work has involved supporting the creation of datasets for similarity tasks, as well as the organization of the task itself.
Resumo:
Esta tese tem como objetivo apresentar uma nova atitude diante do ensino de produção de textos. Trata-se do resultado de uma experiência didático-pedagógica cuja meta é deflagrar nos discentes a competência em produção textual. Então, são descritas técnicas que, explorando as várias linguagens e códigos, estimulam os discentes à expressão verbal, em especial, à produção de textos escritos. Baseadas em pressupostos semiótico-linguísticos, as dinâmicas utilizadas nas aulas criam um espaço no qual a produção de textos se dá de forma lúdica, atraente, longe dos bloqueios que normalmente impedem que os alunos sejam proficientes na interação sociocomunicativa e, especificamente, na produção textual escrita em diferentes gêneros textuais. As três técnicas que originaram esta tese integram um conjunto de quinze propostas de atividades reunidas sob o título de Técnicas de Comunicação e Expressão TCE. Tais técnicas buscam desinibir e promover a expressão verbal escrita, em especial. TCE (ou a eletiva Semiótica & Linguagem) surge como um novo paradigma no ensino de produção de textos, trazendo, para os futuros professores, elementos motivadores para a prática textual, de forma a dinamizar esse momento que, quase sempre, é sinônimo de tortura, medo, insegurança e, consequentemente, fracasso
Resumo:
A Literatura e a História sempre foram determinantes na evolução e afirmação de todos os povos que sofreram dominação estrangeira; o que, tantas vezes, levou os povos subjugados à perda de todas ou de uma boa parte de suas características específicas. Uma situação que ocasionou o questionamento das histórias destes povos - elaboradas pelos dominadores da cultura hegemônica à época e que, no nosso trabalho, são identificados como colonizadores. Este trabalho se propõe a visitar e salientar, através de duas obras bem características - a brasileira, Viva o Povo Brasileiro, de João Ubaldo Ribeiro e a senegalesa Sundjata ou a Epopéia Mandinga, de Djibril Tamsir Niane - não só o impacto das ocupações no cotidiano desses povos, mas também discutir e contribuir para a destruição da visão estereotipada desses povos espalhada pelos colonizadores antes de projetar a re-construção das identidades nacional e cultural corrompidas pela dependência cultural, uma das conseqüências da colonização. Tal será levado a cabo através de uma atuação de primeiro e segundo planos do Herói-Mito que, ultrapassando o maravilhoso e o fantástico com que se identifica geralmente sua personagem, sublinha com insistência a evolução de uma entidade totalizadora como o povo-nação: o passado, o presente e o futuro. O Senegal e o Brasil, a partir de uma exploração detalhada de suas culturas, têm plena consciência dos laços mais do que estreitos que os definem como meio-irmãos, frutos de um pai...polígamo