911 resultados para Machine Learning,Natural Language Processing,Descriptive Text Mining,POIROT,Transformer
Resumo:
Aircraft Maintenance, Repair and Overhaul (MRO) agencies rely largely on row-data based quotation systems to select the best suppliers for the customers (airlines). The data quantity and quality becomes a key issue to determining the success of an MRO job, since we need to ensure we achieve cost and quality benchmarks. This paper introduces a data mining approach to create an MRO quotation system that enhances the data quantity and data quality, and enables significantly more precise MRO job quotations. Regular Expression was utilized to analyse descriptive textual feedback (i.e. engineer’s reports) in order to extract more referable highly normalised data for job quotation. A text mining based key influencer analysis function enables the user to proactively select sub-parts, defects and possible solutions to make queries more accurate. Implementation results show that system data would improve cost quotation in 40% of MRO jobs, would reduce service cost without causing a drop in service quality.
Resumo:
Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.
Resumo:
25 monolingual (L1) children with Specific Language Impairment (SLI), 32 sequential bilingual (L2) children, and 29 L1 controls completed the Test of Active & Passive Sentences-Revised (van der Lely, 1996) and the self-paced listening task with picture verification for actives and passives (Marinis, 2007). These revealed important between-group differences in both tasks. The children with SLI showed difficulties in both actives and passives when they had to reanalyse thematic roles on-line. Their error pattern provided evidence for working memory limitations. The L2 children showed difficulties only in passives both on-line and off-line. We suggest that these relate to the complex syntactic algorithm in passives and reflect an earlier developmental stage due to reduced exposure to the L2. The results are discussed in relation to theories of SLI and can be best accommodated within accounts proposing that difficulties in the comprehension of passives stem from processing limitations.
Resumo:
This cross-sectional study examines the role of L1-L2 differences and structural distance in the processing of gender and number agreement by English-speaking learners of Spanish at three different levels of proficiency. Preliminary results show that differences between the L1 and L2 impact L2 development, as sensitivity to gender agreement violations, as opposed to number agreement violations, emerges only in learners at advanced levels of proficiency. Results also show that the establishment of agreement dependencies is impacted by the structural distance between the agreeing elements for native speakers and for learners at intermediate and advanced levels of proficiency but not for low proficiency. The overall pattern of results suggests that the linguistic factors examined here impact development but do not constrain ultimate attainment; for advanced learners, results suggest that second language processing is qualitatively similar to native processing.
Resumo:
Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.
Resumo:
While there has been a fair amount of research investigating children’s syntactic processing during spoken language comprehension, and a wealth of research examining adults’ syntactic processing during reading, as yet very little research has focused on syntactic processing during text reading in children. In two experiments, children and adults read sentences containing a temporary syntactic ambiguity while their eye movements were monitored. In Experiment 1, participants read sentences such as, ‘The boy poked the elephant with the long stick/trunk from outside the cage’ in which the attachment of a prepositional phrase was manipulated. In Experiment 2, participants read sentences such as, ‘I think I’ll wear the new skirt I bought tomorrow/yesterday. It’s really nice’ in which the attachment of an adverbial phrase was manipulated. Results showed that adults and children exhibited similar processing preferences, but that children were delayed relative to adults in their detection of initial syntactic misanalysis. It is concluded that children and adults have the same sentence-parsing mechanism in place, but that it operates with a slightly different time course. In addition, the data support the hypothesis that the visual processing system develops at a different rate than the linguistic processing system in children.
Resumo:
This special issue is a testament to the recent burgeoning interest by theoretical linguists, language acquisitionists and teaching practitioners in the neuroscience of language. It offers a highly valuable, state-of-the-art overview of the neurophysiological methods that are currently being applied to questions in the field of second language (L2) acquisition, teaching and processing. Research in the area of neurolinguistics has developed dramatically in the past twenty years, providing a wealth of exciting findings, many of which are discussed in the papers in this volume. The goal of this commentary is twofold. The first is to critically assess the current state of neurolinguistic data from the point of view of language acquisition and processing—informed by the papers that comprise this special issue and the literature as a whole—pondering how the neuroscience of language/processing might inform us with respect to linguistic and language acquisition theories. The second goal is to offer some links from implications of exploring the first goal towards informing language teachers and the creation of linguistically and neurolinguistically-informed evidence-based pedagogies for non-native language teaching.
Resumo:
The present study examines the processing of subject-verb (SV) number agreement with coordinate subjects in pre-verbal and post-verbal positions in Greek. Greek is a language with morphological number marked on nominal and verbal elements. Coordinate SV agreement, however, is special in Greek as it is sensitive to the coordinate subject's position: when pre-verbal, the verb is marked for plural while when post-verbal the verb can be in the singular. We conducted two experiments, an acceptability judgment task with adult monolinguals as a pre-study (Experiment 1) and a self-paced reading task as the main study (Experiment 2) in order to obtain acceptance as well as processing data. Forty adult monolingual speakers of Greek participated in Experiment 1 and a hundred and forty one in Experiment 2. Seventy one children participated in Experiment 2: 30 Albanian-Greek sequential bilingual children and 41 Greek monolingual children aged 10–12 years. The adult data in Experiment 1 establish the difference in acceptability between singular VPs in SV and VS constructions reaffirming our hypothesis. Meanwhile, the adult data in Experiment 2 show that plural verbs accelerate processing regardless of subject position. The child online data show that sequential bilingual children have longer reading times (RTs) compared to the age-matched monolingual control group. However, both child groups follow a similar processing pattern in both pre-verbal and post-verbal constructions showing longer RTs immediately after a singular verb when the subject was pre-verbal indicating a grammaticality effect. In the post-verbal coordinate subject sentences, both child groups showed longer RTs on the first subject following the plural verb due to the temporary number mismatch between the verb and the first subject. This effect was resolved in monolingual children but was still present at the end of the sentence for bilingual children indicating difficulties to reanalyze and integrate information. Taken together, these findings demonstrate that (a) 10–12 year-old sequential bilingual children are sensitive to number agreement in SV coordinate constructions parsing sentences in the same way as monolingual children even though their vocabulary abilities are lower than that of age-matched monolingual peers and (b) bilinguals are slower in processing overall.
Resumo:
The amount of textual information digitally stored is growing every day. However, our capability of processing and analyzing that information is not growing at the same pace. To overcome this limitation, it is important to develop semiautomatic processes to extract relevant knowledge from textual information, such as the text mining process. One of the main and most expensive stages of the text mining process is the text pre-processing stage, where the unstructured text should be transformed to structured format such as an attribute-value table. The stemming process, i.e. linguistics normalization, is usually used to find the attributes of this table. However, the stemming process is strongly dependent on the language in which the original textual information is given. Furthermore, for most languages, the stemming algorithms proposed in the literature are computationally expensive. In this work, several improvements of the well know Porter stemming algorithm for the Portuguese language, which explore the characteristics of this language, are proposed. Experimental results show that the proposed algorithm executes in far less time without affecting the quality of the generated stems.
Resumo:
Establishing metrics to assess machine translation (MT) systems automatically is now crucial owing to the widespread use of MT over the web. In this study we show that such evaluation can be done by modeling text as complex networks. Specifically, we extend our previous work by employing additional metrics of complex networks, whose results were used as input for machine learning methods and allowed MT texts of distinct qualities to be distinguished. Also shown is that the node-to-node mapping between source and target texts (English-Portuguese and Spanish-Portuguese pairs) can be improved by adding further hierarchical levels for the metrics out-degree, in-degree, hierarchical common degree, cluster coefficient, inter-ring degree, intra-ring degree and convergence ratio. The results presented here amount to a proof-of-principle that the possible capturing of a wider context with the hierarchical levels may be combined with machine learning methods to yield an approach for assessing the quality of MT systems. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Scenarios for the emergence or bootstrap of a lexicon involve the repeated interaction between at least two agents who must reach a consensus on how to name N objects using H words. Here we consider minimal models of two types of learning algorithms: cross-situational learning, in which the individuals determine the meaning of a word by looking for something in common across all observed uses of that word, and supervised operant conditioning learning, in which there is strong feedback between individuals about the intended meaning of the words. Despite the stark differences between these learning schemes, we show that they yield the same communication accuracy in the limits of large N and H, which coincides with the result of the classical occupancy problem of randomly assigning N objects to H words.
Resumo:
Intelligent Transportation System (ITS) is a system that builds a safe, effective and integrated transportation environment based on advanced technologies. Road signs detection and recognition is an important part of ITS, which offer ways to collect the real time traffic data for processing at a central facility.This project is to implement a road sign recognition model based on AI and image analysis technologies, which applies a machine learning method, Support Vector Machines, to recognize road signs. We focus on recognizing seven categories of road sign shapes and five categories of speed limit signs. Two kinds of features, binary image and Zernike moments, are used for representing the data to the SVM for training and test. We compared and analyzed the performances of SVM recognition model using different features and different kernels. Moreover, the performances using different recognition models, SVM and Fuzzy ARTMAP, are observed.
Resumo:
Parkinson’s disease (PD) is an increasing neurological disorder in an aging society. The motor and non-motor symptoms of PD advance with the disease progression and occur in varying frequency and duration. In order to affirm the full extent of a patient’s condition, repeated assessments are necessary to adjust medical prescription. In clinical studies, symptoms are assessed using the unified Parkinson’s disease rating scale (UPDRS). On one hand, the subjective rating using UPDRS relies on clinical expertise. On the other hand, it requires the physical presence of patients in clinics which implies high logistical costs. Another limitation of clinical assessment is that the observation in hospital may not accurately represent a patient’s situation at home. For such reasons, the practical frequency of tracking PD symptoms may under-represent the true time scale of PD fluctuations and may result in an overall inaccurate assessment. Current technologies for at-home PD treatment are based on data-driven approaches for which the interpretation and reproduction of results are problematic. The overall objective of this thesis is to develop and evaluate unobtrusive computer methods for enabling remote monitoring of patients with PD. It investigates first-principle data-driven model based novel signal and image processing techniques for extraction of clinically useful information from audio recordings of speech (in texts read aloud) and video recordings of gait and finger-tapping motor examinations. The aim is to map between PD symptoms severities estimated using novel computer methods and the clinical ratings based on UPDRS part-III (motor examination). A web-based test battery system consisting of self-assessment of symptoms and motor function tests was previously constructed for a touch screen mobile device. A comprehensive speech framework has been developed for this device to analyze text-dependent running speech by: (1) extracting novel signal features that are able to represent PD deficits in each individual component of the speech system, (2) mapping between clinical ratings and feature estimates of speech symptom severity, and (3) classifying between UPDRS part-III severity levels using speech features and statistical machine learning tools. A novel speech processing method called cepstral separation difference showed stronger ability to classify between speech symptom severities as compared to existing features of PD speech. In the case of finger tapping, the recorded videos of rapid finger tapping examination were processed using a novel computer-vision (CV) algorithm that extracts symptom information from video-based tapping signals using motion analysis of the index-finger which incorporates a face detection module for signal calibration. This algorithm was able to discriminate between UPDRS part III severity levels of finger tapping with high classification rates. Further analysis was performed on novel CV based gait features constructed using a standard human model to discriminate between a healthy gait and a Parkinsonian gait. The findings of this study suggest that the symptom severity levels in PD can be discriminated with high accuracies by involving a combination of first-principle (features) and data-driven (classification) approaches. The processing of audio and video recordings on one hand allows remote monitoring of speech, gait and finger-tapping examinations by the clinical staff. On the other hand, the first-principles approach eases the understanding of symptom estimates for clinicians. We have demonstrated that the selected features of speech, gait and finger tapping were able to discriminate between symptom severity levels, as well as, between healthy controls and PD patients with high classification rates. The findings support suitability of these methods to be used as decision support tools in the context of PD assessment.
Resumo:
Esta tese relaciona-se ao tema da acumulação de capacidades tecnológicas em nível de empresa, suas fontes (mecanismos de aprendizagem) e suas implicações para a performance competitiva empresarial, em indústrias relacionadas a recursos naturais no contexto de economias emergentes. Durante os últimos 40 anos muito se avançou no campo da pesquisa sobre acumulação de capacidades tecnológicas em nível de empresas em economias emergentes. Porém, ainda há importantes lacunas a serem exploradas particularmente em relação a natureza e dinâmica de trajetórias de acumulação de capacidades tecnológicas intra-empresariais, no contexto de indústrias relacionadas a recursos naturais, especialmente na mineração. O objetivo da pesquisa aqui reportada é contribuir para minimizar esta lacuna. Particularmente, esta tese explora variações intra-empresariais de trajetórias de acumulação de capacidades tecnológicas, suas fontes (mecanismos subjacentes de aprendizagem) e algumas implicações que são geradas para a performance competitiva da empresa, no âmbito da indústria de mineração no Brasil, especificamente na empresa mineradora Vale, no período entre 1942 e 2015. Para alcançar este objetivo, a pesquisa apoia-se na combinação das literaturas de acumulação de capacidade tecnológica, inovação, aprendizagem e desenvolvimento industrial para a construção da base conceitual. Em paralelo, a pesquisa vale-se de um desenho de pesquisa qualitativo e indutivo baseado em evidências primárias com base em extensivos trabalhos de campo. Esse desenho de pesquisa é operacionalizado a partir de um estudo de caso individual em profundidade na Vale, no âmbito de três grandes áreas tecnológicas: prospecção e pesquisa mineral, lavra, e processamento mineral. Por meio da implementação dessa estratégia, a pesquisa encontrou: (1) Trajetórias de acumulação de capacidades tecnológicas relativamente distintas entre as três áreas de análise. Especificamente: (i) na área de prospecção e pesquisa mineral observou-se uma trajetória de seguidora tecnológica que alcança posição de liderança mundial em inovação e produção; (ii) na área de lavra também evidenciou-se uma trajetória de seguidora tecnológica que alcança posição de liderança mundial em inovação e produção, porém, com acumulação tardia de capacidades tecnológicas; e (iii) na área de processamento mineral houve o alcance precoce de posição de liderança mundial em inovação e produção a partir da criação de uma trajetória distinta da já mapeada pelos líderes globais. As trajetórias de acumulação de capacidades tecnológicas convergem para uma situação similar a partir de 2011 quando as três áreas apresentaram posição de liderança mundial em inovação e ao mesmo tempo demonstraram indícios de entrada em um processo de estagnação, que restringe as oportunidades de entrada em novas trajetórias tecnológicas e de entrada em novos negócios. (2) As variações encontradas nas trajetórias de acumulação de capacidades tecnológicas são explicadas pela maneira pela qual diferentes mecanismos subjacentes de aprendizagem foram combinados e utilizados pela empresa. Foram encontradas quatro combinações de mecanismos de aprendizagem que contribuíram para explicar as diferentes direções das trajetórias de acumulação de capacidades tecnológicas. (3) As variações nas três trajetórias de acumulação de capacidades tecnológicas geraram implicações distintas para a performance competitiva, mais especificamente em termos de performance inovadora, operacional/ambiental e de novos negócios. Os resultados da pesquisa geram contribuições para o entendimento do relacionamento intra-empresarial entre acumulação de capacidades tecnológicas e os mecanismos de aprendizagem subjacentes. E contribuem para o entendimento da mineração como uma indústria que oferece oportunidades para inovações significativas que, inclusive podem implicar na diversificação do tecido industrial nacional. Portanto, tal indústria deve receber atenção especial por parte de decisores de políticas públicas e de ações empresariais para evitar que empresas com alto potencial inovativo desconstruam suas capacidades inovadoras e consequentemente limitem seus impactos para o desenvolvimento tecnológico e econômico no contexto de economias emergentes.