69 resultados para Text similarity analysis


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we present a robust method to detect handwritten text from unconstrained drawings on normal whiteboards. Unlike printed text on documents, free form handwritten text has no pattern in terms of size, orientation and font and it is often mixed with other drawings such as lines and shapes. Unlike handwritings on paper, handwritings on a normal whiteboard cannot be scanned so the detection has to be based on photos. Our work traces straight edges on photos of the whiteboard and builds graph representation of connected components. We use geometric properties such as edge density, graph density, aspect ratio and neighborhood similarity to differentiate handwritten text from other drawings. The experiment results show that our method achieves satisfactory precision and recall. Furthermore, the method is robust and efficient enough to be deployed in a mobile device. This is an important enabler of business applications that support whiteboard-centric visual meetings in enterprise scenarios. © 2012 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Experiences showed that developing business applications that base on text analysis normally requires a lot of time and expertise in the field of computer linguistics. Several approaches of integrating text analysis systems with business applications have been proposed, but so far there has been no coordinated approach which would enable building scalable and flexible applications of text analysis in enterprise scenarios. In this paper, a service-oriented architecture for text processing applications in the business domain is introduced. It comprises various groups of processing components and knowledge resources. The architecture, created as a result of our experiences with building natural language processing applications in business scenarios, allows for the reuse of text analysis and other components, and facilitates the development of business applications. We verify our approach by showing how the proposed architecture can be applied to create a text analytics enabled business application that addresses a concrete business scenario. © 2010 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Changes in water quality parameters such as pH and salinity can have a significant effect on productivity of aquaculture species. Similarly, relative osmotic pressure influences various physiological processes and regulates expression of a number of osmoregulatory genes. Among those, carbonic anhydrase (CA) plays a key role in systemic acid–base balance and ion regulation. Redclaw crayfish (Cherax quadricarinatus) are unique in their ability to thrive in environments with naturally varied pH levels, suggesting unique adaptation to pH stress. To date, however, no studies have focused on identification and characterisation of CA or other osmoregulatory genes in C. quadricarinatus. Here, we analysed the redclaw gill transcriptome and characterized CA genes along with a number of other key osmoregulatory genes that were identified in the transcriptome. We also examined patterns of gene expression of these CA genes when exposed to three pH treatments. In total, 72,382,710 paired end Illumina reads were assembled into 36,128 contigs with an average length of 800 bp. Approximately 37% of contigs received significant BLAST hits and 22% were assigned gene ontology terms. Three full length CA isoforms; cytoplasmic CA (ChqCAc), glycosyl-phosphatidylinositol-linked CA (ChqCAg), and β-CA (ChqCA-beta) as well as two partial CA gene sequences were identified. Both partial CA genes showed high similarity to ChqCAg and appeared to be duplicated from the ChqCAg. Full length coding sequences of Na+/K+-ATPase, V-type H+-ATPase, sarcoplasmic Ca+-ATPase, arginine kinase, calreticulin and Cl− channel protein 2 were also identified. Only the ChqCAc gene showed significant differences in expression across the three pH treatments. These data provide valuable information on the gill expressed CA genes and their expression patterns in freshwater crayfish. Overall our data suggest an important role for the ChqCAc gene in response to changes in pH and in systemic acid–base balance in freshwater crayfish.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we illustrate a set of features of the Apromore process model repository for analyzing business process variants. Two types of analysis are provided: one is static and based on differences on the process control flow, the other is dynamic and based on differences in the process behavior between the variants. These features combine techniques for the management of large process model collections with those for mining process knowledge from process execution logs. The tool demonstration will be useful for researchers and practitioners working on large process model collections and process execution logs, and specifically for those with an interest in understanding, managing and consolidating business process variants both within and across organizational boundaries.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The world is rich with information such as signage and maps to assist humans to navigate. We present a method to extract topological spatial information from a generic bitmap floor plan and build a topometric graph that can be used by a mobile robot for tasks such as path planning and guided exploration. The algorithm first detects and extracts text in an image of the floor plan. Using the locations of the extracted text, flood fill is used to find the rooms and hallways. Doors are found by matching SURF features and these form the connections between rooms, which are the edges of the topological graph. Our system is able to automatically detect doors and differentiate between hallways and rooms, which is important for effective navigation. We show that our method can extract a topometric graph from a floor plan and is robust against ambiguous cases most commonly seen in floor plans including elevators and stairwells.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective Death certificates provide an invaluable source for cancer mortality statistics; however, this value can only be realised if accurate, quantitative data can be extracted from certificates – an aim hampered by both the volume and variable nature of certificates written in natural language. This paper proposes an automatic classification system for identifying cancer related causes of death from death certificates. Methods Detailed features, including terms, n-grams and SNOMED CT concepts were extracted from a collection of 447,336 death certificates. These features were used to train Support Vector Machine classifiers (one classifier for each cancer type). The classifiers were deployed in a cascaded architecture: the first level identified the presence of cancer (i.e., binary cancer/nocancer) and the second level identified the type of cancer (according to the ICD-10 classification system). A held-out test set was used to evaluate the effectiveness of the classifiers according to precision, recall and F-measure. In addition, detailed feature analysis was performed to reveal the characteristics of a successful cancer classification model. Results The system was highly effective at identifying cancer as the underlying cause of death (F-measure 0.94). The system was also effective at determining the type of cancer for common cancers (F-measure 0.7). Rare cancers, for which there was little training data, were difficult to classify accurately (F-measure 0.12). Factors influencing performance were the amount of training data and certain ambiguous cancers (e.g., those in the stomach region). The feature analysis revealed a combination of features were important for cancer type classification, with SNOMED CT concept and oncology specific morphology features proving the most valuable. Conclusion The system proposed in this study provides automatic identification and characterisation of cancers from large collections of free-text death certificates. This allows organisations such as Cancer Registries to monitor and report on cancer mortality in a timely and accurate manner. In addition, the methods and findings are generally applicable beyond cancer classification and to other sources of medical text besides death certificates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we present the results of an exploratory study that examined the problem of automating content analysis of student online discussion transcripts. We looked at the problem of coding discussion transcripts for the levels of cognitive presence, one of the three main constructs in the Community of Inquiry (CoI) model of distance education. Using Coh-Metrix and LIWC features, together with a set of custom features developed to capture discussion context, we developed a random forest classification system that achieved 70.3% classification accuracy and 0.63 Cohen's kappa, which is significantly higher than values reported in the previous studies. Besides improvement in classification accuracy, the developed system is also less sensitive to overfitting as it uses only 205 classification features, which is around 100 times less features than in similar systems based on bag-of-words features. We also provide an overview of the classification features most indicative of the different phases of cognitive presence that gives an additional insights into the nature of cognitive presence learning cycle. Overall, our results show great potential of the proposed approach, with an added benefit of providing further characterization of the cognitive presence coding scheme.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Concept inventory tests are one method to evaluate conceptual understanding and identify possible misconceptions. The multiple-choice question format, offering a choice between a correct selection and common misconceptions, can provide an assessment of students' conceptual understanding in various dimensions. Misconceptions of some engineering concepts exist due to a lack of mental frameworks, or schemas, for these types of concepts or conceptual areas. This study incorporated an open textual response component in a multiple-choice concept inventory test to capture written explanations of students' selections. The study's goal was to identify, through text analysis of student responses, the types and categorizations of concepts in these explanations that had not been uncovered by the distractor selections. The analysis of the textual explanations of a subset of the discrete-time signals and systems concept inventory questions revealed that students have difficulty conceptually explaining several dimensions of signal processing. This contributed to their inability to provide a clear explanation of the underlying concepts, such as mathematical concepts. The methods used in this study evaluate students' understanding of signals and systems concepts through their ability to express understanding in written text. This may present a bias for students with strong written communication skills. This study presents a framework for extracting and identifying the types of concepts students use to express their reasoning when answering conceptual questions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In an estuary, mixing and dispersion resulting from turbulence and small scale fluctuation has strong spatio-temporal variability which cannot be resolved in conventional hydrodynamic models while some models employs parameterizations large water bodies. This paper presents small scale diffusivity estimates from high resolution drifters sampled at 10 Hz for periods of about 4 hours to resolve turbulence and shear diffusivity within a tidal shallow estuary (depth < 3 m). Taylor's diffusion theorem forms the basis of a first order estimate for the diffusivity scale. Diffusivity varied between 0.001 – 0.02 m2/s during the flood tide experiment. The diffusivity showed strong dependence (R2 > 0.9) on the horizontal mean velocity within the channel. Enhanced diffusivity caused by shear dispersion resulting from the interaction of large scale flow with the boundary geometries was observed. Turbulence within the shallow channel showed some similarities with the boundary layer flow which include consistency with slope of 5/3 predicted by Kolmogorov's similarity hypothesis within the inertial subrange. The diffusivities scale locally by 4/3 power law following Okubo's scaling and the length scale scales as 3/2 power law of the time scale. The diffusivity scaling herein suggests that the modelling of small scale mixing within tidal shallow estuaries can be approached from classical turbulence scaling upon identifying pertinent parameters.