919 resultados para optical character recognition system


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nos encontramos a escasos 8 años del siglo XXI y la información que cobra día a día más auge y más importancia, el acelerado devenir tecnológico y los descubrimientos científicos, hacen que el bibliotecario sea un transmisor de innovación y comunicación que se desenvuelva en un mundo competitivo, en donde debe ser agresivo, dinámico y capaz de adoptar todo ese cúmulo tecnológico y científico si quiere sobrevivir en el futuro como profesional.Si retrocedemos cinco años, nos damos cuenta que la Bibliotecología es una de las disciplinas que más ha evolucionado con respecto a términos relacionados con gestión automatizada de información. Palabras como scanners, videodisco, reconocimiento de caracteres ópticos, CD-ROM, CD-I (Disco compacto interactivo), etc., forman parte del vocabulario bibliotecológico que ha sido incorporado por los profesionales quienes se desenvuelven en el complicado mundo de la información.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This presentation was given at the Panhandle Library Access Network's (PLAN) Innovation Conference: Digitization- Preserving the Past for the Future Conference on August 14th, 2015. The presentation uses a specific collection of directories as a case study of the complications librarians and archivists face in digitizing older materials that may also be quite large, such as a directory. Prime OCR and Abbyy Fine Reader are discussed and their pros and cons covered. Troubleshooting and editing with Adobe Photoshop is also discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This presentation was given at the FLVC regional conference at Broward College on May 7, 2015 and introduced scanning, processing, record creation, dissemination, and preservation in FIU Libraries' Digital Collections Center. The main focus was on processing, specifically employing OCR technology with difficult sources.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose to develop a 3-D optical flow features based human action recognition system. Optical flow based features are employed here since they can capture the apparent movement in object, by design. Moreover, they can represent information hierarchically from local pixel level to global object level. In this work, 3-D optical flow based features a re extracted by combining the 2-1) optical flow based features with the depth flow features obtained from depth camera. In order to develop an action recognition system, we employ a Meta-Cognitive Neuro-Fuzzy Inference System (McFIS). The m of McFIS is to find the decision boundary separating different classes based on their respective optical flow based features. McFIS consists of a neuro-fuzzy inference system (cognitive component) and a self-regulatory learning mechanism (meta-cognitive component). During the supervised learning, self-regulatory learning mechanism monitors the knowledge of the current sample with respect to the existing knowledge in the network and controls the learning by deciding on sample deletion, sample learning or sample reserve strategies. The performance of the proposed action recognition system was evaluated on a proprietary data set consisting of eight subjects. The performance evaluation with standard support vector machine classifier and extreme learning machine indicates improved performance of McFIS is recognizing actions based of 3-D optical flow based features.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Chinese language is based on characters which are syllabic in nature. Since languages have syllabotactic rules which govern the construction of syllables and their allowed sequences, Chinese character sequence models can be used as a first level approximation of allowed syllable sequences. N-gram character sequence models were trained on 4.3 billion characters. Characters are used as a first level recognition unit with multiple pronunciations per character. For comparison the CU-HTK Mandarin word based system was used to recognize words which were then converted to character sequences. The character only system error rates for one best recognition were slightly worse than word based character recognition. However combining the two systems using log-linear combination gives better results than either system separately. An equally weighted combination gave consistent CER gains of 0.1-0.2% absolute over the word based standard system. Copyright © 2009 ISCA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Identifying an individual from surveillance video is a difficult, time consuming and labour intensive process. The proposed system aims to streamline this process by filtering out unwanted scenes and enhancing an individual's face through super-resolution. An automatic face recognition system is then used to identify the subject or present the human operator with likely matches from a database. A person tracker is used to speed up the subject detection and super-resolution process by tracking moving subjects and cropping a region of interest around the subject's face to reduce the number and size of the image frames to be super-resolved respectively. In this paper, experiments have been conducted to demonstrate how the optical flow super-resolution method used improves surveillance imagery for visual inspection as well as automatic face recognition on an Eigenface and Elastic Bunch Graph Matching system. The optical flow based method has also been benchmarked against the ``hallucination'' algorithm, interpolation methods and the original low-resolution images. Results show that both super-resolution algorithms improved recognition rates significantly. Although the hallucination method resulted in slightly higher recognition rates, the optical flow method produced less artifacts and more visually correct images suitable for human consumption.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Characteristics of surveillance video generally include low resolution and poor quality due to environmental, storage and processing limitations. It is extremely difficult for computers and human operators to identify individuals from these videos. To overcome this problem, super-resolution can be used in conjunction with an automated face recognition system to enhance the spatial resolution of video frames containing the subject and narrow down the number of manual verifications performed by the human operator by presenting a list of most likely candidates from the database. As the super-resolution reconstruction process is ill-posed, visual artifacts are often generated as a result. These artifacts can be visually distracting to humans and/or affect machine recognition algorithms. While it is intuitive that higher resolution should lead to improved recognition accuracy, the effects of super-resolution and such artifacts on face recognition performance have not been systematically studied. This paper aims to address this gap while illustrating that super-resolution allows more accurate identification of individuals from low-resolution surveillance footage. The proposed optical flow-based super-resolution method is benchmarked against Baker et al.’s hallucination and Schultz et al.’s super-resolution techniques on images from the Terrascope and XM2VTS databases. Ground truth and interpolated images were also tested to provide a baseline for comparison. Results show that a suitable super-resolution system can improve the discriminability of surveillance video and enhance face recognition accuracy. The experiments also show that Schultz et al.’s method fails when dealing surveillance footage due to its assumption of rigid objects in the scene. The hallucination and optical flow-based methods performed comparably, with the optical flow-based method producing less visually distracting artifacts that interfered with human recognition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents the design and development of a novel optical vehicle classifier system, which is based on interruption of laser beams, that is suitable for use in places with poor transportation infrastructure. The system can estimate the speed, axle count, wheelbase, tire diameter, and the lane of motion of a vehicle. The design of the system eliminates the need for careful optical alignment, whereas the proposed estimation strategies render the estimates insensitive to angular mounting errors and to unevenness of the road. Strategies to estimate vehicular parameters are described along with the optimization of the geometry of the system to minimize estimation errors due to quantization. The system is subsequently fabricated, and the proposed features of the system are experimentally demonstrated. The relative errors in the estimation of velocity and tire diameter are shown to be within 0.5% and to change by less than 17% for angular mounting errors up to 30 degrees. In the field, the classifier demonstrates accuracy better than 97.5% and 94%, respectively, in the estimation of the wheelbase and lane of motion and can classify vehicles with an average accuracy of over 89.5%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Facial emotions are the most expressive way to display emotions. Many algorithms have been proposed which employ a particular set of people (usually a database) to both train and test their model. This paper focuses on the challenging task of database independent emotion recognition, which is a generalized case of subject-independent emotion recognition. The emotion recognition system employed in this work is a Meta-Cognitive Neuro-Fuzzy Inference System (McFIS). McFIS has two components, a neuro-fuzzy inference system, which is the cognitive component and a self-regulatory learning mechanism, which is the meta-cognitive component. The meta-cognitive component, monitors the knowledge in the neuro-fuzzy inference system and decides on what-to-learn, when-to-learn and how-to-learn the training samples, efficiently. For each sample, the McFIS decides whether to delete the sample without being learnt, use it to add/prune or update the network parameter or reserve it for future use. This helps the network avoid over-training and as a result improve its generalization performance over untrained databases. In this study, we extract pixel based emotion features from well-known (Japanese Female Facial Expression) JAFFE and (Taiwanese Female Expression Image) TFEID database. Two sets of experiment are conducted. First, we study the individual performance of both databases on McFIS based on 5-fold cross validation study. Next, in order to study the generalization performance, McFIS trained on JAFFE database is tested on TFEID and vice-versa. The performance The performance comparison in both experiments against SVNI classifier gives promising results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Handwriting is an acquired tool used for communication of one's observations or feelings. Factors that inuence a person's handwriting not only dependent on the individual's bio-mechanical constraints, handwriting education received, writing instrument, type of paper, background, but also factors like stress, motivation and the purpose of the handwriting. Despite the high variation in a person's handwriting, recent results from different writer identification studies have shown that it possesses sufficient individual traits to be used as an identification method. Handwriting as a behavioral biometric has had the interest of researchers for a long time. But recently it has been enjoying new interest due to an increased need and effort to deal with problems ranging from white-collar crime to terrorist threats. The identification of the writer based on a piece of handwriting is a challenging task for pattern recognition. The main objective of this thesis is to develop a text independent writer identification system for Malayalam Handwriting. The study also extends to developing a framework for online character recognition of Grantha script and Malayalam characters

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Die stereoskopische 3-D-Darstellung beruht auf der naturgetreuen Präsentation verschiedener Perspektiven für das rechte und linke Auge. Sie erlangt in der Medizin, der Architektur, im Design sowie bei Computerspielen und im Kino, zukünftig möglicherweise auch im Fernsehen, eine immer größere Bedeutung. 3-D-Displays dienen der zusätzlichen Wiedergabe der räumlichen Tiefe und lassen sich grob in die vier Gruppen Stereoskope und Head-mounted-Displays, Brillensysteme, autostereoskopische Displays sowie echte 3-D-Displays einteilen. Darunter besitzt der autostereoskopische Ansatz ohne Brillen, bei dem N≥2 Perspektiven genutzt werden, ein hohes Potenzial. Die beste Qualität in dieser Gruppe kann mit der Methode der Integral Photography, die sowohl horizontale als auch vertikale Parallaxe kodiert, erreicht werden. Allerdings ist das Verfahren sehr aufwendig und wird deshalb wenig genutzt. Den besten Kompromiss zwischen Leistung und Preis bieten präzise gefertigte Linsenrasterscheiben (LRS), die hinsichtlich Lichtausbeute und optischen Eigenschaften den bereits früher bekannten Barrieremasken überlegen sind. Insbesondere für die ergonomisch günstige Multiperspektiven-3-D-Darstellung wird eine hohe physikalische Monitorauflösung benötigt. Diese ist bei modernen TFT-Displays schon recht hoch. Eine weitere Verbesserung mit dem theoretischen Faktor drei erreicht man durch gezielte Ansteuerung der einzelnen, nebeneinander angeordneten Subpixel in den Farben Rot, Grün und Blau. Ermöglicht wird dies durch die um etwa eine Größenordnung geringere Farbauflösung des menschlichen visuellen Systems im Vergleich zur Helligkeitsauflösung. Somit gelingt die Implementierung einer Subpixel-Filterung, welche entsprechend den physiologischen Gegebenheiten mit dem in Luminanz und Chrominanz trennenden YUV-Farbmodell arbeitet. Weiterhin erweist sich eine Schrägstellung der Linsen im Verhältnis von 1:6 als günstig. Farbstörungen werden minimiert, und die Schärfe der Bilder wird durch eine weniger systematische Vergrößerung der technologisch unvermeidbaren Trennelemente zwischen den Subpixeln erhöht. Der Grad der Schrägstellung ist frei wählbar. In diesem Sinne ist die Filterung als adaptiv an den Neigungswinkel zu verstehen, obwohl dieser Wert für einen konkreten 3-D-Monitor eine Invariante darstellt. Die zu maximierende Zielgröße ist der Parameter Perspektiven-Pixel als Produkt aus Anzahl der Perspektiven N und der effektiven Auflösung pro Perspektive. Der Idealfall einer Verdreifachung wird praktisch nicht erreicht. Messungen mit Hilfe von Testbildern sowie Schrifterkennungstests lieferten einen Wert von knapp über 2. Dies ist trotzdem als eine signifikante Verbesserung der Qualität der 3-D-Darstellung anzusehen. In der Zukunft sind weitere Verbesserungen hinsichtlich der Zielgröße durch Nutzung neuer, feiner als TFT auflösender Technologien wie LCoS oder OLED zu erwarten. Eine Kombination mit der vorgeschlagenen Filtermethode wird natürlich weiterhin möglich und ggf. auch sinnvoll sein.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The automatic interpretation of conventional traffic signs is very complex and time consuming. The paper concerns an automatic warning system for driving assistance. It does not interpret the standard traffic signs on the roadside; the proposal is to incorporate into the existing signs another type of traffic sign whose information will be more easily interpreted by a processor. The type of information to be added is profuse and therefore the most important object is the robustness of the system. The basic proposal of this new philosophy is that the co-pilot system for automatic warning and driving assistance can interpret with greater ease the information contained in the new sign, whilst the human driver only has to interpret the "classic" sign. One of the codings that has been tested with good results and which seems to us easy to implement is that which has a rectangular shape and 4 vertical bars of different colours. The size of these signs is equivalent to the size of the conventional signs (approximately 0.4 m2). The colour information from the sign can be easily interpreted by the proposed processor and the interpretation is much easier and quicker than the information shown by the pictographs of the classic signs

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Physicochemical experimental techniques combined with the specificity of a biological recognition system have resulted in a variety of new analytical devices known as biosensors. Biosensors are under intensive development worldwide because they have many potential applications, e.g. in the fields of clinical diagnostics, food analysis, and environmental monitoring. Much effort is spent on the development of highly sensitive sensor platforms to study interactions on the molecular scale. In the first part, this thesis focuses on exploiting the biosensing application of nanoporous gold (NPG) membranes. NPG with randomly distributed nanopores (pore sizes less than 50 nm) will be discussed here. The NPG membrane shows unique plasmonic features, i.e. it supports both propagating and localized surface plasmon resonance modes (p SPR and l-SPR, respectively), both offering sensitive probing of the local refractive index variation on/in NPG. Surface refractive index sensors have an inherent advantage over fluorescence optical biosensors that require a chromophoric group or other luminescence label to transduce the binding event. In the second part, gold/silica composite inverse opals with macroporous structures were investigated with bio- or chemical sensing applications in mind. These samples combined the advantages of a larger available gold surface area with a regular and highly ordered grating structure. The signal of the plasmon was less noisy in these ordered substrate structures compared to the random pore structures of the NPG samples. In the third part of the thesis, surface plasmon resonance (SPR) spectroscopy was applied to probe the protein-protein interaction of the calcium binding protein centrin with the heterotrimeric G-protein transducin on a newly designed sensor platform. SPR spectroscopy was intended to elucidate how the binding of centrin to transducin is regulated towards understanding centrin functions in photoreceptor cells.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In an automotive environment, the performance of a speech recognition system is affected by environmental noise if the speech signal is acquired directly from a microphone. Speech enhancement techniques are therefore necessary to improve the speech recognition performance. In this paper, a field-programmable gate array (FPGA) implementation of dual-microphone delay-and-sum beamforming (DASB) for speech enhancement is presented. As the first step towards a cost-effective solution, the implementation described in this paper uses a relatively high-end FPGA device to facilitate the verification of various design strategies and parameters. Experimental results show that the proposed design can produce output waveforms close to those generated by a theoretical (floating-point) model with modest usage of FPGA resources. Speech recognition experiments are also conducted on enhanced in-car speech waveforms produced by the FPGA in order to compare recognition performance with the floating-point representation running on a PC.