994 resultados para born digital
Resumo:
In this paper, we discuss the issues related to word recognition in born-digital word images. We introduce a novel method of power-law transformation on the word image for binarization. We show the improvement in image binarization and the consequent increase in the recognition performance of OCR engine on the word image. The optimal value of gamma for a word image is automatically chosen by our algorithm with fixed stroke width threshold. We have exhaustively experimented our algorithm by varying the gamma and stroke width threshold value. By varying the gamma value, we found that our algorithm performed better than the results reported in the literature. On the ICDAR Robust Reading Systems Challenge-1: Word Recognition Task on born digital dataset, as compared to the recognition rate of 61.5% achieved by TH-OCR after suitable pre-processing by Yang et. al. and 63.4% by ABBYY Fine Reader (used as baseline by the competition organizers without any preprocessing), we achieved 82.9% using Omnipage OCR applied on the images after being processed by our algorithm.
Resumo:
Text segmentation and localization algorithms are proposed for the born-digital image dataset. Binarization and edge detection are separately carried out on the three colour planes of the image. Connected components (CC's) obtained from the binarized image are thresholded based on their area and aspect ratio. CC's which contain sufficient edge pixels are retained. A novel approach is presented, where the text components are represented as nodes of a graph. Nodes correspond to the centroids of the individual CC's. Long edges are broken from the minimum spanning tree of the graph. Pair wise height ratio is also used to remove likely non-text components. A new minimum spanning tree is created from the remaining nodes. Horizontal grouping is performed on the CC's to generate bounding boxes of text strings. Overlapping bounding boxes are removed using an overlap area threshold. Non-overlapping and minimally overlapping bounding boxes are used for text segmentation. Vertical splitting is applied to generate bounding boxes at the word level. The proposed method is applied on all the images of the test dataset and values of precision, recall and H-mean are obtained using different approaches.
Resumo:
This paper reflects on the experience of PanamaTipico.com, an independent website specialized in the research and preservation of the cultural heritage of the Republic of Panama, a developing country located in Central America. Basic information about the project is described. Also discussed, are some of the challenges confronted by the project and the results achieved. The goal of this paper is to encourage a discussion on whether or not the experience of Pan-amaTipico.com is comparable to the experiences of similar projects in developing countries in Eastern Europe and elsewhere.
Resumo:
Presentation from the MARAC conference in Roanoke, VA on October 7–10, 2015. S6 - Digital Archives: New Colleagues, New Solutions.
Resumo:
Social media play a prominent role in mediating issues of public concern, not only providing the stage on which public debates play out but also shaping their topics and dynamics. Building on and extending existing approaches to both issue mapping and social media analysis, this article explores ways of accounting for popular media practices and the special case of ‘born digital’ sociocultural controversies. We present a case study of the GamerGate controversy with a particular focus on a spike in activity associated with a 2015 Law and Order: SVU episode about gender-based violence and harassment in games culture that was widely interpreted as being based on events associated with GamerGate. The case highlights the importance and challenges of accounting for the cultural dynamics of digital media within and across platforms.
Resumo:
Why do people become archivists? Historically (and anecdotally) it was a deep love of musty, old records that drew people to the profession. While there have been many other motivating forces that inspired would-be archivists, it is most often that one hears of people seeking jobs in archives for love of “the stuff,” as evidenced in Kate Thiemer’s blog post, Honest tips for wannabe archivists (2012). As a result of the continually advancing presence of digitized and born digital archival collections, the physical nature of archival “stuff” is changing. While there remains the physical imprint of digital information on floppy disks, CDs, DVDs, hard drives, and old computers; the aspects of these physical artifacts might not evoke the same visceral pull to the profession as musty, raspy, paper-based documents. In light of this shift in physical presentation of information, we are faced with the question: how does love of archival “stuff” translate to work in digital archives? What is and/or will be the pull to become a digital archivist? To answer these questions, we will perform a survey-based study where we will invite archivists who work with both traditional and digital archival material to answer questions related to the aspects of their work that inspired or motivated them to join the profession. What motivates people to become archivists? What aspects of digital archives do or can potentially motivate people to seek out a career as an archivist? What, if any, motivational factors for becoming a traditional archivist are the same as those for becoming a digital archivist? What, if any, motivational factors for becoming a traditional archivist are different from those for becoming a digital archivist? By answering these questions, we hope to expand the archival discussion on what it means to be an archivist in the digital age. What compelling intrinsic, evidential, or informational values are present in digital archival content that will draw professionals to the field? Are there other values inherent in digital content that are currently unexplored? In our poster, we will present our discussion of the topic, our survey design, and results we have at the time of the Institute. Thiemer, K. (2012). Honest tips for wannabe archivists. Archivesnext blog. Retrieved from http://www.archivesnext.com/?p=2849
Resumo:
Purpose – The internet is transforming possibilities for creative interaction, experimentation and cultural consumption in China and raising important questions about the role that “publishers” might play in an open and networked digital world. The purpose of this paper is to consider the role that copyright is playing in the growth of a publishing industry that is being “born digital”. Design/methodology/approach – The paper approaches online literature as an example of a creative industry that is generating value for a wider creative economy through its social network market functions. It builds on the social network market definition of the creative industries proposed by Potts et al. and uses this definition to interrogate the role that copyright plays in a rapidly-evolving creative economy. Findings – The rapid growth of a market for crowd-sourced content is combining with growing commercial freedom in cultural space to produce a dynamic landscape of business model experimentation. Using the social web to engage audiences, generate content, establish popularity and build reputation and then converting those assets into profit through less networked channels appears to be a driving strategy in the expansion of wider creative industries markets in China. Originality/value – At a moment when publishing industries all over the world are struggling to come to terms with digital technology, the emergence of a rapidly-growing area of publishing that is being born digital offers important clues about the future of publishing and what social network markets might mean for the role of copyright in a digital age.
Resumo:
It is imperative that we consider the use of current and emerging technologies in terms of the nature of our learners, the physical environment of the lecture theatre, and how technology may help to support appropriate pedagogies that facilitate the capturing of student attention in active engaging learning experiences. It is argued that a re-evaluation of pedagogy is required to address the tech-savy traits of the 21st century learner and the extent to which their mobile devices are capable of not only distracting them from learning but also enhancing face-to-face learning experiences.
Resumo:
Like music and the news media before it, the film and television business is now facing its time of digital disruption. Major changes are being brought about in global online distribution of film and television by new players, such as Google/YouTube, Apple, Amazon, Yahoo!, Facebook, Netflix and Hulu, some of whom massively outrank in size and growth the companies that run film and television today. Content, Hollywood has always asserted, is King. But the power and profitability in screen industries have always resided in distribution. Incumbents in the screen industries tried to control the emerging dynamics of online distribution, but failed. The new, born digital, globally focused, players are developing TV network-like strategies, including commissioning content that has widened the net of what counts as television. Content may be King, but these new players may become the King Kongs of the online world.
Resumo:
Games and the broader interactive entertainment industry are the major ‘born global/born digital’ creative industry. The videogame industry (formally referred to as interactive entertainment) is the economic sector that develops, markets and sells videogames to millions of people worldwide. There are over 11 countries with revenues of over $1 billion. This number was expected to grow 9.1 per cent annually to $48.9 in 2011 and $68 billion in 2012, making it the fastest-growing component of the international media sector (Scanlon, 2007; Caron, 2008).
Resumo:
We have benchmarked the maximum obtainable recognition accuracy on five publicly available standard word image data sets using semi-automated segmentation and a commercial OCR. These images have been cropped from camera captured scene images, born digital images (BDI) and street view images. Using the Matlab based tool developed by us, we have annotated at the pixel level more than 3600 word images from the five data sets. The word images binarized by the tool, as well as by our own midline analysis and propagation of segmentation (MAPS) algorithm are recognized using the trial version of Nuance Omnipage OCR and these two results are compared with the best reported in the literature. The benchmark word recognition rates obtained on ICDAR 2003, Sign evaluation, Street view, Born-digital and ICDAR 2011 data sets are 83.9%, 89.3%, 79.6%, 88.5% and 86.7%, respectively. The results obtained from MAPS binarized word images without the use of any lexicon are 64.5% and 71.7% for ICDAR 2003 and 2011 respectively, and these values are higher than the best reported values in the literature of 61.1% and 41.2%, respectively. MAPS results of 82.8% for BDI 2011 dataset matches the performance of the state of the art method based on power law transform.
Resumo:
A informação digitalizada e nado digital, fruto do avanço tecnológico proporcionado pelas Tecnologias da Informação e Comunicação (TIC), bem como da filosofia participativa da Web 2.0, conduziu à necessidade de reflexão sobre a capacidade de os modelos atuais, para a organização e representação da informação, de responder às necessidades info-comunicacionais assim como o acesso à informação eletrónica pelos utilizadores em Instituições de Memória. O presente trabalho de investigação tem como objetivo a conceção e avaliação de um modelo genérico normativo e harmonizador para a organização e representação da informação eletrónica, num sistema de informação para o uso de utilizadores e profissionais da informação, no contexto atual colaborativo e participativo. A definição dos objetivos propostos teve por base o estudo e análise qualitativa das normas adotadas pelas instituições de memória, para os registos de autoridade, bibliográfico e formatos de representação. Após a concetualização, foi desenvolvido e avaliado o protótipo, essencialmente, pela análise qualitativa dos dados obtidos a partir de testes à recuperação da informação. A experiência decorreu num ambiente laboratorial onde foram realizados testes, entrevistas e inquéritos por questionário. A análise cruzada dos resultados, obtida pela triangulação dos dados recolhidos através das várias fontes, permitiu concluir que tanto os utilizadores como os profissionais da informação consideraram muito interessante a integração da harmonização normativa refletida nos vários módulos, a integração de serviços/ferramentas comunicacionais e a utilização da componente participativa/colaborativa da plataforma privilegiando a Wiki, seguida dos Comentários, Tags, Forum de discussão e E-mail.
Resumo:
El siguiente trabajo es una recopilación de información sobre la tecnología digital y su proceso de evolución hasta nuestros días. Pretende mostrar como la innovación ha sido un motor de cambio en este sector, ideando un nuevo modelo de negocio donde su cadena de valor para llegar al cliente es más rápida, flexible y rentable. El mundo digital abarca múltiples conocimientos y ha revolucionado la sociedad de conocimiento a través de las tecnologías de comunicación, tanto en la academia, el entretenimiento y todas las ciencias. En Colombia la industria digital ha tenido un gran impulso a través del ministerio de tecnología y comunicación y empresas que han motivado e impulsado a emprendedores a desarrollar aplicaciones e incursionar en este mercado que ofrece grandes ventajas competitivas.
Resumo:
This poster presentation from the May 2015 Florida Library Association Conference, along with the Everglades Explorer discovery portal at http://ee.fiu.edu, demonstrates how traditional bibliographic and curatorial principles can be applied to: 1) selection, cross-walking and aggregation of metadata linking end-users to wide-spread digital resources from multiple silos; 2) harvesting of select PDFs, HTML and media for web archiving and access; 3) selection of CMS domains, sub-domains and folders for targeted searching using an API. Choosing content for this discovery portal is comparable to past scholarly practice of creating and publishing subject bibliographies, except metadata and data are housed in relational databases. This new and yet traditional capacity coincides with: Growth of bibliographic utilities (MarcEdit); Evolution of open-source discovery systems (eXtensible Catalog); Development of target-capable web crawling and archiving systems (Archive-it); and specialized search APIs (Google). At the same time, historical and technical changes – specifically the increasing fluidity and re-purposing of syndicated metadata – make this possible. It equally stems from the expansion of freely accessible digitized legacy and born-digital resources. Innovation principles helped frame the process by which the thematic Everglades discovery portal was created at Florida International University. The path -- to providing for more effective searching and co-location of digital scientific, educational and historical material related to the Everglades -- is contextualized through five concepts found within Dyer and Christensen’s “The Innovator’s DNA: Mastering the five skills of disruptive innovators (2011). The project also aligns with Ranganathan’s Laws of Library Science, especially the 4th Law -- to "save the time of the user.”
Resumo:
We propose to design a Custom Learning System that responds to the unique needs and potentials of individual students, regardless of their location, abilities, attitudes, and circumstances. This project is intentionally provocative and future-looking but it is not unrealistic or unfeasible. We propose that by combining complex learning databases with a learner’s personal data, we could provide all students with a personal, customizable, and flexible education. This paper presents the initial research undertaken for this project of which the main challenges were to broadly map the complex web of data available, to identify what logic models are required to make the data meaningful for learning, and to translate this knowledge into simple and easy-to-use interfaces. The ultimate outcome of this research will be a series of candidate user interfaces and a broad system logic model for a new smart system for personalized learning. This project is student-centered, not techno-centric, aiming to deliver innovative solutions for learners and schools. It is deliberately future-looking, allowing us to ask questions that take us beyond the limitations of today to motivate new demands on technology.