989 resultados para Unstructured data
Resumo:
Unstructured text data, such as emails, blogs, contracts, academic publications, organizational documents, transcribed interviews, and even tweets, are important sources of data in Information Systems research. Various forms of qualitative analysis of the content of these data exist and have revealed important insights. Yet, to date, these analyses have been hampered by limitations of human coding of large data sets, and by bias due to human interpretation. In this paper, we compare and combine two quantitative analysis techniques to demonstrate the capabilities of computational analysis for content analysis of unstructured text. Specifically, we seek to demonstrate how two quantitative analytic methods, viz., Latent Semantic Analysis and data mining, can aid researchers in revealing core content topic areas in large (or small) data sets, and in visualizing how these concepts evolve, migrate, converge or diverge over time. We exemplify the complementary application of these techniques through an examination of a 25-year sample of abstracts from selected journals in Information Systems, Management, and Accounting disciplines. Through this work, we explore the capabilities of two computational techniques, and show how these techniques can be used to gather insights from a large corpus of unstructured text.
Resumo:
Nowadays, Opinion Mining is getting more important than before especially in doing analysis and forecasting about customers’ behavior for businesses purpose. The right decision in producing new products or services based on data about customers’ characteristics means profit for organization/company. This paper proposes a new architecture for Opinion Mining, which uses a multidimensional model to integrate customers’ characteristics and their comments about products (or services). The key step to achieve this objective is to transfer comments (opinions) to a fact table that includes several dimensions, such as, customers, products, time and locations. This research presents a comprehensive way to calculate customers’ orientation for all possible products’ attributes. A use case study is also presented in this paper to show the advantages of using OLAP and data cubes to analyze costumers’ opinions.
Resumo:
Online business or Electronic Commerce (EC) is getting popular among customers today, as a result large number of product reviews have been posted online by the customers. This information is very valuable not only for prospective customers to make decision on buying product but also for companies to gather information of customers’ satisfaction about their products. Opinion mining is used to capture customer reviews and separated this review into subjective expressions (sentiment word) and objective expressions (no sentiment word). This paper proposes a novel, multi-dimensional model for opinion mining, which integrates customers’ characteristics and their opinion about any products. The model captures subjective expression from product reviews and transfers to fact table before representing in multi-dimensions named as customers, products, time and location. Data warehouse techniques such as OLAP and Data Cubes were used to analyze opinionated sentences. A comprehensive way to calculate customers’ orientation on products’ features and attributes are presented in this paper.
Resumo:
This research proposes a multi-dimensional model for Opinion Mining, which integrates customers' characteristics and their opinions about products (or services). Customer opinions are valuable for companies to deliver right products or services to their customers. This research presents a comprehensive framework to evaluate opinions' orientation based on products' hierarchy attributes. It also provides an alternative way to obtain opinion summaries for different groups of customers and different categories of produces.
Resumo:
In the modern and dynamic construction environment it is important to access information in a fast and efficient manner in order to improve the decision making processes for construction managers. This capability is, in most cases, straightforward with today’s technologies for data types with an inherent structure that resides primarily on established database structures like estimating and scheduling software. However, previous research has demonstrated that a significant percentage of construction data is stored in semi-structured or unstructured data formats (text, images, etc.) and that manually locating and identifying such data is a very hard and time-consuming task. This paper focuses on construction site image data and presents a novel image retrieval model that interfaces with established construction data management structures. This model is designed to retrieve images from related objects in project models or construction databases using location, date, and material information (extracted from the image content with pattern recognition techniques).
Resumo:
Thesis (Ph.D.)--University of Washington, 2013
Resumo:
Estudiar la percepción personal y grupal de normas y valores asumidos, imágenes y creencias estereotipadas, códigos y esquemas sociales cristalizados (actitudes individuales y grupales), rutas y trayectorias particulares en relación con la violencia exogrupal juvenil. Validar cualitativamente la Teoría del Comportamiento Planificado y, eventualmente, proponer modificaciones teóricas, cuya eficacia explicativa pueda ser comprobada. 19 jóvenes (17 varones y 2 mujeres) de edades entre 15 y 25 años, residentes en Madrid. Todos ellos cumplen los criterios de selección: durante el último año han agredido físicamente, en dos o más ocasiones y en tanto que miembros de grupo, a una o más personas pertenecientes a otros grupos. Entrevistas individuales en profundidad, semiestructuradas, con bajo nivel de directividad y aplicadas en dos sesiones. El protocolo de la primera incluía las variables y los factores más citados por la literatura especializada, estructurada en tres niveles: macrosocial (cultura y socioeconomía), mesosocial (identidad social) y microsocial (identidad personal y fenomenología de la conducta). Para la segunda sesión se elaboró, partiendo del análisis de la primera entrevista, un guión personalizado con una parte común y una parte adaptada a las peculiaridades del informador y de su grupo. Las entrevistas fueron grabadas y transcritas posteriormente. Con el material resultante se realizó un análisis de contenido para operativizar las variables incluidas en el modelo causal desarrollado. También se desarrolló un procedimiento de análisis cualitativo de carácter mixto, con el objetivo de formular, refutar y modificar hipótesis sobre el comportamiento exogrupal violento. Las declaraciones de los sujetos fueron categorizadas y codificadas mediante el programa ARS-NUDIST (Non-Numerical Unstructured Data Indexing Searching and Theorizing). Se usaron entrevistas. Se ha evidenciado el carácter procesual y sistémico de la adquisición y evolución de la violencia exogrupal juvenil. No debe colegirse de esta premisa que este comportamiento sea resultado en cada ocasión de la incidencia de todos los factores analizados; se postula una influencia progresiva convergente de variables macro, meso y microsociales que predisponen al joven a conformarse a la violencia y, después, en ocasiones, a interiorizarla como elemento básico de su identidad social y/o personal. El apoyo social y la autoestima individual y social, concentrados ambos, de forma preferente, en el grupo de iguales, que parece sustituir o complementar la escasa o inadecuada influencia socializadora de otras personas, grupos e instituciones; complementariamente, se valora muy negativamente la soledad, el aislamiento. Los programas preventivos y de intervención deberían evitar la tentación de psicologizar el problema, centrándose en el individuo, propiciando alternativa o complementariamente el desarrollo de normas y conductas prosociales y generalizables a diferentes ámbitos. El principal objetivo de los programas para reducir la violencia exogrupal es la promoción de identidades personal y social positivas, a través de la realización de conductas valoradas socialmente.
Resumo:
Determinar si los centros de educación separada en secundaria ofrecen una alternativa a la coeducación más eficaz para conseguir una educación integral óptima. . Muestra compuesta por 1532 alumnos de segundo y tercero de ESO de edades comprendidas entre los 13 y los 15 años. Los centros pertenecen a tres comunidades autónomas: Madrid, Cataluña y Murcia, en total: 12 centros: 5 mixtos, 3 femeninos y 4 masculinos. Se seleccionaron clases enteras tratando de igualar los grupos a comparar respecto a: nivel socioeconómico, edad, curso académico y sexo. A cada alumno se le aplicaron los siguientes cuestionarios: cuestionarios de autoestima (Coopersmith, 1986); Cuestionario sobre Conductas de Riesgo (González Molina, 2006); Escala de Valores (Mitchell, 1984) e Inventario de Transtornos de la Conducta Alimentaria (EDI, Garner y col., 1983). Se practica un análisis de los resultados descriptivo y correlacional buscando diferencias significativas entre las muestras. El análisis cualitativo se ha realizado a partir de una serie de entrevistas semiestructuradas realizadas a los siguientes sujetos: alumnos de segundo y tercero de ESO, profesores de esos mismos cursos y padres de los alumnos. Se entrevista a un total de 48 sujetos. las entrevistas fueron analizadas según la estructura del programa NUDIST para tratamiento de los datos cualitativos, programa que va más allá del modelo de archivador que se limita a codificar y recuperar el texto. NUDIST (Non-numerical Unstructured Data, Indexing, Searching and Theorising) Se comparan los resultados de los análisis cuantitativo y cualitativo.. La coeducación favorece más a los chicos, sin que se pueda relacionar este dato con el éxito escolar. El autoconcepto más bajo se encuentra en el grupo de mujeres de coeducación, existiendo diferencias con las mujeres que reciben educación en centros de un solo sexo, de lo que se deduce que la problemática que tienen las adolescentes en esta etapa de su vida se acrecienta cuando están en centros de coeducación. Los alumnos de centros de educación separada poseen un comportamiento más sincero, se arrepienten con mayor naturalidad de aquello que consideran que han hecho mal y son más capaces de reconocer que les gustaría cambiar aspectos de su carácter. En relación con los valores, los alumnos de educación separada se diferencian claramente de los de coeducación en tener puntuaciones más altas en valores religiosos y tenerlas más bajas en valores hedonistas e individualistas. Desde el punto de vista metodológico resulta muy difícil en España encontrar centros de educación separada que sean aconfesionales. Los centros privados masculinos son los segundos en valores religiosos pero ocupan el primer lugar a la hora de valorar el status social. Hay mayor miedo a la madurez en centros de coeducación y mayor perfeccionismo en los de no-coeducación. Los aspectos personales y éticos deben de formar parte integrante de la educación puesto que es la autonomía del individuo la que mejor expresa su racionalidad. El planteamiento de esta investigación ha querido enfrentar los prejuicios establecidos contra la no-coeducación que bien pudiera ser un medio eficaz por el que habría que abogar aunque solo fuera de una manera experimental..
Resumo:
Multidimensional Visualization techniques are invaluable tools for analysis of structured and unstructured data with variable dimensionality. This paper introduces PEx-Image-Projection Explorer for Images-a tool aimed at supporting analysis of image collections. The tool supports a methodology that employs interactive visualizations to aid user-driven feature detection and classification tasks, thus offering improved analysis and exploration capabilities. The visual mappings employ similarity-based multidimensional projections and point placement to layout the data on a plane for visual exploration. In addition to its application to image databases, we also illustrate how the proposed approach can be successfully employed in simultaneous analysis of different data types, such as text and images, offering a common visual representation for data expressed in different modalities.
Resumo:
This work aims at evaluating how effective is knowledge disclosure in attenuating institutional negative reactions caused by uncertainties brought by firms’ new strategies that respond to novel technologies. The empirical setting is from an era of technological ferment, the period of the introduction of the voice over internet protocol (VoIP) in the USA in the early 2000’s. This technology led to the convergence of the wireline telecommu- nications and cable television industries. The Institutional Brokers’ Estimate System (also known as the I/B/E/S system) was used to capture reactions of securities analysts, a revealed important source of institutional pressure on firms’ strategies. For assessing knowledge disclosure, a coding technique and a established content analysis framework were used to quantitatively measure the non-numerical and unstructured data of transcripts of business events occurred at that time. Eventually, several binary response models were tested in order to assess the effect of knowledge disclosure on the probability of institutional positive reactions. The findings are that the odds of favorable institutional reactions increase when a specific kind of knowledge is disclosed. It can be concluded that knowledge disclosure can be considered as a weapon in technological changes situations, attenuating adverse institutional reactions to the companies’ strategies in environments of technological changes.
Resumo:
In a peer-to-peer network, the nodes interact with each other by sharing resources, services and information. Many applications have been developed using such networks, being a class of such applications are peer-to-peer databases. The peer-to-peer databases systems allow the sharing of unstructured data, being able to integrate data from several sources, without the need of large investments, because they are used existing repositories. However, the high flexibility and dynamicity of networks the network, as well as the absence of a centralized management of information, becomes complex the process of locating information among various participants in the network. In this context, this paper presents original contributions by a proposed architecture for a routing system that uses the Ant Colony algorithm to optimize the search for desired information supported by ontologies to add semantics to shared data, enabling integration among heterogeneous databases and the while seeking to reduce the message traffic on the network without causing losses in the amount of responses, confirmed by the improve of 22.5% in this amount. © 2011 IEEE.
Resumo:
Pós-graduação em Engenharia de Produção - FEB
Resumo:
Large amounts of animal health care data are present in veterinary electronic medical records (EMR) and they present an opportunity for companion animal disease surveillance. Veterinary patient records are largely in free-text without clinical coding or fixed vocabulary. Text-mining, a computer and information technology application, is needed to identify cases of interest and to add structure to the otherwise unstructured data. In this study EMR's were extracted from veterinary management programs of 12 participating veterinary practices and stored in a data warehouse. Using commercially available text-mining software (WordStat™), we developed a categorization dictionary that could be used to automatically classify and extract enteric syndrome cases from the warehoused electronic medical records. The diagnostic accuracy of the text-miner for retrieving cases of enteric syndrome was measured against human reviewers who independently categorized a random sample of 2500 cases as enteric syndrome positive or negative. Compared to the reviewers, the text-miner retrieved cases with enteric signs with a sensitivity of 87.6% (95%CI, 80.4-92.9%) and a specificity of 99.3% (95%CI, 98.9-99.6%). Automatic and accurate detection of enteric syndrome cases provides an opportunity for community surveillance of enteric pathogens in companion animals.
Resumo:
Clinical text understanding (CTU) is of interest to health informatics because critical clinical information frequently represented as unconstrained text in electronic health records are extensively used by human experts to guide clinical practice, decision making, and to document delivery of care, but are largely unusable by information systems for queries and computations. Recent initiatives advocating for translational research call for generation of technologies that can integrate structured clinical data with unstructured data, provide a unified interface to all data, and contextualize clinical information for reuse in multidisciplinary and collaborative environment envisioned by CTSA program. This implies that technologies for the processing and interpretation of clinical text should be evaluated not only in terms of their validity and reliability in their intended environment, but also in light of their interoperability, and ability to support information integration and contextualization in a distributed and dynamic environment. This vision adds a new layer of information representation requirements that needs to be accounted for when conceptualizing implementation or acquisition of clinical text processing tools and technologies for multidisciplinary research. On the other hand, electronic health records frequently contain unconstrained clinical text with high variability in use of terms and documentation practices, and without commitmentto grammatical or syntactic structure of the language (e.g. Triage notes, physician and nurse notes, chief complaints, etc). This hinders performance of natural language processing technologies which typically rely heavily on the syntax of language and grammatical structure of the text. This document introduces our method to transform unconstrained clinical text found in electronic health information systems to a formal (computationally understandable) representation that is suitable for querying, integration, contextualization and reuse, and is resilient to the grammatical and syntactic irregularities of the clinical text. We present our design rationale, method, and results of evaluation in processing chief complaints and triage notes from 8 different emergency departments in Houston Texas. At the end, we will discuss significance of our contribution in enabling use of clinical text in a practical bio-surveillance setting.
Resumo:
El presente Trabajo Fin de Grado (TFG) surge de la necesidad de disponer de tecnologías que faciliten el Procesamiento de Lenguaje Natural (NLP) en español dentro del sector de la medicina. Centrado concretamente en la extracción de conocimiento de las historias clínicas electrónicas (HCE), que recogen toda la información relacionada con la salud del paciente y en particular, de los documentos recogidos en dichas historias, pretende la obtención de todos los términos relacionados con la medicina. El Procesamiento de Lenguaje Natural permite la obtención de datos estructurados a partir de información no estructurada. Estas técnicas permiten un análisis de texto que genera etiquetas aportando significado semántico a las palabras para la manipulación de información. A partir de la investigación realizada del estado del arte en NLP y de las tecnologías existentes para otras lenguas, se propone como solución un módulo de anotación de términos médicos extraídos de documentos clínicos. Como términos médicos se han considerado síntomas, enfermedades, partes del cuerpo o tratamientos obtenidos de UMLS, una ontología categorizada que agrega distintas fuentes de datos médicos. Se ha realizado el diseño y la implementación del módulo así como el análisis de los resultados obtenidos realizando una evaluación con treinta y dos documentos que contenían 1372 menciones de terminología médica y que han dado un resultado medio de Precisión: 70,4%, Recall: 36,2%, Accuracy: 31,4% y F-Measure: 47,2%.---ABSTRACT---This Final Thesis arises from the need for technologies that facilitate the Natural Language Processing (NLP) in Spanish in the medical sector. Specifically it is focused on extracting knowledge from Electronic Health Records (EHR), which contain all the information related to the patient's health and, in particular, it expects to obtain all the terms related to medicine from the documents contained in these records. Natural Language Processing allows us to obtain structured information from unstructured data. These techniques enable analysis of text generating labels providing semantic meaning to words for handling information. From the investigation of the state of the art in NLP and existing technologies in other languages, an annotation module of medical terms extracted from clinical documents is proposed as a solution. Symptoms, diseases, body parts or treatments are considered part of the medical terms contained in UMLS ontology which is categorized joining different sources of medical data. This project has completed the design and implementation of a module and the analysis of the results have been obtained. Thirty two documents which contain 1372 mentions of medical terminology have been evaluated and the average results obtained are: Precision: 70.4% Recall: 36.2% Accuracy: 31.4% and F-Measure: 47.2%.