874 resultados para Unstructured data
Resumo:
Modern enterprise knowledge management systems typically require distributed approaches and the integration of numerous heterogeneous sources of information. A powerful foundation for these tasks can be Topic Maps, which not only provide a semantic net-like knowledge representation means and the possibility to use ontologies for modelling knowledge structures, but also offer concepts to link these knowledge structures with unstructured data stored in files, external documents etc. In this paper, we present the architecture and prototypical implementation of a Topic Map application infrastructure, the ‘Topic Grid’, which enables transparent, node-spanning access to different Topic Maps distributed in a network.
Resumo:
Unstructured text data, such as emails, blogs, contracts, academic publications, organizational documents, transcribed interviews, and even tweets, are important sources of data in Information Systems research. Various forms of qualitative analysis of the content of these data exist and have revealed important insights. Yet, to date, these analyses have been hampered by limitations of human coding of large data sets, and by bias due to human interpretation. In this paper, we compare and combine two quantitative analysis techniques to demonstrate the capabilities of computational analysis for content analysis of unstructured text. Specifically, we seek to demonstrate how two quantitative analytic methods, viz., Latent Semantic Analysis and data mining, can aid researchers in revealing core content topic areas in large (or small) data sets, and in visualizing how these concepts evolve, migrate, converge or diverge over time. We exemplify the complementary application of these techniques through an examination of a 25-year sample of abstracts from selected journals in Information Systems, Management, and Accounting disciplines. Through this work, we explore the capabilities of two computational techniques, and show how these techniques can be used to gather insights from a large corpus of unstructured text.
Resumo:
Nowadays, Opinion Mining is getting more important than before especially in doing analysis and forecasting about customers’ behavior for businesses purpose. The right decision in producing new products or services based on data about customers’ characteristics means profit for organization/company. This paper proposes a new architecture for Opinion Mining, which uses a multidimensional model to integrate customers’ characteristics and their comments about products (or services). The key step to achieve this objective is to transfer comments (opinions) to a fact table that includes several dimensions, such as, customers, products, time and locations. This research presents a comprehensive way to calculate customers’ orientation for all possible products’ attributes. A use case study is also presented in this paper to show the advantages of using OLAP and data cubes to analyze costumers’ opinions.
Resumo:
Online business or Electronic Commerce (EC) is getting popular among customers today, as a result large number of product reviews have been posted online by the customers. This information is very valuable not only for prospective customers to make decision on buying product but also for companies to gather information of customers’ satisfaction about their products. Opinion mining is used to capture customer reviews and separated this review into subjective expressions (sentiment word) and objective expressions (no sentiment word). This paper proposes a novel, multi-dimensional model for opinion mining, which integrates customers’ characteristics and their opinion about any products. The model captures subjective expression from product reviews and transfers to fact table before representing in multi-dimensions named as customers, products, time and location. Data warehouse techniques such as OLAP and Data Cubes were used to analyze opinionated sentences. A comprehensive way to calculate customers’ orientation on products’ features and attributes are presented in this paper.
Resumo:
This research proposes a multi-dimensional model for Opinion Mining, which integrates customers' characteristics and their opinions about products (or services). Customer opinions are valuable for companies to deliver right products or services to their customers. This research presents a comprehensive framework to evaluate opinions' orientation based on products' hierarchy attributes. It also provides an alternative way to obtain opinion summaries for different groups of customers and different categories of produces.
Resumo:
1. Ecological data sets often use clustered measurements or use repeated sampling in a longitudinal design. Choosing the correct covariance structure is an important step in the analysis of such data, as the covariance describes the degree of similarity among the repeated observations. 2. Three methods for choosing the covariance are: the Akaike information criterion (AIC), the quasi-information criterion (QIC), and the deviance information criterion (DIC). We compared the methods using a simulation study and using a data set that explored effects of forest fragmentation on avian species richness over 15 years. 3. The overall success was 80.6% for the AIC, 29.4% for the QIC and 81.6% for the DIC. For the forest fragmentation study the AIC and DIC selected the unstructured covariance, whereas the QIC selected the simpler autoregressive covariance. Graphical diagnostics suggested that the unstructured covariance was probably correct. 4. We recommend using DIC for selecting the correct covariance structure.
Resumo:
Objectives Demonstrate the application of decision trees – classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs) – to understand structure in missing data. Setting Data taken from employees at three different industry sites in Australia. Participants 7915 observations were included. Materials and Methods The approach was evaluated using an occupational health dataset comprising results of questionnaires, medical tests, and environmental monitoring. Statistical methods included standard statistical tests and the ‘rpart’ and ‘gbm’ packages for CART and BRT analyses, respectively, from the statistical software ‘R’. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results CART and BRT models were effective in highlighting a missingness structure in the data, related to the Type of data (medical or environmental), the site in which it was collected, the number of visits and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured compared to structured missingness. Discussion Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusion Researchers are encouraged to use CART and BRT models to explore and understand missing data.
Resumo:
Huge amount of data are generated from a variety of information sources in healthcare while the data sources originate from a veracity of clinical information systems and corporate data warehouses. The data derived from the above data sources are used for analysis and trending purposes thus playing an influential role as a real time decision-making tool. The unstructured, narrative data provided by these data sources qualify as healthcare big-data and researchers argue that the application of big-data in healthcare might enable the accountability and efficiency.
Resumo:
Recent data indicate that levels of overweight and obesity are increasing at an alarming rate throughout the world. At a population level (and commonly to assess individual health risk), the prevalence of overweight and obesity is calculated using cut-offs of the Body Mass Index (BMI) derived from height and weight. Similarly, the BMI is also used to classify individuals and to provide a notional indication of potential health risk. It is likely that epidemiologic surveys that are reliant on BMI as a measure of adiposity will overestimate the number of individuals in the overweight (and slightly obese) categories. This tendency to misclassify individuals may be more pronounced in athletic populations or groups in which the proportion of more active individuals is higher. This differential is most pronounced in sports where it is advantageous to have a high BMI (but not necessarily high fatness). To illustrate this point we calculated the BMIs of international professional rugby players from the four teams involved in the semi-finals of the 2003 Rugby Union World Cup. According to the World Health Organisation (WHO) cut-offs for BMI, approximately 65% of the players were classified as overweight and approximately 25% as obese. These findings demonstrate that a high BMI is commonplace (and a potentially desirable attribute for sport performance) in professional rugby players. An unanswered question is what proportion of the wider population, classified as overweight (or obese) according to the BMI, is misclassified according to both fatness and health risk? It is evident that being overweight should not be an obstacle to a physically active lifestyle. Similarly, a reliance on BMI alone may misclassify a number of individuals who might otherwise have been automatically considered fat and/or unfit.
Resumo:
In this paper, a singularly perturbed ordinary differential equation with non-smooth data is considered. The numerical method is generated by means of a Petrov-Galerkin finite element method with the piecewise-exponential test function and the piecewise-linear trial function. At the discontinuous point of the coefficient, a special technique is used. The method is shown to be first-order accurate and singular perturbation parameter uniform convergence. Finally, numerical results are presented, which are in agreement with theoretical results.