903 resultados para Data-driven knowledge acquisition
Resumo:
The social media classification problems draw more and more attention in the past few years. With the rapid development of Internet and the popularity of computers, there is astronomical amount of information in the social network (social media platforms). The datasets are generally large scale and are often corrupted by noise. The presence of noise in training set has strong impact on the performance of supervised learning (classification) techniques. A budget-driven One-class SVM approach is presented in this thesis that is suitable for large scale social media data classification. Our approach is based on an existing online One-class SVM learning algorithm, referred as STOCS (Self-Tuning One-Class SVM) algorithm. To justify our choice, we first analyze the noise-resilient ability of STOCS using synthetic data. The experiments suggest that STOCS is more robust against label noise than several other existing approaches. Next, to handle big data classification problem for social media data, we introduce several budget driven features, which allow the algorithm to be trained within limited time and under limited memory requirement. Besides, the resulting algorithm can be easily adapted to changes in dynamic data with minimal computational cost. Compared with two state-of-the-art approaches, Lib-Linear and kNN, our approach is shown to be competitive with lower requirements of memory and time.
Resumo:
At the moment, the phrases “big data” and “analytics” are often being used as if they were magic incantations that will solve all an organization’s problems at a stroke. The reality is that data on its own, even with the application of analytics, will not solve any problems. The resources that analytics and big data can consume represent a significant strategic risk if applied ineffectively. Any analysis of data needs to be guided, and to lead to action. So while analytics may lead to knowledge and intelligence (in the military sense of that term), it also needs the input of knowledge and intelligence (in the human sense of that term). And somebody then has to do something new or different as a result of the new insights, or it won’t have been done to any purpose. Using an analytics example concerning accounts payable in the public sector in Canada, this paper reviews thinking from the domains of analytics, risk management and knowledge management, to show some of the pitfalls, and to present a holistic picture of how knowledge management might help tackle the challenges of big data and analytics.
Resumo:
This dissertation offers a critical international political economy (IPE) analysis of the ways in which consumer information has been governed throughout the formal history of consumer finance (1840 – present). Drawing primarily on the United States, this project problematizes the notion of consumer financial big data as a ‘new era’ by tracing its roots historically from late nineteenth century through to the present. Using a qualitative case study approach, this project applies a unique theoretical framework to three instances of governance in consumer credit big data. Throughout, the historically specific means used to govern consumer credit data are rooted in dominant ideas, institutions and material factors.
Resumo:
This paper is based on the novel use of a very high fidelity decimation filter chain for Electrocardiogram (ECG) signal acquisition and data conversion. The multiplier-free and multi-stage structure of the proposed filters lower the power dissipation while minimizing the circuit area which are crucial design constraints to the wireless noninvasive wearable health monitoring products due to the scarce operational resources in their electronic implementation. The decimation ratio of the presented filter is 128, working in tandem with a 1-bit 3rd order Sigma Delta (ΣΔ) modulator which achieves 0.04 dB passband ripples and -74 dB stopband attenuation. The work reported here investigates the non-linear phase effects of the proposed decimation filters on the ECG signal by carrying out a comparative study after phase correction. It concludes that the enhanced phase linearity is not crucial for ECG acquisition and data conversion applications since the signal distortion of the acquired signal, due to phase non-linearity, is insignificant for both original and phase compensated filters. To the best of the authors’ knowledge, being free of signal distortion is essential as this might lead to misdiagnosis as stated in the state of the art. This article demonstrates that with their minimal power consumption and minimal signal distortion features, the proposed decimation filters can effectively be employed in biosignal data processing units.
Resumo:
Here, we describe gene expression compositional assignment (GECA), a powerful, yet simple method based on compositional statistics that can validate the transfer of prior knowledge, such as gene lists, into independent data sets, platforms and technologies. Transcriptional profiling has been used to derive gene lists that stratify patients into prognostic molecular subgroups and assess biomarker performance in the pre-clinical setting. Archived public data sets are an invaluable resource for subsequent in silico validation, though their use can lead to data integration issues. We show that GECA can be used without the need for normalising expression levels between data sets and can outperform rank-based correlation methods. To validate GECA, we demonstrate its success in the cross-platform transfer of gene lists in different domains including: bladder cancer staging, tumour site of origin and mislabelled cell lines. We also show its effectiveness in transferring an epithelial ovarian cancer prognostic gene signature across technologies, from a microarray to a next-generation sequencing setting. In a final case study, we predict the tumour site of origin and histopathology of epithelial ovarian cancer cell lines. In particular, we identify and validate the commonly-used cell line OVCAR-5 as non-ovarian, being gastrointestinal in origin. GECA is available as an open-source R package.
Resumo:
The use of remote sensing for monitoring of submerged aquatic vegetation (SAV) in fluvial environments has been limited by the spatial and spectral resolution of available image data. The absorption of light in water also complicates the use of common image analysis methods. This paper presents the results of a study that uses very high resolution (VHR) image data, collected with a Near Infrared sensitive DSLR camera, to map the distribution of SAV species for three sites along the Desselse Nete, a lowland river in Flanders, Belgium. Plant species, including Ranunculus aquatilis L., Callitriche obtusangula Le Gall, Potamogeton natans L., Sparganium emersum L. and Potamogeton crispus L., were classified from the data using Object-Based Image Analysis (OBIA) and expert knowledge. A classification rule set based on a combination of both spectral and structural image variation (e.g. texture and shape) was developed for images from two sites. A comparison of the classifications with manually delineated ground truth maps resulted for both sites in 61% overall accuracy. Application of the rule set to a third validation image, resulted in 53% overall accuracy. These consistent results show promise for species level mapping in such biodiverse environments, but also prompt a discussion on assessment of classification accuracy.
Resumo:
This paper explores, both with empirical data and with computer simulations, the extent to which modularity characterises experts' knowledge. We discuss a replication of Chase and Simon's (1973) classic method of identifying 'chunks', i.e., perceptual patterns stored in memory and used as units. This method uses data about the placement of pairs of items in a memory task and consists of comparing latencies between these items and the number and type of relations they share. We then compare the human data with simulations carried out with CHREST, a computer model of perception and memory. We show that the model, based upon the acquisition of a large number of chunks, accounts for the human data well. This is taken as evidence that human knowledge is organised in a modular fashion.
Resumo:
Discovery Driven Analysis (DDA) is a common feature of OLAP technology to analyze structured data. In essence, DDA helps analysts to discover anomalous data by highlighting 'unexpected' values in the OLAP cube. By giving indications to the analyst on what dimensions to explore, DDA speeds up the process of discovering anomalies and their causes. However, Discovery Driven Analysis (and OLAP in general) is only applicable on structured data, such as records in databases. We propose a system to extend DDA technology to semi-structured text documents, that is, text documents with a few structured data. Our system pipeline consists of two stages: first, the text part of each document is structured around user specified dimensions, using semi-PLSA algorithm; then, we adapt DDA to these fully structured documents, thus enabling DDA on text documents. We present some applications of this system in OLAP analysis and show how scalability issues are solved. Results show that our system can handle reasonable datasets of documents, in real time, without any need for pre-computation.
Resumo:
Availability, Data Privacy and Copyrights – Opening Knowledge via Contracts and Pilots, discusses how in Aviisi-project of National Library of Finland, the digital contents, and their availability topics dealt together with pilot organizations
Resumo:
A sociedade em que vivemos hoje encontra-se submersa pela tecnologia em todas as áreas e saberes, exigindo-se aos cidadãos a capacidade de pensar e agir racionalmente perante os desafios e problemas que vão surgindo ao longo da vida. As tecnologias da informação e comunicação (TIC) são uma presença evidente no quotidiano e diversas investigações e estudos têm vindo a enfatizar o seu potencial no processo de ensino-aprendizagem. Ademais, face aos desafios científicos e tecnológicos da nossa sociedade, é fundamental formar-se cidadãos capazes de pensar criticamente e racionalmente, de tomarem decisões focadas no que se deve fazer ou acreditar. Assim, reconhecendo a importância de se promover o pensamento crítico (PC) recorrendo-se às TIC, desenvolveu-se a presente investigação com o intuito de promover o PC dos alunos, através da exploração de diversos recursos digitais já disponíveis gratuitamente na internet. Decorrente da finalidade apresentada formularam-se as seguintes questões de estudo: (i) Quais os contributos dos recursos digitais explicitamente promotores de pensamento crítico dos alunos do 3.º ano de escolaridade?; e (ii) Quais os contributos dos recursos digitais explicitamente promotores de pensamento crítico para a construção de conhecimentos científicos, na área das Ciências, dos alunos do 3.º ano de escolaridade?. A presente investigação desenvolveu-se de acordo com um paradigma sócio-crítico, segundo uma perspetiva metodológica predominantemente qualitativa e com base num plano de Investigação-Ação. A implementação das três atividades construídas, incluindo a exploração dos recursos digitais, decorreu numa turma de 26 alunos inseridos num colégio do distrito de Aveiro, no qual se realizou a Prática Pedagógica Supervisionada da professora investigadora. Para a recolha dos dados utilizaram-se diversos instrumentos, nomeadamente testes de levantamento inicial e final das capacidades de PC, listas de verificação, diário do investigador, fichas de trabalho resolvidas pelos alunos (tendo por base a Taxonomia de Ennis) e questionários de autoavaliação do desempenho dos mesmos. Na análise de dados, privilegiou-se a análise de conteúdo recorrendo-se ao software WebQDA. Com base nos instrumentos de recolha de dados referidos verificou-se que os alunos manifestaram uma maior facilidade a responder a questões que promoviam o uso de capacidades de PC referentes à Clarificação Elementar e à Indução e uma dificuldade acentuada na dimensão das Estratégias e Táticas. No que diz respeito à construção de conhecimentos científicos, os alunos demonstraram, principalmente, dificuldades em reconhecer que o movimento é impulsionado pelas rodas dentadas e a constatar que o tamanho do fio influencia a velocidade do pêndulo. Pode-se concluir que os recursos digitais, de um modo geral, contribuíram para mobilizar o potencial de capacidades de PC e para a construção de conhecimentos científicos. O contributo deste estudo, embora modesto, prende-se com o potencial das TIC na Educação em Ciências dos 1º CEB, na promoção do pensamento crítico e de conhecimentos.
Resumo:
The main purpose of the current study was to examine the role of vocabulary knowledge (VK) and syntactic knowledge (SK) in L2 listening comprehension, as well as their relative significance. Unlike previous studies, the current project employed assessment tasks to measure aural and proceduralized VK and SK. In terms of VK, to avoid under-representing the construct, measures of both breadth (VB) and depth (VD) were included. Additionally, the current study examined the role of VK and SK by accounting for individual differences in two important cognitive factors in L2 listening: metacognitive knowledge (MK) and working memory (WM). Also, to explore the role of VK and SK more fully, the current study accounted for the negative impact of anxiety on WM and L2 listening. The study was carried out in an English as a Foreign Language (EFL) context, and participants were 263 Iranian learners at a wide range of English proficiency from lower-intermediate to advanced. Participants took a battery of ten linguistic, cognitive and affective measures. Then, the collected data were subjected to several preliminary analyses, but structural equation modeling (SEM) was then used as the primary analysis method to answer the study research questions. Results of the preliminary analyses revealed that MK and WM were significant predictors of L2 listening ability; thus, they were kept in the main SEM analyses. The significant role of WM was only observed when the negative effect of anxiety on WM was accounted for. Preliminary analyses also showed that VB and VD were not distinct measures of VK. However, the results also showed that if VB and VD were considered separate, VD was a better predictor of L2 listening success. The main analyses of the current study revealed a significant role for both VK and SK in explaining success in L2 listening comprehension, which differs from findings from previous empirical studies. However, SEM analysis did not reveal a statistically significant difference in terms of the predictive power of the two linguistic factors. Descriptive results of the SEM analysis, along with results from regression analysis, indicated to a more significant role for VK.
Resumo:
Background: Understanding transcriptional regulation by genome-wide microarray studies can contribute to unravel complex relationships between genes. Attempts to standardize the annotation of microarray data include the Minimum Information About a Microarray Experiment (MIAME) recommendations, the MAGE-ML format for data interchange, and the use of controlled vocabularies or ontologies. The existing software systems for microarray data analysis implement the mentioned standards only partially and are often hard to use and extend. Integration of genomic annotation data and other sources of external knowledge using open standards is therefore a key requirement for future integrated analysis systems. Results: The EMMA 2 software has been designed to resolve shortcomings with respect to full MAGE-ML and ontology support and makes use of modern data integration techniques. We present a software system that features comprehensive data analysis functions for spotted arrays, and for the most common synthesized oligo arrays such as Agilent, Affymetrix and NimbleGen. The system is based on the full MAGE object model. Analysis functionality is based on R and Bioconductor packages and can make use of a compute cluster for distributed services. Conclusion: Our model-driven approach for automatically implementing a full MAGE object model provides high flexibility and compatibility. Data integration via SOAP-based web-services is advantageous in a distributed client-server environment as the collaborative analysis of microarray data is gaining more and more relevance in international research consortia. The adequacy of the EMMA 2 software design and implementation has been proven by its application in many distributed functional genomics projects. Its scalability makes the current architecture suited for extensions towards future transcriptomics methods based on high-throughput sequencing approaches which have much higher computational requirements than microarrays.
Resumo:
With the ever-growing amount of connected sensors (IoT), making sense of sensed data becomes even more important. Pervasive computing is a key enabler for sustainable solutions, prominent examples are smart energy systems and decision support systems. A key feature of pervasive systems is situation awareness which allows a system to thoroughly understand its environment. It is based on external interpretation of data and thus relies on expert knowledge. Due to the distinct nature of situations in different domains and applications, the development of situation aware applications remains a complex process. This thesis is concerned with a general framework for situation awareness which simplifies the development of applications. It is based on the Situation Theory Ontology to provide a foundation for situation modelling which allows knowledge reuse. Concepts of the Situation Theory are mapped to the Context Space Theory which is used for situation reasoning. Situation Spaces in the Context Space are automatically generated with the defined knowledge. For the acquisition of sensor data, the IoT standards O-MI/O-DF are integrated into the framework. These allow a peer-to-peer data exchange between data publisher and the proposed framework and thus a platform independent subscription to sensed data. The framework is then applied for a use case to reduce food waste. The use case validates the applicability of the framework and furthermore serves as a showcase for a pervasive system contributing to the sustainability goals. Leading institutions, e.g. the United Nations, stress the need for a more resource efficient society and acknowledge the capability of ICT systems. The use case scenario is based on a smart neighbourhood in which the system recommends the most efficient use of food items through situation awareness to reduce food waste at consumption stage.