838 resultados para text and data mining


Relevância:

100.00% 100.00%

Publicador:

Resumo:

What different forms of engagement do image and text allow the spectator/reader? We know that text and image communicate, and that all communication depends on a relationship between those who communicate. The objective of this text is therefore to understand the new possibilities available to an anthropology of the expression of knowledge that makes use of images, such as photographs and films.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571-578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last years there has been a huge growth and consolidation of the Data Mining field. Some efforts are being done that seek the establishment of standards in the area. Included on these efforts there can be enumerated SEMMA and CRISP-DM. Both grow as industrial standards and define a set of sequential steps that pretends to guide the implementation of data mining applications. The question of the existence of substantial differences between them and the traditional KDD process arose. In this paper, is pretended to establish a parallel between these and the KDD process as well as an understanding of the similarities between them.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The growing importance and influence of new resources connected to the power systems has caused many changes in their operation. Environmental policies and several well know advantages have been made renewable based energy resources largely disseminated. These resources, including Distributed Generation (DG), are being connected to lower voltage levels where Demand Response (DR) must be considered too. These changes increase the complexity of the system operation due to both new operational constraints and amounts of data to be processed. Virtual Power Players (VPP) are entities able to manage these resources. Addressing these issues, this paper proposes a methodology to support VPP actions when these act as a Curtailment Service Provider (CSP) that provides DR capacity to a DR program declared by the Independent System Operator (ISO) or by the VPP itself. The amount of DR capacity that the CSP can assure is determined using data mining techniques applied to a database which is obtained for a large set of operation scenarios. The paper includes a case study based on 27,000 scenarios considering a diversity of distributed resources in a 33 bus distribution network.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper consist in the establishment of a Virtual Producer/Consumer Agent (VPCA) in order to optimize the integrated management of distributed energy resources and to improve and control Demand Side Management DSM) and its aggregated loads. The paper presents the VPCA architecture and the proposed function-based organization to be used in order to coordinate the several generation technologies, the different load types and storage systems. This VPCA organization uses a frame work based on data mining techniques to characterize the costumers. The paper includes results of several experimental tests cases, using real data and taking into account electricity generation resources as well as consumption data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Resumo: Este artigo analisa a relação entre o nível de consciência fonológica, conhecimento das letra e as estratégias utilizadas para ler e escrever, em crianças de cinco anos, ensinadas em catalão. Participaram 69 crianças de três classes diferentes. Cada um dos seus professores utilizava um método diferente de ensino: analítico, sintético ou analítico-sintético. As crianças foram avaliadas no início e no final do ano letivo em: Reconhecimento de letras, segmentação palavra oral, leitura de palavras, leitura de um texto curto e um ditado. Foram realizadas análises de granulação fina em nas respostas das crianças, para identificar estratégias e padrões específicos. A análise qualitativa indica que a capacidade de segmentar uma palavra em sílabas por via oral parece ser suficiente para as crianças começarem a ler de uma forma convencional. Além disso, a consciência fonológica e o conhecimento das letras são usados em formas relativamente diferentes, dependendo do tipo de texto a ser lido. As abordagens de ensino dos professores parecem ter uma influência nos resultados das crianças.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE To analyze lifestyle risk factors related to direct healthcare costs and the indirect costs due to sick leave among workers of an airline company in Brazil. METHODS In this longitudinal 12-month study of 2,201 employees of a Brazilian airline company, the costs of sick leave and healthcare were the primary outcomes of interest. Information on the independent variables, such as gender, age, educational level, type of work, stress, and lifestyle-related factors (body mass index, physical activity, and smoking), was collected using a questionnaire on enrolment in the study. Data on sick leave days were available from the company register, and data on healthcare costs were obtained from insurance records. Multivariate linear regression analysis was used to investigate the association between direct and indirect healthcare costs with sociodemographic, work, and lifestyle-related factors. RESULTS Over the 12-month study period, the average direct healthcare expenditure per worker was US$505.00 and the average indirect cost because of sick leave was US$249.00 per worker. Direct costs were more than twice the indirect costs and both were higher in women. Body mass index was a determinant of direct costs and smoking was a determinant of indirect costs. CONCLUSIONS Obesity and smoking among workers in a Brazilian airline company were associated with increased health costs. Therefore, promoting a healthy diet, physical activity, and anti-tobacco campaigns are important targets for health promotion in this study population.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE To analyze the cases of tuberculosis and the impact of direct follow-up on the assessment of treatment outcomes.METHODS This open prospective cohort study evaluated 504 cases of tuberculosis reported in the Sistema de Informação de Agravos de Notificação (SINAN – Notifiable Diseases Information System) in Juiz de Fora, MG, Southeastern Brazil, between 2008 and 2009. The incidence of treatment outcomes was compared between a group of patients diagnosed with tuberculosis and directly followed up by monthly consultations during return visits (287) and a patient group for which the information was indirectly collected (217) through the city’s surveillance system. The Chi-square test was used to compare the percentages, with a significance level of 0.05. The relative risk (RR) was used to evaluate the differences in the incidence rate of each type of treatment outcome between the two groups.RESULTS Of the outcomes directly and indirectly evaluated, 18.5% and 3.2% corresponded to treatment default and 3.8% and 0.5% corresponded to treatment failure, respectively. The incidence of treatment default and failure was higher in the group with direct follow-up (p < 0.05) (RR = 5.72, 95%CI 2.65;12.34, and RR = 8.31, 95%CI 1.08;63.92, respectively).CONCLUSIONS A higher incidence of treatment default and failure was observed in the directly followed up group, and most of these cases were neglected by the disease reporting system. Therefore, effective measures are needed to improve the control of tuberculosis and data quality.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE To analyze temporal trends of the prevalence of alcohol and tobacco use among Brazilian students. METHODS We analyzed data published between 1989 and 2010 from five epidemiological surveys on students from the 6th to the 12th grade of public schools from the ten largest state capitals of Brazil. The total sample consisted of 104,104 students and data were collected in classrooms. The same collection tool – a World Health Organization self-reporting questionnaire – and sampling and weighting procedures were used in the five surveys. The Chi-square test for trend was used to compare the prevalence from different years. RESULTS The prevalence of alcohol and tobacco use varied among the years and cities studied. Alcohol consumption decreased in the 10 state capitals (p < 0.001) throughout 21 years. Tobacco use also decreased significantly in eight cities (p < 0.001). The highest prevalence of alcohol use was found in the Southeast region in 1993 (72.8%, in Belo Horizonte) and the lowest one in Belem (30.6%) in 2010. The highest past-year prevalence of tobacco use was found in the South region in 1997 (28.0%, in Curitiba) and the lowest one in the Southeast in 2010 (7.8%, in Sao Paulo). CONCLUSIONS The decreasing trend in the prevalence of tobacco and alcohol use among students detected all over the Country can be related to the successful and comprehensive Brazilian antitobacco and antialcohol policies. Despite these results, the past-year prevalence of alcohol consumption in the past year remained high in all Brazilian regions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertation presented at the Faculty of Sciences and Technology of the New University of Lisbon to obtain the degree of Doctor in Electrical Engineering, specialty of Robotics and Integrated Manufacturing

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Electrotécnica e de Computadores

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To meet the increasing demands of the complex inter-organizational processes and the demand for continuous innovation and internationalization, it is evident that new forms of organisation are being adopted, fostering more intensive collaboration processes and sharing of resources, in what can be called collaborative networks (Camarinha-Matos, 2006:03). Information and knowledge are crucial resources in collaborative networks, being their management fundamental processes to optimize. Knowledge organisation and collaboration systems are thus important instruments for the success of collaborative networks of organisations having been researched in the last decade in the areas of computer science, information science, management sciences, terminology and linguistics. Nevertheless, research in this area didn’t give much attention to multilingual contexts of collaboration, which pose specific and challenging problems. It is then clear that access to and representation of knowledge will happen more and more on a multilingual setting which implies the overcoming of difficulties inherent to the presence of multiple languages, through the use of processes like localization of ontologies. Although localization, like other processes that involve multilingualism, is a rather well-developed practice and its methodologies and tools fruitfully employed by the language industry in the development and adaptation of multilingual content, it has not yet been sufficiently explored as an element of support to the development of knowledge representations - in particular ontologies - expressed in more than one language. Multilingual knowledge representation is then an open research area calling for cross-contributions from knowledge engineering, terminology, ontology engineering, cognitive sciences, computational linguistics, natural language processing, and management sciences. This workshop joined researchers interested in multilingual knowledge representation, in a multidisciplinary environment to debate the possibilities of cross-fertilization between knowledge engineering, terminology, ontology engineering, cognitive sciences, computational linguistics, natural language processing, and management sciences applied to contexts where multilingualism continuously creates new and demanding challenges to current knowledge representation methods and techniques. In this workshop six papers dealing with different approaches to multilingual knowledge representation are presented, most of them describing tools, approaches and results obtained in the development of ongoing projects. In the first case, Andrés Domínguez Burgos, Koen Kerremansa and Rita Temmerman present a software module that is part of a workbench for terminological and ontological mining, Termontospider, a wiki crawler that aims at optimally traverse Wikipedia in search of domainspecific texts for extracting terminological and ontological information. The crawler is part of a tool suite for automatically developing multilingual termontological databases, i.e. ontologicallyunderpinned multilingual terminological databases. In this paper the authors describe the basic principles behind the crawler and summarized the research setting in which the tool is currently tested. In the second paper, Fumiko Kano presents a work comparing four feature-based similarity measures derived from cognitive sciences. The purpose of the comparative analysis presented by the author is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain. For that, datasets based on standardized pre-defined feature dimensions and values, which are obtainable from the UNESCO Institute for Statistics (UIS) have been used for the comparative analysis of the similarity measures. The purpose of the comparison is to verify the similarity measures based on the objectively developed datasets. According to the author the results demonstrate that the Bayesian Model of Generalization provides for the most effective cognitive model for identifying the most similar corresponding concepts existing for a targeted socio-cultural community. In another presentation, Thierry Declerck, Hans-Ulrich Krieger and Dagmar Gromann present an ongoing work and propose an approach to automatic extraction of information from multilingual financial Web resources, to provide candidate terms for building ontology elements or instances of ontology concepts. The authors present a complementary approach to the direct localization/translation of ontology labels, by acquiring terminologies through the access and harvesting of multilingual Web presences of structured information providers in the field of finance, leading to both the detection of candidate terms in various multilingual sources in the financial domain that can be used not only as labels of ontology classes and properties but also for the possible generation of (multilingual) domain ontologies themselves. In the next paper, Manuel Silva, António Lucas Soares and Rute Costa claim that despite the availability of tools, resources and techniques aimed at the construction of ontological artifacts, developing a shared conceptualization of a given reality still raises questions about the principles and methods that support the initial phases of conceptualization. These questions become, according to the authors, more complex when the conceptualization occurs in a multilingual setting. To tackle these issues the authors present a collaborative platform – conceptME - where terminological and knowledge representation processes support domain experts throughout a conceptualization framework, allowing the inclusion of multilingual data as a way to promote knowledge sharing and enhance conceptualization and support a multilingual ontology specification. In another presentation Frieda Steurs and Hendrik J. Kockaert present us TermWise, a large project dealing with legal terminology and phraseology for the Belgian public services, i.e. the translation office of the ministry of justice, a project which aims at developing an advanced tool including expert knowledge in the algorithms that extract specialized language from textual data (legal documents) and whose outcome is a knowledge database including Dutch/French equivalents for legal concepts, enriched with the phraseology related to the terms under discussion. Finally, Deborah Grbac, Luca Losito, Andrea Sada and Paolo Sirito report on the preliminary results of a pilot project currently ongoing at UCSC Central Library, where they propose to adapt to subject librarians, employed in large and multilingual Academic Institutions, the model used by translators working within European Union Institutions. The authors are using User Experience (UX) Analysis in order to provide subject librarians with a visual support, by means of “ontology tables” depicting conceptual linking and connections of words with concepts presented according to their semantic and linguistic meaning. The organizers hope that the selection of papers presented here will be of interest to a broad audience, and will be a starting point for further discussion and cooperation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This document presents a tool able to automatically gather data provided by real energy markets and to generate scenarios, capture and improve market players’ profiles and strategies by using knowledge discovery processes in databases supported by artificial intelligence techniques, data mining algorithms and machine learning methods. It provides the means for generating scenarios with different dimensions and characteristics, ensuring the representation of real and adapted markets, and their participating entities. The scenarios generator module enhances the MASCEM (Multi-Agent Simulator of Competitive Electricity Markets) simulator, endowing a more effective tool for decision support. The achievements from the implementation of the proposed module enables researchers and electricity markets’ participating entities to analyze data, create real scenarios and make experiments with them. On the other hand, applying knowledge discovery techniques to real data also allows the improvement of MASCEM agents’ profiles and strategies resulting in a better representation of real market players’ behavior. This work aims to improve the comprehension of electricity markets and the interactions among the involved entities through adequate multi-agent simulation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents several forecasting methodologies based on the application of Artificial Neural Networks (ANN) and Support Vector Machines (SVM), directed to the prediction of the solar radiance intensity. The methodologies differ from each other by using different information in the training of the methods, i.e, different environmental complementary fields such as the wind speed, temperature, and humidity. Additionally, different ways of considering the data series information have been considered. Sensitivity testing has been performed on all methodologies in order to achieve the best parameterizations for the proposed approaches. Results show that the SVM approach using the exponential Radial Basis Function (eRBF) is capable of achieving the best forecasting results, and in half execution time of the ANN based approaches.