928 resultados para Grouping of data
Resumo:
Worldwide electricity markets have been evolving into regional and even continental scales. The aim at an efficient use of renewable based generation in places where it exceeds the local needs is one of the main reasons. A reference case of this evolution is the European Electricity Market, where countries are connected, and several regional markets were created, each one grouping several countries, and supporting transactions of huge amounts of electrical energy. The continuous transformations electricity markets have been experiencing over the years create the need to use simulation platforms to support operators, regulators, and involved players for understanding and dealing with this complex environment. This paper focuses on demonstrating the advantage that real electricity markets data has for the creation of realistic simulation scenarios, which allow the study of the impacts and implications that electricity markets transformations will bring to the participant countries. A case study using MASCEM (Multi-Agent System for Competitive Electricity Markets) is presented, with a scenario based on real data, simulating the European Electricity Market environment, and comparing its performance when using several different market mechanisms.
Resumo:
The study of motor unit action potential (MUAP) activity from electrornyographic signals is an important stage on neurological investigations that aim to understand the state of the neuromuscular system. In this context, the identification and clustering of MUAPs that exhibit common characteristics, and the assessment of which data features are most relevant for the definition of such cluster structure are central issues. In this paper, we propose the application of an unsupervised Feature Relevance Determination (FRD) method to the analysis of experimental MUAPs obtained from healthy human subjects. In contrast to approaches that require the knowledge of a priori information from the data, this FRD method is embedded on a constrained mixture model, known as Generative Topographic Mapping, which simultaneously performs clustering and visualization of MUAPs. The experimental results of the analysis of a data set consisting of MUAPs measured from the surface of the First Dorsal Interosseous, a hand muscle, indicate that the MUAP features corresponding to the hyperpolarization period in the physisiological process of generation of muscle fibre action potentials are consistently estimated as the most relevant and, therefore, as those that should be paid preferential attention for the interpretation of the MUAP groupings.
Resumo:
High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.
Resumo:
To assess the completeness and reliability of the Information System on Live Births (Sinasc) data. A cross-sectional analysis of the reliability and completeness of Sinasc's data was performed using a sample of Live Birth Certificate (LBC) from 2009, related to births from Campinas, Southeast Brazil. For data analysis, hospitals were grouped according to category of service (Unified National Health System, private or both), 600 LBCs were randomly selected and the data were collected in LBC-copies through mothers and newborns' hospital records and by telephone interviews. The completeness of LBCs was evaluated, calculating the percentage of blank fields, and the LBCs agreement comparing the originals with the copies was evaluated by Kappa and intraclass correlation coefficients. The percentage of completeness of LBCs ranged from 99.8%-100%. For the most items, the agreement was excellent. However, the agreement was acceptable for marital status, maternal education and newborn infants' race/color, low for prenatal visits and presence of birth defects, and very low for the number of deceased children. The results showed that the municipality Sinasc is reliable for most of the studied variables. Investments in training of the professionals are suggested in an attempt to improve system capacity to support planning and implementation of health activities for the benefit of maternal and child population.
Resumo:
We tested the effects of four data characteristics on the results of reserve selection algorithms. The data characteristics were nestedness of features (land types in this case), rarity of features, size variation of sites (potential reserves) and size of data sets (numbers of sites and features). We manipulated data sets to produce three levels, with replication, of each of these data characteristics while holding the other three characteristics constant. We then used an optimizing algorithm and three heuristic algorithms to select sites to solve several reservation problems. We measured efficiency as the number or total area of selected sites, indicating the relative cost of a reserve system. Higher nestedness increased the efficiency of all algorithms (reduced the total cost of new reserves). Higher rarity reduced the efficiency of all algorithms (increased the total cost of new reserves). More variation in site size increased the efficiency of all algorithms expressed in terms of total area of selected sites. We measured the suboptimality of heuristic algorithms as the percentage increase of their results over optimal (minimum possible) results. Suboptimality is a measure of the reliability of heuristics as indicative costing analyses. Higher rarity reduced the suboptimality of heuristics (increased their reliability) and there is some evidence that more size variation did the same for the total area of selected sites. We discuss the implications of these results for the use of reserve selection algorithms as indicative and real-world planning tools.
Resumo:
Wireless medical systems are comprised of four stages, namely the medical device, the data transport, the data collection and the data evaluation stages. Whereas the performance of the first stage is highly regulated, the others are not. This paper concentrates on the data transport stage and argues that it is necessary to establish standardized tests to be used by medical device manufacturers to provide comparable results concerning the communication performance of the wireless networks used to transport medical data. Besides, it suggests test parameters and procedures to be used to produce comparable communication performance results.
Resumo:
In this work, we consider the numerical solution of a large eigenvalue problem resulting from a finite rank discretization of an integral operator. We are interested in computing a few eigenpairs, with an iterative method, so a matrix representation that allows for fast matrix-vector products is required. Hierarchical matrices are appropriate for this setting, and also provide cheap LU decompositions required in the spectral transformation technique. We illustrate the use of freely available software tools to address the problem, in particular SLEPc for the eigensolvers and HLib for the construction of H-matrices. The numerical tests are performed using an astrophysics application. Results show the benefits of the data-sparse representation compared to standard storage schemes, in terms of computational cost as well as memory requirements.
Resumo:
Websites are, nowadays, the face of institutions, but they are often neglected, especially when it comes to contents. In the present paper, we put forth an investigation work whose final goal is the development of a model for the measurement of data quality in institutional websites for health units. To that end, we have carried out a bibliographic review of the available approaches for the evaluation of website content quality, in order to identify the most recurrent dimensions and the attributes, and we are currently carrying out a Delphi Method process, presently in its second stage, with the purpose of reaching an adequate set of attributes for the measurement of content quality.
Resumo:
This article presents a research work, the goal of which was to achieve a model for the evaluation of data quality in institutional websites of health units in a broad and balanced way. We have carried out a literature review of the available approaches for the evaluation of website content quality, in order to identify the most recurrent dimensions and the attributes, and we have also carried out a Delphi method process with experts in order to reach an adequate set of attributes and their respective weights for the measurement of content quality. The results obtained revealed a high level of consensus among the experts who participated in the Delphi process. On the other hand, the different statistical analysis and techniques implemented are robust and attach confidence to our results and consequent model obtained.
Resumo:
Business Intelligence (BI) is one emergent area of the Decision Support Systems (DSS) discipline. Over the last years, the evolution in this area has been considerable. Similarly, in the last years, there has been a huge growth and consolidation of the Data Mining (DM) field. DM is being used with success in BI systems, but a truly DM integration with BI is lacking. Therefore, a lack of an effective usage of DM in BI can be found in some BI systems. An architecture that pretends to conduct to an effective usage of DM in BI is presented.
Resumo:
The aim of this paper is to develop models for experimental open-channel water delivery systems and assess the use of three data-driven modeling tools toward that end. Water delivery canals are nonlinear dynamical systems and thus should be modeled to meet given operational requirements while capturing all relevant dynamics, including transport delays. Typically, the derivation of first principle models for open-channel systems is based on the use of Saint-Venant equations for shallow water, which is a time-consuming task and demands for specific expertise. The present paper proposes and assesses the use of three data-driven modeling tools: artificial neural networks, composite local linear models and fuzzy systems. The canal from Hydraulics and Canal Control Nucleus (A parts per thousand vora University, Portugal) will be used as a benchmark: The models are identified using data collected from the experimental facility, and then their performances are assessed based on suitable validation criterion. The performance of all models is compared among each other and against the experimental data to show the effectiveness of such tools to capture all significant dynamics within the canal system and, therefore, provide accurate nonlinear models that can be used for simulation or control. The models are available upon request to the authors.