722 resultados para Fuzzy Database
Resumo:
Volumes of data used in science and industry are growing rapidly. When researchers face the challenge of analyzing them, their format is often the first obstacle. Lack of standardized ways of exploring different data layouts requires an effort each time to solve the problem from scratch. Possibility to access data in a rich, uniform manner, e.g. using Structured Query Language (SQL) would offer expressiveness and user-friendliness. Comma-separated values (CSV) are one of the most common data storage formats. Despite its simplicity, with growing file size handling it becomes non-trivial. Importing CSVs into existing databases is time-consuming and troublesome, or even impossible if its horizontal dimension reaches thousands of columns. Most databases are optimized for handling large number of rows rather than columns, therefore, performance for datasets with non-typical layouts is often unacceptable. Other challenges include schema creation, updates and repeated data imports. To address the above-mentioned problems, I present a system for accessing very large CSV-based datasets by means of SQL. It's characterized by: "no copy" approach - data stay mostly in the CSV files; "zero configuration" - no need to specify database schema; written in C++, with boost [1], SQLite [2] and Qt [3], doesn't require installation and has very small size; query rewriting, dynamic creation of indices for appropriate columns and static data retrieval directly from CSV files ensure efficient plan execution; effortless support for millions of columns; due to per-value typing, using mixed text/numbers data is easy; very simple network protocol provides efficient interface for MATLAB and reduces implementation time for other languages. The software is available as freeware along with educational videos on its website [4]. It doesn't need any prerequisites to run, as all of the libraries are included in the distribution package. I test it against existing database solutions using a battery of benchmarks and discuss the results.
Resumo:
Selectome (http://selectome.unil.ch/) is a database of positive selection, based on a branch-site likelihood test. This model estimates the number of nonsynonymous substitutions (dN) and synonymous substitutions (dS) to evaluate the variation in selective pressure (dN/dS ratio) over branches and over sites. Since the original release of Selectome, we have benchmarked and implemented a thorough quality control procedure on multiple sequence alignments, aiming to provide minimum false-positive results. We have also improved the computational efficiency of the branch-site test implementation, allowing larger data sets and more frequent updates. Release 6 of Selectome includes all gene trees from Ensembl for Primates and Glires, as well as a large set of vertebrate gene trees. A total of 6810 gene trees have some evidence of positive selection. Finally, the web interface has been improved to be more responsive and to facilitate searches and browsing.
Resumo:
In this article, the objective is to demonstrate the effects of different decision styles on strategic decisions and likewise, on an organization. The technique that was presented in the study is based on the transformation of linguistic variables to numerical value intervals. In this model, the study benefits from fuzzy logic methodology and fuzzy numbers. This fuzzy methodology approach allows us to examine the relations between decision making styles and strategic management processes when there is uncertainty. The purpose is to provide results to companies that may help them to exercise the most appropriate decision making style for its different strategic management processes. The study is leaving more research topics for further studies that may be applied to other decision making areas within the strategic management process.
Resumo:
Information about the genomic coordinates and the sequence of experimentally identified transcription factor binding sites is found scattered under a variety of diverse formats. The availability of standard collections of such high-quality data is important to design, evaluate and improve novel computational approaches to identify binding motifs on promoter sequences from related genes. ABS (http://genome.imim.es/datasets/abs2005/index.html) is a public database of known binding sites identified in promoters of orthologous vertebrate genes that have been manually curated from bibliography. We have annotated 650 experimental binding sites from 68 transcription factors and 100 orthologous target genes in human, mouse, rat or chicken genome sequences. Computational predictions and promoter alignment information are also provided for each entry. A simple and easy-to-use web interface facilitates data retrieval allowing different views of the information. In addition, the release 1.0 of ABS includes a customizable generator of artificial datasets based on the known sites contained in the collection and an evaluation tool to aid during the training and the assessment of motif-finding programs.
Resumo:
The atomic force microscope is not only a very convenient tool for studying the topography of different samples, but it can also be used to measure specific binding forces between molecules. For this purpose, one type of molecule is attached to the tip and the other one to the substrate. Approaching the tip to the substrate allows the molecules to bind together. Retracting the tip breaks the newly formed bond. The rupture of a specific bond appears in the force-distance curves as a spike from which the binding force can be deduced. In this article we present an algorithm to automatically process force-distance curves in order to obtain bond strength histograms. The algorithm is based on a fuzzy logic approach that permits an evaluation of "quality" for every event and makes the detection procedure much faster compared to a manual selection. In this article, the software has been applied to measure the binding strength between tubuline and microtubuline associated proteins.
Resumo:
The main goal of CleanEx is to provide access to public gene expression data via unique gene names. A second objective is to represent heterogeneous expression data produced by different technologies in a way that facilitates joint analysis and cross-data set comparisons. A consistent and up-to-date gene nomenclature is achieved by associating each single experiment with a permanent target identifier consisting of a physical description of the targeted RNA population or the hybridization reagent used. These targets are then mapped at regular intervals to the growing and evolving catalogues of human genes and genes from model organisms. The completely automatic mapping procedure relies partly on external genome information resources such as UniGene and RefSeq. The central part of CleanEx is a weekly built gene index containing cross-references to all public expression data already incorporated into the system. In addition, the expression target database of CleanEx provides gene mapping and quality control information for various types of experimental resource, such as cDNA clones or Affymetrix probe sets. The web-based query interfaces offer access to individual entries via text string searches or quantitative expression criteria. CleanEx is accessible at: http://www.cleanex.isb-sib.ch/.
Resumo:
For well over 100 years, the Working Stress Design (WSD) approach has been the traditional basis for geotechnical design with regard to settlements or failure conditions. However, considerable effort has been put forth over the past couple of decades in relation to the adoption of the Load and Resistance Factor Design (LRFD) approach into geotechnical design. With the goal of producing engineered designs with consistent levels of reliability, the Federal Highway Administration (FHWA) issued a policy memorandum on June 28, 2000, requiring all new bridges initiated after October 1, 2007, to be designed according to the LRFD approach. Likewise, regionally calibrated LRFD resistance factors were permitted by the American Association of State Highway and Transportation Officials (AASHTO) to improve the economy of bridge foundation elements. Thus, projects TR-573, TR-583 and TR-584 were undertaken by a research team at Iowa State University’s Bridge Engineering Center with the goal of developing resistance factors for pile design using available pile static load test data. To accomplish this goal, the available data were first analyzed for reliability and then placed in a newly designed relational database management system termed PIle LOad Tests (PILOT), to which this first volume of the final report for project TR-573 is dedicated. PILOT is an amalgamated, electronic source of information consisting of both static and dynamic data for pile load tests conducted in the State of Iowa. The database, which includes historical data on pile load tests dating back to 1966, is intended for use in the establishment of LRFD resistance factors for design and construction control of driven pile foundations in Iowa. Although a considerable amount of geotechnical and pile load test data is available in literature as well as in various State Department of Transportation files, PILOT is one of the first regional databases to be exclusively used in the development of LRFD resistance factors for the design and construction control of driven pile foundations. Currently providing an electronically organized assimilation of geotechnical and pile load test data for 274 piles of various types (e.g., steel H-shaped, timber, pipe, Monotube, and concrete), PILOT (http://srg.cce.iastate.edu/lrfd/) is on par with such familiar national databases used in the calibration of LRFD resistance factors for pile foundations as the FHWA’s Deep Foundation Load Test Database. By narrowing geographical boundaries while maintaining a high number of pile load tests, PILOT exemplifies a model for effective regional LRFD calibration procedures.
Resumo:
In the context of recent attempts to redefine the 'skin notation' concept, a position paper summarizing an international workshop on the topic stated that the skin notation should be a hazard indicator related to the degree of toxicity and the potential for transdermal exposure of a chemical. Within the framework of developing a web-based tool integrating this concept, we constructed a database of 7101 agents for which a percutaneous permeation constant can be estimated (using molecular weight and octanol-water partition constant), and for which at least one of the following toxicity indices could be retrieved: Inhalation occupational exposure limit (n=644), Oral lethal dose 50 (LD50, n=6708), cutaneous LD50 (n=1801), Oral no observed adverse effect level (NOAEL, n=1600), and cutaneous NOAEL (n=187). Data sources included the Registry of toxic effects of chemical substances (RTECS, MDL information systems, Inc.), PHYSPROP (Syracuse Research Corp.) and safety cards from the International Programme on Chemical Safety (IPCS). A hazard index, which corresponds to the product of exposure duration and skin surface exposed that would yield an internal dose equal to a toxic reference dose was calculated. This presentation provides a descriptive summary of the database, correlations between toxicity indices, and an example of how the web tool will help industrial hygienist decide on the possibility of a dermal risk using the hazard index.
Resumo:
Abstract
Resumo:
Vagueness and high dimensional space data are usual features of current data. The paper is an approach to identify conceptual structures among fuzzy three dimensional data sets in order to get conceptual hierarchy. We propose a fuzzy extension of the Galois connections that allows to demonstrate an isomorphism theorem between fuzzy sets closures which is the basis for generating lattices ordered-sets
Resumo:
This work focuses on the prediction of the two main nitrogenous variables that describe the water quality at the effluent of a Wastewater Treatment Plant. We have developed two kind of Neural Networks architectures based on considering only one output or, in the other hand, the usual five effluent variables that define the water quality: suspended solids, biochemical organic matter, chemical organic matter, total nitrogen and total Kjedhal nitrogen. Two learning techniques based on a classical adaptative gradient and a Kalman filter have been implemented. In order to try to improve generalization and performance we have selected variables by means genetic algorithms and fuzzy systems. The training, testing and validation sets show that the final networks are able to learn enough well the simulated available data specially for the total nitrogen
Resumo:
The Quaternary Active Faults Database of Iberia (QAFI) is an initiative lead by the Institute of Geology and Mines of Spain (IGME) for building a public repository of scientific data regarding faults having documented activity during the last 2.59 Ma (Quaternary). QAFI also addresses a need to transfer geologic knowledge to practitioners of seismic hazard and risk in Iberia by identifying and characterizing seismogenic fault-sources. QAFI is populated by the information freely provided by more than 40 Earth science researchers, storing to date a total of 262 records. In this article we describe the development and evolution of the database, as well as its internal architecture. Aditionally, a first global analysis of the data is provided with a special focus on length and slip-rate fault parameters. Finally, the database completeness and the internal consistency of the data are discussed. Even though QAFI v.2.0 is the most current resource for calculating fault-related seismic hazard in Iberia, the database is still incomplete and requires further review.
Resumo:
BACKGROUND: Several European HIV observational data bases have, over the last decade, accumulated a substantial number of resistance test results and developed large sample repositories, There is a need to link these efforts together, We here describe the development of such a novel tool that allows to bind these data bases together in a distributed fashion for which the control and data remains with the cohorts rather than classic data mergers.METHODS: As proof-of-concept we entered two basic queries into the tool: available resistance tests and available samples. We asked for patients still alive after 1998-01-01, and between 180 and 195 cm of height, and how many samples or resistance tests there would be available for these patients, The queries were uploaded with the tool to a central web server from which each participating cohort downloaded the queries with the tool and ran them against their database, The numbers gathered were then submitted back to the server and we could accumulate the number of available samples and resistance tests.RESULTS: We obtained the following results from the cohorts on available samples/resistance test: EuResist: not availableI11,194; EuroSIDA: 20,71611,992; ICONA: 3,751/500; Rega: 302/302; SHCS: 53,78311,485, In total, 78,552 samples and 15,473 resistance tests were available amongst these five cohorts. Once these data items have been identified, it is trivial to generate lists of relevant samples that would be usefuI for ultra deep sequencing in addition to the already available resistance tests, Saon the tool will include small analysis packages that allow each cohort to pull a report on their cohort profile and also survey emerging resistance trends in their own cohort,CONCLUSIONS: We plan on providing this tool to all cohorts within the Collaborative HIV and Anti-HIV Drug Resistance Network (CHAIN) and will provide the tool free of charge to others for any non-commercial use, The potential of this tool is to ease collaborations, that is, in projects requiring data to speed up identification of novel resistance mutations by increasing the number of observations across multiple cohorts instead of awaiting single cohorts or studies to reach the critical number needed to address such issues.
Resumo:
Diplomityössä on tutkittu reaaliaikaisen toimintolaskennan toteuttamista suomalaisen lasersiruja valmistavan PK-yrityksen tietojärjestelmään. Lisäksi on tarkasteltu toimintolaskennan vaikutuksia operatiiviseen toimintaan sekä toimintojen johtamiseen. Työn kirjallisuusosassa on käsitelty kirjallisuuslähteiden perusteella toimintolaskennan teorioita, laskentamenetelmiä sekä teknisessä toteutuksessa käytettyjä teknologioita. Työn toteutusosassa suunniteltiin ja toteutettiin WWW-pohjainen toimintolaskentajärjestelmä case-yrityksen kustannuslaskennan sekä taloushallinnon avuksi. Työkalu integroitiin osaksi yrityksen toiminnanohjaus- sekä valmistuksenohjausjärjestelmää. Perinteisiin toimintolaskentamallien tiedonkeruujärjestelmiin verrattuna case-yrityksessä syötteet toimintolaskentajärjestelmälle tulevat reaaliaikaisesti osana suurempaa tietojärjestelmäintegraatiota.Diplomityö pyrkii luomaan suhteen toimintolaskennan vaatimusten ja tietokantajärjestelmien välille. Toimintolaskentajärjestelmää yritys voi hyödyntää esimerkiksi tuotteiden hinnoittelussa ja kustannuslaskennassa näkemällä tuotteisiin liittyviä kustannuksia eri näkökulmista. Päätelmiä voidaan tehdä tarkkaan kustannusinformaatioon perustuen sekä määrittää järjestelmän tuottaman datan perusteella, onko tietyn projektin, asiakkuuden tai tuotteen kehittäminen taloudellisesti kannattavaa.