958 resultados para Engineering -- Data processing
Resumo:
BACKGROUND: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology. RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads. CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.
Resumo:
Gaia is the most ambitious space astrometry mission currently envisaged and is a technological challenge in all its aspects. We describe a proposal for the payload data handling system of Gaia, as an example of a high-performance, real-time, concurrent, and pipelined data system. This proposal includes the front-end systems for the instrumentation, the data acquisition and management modules, the star data processing modules, and the payload data handling unit. We also review other payload and service module elements and we illustrate a data flux proposal.
Resumo:
The purpose of this project was to investigate the potential for collecting and using data from mobile terrestrial laser scanning (MTLS) technology that would reduce the need for traditional survey methods for the development of highway improvement projects at the Iowa Department of Transportation (Iowa DOT). The primary interest in investigating mobile scanning technology is to minimize the exposure of field surveyors to dangerous high volume traffic situations. Issues investigated were cost, timeframe, accuracy, contracting specifications, data capture extents, data extraction capabilities and data storage issues associated with mobile scanning. The project area selected for evaluation was the I-35/IA 92 interchange in Warren County, Iowa. This project covers approximately one mile of I-35, one mile of IA 92, 4 interchange ramps, and bridges within these limits. Delivered LAS and image files for this project totaled almost 31GB. There is nearly a 6-fold increase in the size of the scan data after post-processing. Camera data, when enabled, produced approximately 900MB of imagery data per mile using a 2- camera, 5 megapixel system. A comparison was done between 1823 points on the pavement that were surveyed by Iowa DOT staff using a total station and the same points generated through the MTLS process. The data acquired through the MTLS and data processing met the Iowa DOT specifications for engineering survey. A list of benefits and challenges is included in the detailed report. With the success of this project, it is anticipate[d] that additional projects will be scanned for the Iowa DOT for use in the development of highway improvement projects.
Resumo:
This work proposes a parallel architecture for a motion estimation algorithm. It is well known that image processing requires a huge amount of computation, mainly at low level processing where the algorithms are dealing with a great numbers of data-pixel. One of the solutions to estimate motions involves detection of the correspondences between two images. Due to its regular processing scheme, parallel implementation of correspondence problem can be an adequate approach to reduce the computation time. This work introduces parallel and real-time implementation of such low-level tasks to be carried out from the moment that the current image is acquired by the camera until the pairs of point-matchings are detected
Resumo:
Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.
Resumo:
DnaSP is a software package for a comprehensive analysis of DNA polymorphism data. Version 5 implements a number of new features and analytical methods allowing extensive DNA polymorphism analyses on large datasets. Among other features, the newly implemented methods allow for: (i) analyses on multiple data files; (ii) haplotype phasing; (iii) analyses on insertion/deletion polymorphism data; (iv) visualizing sliding window results integrated with available genome annotations in the UCSC browser.
Resumo:
Peer-reviewed
Resumo:
Gaia is the most ambitious space astrometry mission currently envisaged and is a technological challenge in all its aspects. We describe a proposal for the payload data handling system of Gaia, as an example of a high-performance, real-time, concurrent, and pipelined data system. This proposal includes the front-end systems for the instrumentation, the data acquisition and management modules, the star data processing modules, and the payload data handling unit. We also review other payload and service module elements and we illustrate a data flux proposal.
Resumo:
Tämän diplomityön päämääränä oli kuvata tilaus-toimitusprosessin eri toimintojen työnkulku, kun tuotetiedonhallintajärjestelmä on osa työympäristöä. Työn teoreettisessa osassa tarkasteltiin liiketoimintaprosessien uudistamista ja prosessien määrittämistä sekä esiteltiin tuotetiedonhallinnan (PDM) keskeiset osa-alueet. Kohdeyrityksen tausta ja strategiat esiteltiin, minkä jälkeen muutoksia arvioitiin suhteessa teoriaosuuden tuloksiin. Nykyisten toimintatapojen määrittämistä varten haastateltiin henkilöitä jokaisesta tilaus-toimitusprosessin vaiheesta tuotantoyksikön sisällä. Lopuksi kuvattiin yrityksen tuotetiedonhallintaperiaatteet ja määritettiin työnkulku prosessin eri vaiheissa. Samalla kuin uusi tuotetiedonhallintajärjestelmä otetaan käyttöön, on yrityksessä omaksuttava tuotetiedonhallinnan ajatusmalli. Tuoterakenteen hallinta jakautuu nyt eri toimintojen kesken, jolloin suunnittelun rakenne, tuotannon rakenne ja huoltorakenne ovat eri ihmisten vastuulla. Näiden eri rakenteiden konfigurointi tilaus-toimitus prosessin aikana määrää missä järjestyksessä toiminnot on suoritettava eri järjestelmien välillä. Monikansallinen suunnitteluorganisaatio on myös otettava huomioon tilauksenkulun aikana. Tuotetiedonhallintajärjestelmää käytetään yhdessä tuttujen suunnitteluohjelmien sekä toiminnanohjausjärjestelmän (ERP) kanssa. Työnkulkukaaviossa määritellään koko yritystä koskeva malli siitä, miten ja missä järjestyksessä tehtävät on suoritettava eri järjestelmissä tilaus-toimitus prosessin aikana. Tässä työssä tutkittiin tuotteen määrittelyn ja suunnittelutiedon hallinnan kannalta oleellisimmat tilaus-toimitusprosessiin kuuluvat toiminnot; myynti, myynnin tuki, tuotannon ohjaus, sovellussuunnittelu ja dokumentointi. Tulevaisuudessa on suositeltavaa pohtia tuotetiedonhallintajärjestelmän käyttöönottoa myös tuotannossa ja ostoissa. Tilaus-toimitusprosessiin liittyvät kehitysmahdollisuudet kannattaisi seuraavaksi kohdistaa tilauksen määrittelyvaiheeseen myyjä-asiakas rajapinnassa, jossa tehdyt virheet kertautuvat jokaisessa prosessin vaiheessa.
Resumo:
Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.
Resumo:
One of the fundamental problems with image processing of petrographic thin sections is that the appearance (colour I intensity) of a mineral grain will vary with the orientation of the crystal lattice to the preferred direction of the polarizing filters on a petrographic microscope. This makes it very difficult to determine grain boundaries, grain orientation and mineral species from a single captured image. To overcome this problem, the Rotating Polarizer Stage was used to replace the fixed polarizer and analyzer on a standard petrographic microscope. The Rotating Polarizer Stage rotates the polarizers while the thin section remains stationary, allowing for better data gathering possibilities. Instead of capturing a single image of a thin section, six composite data sets are created by rotating the polarizers through 900 (or 1800 if quartz c-axes measurements need to be taken) in both plane and cross polarized light. The composite data sets can be viewed as separate images and consist of the average intensity image, the maximum intensity image, the minimum intensity image, the maximum position image, the minimum position image and the gradient image. The overall strategy used by the image processing system is to gather the composite data sets, determine the grain boundaries using the gradient image, classify the different mineral species present using the minimum and maximum intensity images and then perform measurements of grain shape and, where possible, partial crystallographic orientation using the maximum intensity and maximum position images.
Resumo:
Analysis by reduction is a linguistically motivated method for checking correctness of a sentence. It can be modelled by restarting automata. In this paper we propose a method for learning restarting automata which are strictly locally testable (SLT-R-automata). The method is based on the concept of identification in the limit from positive examples only. Also we characterize the class of languages accepted by SLT-R-automata with respect to the Chomsky hierarchy.
Resumo:
Tagungsband - Vorträge vom Automation Symposium 2006
Resumo:
Der Anteil dezentraler eingebetteter Systeme steigt in zahlreichen Andwendungsfeldern, wie der Kfz-Elektronik oder der Anlagenautomatisierung [ScZu03]. Zudem steigen die Anforderungen and die Flexibilität und den Funktionsumfang moderner automatisierungs-technischer Systeme. Der Einsatz agentenorientierter Methoden ist diesbezüglich ein geeigneter Ansatz diesen Anforderungen gerecht zu werden [WGU03]. Mit Agenten können flexible, anpassungsfähige Softwaresysteme entwickelt werden, welche die Verteilung von Informationen, Aufgaben, Ressourcen oder Entscheidungsprozessen der realen Problemstellung im Softwaresystem widerspiegeln. Es ist somit möglich, die gewünschte Flexibilität des Systems, bezüglich der Struktur oder des Verhaltens gezielt zu entwerfen. Nachteilig ist jedoch der Indeterminismus des Verhaltens des Gesamtsystems, der sich aufgrund von schwer vorhersagbaren Interaktionen ergibt [Jen00]. Dem gegenüber stehen statische Softwaresysteme, welche zwar einen hohen Determinismus aufweisen aber wenig flexibel in Bezug auf Änderungen der Struktur des Systems oder des Ablaufs des realen Prozesses sind. Mit der steigenden Komplexität der Systeme ist allerdings selbst mit einem statischen Entwurf die Vorhersagbarkeit immer weniger zu gewährleisten. Die Zahl der möglichen Zustände einer Anlage wird mit der Berücksichtigung von allen möglichen Fehlern, Ausfällen und externen Einflüssen (dynamische Umgebung) so groß, daß diese mit vertretbarem Aufwand kaum noch erfassbar sind und somit auch nicht behandelt werden können. Das von der DFG geförderten Projekt AVE [AVE05], welches in Kooperation mit dem Institut für Automatisierungs- und Softwaretechnik der Universität Stuttgart bearbeitet wird, beschäftigt sich in diesem Kontext mit dem Konflikt, die Vorteile der Flexibilität und Anpassungsfähigkeit von agentenorientierter Software mit den spezifischen Anforderungen der Domäne der Echtzeitsysteme, wie Zeit- und Verlässlichkeitsanforderungen, zu verknüpfen. In einer detaillierten Analyse dieser Anforderungen wurde untersucht, wie die Eigenschaften der Anpassungsfähigkeit und Flexibilität prinzipiell die Anforderungen an Echtzeit- und Verlässlichkeitseigenschaften beeinflussen und wie umgekehrt Anforderungen an Echtzeit- und Verlässlichkeitseigenschaften die Anpassungsfähigkeit und Flexibilität beschränken können. Aufbauend auf diesen Erkenntnissen werden Methoden und Konzepte für den Entwurf und die Implementierung von Agentensystemen auf gängiger Automatisierungshardware, insbesondere Speicher Programmierbare Steuerungen (SPS), entwickelt. In diesem Rahmen wird ein Konzept für die Modellierung von Sicherheit in Agentensystemen vorgestellt, welches insbesondere den modularen Charakter von Agenten berücksichtigt. Kernaspekt ist es, dem Entwickler einen Rahmen vorzugeben, der ihn dabei unterstützt ein möglichst lückenloses Sicherheitskonzept zu erstellen und ihm dabei genug Freiheiten lässt den Aufwand für die Strategien zur Fehlererkennung, Fehlerdiagnose und Fehlerbehandlung je nach Anforderung für jedes Modul individuell festzulegen. Desweiteren ist besonderer Wert darauf gelegt worden, dass die verwendeten Darstellungen und Diagramme aus der Domäne stammen und eine gute Vorlage für die spätere Implementierung auf automatisierungstechnischer Hardware bieten.
Resumo:
Data mining means to summarize information from large amounts of raw data. It is one of the key technologies in many areas of economy, science, administration and the internet. In this report we introduce an approach for utilizing evolutionary algorithms to breed fuzzy classifier systems. This approach was exercised as part of a structured procedure by the students Achler, Göb and Voigtmann as contribution to the 2006 Data-Mining-Cup contest, yielding encouragingly positive results.