819 resultados para Classification error rate
Resumo:
The understanding of the embryogenesis in living systems requires reliable quantitative analysis of the cell migration throughout all the stages of development. This is a major challenge of the "in-toto" reconstruction based on different modalities of "in-vivo" imaging techniques -spatio-temporal resolution and image artifacts and noise. Several methods for cell tracking are available, but expensive manual interaction -time and human resources- is always required to enforce coherence. Because of this limitation it is necessary to restrict the experiments or assume an uncontrolled error rate. Is it possible to obtain automated reliable measurements of migration? can we provide a seed for biologists to complete cell lineages efficiently? We propose a filtering technique that considers trajectories as spatio-temporal connected structures that prunes out those that might introduce noise and false positives by using multi-dimensional morphological operators.
Resumo:
We present two approaches to cluster dialogue-based information obtained by the speech understanding module and the dialogue manager of a spoken dialogue system. The purpose is to estimate a language model related to each cluster, and use them to dynamically modify the model of the speech recognizer at each dialogue turn. In the first approach we build the cluster tree using local decisions based on a Maximum Normalized Mutual Information criterion. In the second one we take global decisions, based on the optimization of the global perplexity of the combination of the cluster-related LMs. Our experiments show a relative reduction of the word error rate of 15.17%, which helps to improve the performance of the understanding and the dialogue manager modules.
Resumo:
This paper presents an automatic strategy to decide how to pronounce a Capital Letter Sequence (CLS) in a Text to Speech system (TTS). If CLS is well known by the TTS, it can be expanded in several words. But when the CLS is unknown, the system has two alternatives: spelling it (abbreviation) or pronouncing it as a new word (acronym). In Spanish, there is a high relationship between letters and phonemes. Because of this, when a CLS is similar to other words in Spanish, there is a high tendency to pronounce it as a standard word. This paper proposes an automatic method for detecting acronyms. Additionaly, this paper analyses the discrimination capability of some features, and several strategies for combining them in order to obtain the best classifier. For the best classifier, the classification error is 8.45%. About the feature analysis, the best features have been the Letter Sequence Perplexity and the Average N-gram order.
Resumo:
En este proyecto realizaremos un estudio del efecto de las interferencias procedentes de las redes públicas y veremos cómo afectan el rendimiento de las comunicaciones GSM-R que están en la banda de frecuencias adyacente, por un lado, definiremos las características de las redes públicas y como afectan los niveles de potencia y los anchos de banda de redes de banda ancha, especialmente LTE que dispone de un ancho de banda adaptativo que puede llegar hasta 20 MHZ, y por otro lado definiremos las características y las exigencias de las comunicaciones GSM-R que es una red privada que se utiliza actualmente para comunicaciones ferroviales. Con el objetivo de determinar el origen y los motivos de estas interferencias vamos a explicar cómo se produzcan las emisiones no deseadas de las redes públicas que son fruto de la intermodulación que se produzca por las características no lineales de los amplificadores, entre las emisiones no deseadas se puede diferenciar entre el dominio de los espurios y el dominio de las emisiones fuera de banda, para determinar el nivel de las emisiones fuera de banda definiremos la relación de fugas del canal adyacente, ACLR, que determina la diferencia entre el pico de la señal deseada y el nivel de señal interferente en la banda de paso. Veremos cómo afectan estas emisiones no deseadas a las comunicaciones GSMR en el caso de interferencias procedentes de señales de banda estrecha, como es el caso de GSM, y como afectan en el caso de emisiones de banda ancha con los protocolos UMTS y LTE, también estudiaremos como varia el rendimiento de la comunicación GSM-R frente a señales LTE de diferentes anchos de banda. Para reducir el impacto de las interferencias sobre los receptores GSM-R, analizaremos el efecto de los filtros de entrada de los receptores GSM-R y veremos cómo varia la BER y la ACLR. Además, con el objetivo de evaluar el rendimiento del receptor GSM-R ante diferentes tipos de interferencias, simularemos dos escenarios donde la red GSM-R se verá afectada por las interferencias procedente de una estación base de red pública, en el primer escenario la distancia entre la BS y MS GSM-R será de 4.6 KM, mientras en el segundo escenario simularemos una situación típica cuando un tren está a una distancia corta (25 m) de la BS de red pública. Finalmente presentaremos los resultados en forma de graficas de BER y ACLR, y tablas indicando los diferentes niveles de interferencias y la diferencia entre la potencia a la que obtenemos un valor óptimo de BER, 10-3, sin interferencia y la potencia a la que obtenemos el mismo valor con interferencias. ABSTRACT In this project we will study the interference effect from public networks and how they affect the performance of GSM-R communications that are in the adjacent frequency band, furthermore, we will define the characteristics of public networks and will explain how the power levels and bandwidth broadband networks are affected as a result, especially LTE with adaptive bandwidth that can reach 20 MHZ. Lastly, we will define the characteristics and requirements of the GSM-R communications, a private network that is currently used for railways communications. In order to determine the origin and motives of these interferences, we will explain what causes unwanted emissions of public networks that occur as a result. The intermodulation, which is caused by the nonlinear characteristics of amplifiers. Unwanted emissions from the transmitter are divided into OOB (out-of-band) emission and spurious emissions. The OOB emissions are defined by an Adjacent Channel Leakage Ratio (ACLR) requirement. We'll analyze the effect of the OOB emission on the GSM-R communication in the case of interference from narrowband signals such as GSM, and how they affect emissions in the case of broadband such as UMTS and LTE; also we will study how performance varies with GSM-R versus LTE signals of different bandwidths. To reduce the impact of interference on the GSM-R receiver, we analyze the effect of input filters GSM-R receivers to see how it affects the BER (Bits Error Rate) and ACLR. To analyze the GSM-R receiver performance in this project, we will simulate two scenarios when the GSM-R will be affected by interference from a base station (BS). In the first case the distance between the public network BS and MS GSM-R is 4.6 KM, while the second case simulates a typical situation when a train is within a short distance, 25 m, of a public network BS. Finally, we will present the results as BER and ACLR graphs, and tables showing different levels of interference and the differences between the power to obtain an optimal value of BER, 10-3, without interference, and the power that gets the same value with interference.
Resumo:
El vertiginoso avance de la informática y las telecomunicaciones en las últimas décadas ha incidido invariablemente en la producción y la prestación de servicios, en la educación, en la industria, en la medicina, en las comunicaciones e inclusive en las relaciones interpersonales. No obstante estos avances, y a pesar de la creciente aportación del software al mundo actual, durante su desarrollo continuamente se incurre en el mismo tipo de problemas que provocan un retraso sistemático en los plazos de entrega, se exceda en presupuesto, se entregue con una alta tasa de errores y su utilidad sea inferior a la esperada. En gran medida, esta problemática es atribuible a defectos en los procesos utilizados para recoger, documentar, acordar y modificar los requisitos del sistema. Los requisitos son los cimientos sobre los cuáles se construye un producto software, y sin embargo, la incapacidad de gestionar sus cambios es una de las principales causas por las que un producto software se entrega fuera de tiempo, se exceda en coste y no cumpla con la calidad esperada por el cliente. El presente trabajo de investigación ha identificado la necesidad de contar con metodologías que ayuden a desplegar un proceso de Gestión de Requisitos en pequeños grupos y entornos de trabajo o en pequeñas y medianas empresas. Para efectos de esta tesis llamaremos Small-Settings a este tipo de organizaciones. El objetivo de este trabajo de tesis doctoral es desarrollar un metamodelo que permita, por un lado, la implementación y despliegue del proceso de Gestión de Requisitos de forma natural y a bajo coste y, por otro lado, el desarrollo de mecanismos para la mejora continua del mismo. Este metamodelo esta soportado por el desarrollo herramientas que permiten mantener una biblioteca de activos de proceso para la Gestión de Requisitos y a su vez contar con plantillas para implementar el proceso partiendo del uso de activos previamente definidos. El metamodelo contempla el desarrollo de prácticas y actividades para guiar, paso a paso, la implementación del proceso de Gestión de Requisitos para una Small-Setting utilizando un modelo de procesos como referencia y una biblioteca de activos de proceso como principal herramienta de apoyo. El mantener los activos de proceso bien organizados, indexados, y fácilmente asequibles, facilita la introducción de las mejores prácticas al interior de una organización. ABSTRACT The fast growth of computer science and telecommunication in recent decades has invariably affected the provision of products and services in education, industry, healthcare, communications and also interpersonal relationships. In spite of such progress and the active role of the software in the world, its development and production continually incurs in the same type of problems that cause systematic delivery delays, over budget, a high error rate and consequently its use is lower than expected. These problems are largely attributed to defects in the processes used to identify, document, organize, and track all system's requirements. It is generally accepted that requirements are the foundation upon which the software process is built, however, the inability to manage changes in requirements is one of the principal factors that contribute to delays on the software development process, which in turn, may cause customer dissatisfaction. The aim of the present research work has identified the need for appropriate methodologies to help on the requirement management process for those organizations that are categorised as small and medium size enterprises, small groups within large companies, or small projects. For the purposes of this work, these organizations are named Small-Settings. The main goal of this research work is to develop a metamodel to manage the requirement process using a Process Asset Library (PAL) and to provide predefined tools and actives to help on the implementation process. The metamodel includes the development of practices and activities to guide step by step the deployment of the requirement management process in Small-Settings. Keeping assets organized, indexed, and readily available are a main factor to the success of the organization process improvement effort and facilitate the introduction of best practices within the organization. The Process Asset Library (PAL) will become a repository of information used to keep and make available all process assets that are useful to those who are defining, implementing, and managing processes in the organization.
Resumo:
Optical communications receivers using wavelet signals processing is proposed in this paper for dense wavelength-division multiplexed (DWDM) systems and modal-division multiplexed (MDM) transmissions. The optical signal-to-noise ratio (OSNR) required to demodulate polarization-division multiplexed quadrature phase shift keying (PDM-QPSK) modulation format is alleviated with the wavelet denoising process. This procedure improves the bit error rate (BER) performance and increasing the transmission distance in DWDM systems. Additionally, the wavelet-based design relies on signal decomposition using time-limited basis functions allowing to reduce the computational cost in Digital-Signal-Processing (DSP) module. Attending to MDM systems, a new scheme of encoding data bits based on wavelets is presented to minimize the mode coupling in few-mode (FWF) and multimode fibers (MMF). The Shifted Prolate Wave Spheroidal (SPWS) functions are proposed to reduce the modal interference.
Resumo:
Differential Phase Shift Keying (DPSK) modulation format has been shown as a robust solution for next-generation optical transmission systems. One key device enabling such systems is the delay interferometer, converting the signal phase information into intensity modulation to be detected by the photodiodes. Usually, Mach-Zehnder interferometer (MZI) is used for demodulating DPSK signals. In this paper, we developed an MZI which is based on all-fiber Multimode Interference (MI) structure: a multimode fiber (MMF) located between two single-mode fibers (SMF) without any transition zones. The standard MZI is not very stable since the two beams go through two different paths before they recombine. In our design the two arms of the MZI are in the same fiber, which will make it less temperature-sensitive than the standard MZI. Performance of such MZI will be analyzed from transmission spectrum. Finally such all-fiber MI-based MZI (MI-MZI) is used to demodulate 10 Gbps DPSK signals. The demodulated signals are analyzed from eye diagram and bit error rate (BER).
Resumo:
This paper describes the application of language translation technologies for generating bus information in Spanish Sign Language (LSE: Lengua de Signos Española). In this work, two main systems have been developed: the first for translating text messages from information panels and the second for translating spoken Spanish into natural conversations at the information point of the bus company. Both systems are made up of a natural language translator (for converting a word sentence into a sequence of LSE signs), and a 3D avatar animation module (for playing back the signs). For the natural language translator, two technological approaches have been analyzed and integrated: an example-based strategy and a statistical translator. When translating spoken utterances, it is also necessary to incorporate a speech recognizer for decoding the spoken utterance into a word sequence, prior to the language translation module. This paper includes a detailed description of the field evaluation carried out in this domain. This evaluation has been carried out at the customer information office in Madrid involving both real bus company employees and deaf people. The evaluation includes objective measurements from the system and information from questionnaires. In the field evaluation, the whole translation presents an SER (Sign Error Rate) of less than 10% and a BLEU greater than 90%.
Resumo:
In this contribution a novel iterative bit- and power allocation (IBPA) approach has been developed when transmitting a given bit/s/Hz data rate over a correlated frequency non-selective (4× 4) Multiple-Input MultipleOutput (MIMO) channel. The iterative resources allocation algorithm developed in this investigation is aimed at the achievement of the minimum bit-error rate (BER) in a correlated MIMO communication system. In order to achieve this goal, the available bits are iteratively allocated in the MIMO active layers which present the minimum transmit power requirement per time slot.
Resumo:
Automated Teller Machines (ATMs) are sensitive self-service systems that require important investments in security and testing. ATM certifications are testing processes for machines that integrate software components from different vendors and are performed before their deployment for public use. This project was originated from the need of optimization of the certification process in an ATM manufacturing company. The process identifies compatibility problems between software components through testing. It is composed by a huge number of manual user tasks that makes the process very expensive and error-prone. Moreover, it is not possible to fully automate the process as it requires human intervention for manipulating ATM peripherals. This project presented important challenges for the development team. First, this is a critical process, as all the ATM operations rely on the software under test. Second, the context of use of ATMs applications is vastly different from ordinary software. Third, ATMs’ useful lifetime is beyond 15 years and both new and old models need to be supported. Fourth, the know-how for efficient testing depends on each specialist and it is not explicitly documented. Fifth, the huge number of tests and their importance implies the need for user efficiency and accuracy. All these factors led us conclude that besides the technical challenges, the usability of the intended software solution was critical for the project success. This business context is the motivation of this Master Thesis project. Our proposal focused in the development process applied. By combining user-centered design (UCD) with agile development we ensured both the high priority of usability and the early mitigation of software development risks caused by all the technology constraints. We performed 23 development iterations and finally we were able to provide a working solution on time according to users’ expectations. The evaluation of the project was carried out through usability tests, where 4 real users participated in different tests in the real context of use. The results were positive, according to different metrics: error rate, efficiency, effectiveness, and user satisfaction. We discuss the problems found, the benefits and the lessons learned in the process. Finally, we measured the expected project benefits by comparing the effort required by the current and the new process (once the new software tool is adopted). The savings corresponded to 40% less effort (man-hours) per certification. Future work includes additional evaluation of product usability in a real scenario (with customers) and the measuring of benefits in terms of quality improvement.
Resumo:
A methodology for developing an advanced communications system for the Deaf in a new domain is presented in this paper. This methodology is a user-centred design approach consisting of four main steps: requirement analysis, parallel corpus generation, technology adaptation to the new domain, and finally, system evaluation. During the requirement analysis, both the user and technical requirements are evaluated and defined. For generating the parallel corpus, it is necessary to collect Spanish sentences in the new domain and translate them into LSE (Lengua de Signos Española: Spanish Sign Language). LSE is represented by glosses and using video recordings. This corpus is used for training the two main modules of the advanced communications system to the new domain: the spoken Spanish into the LSE translation module and the Spanish generation from the LSE module. The main aspects to be generated are the vocabularies for both languages (Spanish words and signs), and the knowledge for translating in both directions. Finally, the field evaluation is carried out with deaf people using the advanced communications system to interact with hearing people in several scenarios. In this evaluation, the paper proposes several objective and subjective measurements for evaluating the performance. In this paper, the new considered domain is about dialogues in a hotel reception. Using this methodology, the system was developed in several months, obtaining very good performance: good translation rates (10% Sign Error Rate) with small processing times, allowing face-to-face dialogues.
Design and Simulation of Deep Nanometer SRAM Cells under Energy, Mismatch, and Radiation Constraints
Resumo:
La fiabilidad está pasando a ser el principal problema de los circuitos integrados según la tecnología desciende por debajo de los 22nm. Pequeñas imperfecciones en la fabricación de los dispositivos dan lugar ahora a importantes diferencias aleatorias en sus características eléctricas, que han de ser tenidas en cuenta durante la fase de diseño. Los nuevos procesos y materiales requeridos para la fabricación de dispositivos de dimensiones tan reducidas están dando lugar a diferentes efectos que resultan finalmente en un incremento del consumo estático, o una mayor vulnerabilidad frente a radiación. Las memorias SRAM son ya la parte más vulnerable de un sistema electrónico, no solo por representar más de la mitad del área de los SoCs y microprocesadores actuales, sino también porque las variaciones de proceso les afectan de forma crítica, donde el fallo de una única célula afecta a la memoria entera. Esta tesis aborda los diferentes retos que presenta el diseño de memorias SRAM en las tecnologías más pequeñas. En un escenario de aumento de la variabilidad, se consideran problemas como el consumo de energía, el diseño teniendo en cuenta efectos de la tecnología a bajo nivel o el endurecimiento frente a radiación. En primer lugar, dado el aumento de la variabilidad de los dispositivos pertenecientes a los nodos tecnológicos más pequeños, así como a la aparición de nuevas fuentes de variabilidad por la inclusión de nuevos dispositivos y la reducción de sus dimensiones, la precisión del modelado de dicha variabilidad es crucial. Se propone en la tesis extender el método de inyectores, que modela la variabilidad a nivel de circuito, abstrayendo sus causas físicas, añadiendo dos nuevas fuentes para modelar la pendiente sub-umbral y el DIBL, de creciente importancia en la tecnología FinFET. Los dos nuevos inyectores propuestos incrementan la exactitud de figuras de mérito a diferentes niveles de abstracción del diseño electrónico: a nivel de transistor, de puerta y de circuito. El error cuadrático medio al simular métricas de estabilidad y prestaciones de células SRAM se reduce un mínimo de 1,5 veces y hasta un máximo de 7,5 a la vez que la estimación de la probabilidad de fallo se mejora en varios ordenes de magnitud. El diseño para bajo consumo es una de las principales aplicaciones actuales dada la creciente importancia de los dispositivos móviles dependientes de baterías. Es igualmente necesario debido a las importantes densidades de potencia en los sistemas actuales, con el fin de reducir su disipación térmica y sus consecuencias en cuanto al envejecimiento. El método tradicional de reducir la tensión de alimentación para reducir el consumo es problemático en el caso de las memorias SRAM dado el creciente impacto de la variabilidad a bajas tensiones. Se propone el diseño de una célula que usa valores negativos en la bit-line para reducir los fallos de escritura según se reduce la tensión de alimentación principal. A pesar de usar una segunda fuente de alimentación para la tensión negativa en la bit-line, el diseño propuesto consigue reducir el consumo hasta en un 20 % comparado con una célula convencional. Una nueva métrica, el hold trip point se ha propuesto para prevenir nuevos tipos de fallo debidos al uso de tensiones negativas, así como un método alternativo para estimar la velocidad de lectura, reduciendo el número de simulaciones necesarias. Según continúa la reducción del tamaño de los dispositivos electrónicos, se incluyen nuevos mecanismos que permiten facilitar el proceso de fabricación, o alcanzar las prestaciones requeridas para cada nueva generación tecnológica. Se puede citar como ejemplo el estrés compresivo o extensivo aplicado a los fins en tecnologías FinFET, que altera la movilidad de los transistores fabricados a partir de dichos fins. Los efectos de estos mecanismos dependen mucho del layout, la posición de unos transistores afecta a los transistores colindantes y pudiendo ser el efecto diferente en diferentes tipos de transistores. Se propone el uso de una célula SRAM complementaria que utiliza dispositivos pMOS en los transistores de paso, así reduciendo la longitud de los fins de los transistores nMOS y alargando los de los pMOS, extendiéndolos a las células vecinas y hasta los límites de la matriz de células. Considerando los efectos del STI y estresores de SiGe, el diseño propuesto mejora los dos tipos de transistores, mejorando las prestaciones de la célula SRAM complementaria en más de un 10% para una misma probabilidad de fallo y un mismo consumo estático, sin que se requiera aumentar el área. Finalmente, la radiación ha sido un problema recurrente en la electrónica para aplicaciones espaciales, pero la reducción de las corrientes y tensiones de los dispositivos actuales los está volviendo vulnerables al ruido generado por radiación, incluso a nivel de suelo. Pese a que tecnologías como SOI o FinFET reducen la cantidad de energía colectada por el circuito durante el impacto de una partícula, las importantes variaciones de proceso en los nodos más pequeños va a afectar su inmunidad frente a la radiación. Se demuestra que los errores inducidos por radiación pueden aumentar hasta en un 40 % en el nodo de 7nm cuando se consideran las variaciones de proceso, comparado con el caso nominal. Este incremento es de una magnitud mayor que la mejora obtenida mediante el diseño de células de memoria específicamente endurecidas frente a radiación, sugiriendo que la reducción de la variabilidad representaría una mayor mejora. ABSTRACT Reliability is becoming the main concern on integrated circuit as the technology goes beyond 22nm. Small imperfections in the device manufacturing result now in important random differences of the devices at electrical level which must be dealt with during the design. New processes and materials, required to allow the fabrication of the extremely short devices, are making new effects appear resulting ultimately on increased static power consumption, or higher vulnerability to radiation SRAMs have become the most vulnerable part of electronic systems, not only they account for more than half of the chip area of nowadays SoCs and microprocessors, but they are critical as soon as different variation sources are regarded, with failures in a single cell making the whole memory fail. This thesis addresses the different challenges that SRAM design has in the smallest technologies. In a common scenario of increasing variability, issues like energy consumption, design aware of the technology and radiation hardening are considered. First, given the increasing magnitude of device variability in the smallest nodes, as well as new sources of variability appearing as a consequence of new devices and shortened lengths, an accurate modeling of the variability is crucial. We propose to extend the injectors method that models variability at circuit level, abstracting its physical sources, to better model sub-threshold slope and drain induced barrier lowering that are gaining importance in FinFET technology. The two new proposed injectors bring an increased accuracy of figures of merit at different abstraction levels of electronic design, at transistor, gate and circuit levels. The mean square error estimating performance and stability metrics of SRAM cells is reduced by at least 1.5 and up to 7.5 while the yield estimation is improved by orders of magnitude. Low power design is a major constraint given the high-growing market of mobile devices that run on battery. It is also relevant because of the increased power densities of nowadays systems, in order to reduce the thermal dissipation and its impact on aging. The traditional approach of reducing the voltage to lower the energy consumption if challenging in the case of SRAMs given the increased impact of process variations at low voltage supplies. We propose a cell design that makes use of negative bit-line write-assist to overcome write failures as the main supply voltage is lowered. Despite using a second power source for the negative bit-line, the design achieves an energy reduction up to 20% compared to a conventional cell. A new metric, the hold trip point has been introduced to deal with new sources of failures to cells using a negative bit-line voltage, as well as an alternative method to estimate cell speed, requiring less simulations. With the continuous reduction of device sizes, new mechanisms need to be included to ease the fabrication process and to meet the performance targets of the successive nodes. As example we can consider the compressive or tensile strains included in FinFET technology, that alter the mobility of the transistors made out of the concerned fins. The effects of these mechanisms are very dependent on the layout, with transistor being affected by their neighbors, and different types of transistors being affected in a different way. We propose to use complementary SRAM cells with pMOS pass-gates in order to reduce the fin length of nMOS devices and achieve long uncut fins for the pMOS devices when the cell is included in its corresponding array. Once Shallow Trench isolation and SiGe stressors are considered the proposed design improves both kinds of transistor, boosting the performance of complementary SRAM cells by more than 10% for a same failure probability and static power consumption, with no area overhead. While radiation has been a traditional concern in space electronics, the small currents and voltages used in the latest nodes are making them more vulnerable to radiation-induced transient noise, even at ground level. Even if SOI or FinFET technologies reduce the amount of energy transferred from the striking particle to the circuit, the important process variation that the smallest nodes will present will affect their radiation hardening capabilities. We demonstrate that process variations can increase the radiation-induced error rate by up to 40% in the 7nm node compared to the nominal case. This increase is higher than the improvement achieved by radiation-hardened cells suggesting that the reduction of process variations would bring a higher improvement.
Resumo:
Pairwise sequence comparison methods have been assessed using proteins whose relationships are known reliably from their structures and functions, as described in the scop database [Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia C. (1995) J. Mol. Biol. 247, 536–540]. The evaluation tested the programs blast [Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990). J. Mol. Biol. 215, 403–410], wu-blast2 [Altschul, S. F. & Gish, W. (1996) Methods Enzymol. 266, 460–480], fasta [Pearson, W. R. & Lipman, D. J. (1988) Proc. Natl. Acad. Sci. USA 85, 2444–2448], and ssearch [Smith, T. F. & Waterman, M. S. (1981) J. Mol. Biol. 147, 195–197] and their scoring schemes. The error rate of all algorithms is greatly reduced by using statistical scores to evaluate matches rather than percentage identity or raw scores. The E-value statistical scores of ssearch and fasta are reliable: the number of false positives found in our tests agrees well with the scores reported. However, the P-values reported by blast and wu-blast2 exaggerate significance by orders of magnitude. ssearch, fasta ktup = 1, and wu-blast2 perform best, and they are capable of detecting almost all relationships between proteins whose sequence identities are >30%. For more distantly related proteins, they do much less well; only one-half of the relationships between proteins with 20–30% identity are found. Because many homologs have low sequence similarity, most distant relationships cannot be detected by any pairwise comparison method; however, those which are identified may be used with confidence.
Resumo:
This report documents the error rate in a commercially distributed subset of the IMAGE Consortium mouse cDNA clone collection. After isolation of plasmid DNA from 1189 bacterial stock cultures, only 62.2% were uncontaminated and contained cDNA inserts that had significant sequence identity to published data for the ordered clones. An agarose gel electrophoresis pre-screening strategy identified 361 stock cultures that appeared to contain two or more plasmid species. Isolation of individual colonies from these stocks demonstrated that 7.1% of the original 1189 stocks contained both a correct and an incorrect plasmid. 5.9% of the original 1189 stocks contained multiple, distinct, incorrect plasmids, indicating the likelihood of multiple contaminating events. While only 739 of the stocks purchased contained the desired cDNA clone, agarose gel pre-screening, colony isolation and similarity searching of dbEST allowed for the identification of an additional 420 clones that would have otherwise been discarded. Considering the high error rate in this subset of the IMAGE cDNA clone set, the use of sequence verified clones for cDNA microarray construction is warranted. When this is not possible, pre-screening non-sequence verified clones with agarose gel electrophoresis provides an inexpensive and efficient method to eliminate contaminated clones from the probe set.
Resumo:
The replication of double-stranded plasmids containing a single adduct was analyzed in vivo by means of a sequence heterology that marks the two DNA strands. The single adduct was located within the sequence heterology, making it possible to distinguish trans-lesion synthesis (TLS) events from damage avoidance events in which replication did not proceed through the lesion. When the SOS system of the host bacteria is not induced, the C8-guanine adduct formed by the carcinogen N-2-acetylaminofluorene (AAF) yields less than 1% of TLS events, showing that replication does not readily proceed through the lesion. In contrast, the deacetylated adduct N-(deoxyguanosin-8-yl)-2-aminofluorene yields approximately 70% of TLS events under both SOS-induced and uninduced conditions. These results for TLS in vivo are in good agreement with the observation that AAF blocks DNA replication in vitro, whereas aminofluorene does so only weakly. Induction of the SOS response causes an increase in TLS events through the AAF adduct (approximately 13%). The increase in TLS is accompanied by a proportional increase in the frequency of AAF-induced frameshift mutations. However, the polymerase frameshift error rate per TLS event was essentially constant throughout the SOS response. In an SOS-induced delta umuD/C strain, both US events and mutagenesis are totally abolished even though there is no decrease in plasmid survival. Error-free replication evidently proceeds efficiently by means of the damage avoidance pathway. We conclude that SOS mutagenesis results from increased TLS rather than from an increased frameshift error rate of the polymerase.