997 resultados para k-mer


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This item provides supplementary materials for the paper mentioned in the title, specifically a range of organisms used in the study. The full abstract for the main paper is as follows: Next Generation Sequencing (NGS) technologies have revolutionised molecular biology, allowing clinical sequencing to become a matter of routine. NGS data sets consist of short sequence reads obtained from the machine, given context and meaning through downstream assembly and annotation. For these techniques to operate successfully, the collected reads must be consistent with the assumed species or species group, and not corrupted in some way. The common bacterium Staphylococcus aureus may cause severe and life-threatening infections in humans,with some strains exhibiting antibiotic resistance. In this paper, we apply an SVM classifier to the important problem of distinguishing S. aureus sequencing projects from alternative pathogens, including closely related Staphylococci. Using a sequence k-mer representation, we achieve precision and recall above 95%, implicating features with important functional associations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Next Generation Sequencing (NGS) has revolutionised molec- ular biology, allowing routine clinical sequencing. NGS data consists of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. The common bacterium Staphylococcus aureus may cause severe and life-threatening infections in humans, with some strains exhibiting antibiotic resistance. Here we apply an SVM classifier to the important problem of distinguishing S. aureus sequencing projects from other pathogens, including closely related Staphylococci. Using a sequence k-mer representation, we achieve precision and recall above 95%, implicating features with important functional associations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background Nicotiana benthamiana is an allo-tetraploid plant, which can be challenging for de novo transcriptome assemblies due to homeologous and duplicated gene copies. Transcripts generated from such genes can be distinct yet highly similar in sequence, with markedly differing expression levels. This can lead to unassembled, partially assembled or mis-assembled contigs. Due to the different properties of de novo assemblers, no one assembler with any one given parameter space can re-assemble all possible transcripts from a transcriptome. Results In an effort to maximise the diversity and completeness of de novo assembled transcripts, we utilised four de novo transcriptome assemblers, TransAbyss, Trinity, SOAPdenovo-Trans, and Oases, using a range of k-mer sizes and different input RNA-seq read counts. We complemented the parameter space biologically by using RNA from 10 plant tissues. We then combined the output of all assemblies into a large super-set of sequences. Using a method from the EvidentialGene pipeline, the combined assembly was reduced from 9.9 million de novo assembled transcripts to about 235,000 of which about 50,000 were classified as primary. Metrics such as average bit-scores, feature response curves and the ability to distinguish paralogous or homeologous transcripts, indicated that the EvidentialGene processed assembly was of high quality. Of 35 RNA silencing gene transcripts, 34 were identified as assembled to full length, whereas in a previous assembly using only one assembler, 9 of these were partially assembled. Conclusions To achieve a high quality transcriptome, it is advantageous to implement and combine the output from as many different de novo assemblers as possible. We have in essence taking the ‘best’ output from each assembler while minimising sequence redundancy. We have also shown that simultaneous assessment of a variety of metrics, not just focused on contig length, is necessary to gauge the quality of assemblies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Viral infections remain a serious global health issue. Metagenomic approaches are increasingly used in the detection of novel viral pathogens but also to generate complete genomes of uncultivated viruses. In silico identification of complete viral genomes from sequence data would allow rapid phylogenetic characterization of these new viruses. Often, however, complete viral genomes are not recovered, but rather several distinct contigs derived from a single entity are, some of which have no sequence homology to any known proteins. De novo assembly of single viruses from a metagenome is challenging, not only because of the lack of a reference genome, but also because of intrapopulation variation and uneven or insufficient coverage. Here we explored different assembly algorithms, remote homology searches, genome-specific sequence motifs, k-mer frequency ranking, and coverage profile binning to detect and obtain viral target genomes from metagenomes. All methods were tested on 454-generated sequencing datasets containing three recently described RNA viruses with a relatively large genome which were divergent to previously known viruses from the viral families Rhabdoviridae and Coronaviridae. Depending on specific characteristics of the target virus and the metagenomic community, different assembly and in silico gap closure strategies were successful in obtaining near complete viral genomes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Le but de ce projet était de développer des méthodes d'assemblage de novo dans le but d'assembler de petits génomes, principalement bactériens, à partir de données de séquençage de nouvelle-génération. Éventuellement, ces méthodes pourraient être appliquées à l'assemblage du génome de StachEndo, une Alpha-Protéobactérie inconnue endosymbiote de l'amibe Stachyamoeba lipophora. Suite à plusieurs analyses préliminaires, il fut observé que l’utilisation de lectures Illumina avec des assembleurs par graphe DeBruijn produisait les meilleurs résultats. Ces expériences ont également montré que les contigs produits à partir de différentes tailles de k-mères étaient complémentaires pour la finition des génomes. L’ajout de longues paires de lectures chevauchantes se montra essentiel pour la finition complète des grandes répétitions génomiques. Ces méthodes permirent d'assembler le génome de StachEndo (1,7 Mb). L'annotation de ce génome permis de montrer que StachEndo possède plusieurs caractéristiques inhabituelles chez les endosymbiotes. StachEndo constitue une espèce d'intérêt pour l'étude du développement endosymbiotique.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

al-Maḥāmid al-thamānīyah / Aḥmad ibn Idrīs. İstanbul, [1307] 1891 -- Tercüme-yi Hidayet üt-talibîn / Abdullah Dihlavi. İstanbul, 1299 [1881 or 1882] -- Şerh-i Kaside-yi şümu-ı lâmi fi beyan-ı etvar-ı sabi / Ahmet Müsellem -- Âsâr-ı aşk / Ömer Ruşeni. İstanbul, 1314/1316 [1899] -- Nüzhet ül-ihvan / Müsellem Efendi. İstanbul, 1310 [1892 or 1893]

Relevância:

60.00% 60.00%

Publicador:

Resumo:

To provide biological insights into transcriptional regulation, a couple of groups have recently presented models relating the promoter DNA-bound transcription factors (TFs) to downstream gene’s mean transcript level or transcript production rates over time. However, transcript production is dynamic in response to changes of TF concentrations over time. Also, TFs are not the only factors binding to promoters; other DNA binding factors (DBFs) bind as well, especially nucleosomes, resulting in competition between DBFs for binding at same genomic location. Additionally, not only TFs, but also some other elements regulate transcription. Within core promoter, various regulatory elements influence RNAPII recruitment, PIC formation, RNAPII searching for TSS, and RNAPII initiating transcription. Moreover, it is proposed that downstream from TSS, nucleosomes resist RNAPII elongation.

Here, we provide a machine learning framework to predict transcript production rates from DNA sequences. We applied this framework in the S. cerevisiae yeast for two scenarios: a) to predict the dynamic transcript production rate during the cell cycle for native promoters; b) to predict the mean transcript production rate over time for synthetic promoters. As far as we know, our framework is the first successful attempt to have a model that can predict dynamic transcript production rates from DNA sequences only: with cell cycle data set, we got Pearson correlation coefficient Cp = 0.751 and coefficient of determination r2 = 0.564 on test set for predicting dynamic transcript production rate over time. Also, for DREAM6 Gene Promoter Expression Prediction challenge, our fitted model outperformed all participant teams, best of all teams, and a model combining best team’s k-mer based sequence features and another paper’s biologically mechanistic features, in terms of all scoring metrics.

Moreover, our framework shows its capability of identifying generalizable fea- tures by interpreting the highly predictive models, and thereby provide support for associated hypothesized mechanisms about transcriptional regulation. With the learned sparse linear models, we got results supporting the following biological insights: a) TFs govern the probability of RNAPII recruitment and initiation possibly through interactions with PIC components and transcription cofactors; b) the core promoter amplifies the transcript production probably by influencing PIC formation, RNAPII recruitment, DNA melting, RNAPII searching for and selecting TSS, releasing RNAPII from general transcription factors, and thereby initiation; c) there is strong transcriptional synergy between TFs and core promoter elements; d) the regulatory elements within core promoter region are more than TATA box and nucleosome free region, suggesting the existence of still unidentified TAF-dependent and cofactor-dependent core promoter elements in yeast S. cerevisiae; e) nucleosome occupancy is helpful for representing +1 and -1 nucleosomes’ regulatory roles on transcription.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Holothurians, belonging to the species Holothuria scabra, have been collected by diver-fishermen and processed for export. There is an active fishery for these animals in Palk Bay and Gulf of Mannar, off the north-western coast of Sri Lanka. The absence of standardized method of grading has led to unwarranted variation in sale prices and sometimes in loss of foreign exchange in Sri Lanka. In this paper an attempt is made to present a method to grade processed beche-de-mer by using the lenght-weight relationship.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An improved method for processing sea cucumber (beche-de-mer) is described. Details of a machine named de-scummer are presented. The traditional method for preparation is discussed, and the new method outlined; this involves burying boiled sea cucumber in clean sand contained in cement pits for periods of 6-8 h. The animals are then transferred to the de-scummer for mechanical treatment and are boiled again. After this they are dried.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The beche-de-mer industry in India is a cent percent export oriented industry being confined to south east coast in Palk Bay and Gulf of Mannar in Tamil Nadu. Chemical quality of 180 trade samples of beche-de-mer of four sizes collected from the beche-de-mer curing centres of Ramanathapuram district was studied. Moisture ranged from 6.2 to 24.4% and sand content from 0.11 to 20.42% for all grades. Mean values of sand content are for grade 1=3.47%, grade 2=4.50%, grade 3=3.68%, grade 4=6.87%. Sodium chloride was almost constant for all grades at 5.7%. TVBN values ranged from 10 to 78.4 mg%. 44 laboratory samples of different grades were prepared following trade practice and examined for chemical quality. Mean moisture values are for grade 1=13.4%, grade 2=12.44%, grade 3=12.62%, grade 4=12.08% and mean values of sand are for grade 1=0.70%, grade 2=0.90%, grade 3=1.16%, grade 4=2.15%. The percentage of shrinkage of the animals ranged from 56% to 60% for dried beche-de-mer of 7.5 cm size and above.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ett större antal andraspråkelever i den svenska skolan ställer nya krav på undervisningen och på lärarna. Svårigheter uppstår när elever ska lära ett nytt språk samtidigt som de ska inhämta kunskaper på det nya språket. Inom matematiken ger de matematiska textuppgifterna eleverna en extra utmaning där ett enda missat ord kan omintetgöra hela förståelsen. Det matematiska språket har sin egen speciella utformning och avviker ifrån vardagligt språk. Det kan därför vara ord i textuppgifterna som är nya och främmande för elever som inte har kommit så långt i sin svenska språkutveckling. Många av de begrepp som vi har i matematiken har dessutom flera betydelser som tex. volym, rymmer, axel, udda och så vidare och när eleverna inte förstår innebörden i texten så vet de inte heller vad de ska göra och kan därför inte göra den matematiska beräkning som uppgiften efterfrågar.Huvudsyftet med denna undersökning är därför att få kunskap om vilka svårigheter som möter andraspråkselever i matematikundervisningen. Svårigheter som lärare i matematik behöver känna till för att kunna ta hänsyn till dessa och på bästa sätt kunna stödja och hjälpa andraspråkelever till en god inlärning med de resurser som finns tillgängliga på skolan idag.Uppsatsen presenterar resultatet av ett språktest med matematiska textuppgifter. Detta resultat kompletteras sedan med intervjuer av lärare, elever och elevernas studiehandledare.Matematikundervisningen behöver utformas med en större kunskap om språk och språkinlärning. För att kunna bilda sig en uppfattning om ett nytt ords betydelse måste ordet få användas i olika sammanhang, ordet måste behövas. Mer muntliga övningar och mer träning i att beskriva tankar och begrepp skulle gynna inte bara andraspråkeleverna utan alla elever i klassen.Lärare i matematik kan känna att de har för lite tid och resurser för att kunna hjälpa andraspråkelever till goda matematikkunskaper. Det saknas även kunskaper om vad som är svårt inom matematiken och i texten till matematiska läsuppgifter. Till en del saknas också kunskaper i vad det innebär att lära sig ett språk och hur det är att studera på ett språk som man inte behärskar. De elever som jag har talat med är nöjda med sin matematikundervisning, sin lärare och sin studiehandledare. Två av dem är entusiastiska och mycket motiverade och har bra resultat. De andra två tycker att matematiken är svår och tråkig och har trots ansträngningar jobbigt med förståelsen. Lärare skulle behöva mer kunskaper om matematikundervisning för andraspråkelever. De behöver också ta ett större övergripande ansvar för varje enskild elevs kunskapsinhämtning. Som det är idag läggs ofta ansvaret för andraspråkeleven på någon annan, det kan vara studiehandledaren, läraren i svenska som andraspråk, modersmålsläraren eller speciallärare. Självklart ska alla dessa lärare samarbeta för att skapa en bra studiesituation för eleven men någon måste ha huvudansvaret. Det mest naturliga är att ansvaret innehas av den ordinarie läraren som känner till de mål och kursplaner som finns för ämnet.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Denna uppsats undersöker koreansk populärmusiks explosiva framfart från slutet av 1990-talet och framåt. Den försöker svara på vad K-pop och dess idoler mer specifikt har representerat och erbjudit fans att konsumera och koppla detta till dess växande popularitet i ett historiskt perspektiv. Vidare försöker uppsatsen svara på varför K-pop gradvis har vuxit markant från slutet av 2000-talet i perifera regioner som Sverige. Uppsatsen analyserar ett urval av idolers representation i olika medier och källor som behandlar hur svenska fans förhåller sig till K-pop. Resultaten visar att idoler historiskt och med tid mer frekvent medverkat i många olika medieformat där de byggt upp en performativ image i relation till den rådande diskursen som en form av skådespeleri. Samtidigt har de idoler som följt kopierat originalkonceptet till det yttersta, vilket då också inbegriper plastkirurgi. Detta har resulterat i en ökning av ofta oskiljbara men visuellt vackra idoler. För många fans så ligger här en stor del av attraktionen; möjligheten att konstruera sin egen avgränsade konception av idoler för att fylla olika behov. För svenska fans så verkar det som att detta likaså är en del av anledning till dess växande popularitet när gränsen mellan det verkliga och fiktiva suddas ut. Det är också en möjlighet att anamma ett annat figurativt universum där exotiska idoler i sin allomfattande roll fritt kan konsumeras inom K-pop diskursen.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Langmuir monolayers and Langmuir-Blodgett (LB) films have been produced from polyaniline and a biphosphinic ruthenium complex, referred to as Rupy. Strong, repulsive interaction between the two components led to a nonlinear change in area per molecule and surface potential with the concentration of Rupy in the mixed film. Molecular interaction was also denoted in the spectroscopic and electrochemical properties of the Y-type LB transferred films. The Raman spectra of mixed PANI-Rupy films indicated that the degree of oxidation of PANI increased linearly with the concentration of Ropy. With PANI being increasingly oxidized by presence of Rupy, the electroactivity of the mixed films decreased with the amount of Rupy, to become undetectable when the mixed LB film is 501 mol in Rupy. The presence of Rupy caused the electrical properties of the mixed LB films to be less sensitive to environmental changes. The electrical capacitance of a mixed film changed only by 15% when the sample was taken from vacuum to air, whereas the change was 215% for a pure PANI LB film.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

cis-Diamminedichloroplatinum(II) (cisplatin) is a widely used anticancer drug that binds to and crosslinks DNA. The major DNA adduct of the drug results from coordination of two adjacent guanine bases to platinum to form the intrastrand crosslink cis-[Pt(NH3)2[d(GpG)-N7(1), -N7(2)]] (cis-Pt-GG). In the present study, spectroscopic and calorimetric techniques were employed to characterize the influence of this crosslink on the conformation, thermal stability, and energetics of a site-specifically platinated 20-mer DNA duplex. CD spectroscopic and thermal denaturation data revealed that the crosslink alters the structure of the host duplex, consistent with a shift from a B-like to an A-like conformation; lowers its thermal stability by approximately 9 degrees C; and reduces its thermodynamic stability by 6.3 kcal/mol at 25 degrees C, most of which is enthalpic in origin; but it does not alter the two-state melting behavior exhibited by the parent, unmodified duplex, despite the significant crosslink-induced changes noted above. The energetic consequences of the cis-Pt-GG crosslink are discussed in relation to the structural perturbations it induces in DNA and to how these crosslink-induced perturbations might modulate protein binding.