951 resultados para sequence based alignments


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Within this thesis, new approaches for the concepts of peptide-polymer conjugates and peptide-based hybrid nanomaterials are investigated. In the first part, the synthesis of a triblock polymer-peptide-polymer is carried out following a typical peptide coupling reaction, both in solution and on solid-phase. The peptide sequence is chosen, so that it is cleaved by an enzyme preparation of trypsin. End-functionalized polystyrene is used as a model hydrophobic polymer and coupled to the peptide sequence. The results show successful coupling reactions in both methods, while the solid phase method produced a more defined product. Suspensions, consisting of peptide-polymer conjugates particles, are prepared in water by ultrasonication. In contact with the enzyme, the peptide constituting the conjugated particles is cleaved. This demonstrates the enzymatic cleavage in heterophase of enzymatic sequence bond to hydrophobic polymers, and is of great interest for the encapsulation and delivery of hydrophobic molecules.rnA second approach is the preparation of peptide-based hybrid nanocapsules. This is achieved by interfacial polyaddition in inverse miniemulsion with the peptide sequence functionalized with additional amino acids. A method suitable to the use of a peptide sequence for interfacial polyaddition was developed. It is shown that, the polarity of the dispersed phase influences the structures prepared, from particle-like to polymeric shell with a liquid core.rnThe peptide sequence is equipped with a FRET pair (more exactly, an internally-quenched fluorescent system) which allows the real-time monitoring of the enzymatic cleavage of the recognition site. This system shows the successful cleavage of the peptide-based nanocapsules when trypsin preparation is added to the suspensions. A water-soluble fluorescent polymer is efficiently entrapped and its possible use as marker for the capsules is highlighted. Furthermore, a small water-soluble fluorescent dye (SR-101) is successfully encapsulated and the encapsulation efficiency as a function of the functionality of the peptide and the amount of comonomer equivalent (toluene diisocyanate) is studied. The dye is encapsulated at such a high concentration, that self-quenching occurs. Thus, the release of the encapsulated dye triggered by the enzymatic cleavage of the peptide results in a fluorescence recovery of the dye. The fluorescence recovery of the FRET pair in the peptide and of the encapsulated dye correlate well.rnFinally, nanocapsules based on a hepsin-cleavable peptide sequence are prepared. Hepsin is an enzyme, which is highly upregulated in prostate cancer cells. The cleavage of the nanocapsules is investigated with healthy and “cancerous” (hepsin-expressing) cell cultures. The degradation, followed via fluorescence recovery of the FRET system, is faster for the suspensions introduced in the hepsin expressing cell cultures.rnIn summary, this work tackles the domain of responsive nanomaterials for drug delivery from a new perspective. It presents the adaptation of the miniemulsion process for hybrid peptide-based materials, and their successful use in preparing specific enzyme-responsive nanoparticles, with hydrophilic payload release properties.rn

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Staphylococcus carnosus ist ein fakultativ anaerobes Bakterium, das aerobe Atmung, anaerobe Nitratatmung und Gärungsstoffwechsel betreiben kann. Die Expression des Nitratstoffwechsels wird durch das Dreikomponentensystem NreABC reguliert.rnUnter anaeroben Bedingungen besitzt die Sensorhistidinkinase NreB in ihrer PAS-Domäne ein [Fe4S4]2+-Cluster. Das aktive (anaerobe) [Fe4S4]2+-NreB überträgt nach Autophosphorylierung die Phosphorylgruppe auf den Antwortregulator NreC, welcher dann die Expression der Gene der Nitratatmung aktiviert. Nitrat wirkt mit Hilfe des NreA-Proteins auf diese Gene induzierend. Im Rahmen der vorliegenden Arbeit wurde gezeigt, dass NreA ein GAF-Domänen-Protein und ein neuartiger Nitratrezeptor ist.rnDie Natur von NreA als GAF-Domänen-Protein bestätigte sich beim Vergleich der Kristallstruktur mit denen anderer GAF-Domänen. GAF-Domänen sind weit verbreitet und binden typischer Weise kleine Moleküle. Als physiologischer Ligand von NreA zeigte sich Nitrat, das innerhalb einer definierten Bindetasche gebunden wird. NreA bindet vermutlich in dimerer Form an dimeres NreB und inhibiert dadurch die Phosphorylierung der Sensorhistidinkinase NreB. Die Interaktion von NreA mit NreB wurde in vivo durch BACTH-Messungen und sowohl in vivo als auch in vitro durch Cross-Linking Experimente gezeigt. Nitrat reduziert den Ergebnissen nach die Interaktion von NreA mit NreB.rnDurch Sequenzvergleiche von NreA mit Homologen wurden konservierte Aminosäuren identifiziert. Über gerichtete Mutagenese wurden 25 NreA-Varianten hergestellt und bezüglich ihres Verhaltens in Abhängigkeit von Nitrat in narG-lip-Reportergenstudien getestet. Anhand ihres Phänotyps wurden sie als Wildtyp, NreA- und NreABC-Mutanten klassifiziert. Die Nitratbindetasche war in sechs Fällen betroffen. Die Phänotypen der Mutationen in der Peripherie lassen sich mit Auswirkungen auf die vermutete Konformationsänderung oder auf die Interaktion mit NreB erklären. Mutationen von konservierten, oberflächenexponierten Resten führten vermehrt zu NreA/ON-Varianten. Es ließen sich Bereiche auf der Proteinoberfläche identifizieren, die für NreA/NreA- oder NreA/NreB-Interaktionen wichtig sein könnten.rnDie Untersuchungen zeigten, dass NreA mit NreB interagiert und dass dadurch ein NreA/NreB-Sensorkomplex für die gemeinsame Erkennung von Nitrat und Sauerstoff gebildet wird.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Im Verlauf der Forschungsarbeit wurden Proben aus fünf, mit nachwachsenden Rohstoffen (NawaRo) beschickten, landwirtschaftlichen Biogasanlagen (BGA) auf die Biozönose methanogener Archaea hin molekularbiologisch untersucht. Über „amplified rDNA restriction analysis“-Screening (ARDRA) von Bibliotheken auf Basis von 16S rRNA-Genfragmenten konnte anhand zweier beispielhafter BGA das Vorkommen von Vertretern der Gattungen Methanoculleus (Mcu.), Methanobacterium (Mb.), Methanosarcina (Msc.) und Methanosaeta (Mst.) nachgewiesen werden. Mittels denaturierender Gradienten-Gelelektrophorese (DGGE) wurde das Vorkommen dieser Mikroorganismen auch in den übrigen Anlagen gezeigt. Ergänzend dazu wurde in drei Anlagen Methanospirillum hungatei nachgewiesen. Nach Ausarbeitung gattungsspezifischer Isolierungsstrategien konnten insgesamt zehn Vertreter der Gattung Methanobacterium (Isolate Mb1 bis Mb10) und jeweils ein Vertreter der Gattungen Methanoculleus (Isolat Mcu(1)), Methanosarcina (Isolat NieKK) und Methanosaeta (Isolat Mst1.3) aus den BGA-Proben isoliert werden. Durch in silico-Abgleich der partiellen 16S rRNA-Gensequenzen wurden diese als Verwandte von Mb. formicicum MFT, Mcu. bourgensis MS2T, Msc. mazei S-6T und Mst. concilii FE mit einer Sequenzidentität > 97% identifiziert. Im Laufe weiterer molekularbiologischer Untersuchungen mittels DGGE und ARDRA-Analyse konnten die Isolate den Referenzstämmen zugeordnet werden. In Bezug auf die Gattung Methanobacterium ergaben sich jedoch leichte Abweichungen. Diese bestätigten sich in vergleichenden Analysen des genomischen Fingerabdrucks in der „specifically amplified polymorphic DNA“-PCR (SAPD-PCR), welche im Rahmen dieser Arbeit erstmalig erfolgreich auf archaeelle Organismen angewandt wurde. Hier zeigten die Isolate zwei von den Fingerabdrücken der untersuchten Referenzstämme verschiedene Hauptamplifikationsmuster. Aufgrund der Vielzahl der Isolate sowie dem signifikanten Vorkommen in qPCR-Analysen und Klonbibliotheken fokussierten sich die weiteren Arbeiten zur genauen Untersuchung dieser Abweichungen auf phylogenetische Analysen der Gattung Methanobacterium und die Entwicklung von Nachweissystemen. Die Aufklärung eines Großteils der 23S rRNA-Gensequenzen der Isolate und von ausgewählten Typstämmen ermöglichte ergänzende phylogenetische Untersuchungen zu durchgeführten 16S rRNA-Analysen. Dabei wurden die Isolate jeweils in einem eigenen Cluster abseits der meisten Referenzstämme aus der Gattung Methanobacterium positioniert. Analog zur Musterbildung im Rahmen der SAPD-Analyse zeigte sich eine Differenzierung in zwei Äste und ergab in Übereinstimmung mit den in silico-Sequenzabgleichen den höchsten Verwandtschaftsgrad mit Mb. formicicum MFT. Die Eignung der SAPD-PCR zur Ableitung spezifischer Primerpaare konnte erstmals auch für methanogene Archaea gezeigt werden. Die Ableitung zweier Primerpaare mit Spezifität für die Methanobacterium-Isolate Mb1 bis Mb10 sowie für den Typstamm Mb. formicicum MFT gelang und konnte im Rahmen eines Direkt-PCR-Nachweises erfolgreich auf Reinkulturen und Fermenterproben angewandt werden. Unter Einbezug der sequenzierten 23S rRNA-Genfragmente gelang die Erstellung von Oligonukleotid-Sonden für den Einsatz in Fluoreszenz in situ-Hybridisierungsexperimenten. Im Praxistest ergab sich für diese Sonden eine Spezifität für alle getesteten Vertreter der Gattung Methanobacterium sowie für Methanosphaera stadtmanae MCB-3T und Methanobrevibacter smithii PST.rnSomit konnten im Laufe der Arbeit die dominanten methanogenen Archaea in NawaRo-BGA in mehrphasigen Experimenten nachgewiesen, quantifiziert und auf nur wenige Gattungen eingegrenzt werden. Vertreter der vier dominanten Gattungen wurden isoliert und Nachweissysteme für Arten der Gattung Methanobacterium erstellt.rn

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Accurate placement of lesions is crucial for the effectiveness and safety of a retinal laser photocoagulation treatment. Computer assistance provides the capability for improvements to treatment accuracy and execution time. The idea is to use video frames acquired from a scanning digital ophthalmoscope (SDO) to compensate for retinal motion during laser treatment. This paper presents a method for the multimodal registration of the initial frame from an SDO retinal video sequence to a retinal composite image, which may contain a treatment plan. The retinal registration procedure comprises the following steps: 1) detection of vessel centerline points and identification of the optic disc; 2) prealignment of the video frame and the composite image based on optic disc parameters; and 3) iterative matching of the detected vessel centerline points in expanding matching regions. This registration algorithm was designed for the initialization of a real-time registration procedure that registers the subsequent video frames to the composite image. The algorithm demonstrated its capability to register various pairs of SDO video frames and composite images acquired from patients.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To evaluate, in a prospective pilot study, the feasibility of identifying pathogens in urine using real-time polymerase chain reaction (PCR), and to compare the results with the conventional urine culture-based procedures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Diffusely infiltrating gliomas (WHO grade II-IV) are the most common primary brain tumours in adults. These tumours are not amenable to cure by surgery alone, so suitable biomarkers for adjuvant modalities are required to guide therapeutic decision-making. Epigenetic silencing of the O(6)-methylguanine-DNA methyltransferase (MGMT) gene by promoter methylation has been associated with longer survival of patients with high-grade gliomas who receive alkylating chemotherapy; and molecular testing for the methylation status of the MGMT promoter sequence is regarded as among the most relevant of such markers. We have developed a primer extension-based assay adapted to formalin-fixed paraffin-embedded tissues that enables quantitative assessment of the methylation status of the MGMT promoter. The assay is very sensitive, highly reproducible, and provides valid test results in nearly 100% of cases. Our results indicate that oligodendrogliomas, empirically known to have a relatively favourable prognosis, are also the most homogeneous entities in terms of MGMT promoter methylation. Conversely, astrocytomas, which are more prone to spontaneous progression to higher grade malignancy, are significantly more heterogeneous. In addition, we show that the degree of promoter methylation correlates with the prevalence of loss of heterozygosity on chromosome arm 1p in the oligodendroglioma group, but not the astrocytoma group. Our results may have potentially important implications for clinical molecular diagnosis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the last decade, the end-state comfort effect (e.g., Rosenbaum et al., 2006) has received a considerable amount of attention. However, some of the underlying mechanisms are still to be investigated, amongst others, how sequential planning affects end-state comfort and how this effect develops over learning. In a two-step sequencing task, e.g., postural comfort can be planned on the intermediate position (next state) or on the actual end position (final state). It might be hypothesized that, in initial acquisition, next state’s comfort is crucial for action planning but that, in the course of learning, final state’s comfort is taken more and more into account. To test this hypothesis, a variant of Rosenbaum’s vertical stick transportation task was used. Participants (N = 16, right-handed) received extensive practice on a two-step transportation task (10,000 trials over 12 sessions). From the initial position on the middle stair of a staircase in front of the participant, the stick had to be transported either 20 cm upwards and then 40 cm downwards or 20 cm downwards and then 40 cm upwards (N = 8 per subgroup). Participants were supposed to produce fluid movements without changing grasp. In the pre- and posttest, participants were tested on both two-step sequencing tasks as well as on 20 cm single-step upwards and downwards movements (10 trials per condition). For the test trials, grasp height was calculated kinematographically. In the pretest, large end/next/final-state comfort effects for single-step transportation tasks and large next-state comfort effects for sequenced tasks were found. However, no change in grasp height from pre- to posttest could be revealed. Results show that, in vertical stick transportation sequences, the final state is not taken into account when planning grasp height. Instead, action planning seems to be solely based on aspects of the next action goal that is to be reached.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536(T), M. massiliense CIP 108297(T), and M. bolletii CIP 108541(T)) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the clustering of strains. We found 10/120 (8.3%) isolates for which the concatenated MLSA gene sequence and rpoB sequence were discordant (e.g., M. massiliense MLSA sequence and M. abscessus rpoB sequence), suggesting the intergroup lateral transfers of rpoB. In conclusion, our study strongly supports the recent proposal that M. abscessus, M. massiliense, and M. bolletii should constitute a single species. Our findings also indicate that there has been a horizontal transfer of rpoB sequences between these subgroups, precluding the use of rpoB sequencing alone for the accurate identification of the two proposed M. abscessus subspecies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Molecular genetic testing is commonly used to confirm clinical diagnoses of inherited urea cycle disorders (UCDs); however, conventional mutation screenings encompassing only the coding regions of genes may not detect disease-causing mutations occurring in regulatory elements and introns. Microarray-based target enrichment and next-generation sequencing now allow more-comprehensive genetic screening. We applied this approach to UCDs and combined it with the use of DNA bar codes for more cost-effective, parallel analyses of multiple samples.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: In protein sequence classification, identification of the sequence motifs or n-grams that can precisely discriminate between classes is a more interesting scientific question than the classification itself. A number of classification methods aim at accurate classification but fail to explain which sequence features indeed contribute to the accuracy. We hypothesize that sequences in lower denominations (n-grams) can be used to explore the sequence landscape and to identify class-specific motifs that discriminate between classes during classification. Discriminative n-grams are short peptide sequences that are highly frequent in one class but are either minimally present or absent in other classes. In this study, we present a new substitution-based scoring function for identifying discriminative n-grams that are highly specific to a class. Results: We present a scoring function based on discriminative n-grams that can effectively discriminate between classes. The scoring function, initially, harvests the entire set of 4- to 8-grams from the protein sequences of different classes in the dataset. Similar n-grams of the same size are combined to form new n-grams, where the similarity is defined by positive amino acid substitution scores in the BLOSUM62 matrix. Substitution has resulted in a large increase in the number of discriminatory n-grams harvested. Due to the unbalanced nature of the dataset, the frequencies of the n-grams are normalized using a dampening factor, which gives more weightage to the n-grams that appear in fewer classes and vice-versa. After the n-grams are normalized, the scoring function identifies discriminative 4- to 8-grams for each class that are frequent enough to be above a selection threshold. By mapping these discriminative n-grams back to the protein sequences, we obtained contiguous n-grams that represent short class-specific motifs in protein sequences. Our method fared well compared to an existing motif finding method known as Wordspy. We have validated our enriched set of class-specific motifs against the functionally important motifs obtained from the NLSdb, Prosite and ELM databases. We demonstrate that this method is very generic; thus can be widely applied to detect class-specific motifs in many protein sequence classification tasks. Conclusion: The proposed scoring function and methodology is able to identify class-specific motifs using discriminative n-grams derived from the protein sequences. The implementation of amino acid substitution scores for similarity detection, and the dampening factor to normalize the unbalanced datasets have significant effect on the performance of the scoring function. Our multipronged validation tests demonstrate that this method can detect class-specific motifs from a wide variety of protein sequence classes with a potential application to detecting proteome-specific motifs of different organisms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To use a new approach which provides, based on the widely used three-dimensional double-echo steady-state (DESS) sequence, in addition to the morphological information, the generation of biochemical T2 maps in one hybrid sequence.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Three fundamental types of suppressor additives for copper electroplating could be identified by means of potential Transient measurements. These suppressor additives differ in their synergistic and antagonistic interplay with anions that are chemisorbed on the metallic copper surface during electrodeposition. In addition these suppressor chemistries reveal different barrier properties with respect to cupric ions and plating additives (Cl, SPS). While the type-I suppressor selectively forms efficient barriers for copper inter-diffusion on chloride-terminated electrode surfaces we identified a type-II suppressor that interacts non-selectively with any kind of anions chemisorbed on copper (chloride, sulfate, sulfonate). Type-I suppressors are vital for the superconformal copper growth mode in Damascene processing and show an antagonistic interaction with SPS (Bis-Sodium-Sulfopropyl-Disulfide) which involves the deactivation of this suppressor chemistry. This suppressor deactivation is rationalized in terms of compositional changes in the layer of the chemisorbed anions due to the competition of chloride and MPS (Mercaptopropane Sulfonic Acid) for adsorption sites on the metallic copper surface. MPS is the product of the dissociative SPS adsorption within the preexisting chloride matrix on the copper surface. The non-selectivity in the adsorption behavior of the type-II suppressor is rationalized in terms of anion/cation pairing effects of the poly-cationic suppressor and the anion-modified copper substrate. Atomic-scale insights into the competitive Cl/MPS adsorption are gained from in situ STM (Scanning Tunneling Microscopy) using single crystalline copper surfaces as model substrates. Type-III suppressors are a third class of suppressors. In case of type-land type-II suppressor chemistries the resulting steady-state deposition conditions are completely independent on the particular succession of additive adsorption. In contrast to that a strong dependence of the suppressing capabilities on the sequence of additive adsorption ("first comes, first serves" principle) is observed for the type-IIIsuppressor. This behavior:is explained by a suppressor barrier that impedes not only the copper inter-diffusion but also the transport of other additives (e.g. SPS) to the copper surface. (C) 2011 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With a virus such as Human Immunodeficiency Virus (HIV) that has infected millions of people worldwide, and with many unaware that they are infected, it becomes vital to understand how the virus works and how it functions at the molecular level. Because there currently is no vaccine and no way to eradicate the virus from an infected person, any information about how the virus interacts with its host greatly increases the chances of understanding how HIV works and brings scientists one step closer to being able to combat such a destructive virus. Thousands of HIV viruses have been sequenced and are available in many online databases for public use. Attributes that are linked to each sequence include the viral load within the host and how sick the patient is currently. Being able to predict the stage of infection for someone is a valuable resource, as it could potentially aid in treatment options and proper medication use. Our approach of analyzing region-specific amino acid composition for select genes has been able to predict patient disease state up to an accuracy of 85.4%. Moreover, we output a set of classification rules based on the sequence that may prove useful for diagnosing the expected clinical outcome of the infected patient.