924 resultados para Computational methods


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The goal of this thesis work is to develop a computational method based on machine learning techniques for predicting disulfide-bonding states of cysteine residues in proteins, which is a sub-problem of a bigger and yet unsolved problem of protein structure prediction. Improvement in the prediction of disulfide bonding states of cysteine residues will help in putting a constraint in the three dimensional (3D) space of the respective protein structure, and thus will eventually help in the prediction of 3D structure of proteins. Results of this work will have direct implications in site-directed mutational studies of proteins, proteins engineering and the problem of protein folding. We have used a combination of Artificial Neural Network (ANN) and Hidden Markov Model (HMM), the so-called Hidden Neural Network (HNN) as a machine learning technique to develop our prediction method. By using different global and local features of proteins (specifically profiles, parity of cysteine residues, average cysteine conservation, correlated mutation, sub-cellular localization, and signal peptide) as inputs and considering Eukaryotes and Prokaryotes separately we have reached to a remarkable accuracy of 94% on cysteine basis for both Eukaryotic and Prokaryotic datasets, and an accuracy of 90% and 93% on protein basis for Eukaryotic dataset and Prokaryotic dataset respectively. These accuracies are best so far ever reached by any existing prediction methods, and thus our prediction method has outperformed all the previously developed approaches and therefore is more reliable. Most interesting part of this thesis work is the differences in the prediction performances of Eukaryotes and Prokaryotes at the basic level of input coding when ‘profile’ information was given as input to our prediction method. And one of the reasons for this we discover is the difference in the amino acid composition of the local environment of bonded and free cysteine residues in Eukaryotes and Prokaryotes. Eukaryotic bonded cysteine examples have a ‘symmetric-cysteine-rich’ environment, where as Prokaryotic bonded examples lack it.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Here I will focus on three main topics that best address and include the projects I have been working in during my three year PhD period that I have spent in different research laboratories addressing both computationally and practically important problems all related to modern molecular genomics. The first topic is the use of livestock species (pigs) as a model of obesity, a complex human dysfunction. My efforts here concern the detection and annotation of Single Nucleotide Polymorphisms. I developed a pipeline for mining human and porcine sequences. Starting from a set of human genes related with obesity the platform returns a list of annotated porcine SNPs extracted from a new set of potential obesity-genes. 565 of these SNPs were analyzed on an Illumina chip to test the involvement in obesity on a population composed by more than 500 pigs. Results will be discussed. All the computational analysis and experiments were done in collaboration with the Biocomputing group and Dr.Luca Fontanesi, respectively, under the direction of prof. Rita Casadio at the Bologna University, Italy. The second topic concerns developing a methodology, based on Factor Analysis, to simultaneously mine information from different levels of biological organization. With specific test cases we develop models of the complexity of the mRNA-miRNA molecular interaction in brain tumors measured indirectly by microarray and quantitative PCR. This work was done under the supervision of Prof. Christine Nardini, at the “CAS-MPG Partner Institute for Computational Biology” of Shangai, China (co-founded by the Max Planck Society and the Chinese Academy of Sciences jointly) The third topic concerns the development of a new method to overcome the variety of PCR technologies routinely adopted to characterize unknown flanking DNA regions of a viral integration locus of the human genome after clinical gene therapy. This new method is entirely based on next generation sequencing and it reduces the time required to detect insertion sites, decreasing the complexity of the procedure. This work was done in collaboration with the group of Dr. Manfred Schmidt at the Nationales Centrum für Tumorerkrankungen (Heidelberg, Germany) supervised by Dr. Annette Deichmann and Dr. Ali Nowrouzi. Furthermore I add as an Appendix the description of a R package for gene network reconstruction that I helped to develop for scientific usage (http://www.bioconductor.org/help/bioc-views/release/bioc/html/BUS.html).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The cellular basis of cardiac pacemaking activity, and specifically the quantitative contributions of particular mechanisms, is still debated. Reliable computational models of sinoatrial nodal (SAN) cells may provide mechanistic insights, but competing models are built from different data sets and with different underlying assumptions. To understand quantitative differences between alternative models, we performed thorough parameter sensitivity analyses of the SAN models of Maltsev & Lakatta (2009) and Severi et al (2012). Model parameters were randomized to generate a population of cell models with different properties, simulations performed with each set of random parameters generated 14 quantitative outputs that characterized cellular activity, and regression methods were used to analyze the population behavior. Clear differences between the two models were observed at every step of the analysis. Specifically: (1) SR Ca2+ pump activity had a greater effect on SAN cell cycle length (CL) in the Maltsev model; (2) conversely, parameters describing the funny current (If) had a greater effect on CL in the Severi model; (3) changes in rapid delayed rectifier conductance (GKr) had opposite effects on action potential amplitude in the two models; (4) within the population, a greater percentage of model cells failed to exhibit action potentials in the Maltsev model (27%) compared with the Severi model (7%), implying greater robustness in the latter; (5) confirming this initial impression, bifurcation analyses indicated that smaller relative changes in GKr or Na+-K+ pump activity led to failed action potentials in the Maltsev model. Overall, the results suggest experimental tests that can distinguish between models and alternative hypotheses, and the analysis offers strategies for developing anti-arrhythmic pharmaceuticals by predicting their effect on the pacemaking activity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Different types of proteins exist with diverse functions that are essential for living organisms. An important class of proteins is represented by transmembrane proteins which are specifically designed to be inserted into biological membranes and devised to perform very important functions in the cell such as cell communication and active transport across the membrane. Transmembrane β-barrels (TMBBs) are a sub-class of membrane proteins largely under-represented in structure databases because of the extreme difficulty in experimental structure determination. For this reason, computational tools that are able to predict the structure of TMBBs are needed. In this thesis, two computational problems related to TMBBs were addressed: the detection of TMBBs in large datasets of proteins and the prediction of the topology of TMBB proteins. Firstly, a method for TMBB detection was presented based on a novel neural network framework for variable-length sequence classification. The proposed approach was validated on a non-redundant dataset of proteins. Furthermore, we carried-out genome-wide detection using the entire Escherichia coli proteome. In both experiments, the method significantly outperformed other existing state-of-the-art approaches, reaching very high PPV (92%) and MCC (0.82). Secondly, a method was also introduced for TMBB topology prediction. The proposed approach is based on grammatical modelling and probabilistic discriminative models for sequence data labeling. The method was evaluated using a newly generated dataset of 38 TMBB proteins obtained from high-resolution data in the PDB. Results have shown that the model is able to correctly predict topologies of 25 out of 38 protein chains in the dataset. When tested on previously released datasets, the performances of the proposed approach were measured as comparable or superior to the current state-of-the-art of TMBB topology prediction.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis studies molecular dynamics simulations on two levels of resolution: the detailed level of atomistic simulations, where the motion of explicit atoms in a many-particle system is considered, and the coarse-grained level, where the motion of superatoms composed of up to 10 atoms is modeled. While atomistic models are capable of describing material specific effects on small scales, the time and length scales they can cover are limited due to their computational costs. Polymer systems are typically characterized by effects on a broad range of length and time scales. Therefore it is often impossible to atomistically simulate processes, which determine macroscopic properties in polymer systems. Coarse-grained (CG) simulations extend the range of accessible time and length scales by three to four orders of magnitude. However, no standardized coarse-graining procedure has been established yet. Following the ideas of structure-based coarse-graining, a coarse-grained model for polystyrene is presented. Structure-based methods parameterize CG models to reproduce static properties of atomistic melts such as radial distribution functions between superatoms or other probability distributions for coarse-grained degrees of freedom. Two enhancements of the coarse-graining methodology are suggested. Correlations between local degrees of freedom are implicitly taken into account by additional potentials acting between neighboring superatoms in the polymer chain. This improves the reproduction of local chain conformations and allows the study of different tacticities of polystyrene. It also gives better control of the chain stiffness, which agrees perfectly with the atomistic model, and leads to a reproduction of experimental results for overall chain dimensions, such as the characteristic ratio, for all different tacticities. The second new aspect is the computationally cheap development of nonbonded CG potentials based on the sampling of pairs of oligomers in vacuum. Static properties of polymer melts are obtained as predictions of the CG model in contrast to other structure-based CG models, which are iteratively refined to reproduce reference melt structures. The dynamics of simulations at the two levels of resolution are compared. The time scales of dynamical processes in atomistic and coarse-grained simulations can be connected by a time scaling factor, which depends on several specific system properties as molecular weight, density, temperature, and other components in mixtures. In this thesis the influence of molecular weight in systems of oligomers and the situation in two-component mixtures is studied. For a system of small additives in a melt of long polymer chains the temperature dependence of the additive diffusion is predicted and compared to experiments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Theories and numerical modeling are fundamental tools for understanding, optimizing and designing present and future laser-plasma accelerators (LPAs). Laser evolution and plasma wave excitation in a LPA driven by a weakly relativistically intense, short-pulse laser propagating in a preformed parabolic plasma channel, is studied analytically in 3D including the effects of pulse steepening and energy depletion. At higher laser intensities, the process of electron self-injection in the nonlinear bubble wake regime is studied by means of fully self-consistent Particle-in-Cell simulations. Considering a non-evolving laser driver propagating with a prescribed velocity, the geometrical properties of the non-evolving bubble wake are studied. For a range of parameters of interest for laser plasma acceleration, The dependence of the threshold for self-injection in the non-evolving wake on laser intensity and wake velocity is characterized. Due to the nonlinear and complex nature of the Physics involved, computationally challenging numerical simulations are required to model laser-plasma accelerators operating at relativistic laser intensities. The numerical and computational optimizations, that combined in the codes INF&RNO and INF&RNO/quasi-static give the possibility to accurately model multi-GeV laser wakefield acceleration stages with present supercomputing architectures, are discussed. The PIC code jasmine, capable of efficiently running laser-plasma simulations on Graphics Processing Units (GPUs) clusters, is presented. GPUs deliver exceptional performance to PIC codes, but the core algorithms had to be redesigned for satisfying the constraints imposed by the intrinsic parallelism of the architecture. The simulation campaigns, run with the code jasmine for modeling the recent LPA experiments with the INFN-FLAME and CNR-ILIL laser systems, are also presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Die Flachwassergleichungen (SWE) sind ein hyperbolisches System von Bilanzgleichungen, die adäquate Approximationen an groß-skalige Strömungen der Ozeane, Flüsse und der Atmosphäre liefern. Dabei werden Masse und Impuls erhalten. Wir unterscheiden zwei charakteristische Geschwindigkeiten: die Advektionsgeschwindigkeit, d.h. die Geschwindigkeit des Massentransports, und die Geschwindigkeit von Schwerewellen, d.h. die Geschwindigkeit der Oberflächenwellen, die Energie und Impuls tragen. Die Froude-Zahl ist eine Kennzahl und ist durch das Verhältnis der Referenzadvektionsgeschwindigkeit zu der Referenzgeschwindigkeit der Schwerewellen gegeben. Für die oben genannten Anwendungen ist sie typischerweise sehr klein, z.B. 0.01. Zeit-explizite Finite-Volume-Verfahren werden am öftersten zur numerischen Berechnung hyperbolischer Bilanzgleichungen benutzt. Daher muss die CFL-Stabilitätsbedingung eingehalten werden und das Zeitinkrement ist ungefähr proportional zu der Froude-Zahl. Deswegen entsteht bei kleinen Froude-Zahlen, etwa kleiner als 0.2, ein hoher Rechenaufwand. Ferner sind die numerischen Lösungen dissipativ. Es ist allgemein bekannt, dass die Lösungen der SWE gegen die Lösungen der Seegleichungen/ Froude-Zahl Null SWE für Froude-Zahl gegen Null konvergieren, falls adäquate Bedingungen erfüllt sind. In diesem Grenzwertprozess ändern die Gleichungen ihren Typ von hyperbolisch zu hyperbolisch.-elliptisch. Ferner kann bei kleinen Froude-Zahlen die Konvergenzordnung sinken oder das numerische Verfahren zusammenbrechen. Insbesondere wurde bei zeit-expliziten Verfahren falsches asymptotisches Verhalten (bzgl. der Froude-Zahl) beobachtet, das diese Effekte verursachen könnte.Ozeanographische und atmosphärische Strömungen sind typischerweise kleine Störungen eines unterliegenden Equilibriumzustandes. Wir möchten, dass numerische Verfahren für Bilanzgleichungen gewisse Equilibriumzustände exakt erhalten, sonst können künstliche Strömungen vom Verfahren erzeugt werden. Daher ist die Quelltermapproximation essentiell. Numerische Verfahren die Equilibriumzustände erhalten heißen ausbalanciert.rnrnIn der vorliegenden Arbeit spalten wir die SWE in einen steifen, linearen und einen nicht-steifen Teil, um die starke Einschränkung der Zeitschritte durch die CFL-Bedingung zu umgehen. Der steife Teil wird implizit und der nicht-steife explizit approximiert. Dazu verwenden wir IMEX (implicit-explicit) Runge-Kutta und IMEX Mehrschritt-Zeitdiskretisierungen. Die Raumdiskretisierung erfolgt mittels der Finite-Volumen-Methode. Der steife Teil wird mit Hilfe von finiter Differenzen oder au eine acht mehrdimensional Art und Weise approximniert. Zur mehrdimensionalen Approximation verwenden wir approximative Evolutionsoperatoren, die alle unendlich viele Informationsausbreitungsrichtungen berücksichtigen. Die expliziten Terme werden mit gewöhnlichen numerischen Flüssen approximiert. Daher erhalten wir eine Stabilitätsbedingung analog zu einer rein advektiven Strömung, d.h. das Zeitinkrement vergrößert um den Faktor Kehrwert der Froude-Zahl. Die in dieser Arbeit hergeleiteten Verfahren sind asymptotisch erhaltend und ausbalanciert. Die asymptotischer Erhaltung stellt sicher, dass numerische Lösung das "korrekte" asymptotische Verhalten bezüglich kleiner Froude-Zahlen besitzt. Wir präsentieren Verfahren erster und zweiter Ordnung. Numerische Resultate bestätigen die Konvergenzordnung, so wie Stabilität, Ausbalanciertheit und die asymptotische Erhaltung. Insbesondere beobachten wir bei machen Verfahren, dass die Konvergenzordnung fast unabhängig von der Froude-Zahl ist.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We have investigated the thermodynamics of sulfuric acid dimer hydration using ab initio quantum mechanical methods. For (H2SO4)2(H2O)n where n = 0−6, we employed high-level ab initio calculations to locate the most stable minima for each cluster size. The results presented herein yield a detailed understanding of the first deprotonation of sulfuric acid as a function of temperature for a system consisting of two sulfuric acid molecules and up to six waters. At 0 K, a cluster of two sulfuric acid molecules and one water remains undissociated. Addition of a second water begins the deprotonation of the first sulfuric acid leading to the di-ionic species (the bisulfate anion HSO4−, the hydronium cation H3O+, an undissociated sulfuric acid molecule, and a water). Upon the addition of a third water molecule, the second sulfuric acid molecule begins to dissociate. For the (H2SO4)2(H2O)3 cluster, the di-ionic cluster is a few kcal mol−1 more stable than the neutral cluster, which is just slightly more stable than the tetra-ionic cluster (two bisulfate anions, two hydronium cations, and one water). With four water molecules, the tetra-ionic cluster, (HSO4−)2(H3O+)2(H2O)2, becomes as favorable as the di-ionic cluster H2SO4(HSO4−)(H3O+)(H2O)3 at 0 K. Increasing the temperature favors the undissociated clusters, and at room temperature we predict that the di-ionic species is slightly more favorable than the neutral cluster once three waters have been added to the cluster. The tetra-ionic species competes with the di-ionic species once five waters have been added to the cluster. The thermodynamics of stepwise hydration of sulfuric acid dimer is similar to that of the monomer; it is favorable up to n = 4−5 at 298 K. A much more thermodynamically favorable pathway forming sulfuric acid dimer hydrates is through the combination of sulfuric acid monomer hydrates, but the low concentration of sulfuric acid relative to water vapor at ambient conditions limits that process.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dynamic core-shell nanoparticles have received increasing attention in recent years. This paper presents a detailed study of Au-Hg nanoalloys, whose composing elements show a large difference in cohesive energy. A simple method to prepare Au@Hg particles with precise control over the composition up to 15 atom% mercury is introduced, based on reacting a citrate stabilized gold sol with elemental mercury. Transmission electron microscopy shows an increase of particle size with increasing mercury content and, together with X-ray powder diffraction, points towards the presence of a core-shell structure with a gold core surrounded by an Au-Hg solid solution layer. The amalgamation process is described by pseudo-zero-order reaction kinetics, which indicates slow dissolution of mercury in water as the rate determining step, followed by fast scavenging by nanoparticles in solution. Once adsorbed at the surface, slow diffusion of Hg into the particle lattice occurs, to a depth of ca. 3 nm, independent of Hg concentration. Discrete dipole approximation calculations relate the UV-vis spectra to the microscopic details of the nanoalloy structure. Segregation energies and metal distribution in the nanoalloys were modeled by density functional theory calculations. The results indicate slow metal interdiffusion at the nanoscale, which has important implications for synthetic methods aimed at core-shell particles.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We derive a new class of iterative schemes for accelerating the convergence of the EM algorithm, by exploiting the connection between fixed point iterations and extrapolation methods. First, we present a general formulation of one-step iterative schemes, which are obtained by cycling with the extrapolation methods. We, then square the one-step schemes to obtain the new class of methods, which we call SQUAREM. Squaring a one-step iterative scheme is simply applying it twice within each cycle of the extrapolation method. Here we focus on the first order or rank-one extrapolation methods for two reasons, (1) simplicity, and (2) computational efficiency. In particular, we study two first order extrapolation methods, the reduced rank extrapolation (RRE1) and minimal polynomial extrapolation (MPE1). The convergence of the new schemes, both one-step and squared, is non-monotonic with respect to the residual norm. The first order one-step and SQUAREM schemes are linearly convergent, like the EM algorithm but they have a faster rate of convergence. We demonstrate, through five different examples, the effectiveness of the first order SQUAREM schemes, SqRRE1 and SqMPE1, in accelerating the EM algorithm. The SQUAREM schemes are also shown to be vastly superior to their one-step counterparts, RRE1 and MPE1, in terms of computational efficiency. The proposed extrapolation schemes can fail due to the numerical problems of stagnation and near breakdown. We have developed a new hybrid iterative scheme that combines the RRE1 and MPE1 schemes in such a manner that it overcomes both stagnation and near breakdown. The squared first order hybrid scheme, SqHyb1, emerges as the iterative scheme of choice based on our numerical experiments. It combines the fast convergence of the SqMPE1, while avoiding near breakdowns, with the stability of SqRRE1, while avoiding stagnations. The SQUAREM methods can be incorporated very easily into an existing EM algorithm. They only require the basic EM step for their implementation and do not require any other auxiliary quantities such as the complete data log likelihood, and its gradient or hessian. They are an attractive option in problems with a very large number of parameters, and in problems where the statistical model is complex, the EM algorithm is slow and each EM step is computationally demanding.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This article gives an overview over the methods used in the low--level analysis of gene expression data generated using DNA microarrays. This type of experiment allows to determine relative levels of nucleic acid abundance in a set of tissues or cell populations for thousands of transcripts or loci simultaneously. Careful statistical design and analysis are essential to improve the efficiency and reliability of microarray experiments throughout the data acquisition and analysis process. This includes the design of probes, the experimental design, the image analysis of microarray scanned images, the normalization of fluorescence intensities, the assessment of the quality of microarray data and incorporation of quality information in subsequent analyses, the combination of information across arrays and across sets of experiments, the discovery and recognition of patterns in expression at the single gene and multiple gene levels, and the assessment of significance of these findings, considering the fact that there is a lot of noise and thus random features in the data. For all of these components, access to a flexible and efficient statistical computing environment is an essential aspect.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In epidemiological work, outcomes are frequently non-normal, sample sizes may be large, and effects are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. We focus on binary outcomes, with the risk surface a smooth function of space. We compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation. A Bayesian model using a spectral basis representation of the spatial surface provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial features while limiting overfitting and being more efficient computationally than other Bayesian approaches. One of the contributions of this work is further development of this underused representation. The spectral basis model outperforms the penalized likelihood methods, which are prone to overfitting, but is slower to fit and not as easily implemented. Conclusions based on a real dataset of cancer cases in Taiwan are similar albeit less conclusive with respect to comparing the approaches. The success of the spectral basis with binary data and similar results with count data suggest that it may be generally useful in spatial models and more complicated hierarchical models.