The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.


The recent availability of the chicken genome sequence poses the question of whether there are human protein-coding genes conserved in chicken that are currently not included in the human gene catalog. Here, we show, using comparative gene finding followed by experimental verification of exon pairs by RT–PCR, that the addition to the multi-exonic subset of this catalog could be as little as 0.2%, suggesting that we may be closing in on the human gene set. Our protocol, however, has two shortcomings: (i) the bioinformatic screening of the predicted genes, applied to filter out false positives, cannot handle intronless genes; and (ii) the experimental verification could fail to identify expression at a specific developmental time. This highlights the importance of developing methods that could provide a reliable estimate of the number of these two types of genes.


Background: Despite the continuous production of genome sequence for a number of organisms,reliable, comprehensive, and cost effective gene prediction remains problematic. This is particularlytrue for genomes for which there is not a large collection of known gene sequences, such as therecently published chicken genome. We used the chicken sequence to test comparative andhomology-based gene-finding methods followed by experimental validation as an effective genomeannotation method.Results: We performed experimental evaluation by RT-PCR of three different computational genefinders, Ensembl, SGP2 and TWINSCAN, applied to the chicken genome. A Venn diagram wascomputed and each component of it was evaluated. The results showed that de novo comparativemethods can identify up to about 700 chicken genes with no previous evidence of expression, andcan correctly extend about 40% of homology-based predictions at the 5' end.Conclusions: De novo comparative gene prediction followed by experimental verification iseffective at enhancing the annotation of the newly sequenced genomes provided by standardhomology-based methods.


We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments.


Time scale parametric spike train distances like the Victor and the van Rossum distancesare often applied to study the neural code based on neural stimuli discrimination.Different neural coding hypotheses, such as rate or coincidence coding,can be assessed by combining a time scale parametric spike train distance with aclassifier in order to obtain the optimal discrimination performance. The time scalefor which the responses to different stimuli are distinguished best is assumed to bethe discriminative precision of the neural code. The relevance of temporal codingis evaluated by comparing the optimal discrimination performance with the oneachieved when assuming a rate code.We here characterize the measures quantifying the discrimination performance,the discriminative precision, and the relevance of temporal coding. Furthermore,we evaluate the information these quantities provide about the neural code. Weshow that the discriminative precision is too unspecific to be interpreted in termsof the time scales relevant for encoding. Accordingly, the time scale parametricnature of the distances is mainly an advantage because it allows maximizing thediscrimination performance across a whole set of measures with different sensitivitiesdetermined by the time scale parameter, but not due to the possibility toexamine the temporal properties of the neural code.


In this paper we present a description of the role of definitional verbal patterns for the extraction of semantic relations. Several studies show that semantic relations can be extracted from analytic definitions contained in machine-readable dictionaries (MRDs). In addition, definitions found in specialised texts are a good starting point to search for different types of definitions where other semantic relations occur. The extraction of definitional knowledge from specialised corpora represents another interesting approach for the extraction of semantic relations. Here, we present a descriptive analysis of definitional verbal patterns in Spanish and the first steps towards the development of a system for the automatic extraction of definitional knowledge.


El projecte central que es duu a terme a l'Institut Universitari de Lingüística Aplicada (IULA) de la Universitat Pompeu Fabra és el corpus de Llenguatges especialitzats. En el marc d'aquest projecte —que implica cinc dominis d'especialitat (dret, economia, informàtica, medi ambient i medicina) i cinc llengües (català, castellà, francès, anglès i alemany)— s'han desenvolupat dos etiquetaris per a la llengua catalana i castellana. Amb el desenvolupament d'aquests etiquetaris es pretén facilitar l'etapa del processament lingüístic del corpus. En aquest paper es discuteixen, d'una banda, alguns aspectes teòrics relatius a la construcció d'etiquetaris i, de l'altra, es presenten els dos etiquetaris que s'han elaborat a l'IULA.


En aquest paper, analitzem les principals ontologies amb la finalitat de dibuixar un panorama general d'una de les eines més utilitzades en l'estructuració del coneixement. En primer lloc, presentem una àmplia descripció de les cinc ontologies més difoses entre la comunitat científica dedicada a la gestió de la informació. Seguidament, repassem breument algunes de les eines de gestió que s'utilitzen per crear i actualitzar ontologies. I, finalment, presentem algunes conclusions en relació a la selecció d'una ontologia i d'un sistema de gestió per a la seva utilització en el marc dels projectes vigents del grup IULATERM.


A medida que el tamaño de un corpus aumenta, aumenta también el número de concordancias obtenidas al consultar una forma. Un número muy elevado de concordancias, de cientos o de miles, dificulta la sistematicidad de la tarea del lexicógrafo. La propuesta del artículo consiste en el uso de un sistema automático de agrupación de concordancias según su similitud léxica (es decir, qué elementos léxicos comparten), con el objetivo de presentar dichas concordancias agrupadas y asociadas a un único representante de todo el conjunto de las que son consideradas como léxicamente similares, de manera que la cardinalidad efectiva de los datos de corpus se vea reducida. El sistema se ha desarrollado teniendo en cuenta las ventajas de una arquitectura distribuida, por lo que cada una de las partes del sistema (stemming, identificación de stop words, cálculo de similitud entre concordancias, ordenación final de los datos, etc.) se ha desarrollado en módulos diferentes que pueden estar alojados en servidores, ya que las necesidades de cálculo del sistema harían que su uso desde un ordenador personal resultara muy lento.


Remote sensing spatial, spectral, and temporal resolutions of images, acquired over a reasonably sized image extent, result in imagery that can be processed to represent land cover over large areas with an amount of spatial detail that is very attractive for monitoring, management, and scienti c activities. With Moore's Law alive and well, more and more parallelism is introduced into all computing platforms, at all levels of integration and programming to achieve higher performance and energy e ciency. Being the geometric calibration process one of the most time consuming processes when using remote sensing images, the aim of this work is to accelerate this process by taking advantage of new computing architectures and technologies, specially focusing in exploiting computation over shared memory multi-threading hardware. A parallel implementation of the most time consuming process in the remote sensing geometric correction has been implemented using OpenMP directives. This work compares the performance of the original serial binary versus the parallelized implementation, using several multi-threaded modern CPU architectures, discussing about the approach to nd the optimum hardware for a cost-e ective execution.


Nominal Unification is an extension of first-order unification where terms can contain binders and unification is performed modulo α equivalence. Here we prove that the existence of nominal unifiers can be decided in quadratic time. First, we linearly-reduce nominal unification problems to a sequence of freshness and equalities between atoms, modulo a permutation, using ideas as Paterson and Wegman for first-order unification. Second, we prove that solvability of these reduced problems may be checked in quadràtic time. Finally, we point out how using ideas of Brown and Tarjan for unbalanced merging, we could solve these reduced problems more efficiently


We present a non-equilibrium theory in a system with heat and radiative fluxes. The obtained expression for the entropy production is applied to a simple one-dimensional climate model based on the first law of thermodynamics. In the model, the dissipative fluxes are assumed to be independent variables, following the criteria of the Extended Irreversible Thermodynamics (BIT) that enlarges, in reference to the classical expression, the applicability of a macroscopic thermodynamic theory for systems far from equilibrium. We analyze the second differential of the classical and the generalized entropy as a criteria of stability of the steady states. Finally, the extreme state is obtained using variational techniques and observing that the system is close to the maximum dissipation rate


The long-term mean properties of the global climate system and those of turbulent fluid systems are reviewed from a thermodynamic viewpoint. Two general expressions are derived for a rate of entropy production due to thermal and viscous dissipation (turbulent dissipation) in a fluid system. It is shown with these expressions that maximum entropy production in the Earth s climate system suggested by Paltridge, as well as maximum transport properties of heat or momentum in a turbulent system suggested by Malkus and Busse, correspond to a state in which the rate of entropy production due to the turbulent dissipation is at a maximum. Entropy production due to absorption of solar radiation in the climate system is found to be irrelevant to the maximized properties associated with turbulence. The hypothesis of maximum entropy production also seems to be applicable to the planetary atmospheres of Mars and Titan and perhaps to mantle convection. Lorenz s conjecture on maximum generation of available potential energy is shown to be akin to this hypothesis with a few minor approximations. A possible mechanism by which turbulent fluid systems adjust themselves to the states of maximum entropy production is presented as a selffeedback mechanism for the generation of available potential energy. These results tend to support the hypothesis of maximum entropy production that underlies a wide variety of nonlinear fluid systems, including our planet as well as other planets and stars


The second differential of the entropy is used for analysing the stability of a thermodynamic climatic model. A delay time for the heat flux is introduced whereby it becomes an independent variable. Two different expressions for the second differential of the entropy are used: one follows classical irreversible thermodynamics theory; the second is related to the introduction of response time and is due to the extended irreversible thermodynamics theory. the second differential of the classical entropy leads to unstable solutions for high values of delay times. the extended expression always implies stable states for an ice-free earth. When the ice-albedo feedback is included, a discontinuous distribution of stable states is found for high response times. Following the thermodynamic analysis of the model, the maximum rates of entropy production at the steady state are obtained. A latitudinally isothermal earth produces the extremum in global entropy production. the material contribution to entropy production (by which we mean the production of entropy by material transport of heat) is a maximum when the latitudinal distribution of temperatures becomes less homogeneous than present values


We investigate the hypothesis that the atmosphere is constrained to maximize its entropy production by using a one-dimensional (1-D) vertical model. We prescribe the lapse rate in the convective layer as that of the standard troposphere. The assumption that convection sustains a critical lapse rate was absent in previous studies, which focused on the vertical distribution of climatic variables, since such a convective adjustment reduces the degrees of freedom of the system and may prevent the application of the maximum entropy production (MEP) principle. This is not the case in the radiative–convective model (RCM) developed here, since we accept a discontinuity of temperatures at the surface similar to that adopted in many RCMs. For current conditions, the MEP state gives a difference between the ground temperature and the air temperature at the surface ≈10 K. In comparison, conventional RCMs obtain a discontinuity ≈2 K only. However, the surface boundary layer velocity in the MEP state appears reasonable (≈3 m s-¹). Moreover, although the convective flux at the surface in MEP states is almost uniform in optically thick atmospheres, it reaches a maximum value for an optical thickness similar to current conditions. This additional result may support the maximum convection hypothesis suggested by Paltridge (1978)