993 resultados para Retrieval models
Resumo:
Among the largest resources for biological sequence data is the large amount of expressed sequence tags (ESTs) available in public and proprietary databases. ESTs provide information on transcripts but for technical reasons they often contain sequencing errors. Therefore, when analyzing EST sequences computationally, such errors must be taken into account. Earlier attempts to model error prone coding regions have shown good performance in detecting and predicting these while correcting sequencing errors using codon usage frequencies. In the research presented here, we improve the detection of translation start and stop sites by integrating a more complex mRNA model with codon usage bias based error correction into one hidden Markov model (HMM), thus generalizing this error correction approach to more complex HMMs. We show that our method maintains the performance in detecting coding sequences.
Resumo:
Les moteurs de recherche font partie de notre vie quotidienne. Actuellement, plus d’un tiers de la population mondiale utilise l’Internet. Les moteurs de recherche leur permettent de trouver rapidement les informations ou les produits qu'ils veulent. La recherche d'information (IR) est le fondement de moteurs de recherche modernes. Les approches traditionnelles de recherche d'information supposent que les termes d'indexation sont indépendants. Pourtant, les termes qui apparaissent dans le même contexte sont souvent dépendants. L’absence de la prise en compte de ces dépendances est une des causes de l’introduction de bruit dans le résultat (résultat non pertinents). Certaines études ont proposé d’intégrer certains types de dépendance, tels que la proximité, la cooccurrence, la contiguïté et de la dépendance grammaticale. Dans la plupart des cas, les modèles de dépendance sont construits séparément et ensuite combinés avec le modèle traditionnel de mots avec une importance constante. Par conséquent, ils ne peuvent pas capturer correctement la dépendance variable et la force de dépendance. Par exemple, la dépendance entre les mots adjacents "Black Friday" est plus importante que celle entre les mots "road constructions". Dans cette thèse, nous étudions différentes approches pour capturer les relations des termes et de leurs forces de dépendance. Nous avons proposé des méthodes suivantes: ─ Nous réexaminons l'approche de combinaison en utilisant différentes unités d'indexation pour la RI monolingue en chinois et la RI translinguistique entre anglais et chinois. En plus d’utiliser des mots, nous étudions la possibilité d'utiliser bi-gramme et uni-gramme comme unité de traduction pour le chinois. Plusieurs modèles de traduction sont construits pour traduire des mots anglais en uni-grammes, bi-grammes et mots chinois avec un corpus parallèle. Une requête en anglais est ensuite traduite de plusieurs façons, et un score classement est produit avec chaque traduction. Le score final de classement combine tous ces types de traduction. Nous considérons la dépendance entre les termes en utilisant la théorie d’évidence de Dempster-Shafer. Une occurrence d'un fragment de texte (de plusieurs mots) dans un document est considérée comme représentant l'ensemble de tous les termes constituants. La probabilité est assignée à un tel ensemble de termes plutôt qu’a chaque terme individuel. Au moment d’évaluation de requête, cette probabilité est redistribuée aux termes de la requête si ces derniers sont différents. Cette approche nous permet d'intégrer les relations de dépendance entre les termes. Nous proposons un modèle discriminant pour intégrer les différentes types de dépendance selon leur force et leur utilité pour la RI. Notamment, nous considérons la dépendance de contiguïté et de cooccurrence à de différentes distances, c’est-à-dire les bi-grammes et les paires de termes dans une fenêtre de 2, 4, 8 et 16 mots. Le poids d’un bi-gramme ou d’une paire de termes dépendants est déterminé selon un ensemble des caractères, en utilisant la régression SVM. Toutes les méthodes proposées sont évaluées sur plusieurs collections en anglais et/ou chinois, et les résultats expérimentaux montrent que ces méthodes produisent des améliorations substantielles sur l'état de l'art.
Resumo:
Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.
Resumo:
Ice clouds are an important yet largely unvalidated component of weather forecasting and climate models, but radar offers the potential to provide the necessary data to evaluate them. First in this paper, coordinated aircraft in situ measurements and scans by a 3-GHz radar are presented, demonstrating that, for stratiform midlatitude ice clouds, radar reflectivity in the Rayleigh-scattering regime may be reliably calculated from aircraft size spectra if the "Brown and Francis" mass-size relationship is used. The comparisons spanned radar reflectivity values from -15 to +20 dBZ, ice water contents (IWCs) from 0.01 to 0.4 g m(-3), and median volumetric diameters between 0.2 and 3 mm. In mixed-phase conditions the agreement is much poorer because of the higher-density ice particles present. A large midlatitude aircraft dataset is then used to derive expressions that relate radar reflectivity and temperature to ice water content and visible extinction coefficient. The analysis is an advance over previous work in several ways: the retrievals vary smoothly with both input parameters, different relationships are derived for the common radar frequencies of 3, 35, and 94 GHz, and the problem of retrieving the long-term mean and the horizontal variance of ice cloud parameters is considered separately. It is shown that the dependence on temperature arises because of the temperature dependence of the number concentration "intercept parameter" rather than mean particle size. A comparison is presented of ice water content derived from scanning 3-GHz radar with the values held in the Met Office mesoscale forecast model, for eight precipitating cases spanning 39 h over Southern England. It is found that the model predicted mean I WC to within 10% of the observations at temperatures between -30 degrees and - 10 degrees C but tended to underestimate it by around a factor of 2 at colder temperatures.
Resumo:
A novel framework referred to as collaterally confirmed labelling (CCL) is proposed, aiming at localising the visual semantics to regions of interest in images with textual keywords. Both the primary image and collateral textual modalities are exploited in a mutually co-referencing and complementary fashion. The collateral content and context-based knowledge is used to bias the mapping from the low-level region-based visual primitives to the high-level visual concepts defined in a visual vocabulary. We introduce the notion of collateral context, which is represented as a co-occurrence matrix of the visual keywords. A collaborative mapping scheme is devised using statistical methods like Gaussian distribution or Euclidean distance together with collateral content and context-driven inference mechanism. We introduce a novel high-level visual content descriptor that is devised for performing semantic-based image classification and retrieval. The proposed image feature vector model is fundamentally underpinned by the CCL framework. Two different high-level image feature vector models are developed based on the CCL labelling of results for the purposes of image data clustering and retrieval, respectively. A subset of the Corel image collection has been used for evaluating our proposed method. The experimental results to-date already indicate that the proposed semantic-based visual content descriptors outperform both traditional visual and textual image feature models. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
We test the response of the Oxford-RAL Aerosol and Cloud (ORAC) retrieval algorithm for MSG SEVIRI to changes in the aerosol properties used in the dust aerosol model, using data from the Dust Outflow and Deposition to the Ocean (DODO) flight campaign in August 2006. We find that using the observed DODO free tropospheric aerosol size distribution and refractive index increases simulated top of the atmosphere radiance at 0.55 µm assuming a fixed erosol optical depth of 0.5 by 10–15 %, reaching a maximum difference at low solar zenith angles. We test the sensitivity of the retrieval to the vertical distribution f the aerosol and find that this is unimportant in determining simulated radiance at 0.55 µm. We also test the ability of the ORAC retrieval when used to produce the GlobAerosol dataset to correctly identify continental aerosol outflow from the African continent and we find that it poorly constrains aerosol speciation. We develop spatially and temporally resolved prior distributions of aerosols to inform the retrieval which incorporates five aerosol models: desert dust, maritime, biomass burning, urban and continental. We use a Saharan Dust Index and the GEOS-Chem chemistry transport model to describe dust and biomass burning aerosol outflow, and compare AOD using our speciation against the GlobAerosol retrieval during January and July 2006. We find AOD discrepancies of 0.2–1 over regions of intense biomass burning outflow, where AOD from our aerosol speciation and GlobAerosol speciation can differ by as much as 50 - 70 %.
Resumo:
The need for consistent assimilation of satellite measurements for numerical weather prediction led operational meteorological centers to assimilate satellite radiances directly using variational data assimilation systems. More recently there has been a renewed interest in assimilating satellite retrievals (e.g., to avoid the use of relatively complicated radiative transfer models as observation operators for data assimilation). The aim of this paper is to provide a rigorous and comprehensive discussion of the conditions for the equivalence between radiance and retrieval assimilation. It is shown that two requirements need to be satisfied for the equivalence: (i) the radiance observation operator needs to be approximately linear in a region of the state space centered at the retrieval and with a radius of the order of the retrieval error; and (ii) any prior information used to constrain the retrieval should not underrepresent the variability of the state, so as to retain the information content of the measurements. Both these requirements can be tested in practice. When these requirements are met, retrievals can be transformed so as to represent only the portion of the state that is well constrained by the original radiance measurements and can be assimilated in a consistent and optimal way, by means of an appropriate observation operator and a unit matrix as error covariance. Finally, specific cases when retrieval assimilation can be more advantageous (e.g., when the estimate sought by the operational assimilation system depends on the first guess) are discussed.
Resumo:
The retrieval (estimation) of sea surface temperatures (SSTs) from space-based infrared observations is increasingly performed using retrieval coefficients derived from radiative transfer simulations of top-of-atmosphere brightness temperatures (BTs). Typically, an estimate of SST is formed from a weighted combination of BTs at a few wavelengths, plus an offset. This paper addresses two questions about the radiative transfer modeling approach to deriving these weighting and offset coefficients. How precisely specified do the coefficients need to be in order to obtain the required SST accuracy (e.g., scatter <0.3 K in week-average SST, bias <0.1 K)? And how precisely is it actually possible to specify them using current forward models? The conclusions are that weighting coefficients can be obtained with adequate precision, while the offset coefficient will often require an empirical adjustment of the order of a few tenths of a kelvin against validation data. Thus, a rational approach to defining retrieval coefficients is one of radiative transfer modeling followed by offset adjustment. The need for this approach is illustrated from experience in defining SST retrieval schemes for operational meteorological satellites. A strategy is described for obtaining the required offset adjustment, and the paper highlights some of the subtler aspects involved with reference to the example of SST retrievals from the imager on the geostationary satellite GOES-8.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
The starting point of this article is the question "How to retrieve fingerprints of rhythm in written texts?" We address this problem in the case of Brazilian and European Portuguese. These two dialects of Modern Portuguese share the same lexicon and most of the sentences they produce are superficially identical. Yet they are conjectured, on linguistic grounds, to implement different rhythms. We show that this linguistic question can be formulated as a problem of model selection in the class of variable length Markov chains. To carry on this approach, we compare texts from European and Brazilian Portuguese. These texts are previously encoded according to some basic rhythmic features of the sentences which can be automatically retrieved. This is an entirely new approach from the linguistic point of view. Our statistical contribution is the introduction of the smallest maximizer criterion which is a constant free procedure for model selection. As a by-product, this provides a solution for the problem of optimal choice of the penalty constant when using the BIC to select a variable length Markov chain. Besides proving the consistency of the smallest maximizer criterion when the sample size diverges, we also make a simulation study comparing our approach with both the standard BIC selection and the Peres-Shields order estimation. Applied to the linguistic sample constituted for our case study, the smallest maximizer criterion assigns different context-tree models to the two dialects of Portuguese. The features of the selected models are compatible with current conjectures discussed in the linguistic literature.
Resumo:
The research activity carried out during the PhD course was focused on the development of mathematical models of some cognitive processes and their validation by means of data present in literature, with a double aim: i) to achieve a better interpretation and explanation of the great amount of data obtained on these processes from different methodologies (electrophysiological recordings on animals, neuropsychological, psychophysical and neuroimaging studies in humans), ii) to exploit model predictions and results to guide future research and experiments. In particular, the research activity has been focused on two different projects: 1) the first one concerns the development of neural oscillators networks, in order to investigate the mechanisms of synchronization of the neural oscillatory activity during cognitive processes, such as object recognition, memory, language, attention; 2) the second one concerns the mathematical modelling of multisensory integration processes (e.g. visual-acoustic), which occur in several cortical and subcortical regions (in particular in a subcortical structure named Superior Colliculus (SC)), and which are fundamental for orienting motor and attentive responses to external world stimuli. This activity has been realized in collaboration with the Center for Studies and Researches in Cognitive Neuroscience of the University of Bologna (in Cesena) and the Department of Neurobiology and Anatomy of the Wake Forest University School of Medicine (NC, USA). PART 1. Objects representation in a number of cognitive functions, like perception and recognition, foresees distribute processes in different cortical areas. One of the main neurophysiological question concerns how the correlation between these disparate areas is realized, in order to succeed in grouping together the characteristics of the same object (binding problem) and in maintaining segregated the properties belonging to different objects simultaneously present (segmentation problem). Different theories have been proposed to address these questions (Barlow, 1972). One of the most influential theory is the so called “assembly coding”, postulated by Singer (2003), according to which 1) an object is well described by a few fundamental properties, processing in different and distributed cortical areas; 2) the recognition of the object would be realized by means of the simultaneously activation of the cortical areas representing its different features; 3) groups of properties belonging to different objects would be kept separated in the time domain. In Chapter 1.1 and in Chapter 1.2 we present two neural network models for object recognition, based on the “assembly coding” hypothesis. These models are networks of Wilson-Cowan oscillators which exploit: i) two high-level “Gestalt Rules” (the similarity and previous knowledge rules), to realize the functional link between elements of different cortical areas representing properties of the same object (binding problem); 2) the synchronization of the neural oscillatory activity in the γ-band (30-100Hz), to segregate in time the representations of different objects simultaneously present (segmentation problem). These models are able to recognize and reconstruct multiple simultaneous external objects, even in difficult case (some wrong or lacking features, shared features, superimposed noise). In Chapter 1.3 the previous models are extended to realize a semantic memory, in which sensory-motor representations of objects are linked with words. To this aim, the network, previously developed, devoted to the representation of objects as a collection of sensory-motor features, is reciprocally linked with a second network devoted to the representation of words (lexical network) Synapses linking the two networks are trained via a time-dependent Hebbian rule, during a training period in which individual objects are presented together with the corresponding words. Simulation results demonstrate that, during the retrieval phase, the network can deal with the simultaneous presence of objects (from sensory-motor inputs) and words (from linguistic inputs), can correctly associate objects with words and segment objects even in the presence of incomplete information. Moreover, the network can realize some semantic links among words representing objects with some shared features. These results support the idea that semantic memory can be described as an integrated process, whose content is retrieved by the co-activation of different multimodal regions. In perspective, extended versions of this model may be used to test conceptual theories, and to provide a quantitative assessment of existing data (for instance concerning patients with neural deficits). PART 2. The ability of the brain to integrate information from different sensory channels is fundamental to perception of the external world (Stein et al, 1993). It is well documented that a number of extraprimary areas have neurons capable of such a task; one of the best known of these is the superior colliculus (SC). This midbrain structure receives auditory, visual and somatosensory inputs from different subcortical and cortical areas, and is involved in the control of orientation to external events (Wallace et al, 1993). SC neurons respond to each of these sensory inputs separately, but is also capable of integrating them (Stein et al, 1993) so that the response to the combined multisensory stimuli is greater than that to the individual component stimuli (enhancement). This enhancement is proportionately greater if the modality-specific paired stimuli are weaker (the principle of inverse effectiveness). Several studies have shown that the capability of SC neurons to engage in multisensory integration requires inputs from cortex; primarily the anterior ectosylvian sulcus (AES), but also the rostral lateral suprasylvian sulcus (rLS). If these cortical inputs are deactivated the response of SC neurons to cross-modal stimulation is no different from that evoked by the most effective of its individual component stimuli (Jiang et al 2001). This phenomenon can be better understood through mathematical models. The use of mathematical models and neural networks can place the mass of data that has been accumulated about this phenomenon and its underlying circuitry into a coherent theoretical structure. In Chapter 2.1 a simple neural network model of this structure is presented; this model is able to reproduce a large number of SC behaviours like multisensory enhancement, multisensory and unisensory depression, inverse effectiveness. In Chapter 2.2 this model was improved by incorporating more neurophysiological knowledge about the neural circuitry underlying SC multisensory integration, in order to suggest possible physiological mechanisms through which it is effected. This endeavour was realized in collaboration with Professor B.E. Stein and Doctor B. Rowland during the 6 months-period spent at the Department of Neurobiology and Anatomy of the Wake Forest University School of Medicine (NC, USA), within the Marco Polo Project. The model includes four distinct unisensory areas that are devoted to a topological representation of external stimuli. Two of them represent subregions of the AES (i.e., FAES, an auditory area, and AEV, a visual area) and send descending inputs to the ipsilateral SC; the other two represent subcortical areas (one auditory and one visual) projecting ascending inputs to the same SC. Different competitive mechanisms, realized by means of population of interneurons, are used in the model to reproduce the different behaviour of SC neurons in conditions of cortical activation and deactivation. The model, with a single set of parameters, is able to mimic the behaviour of SC multisensory neurons in response to very different stimulus conditions (multisensory enhancement, inverse effectiveness, within- and cross-modal suppression of spatially disparate stimuli), with cortex functional and cortex deactivated, and with a particular type of membrane receptors (NMDA receptors) active or inhibited. All these results agree with the data reported in Jiang et al. (2001) and in Binns and Salt (1996). The model suggests that non-linearities in neural responses and synaptic (excitatory and inhibitory) connections can explain the fundamental aspects of multisensory integration, and provides a biologically plausible hypothesis about the underlying circuitry.
Resumo:
Energy efficiency has become an important research topic in intralogistics. Especially in this field the focus is placed on automated storage and retrieval systems (AS/RS) utilizing stacker cranes as these systems are widespread and consume a significant portion of the total energy demand of intralogistical systems. Numerical simulation models were developed to calculate the energy demand rather precisely for discrete single and dual command cycles. Unfortunately these simulation models are not suitable to perform fast calculations to determine a mean energy demand value of a complete storage aisle. For this purpose analytical approaches would be more convenient but until now analytical approaches only deliver results for certain configurations. In particular, for commonly used stacker cranes equipped with an intermediate circuit connection within their drive configuration there is no analytical approach available to calculate the mean energy demand. This article should address this research gap and present a calculation approach which enables planners to quickly calculate the energy demand of these systems.
Resumo:
Directly imaged exoplanets are unexplored laboratories for the application of the spectral and temperature retrieval method, where the chemistry and composition of their atmospheres are inferred from inverse modeling of the available data. As a pilot study, we focus on the extrasolar gas giant HR 8799b, for which more than 50 data points are available. We upgrade our non-linear optimal estimation retrieval method to include a phenomenological model of clouds that requires the cloud optical depth and monodisperse particle size to be specified. Previous studies have focused on forward models with assumed values of the exoplanetary properties; there is no consensus on the best-fit values of the radius, mass, surface gravity, and effective temperature of HR 8799b. We show that cloud-free models produce reasonable fits to the data if the atmosphere is of super-solar metallicity and non-solar elemental abundances. Intermediate cloudy models with moderate values of the cloud optical depth and micron-sized particles provide an equally reasonable fit to the data and require a lower mean molecular weight. We report our best-fit values for the radius, mass, surface gravity, and effective temperature of HR 8799b. The mean molecular weight is about 3.8, while the carbon-to-oxygen ratio is about unity due to the prevalence of carbon monoxide. Our study emphasizes the need for robust claims about the nature of an exoplanetary atmosphere to be based on analyses involving both photometry and spectroscopy and inferred from beyond a few photometric data points, such as are typically reported for hot Jupiters.
Resumo:
Motivado por los últimos hallazgos realizados gracias a los recientes avances tecnológicos y misiones espaciales, el estudio de los asteroides ha despertado el interés de la comunidad científica. Tal es así que las misiones a asteroides han proliferado en los últimos años (Hayabusa, Dawn, OSIRIX-REx, ARM, AIMS-DART, ...) incentivadas por su enorme interés científico. Los asteroides son constituyentes fundamentales en la evolución del Sistema Solar, son además grandes concentraciones de valiosos recursos naturales, y también pueden considerarse como objectivos estratégicos para la futura exploración espacial. Desde hace tiempo se viene especulando con la posibilidad de capturar objetos próximos a la Tierra (NEOs en su acrónimo anglosajón) y acercarlos a nuestro planeta, permitiendo así un acceso asequible a los mismos para estudiarlos in-situ, explotar sus recursos u otras finalidades. Por otro lado, las asteroides se consideran con frecuencia como posibles peligros de magnitud planetaria, ya que impactos de estos objetos con la Tierra suceden constantemente, y un asteroide suficientemente grande podría desencadenar eventos catastróficos. Pese a la gravedad de tales acontecimientos, lo cierto es que son ciertamente difíciles de predecir. De hecho, los ricos aspectos dinámicos de los asteroides, su modelado complejo y las incertidumbres observaciones hacen que predecir su posición futura con la precisión necesaria sea todo un reto. Este hecho se hace más relevante cuando los asteroides sufren encuentros próximos con la Tierra, y más aún cuando estos son recurrentes. En tales situaciones en las cuales fuera necesario tomar medidas para mitigar este tipo de riesgos, saber estimar con precisión sus trayectorias y probabilidades de colisión es de una importancia vital. Por ello, se necesitan herramientas avanzadas para modelar su dinámica y predecir sus órbitas con precisión, y son también necesarios nuevos conceptos tecnológicos para manipular sus órbitas llegado el caso. El objetivo de esta Tesis es proporcionar nuevos métodos, técnicas y soluciones para abordar estos retos. Las contribuciones de esta Tesis se engloban en dos áreas: una dedicada a la propagación numérica de asteroides, y otra a conceptos de deflexión y captura de asteroides. Por lo tanto, la primera parte de este documento presenta novedosos avances de apliación a la propagación dinámica de alta precisión de NEOs empleando métodos de regularización y perturbaciones, con especial énfasis en el método DROMO, mientras que la segunda parte expone ideas innovadoras para la captura de asteroides y comenta el uso del “ion beam shepherd” (IBS) como tecnología para deflectarlos. Abstract Driven by the latest discoveries enabled by recent technological advances and space missions, the study of asteroids has awakened the interest of the scientific community. In fact, asteroid missions have become very popular in the recent years (Hayabusa, Dawn, OSIRIX-REx, ARM, AIMS-DART, ...) motivated by their outstanding scientific interest. Asteroids are fundamental constituents in the evolution of the Solar System, can be seen as vast concentrations of valuable natural resources, and are also considered as strategic targets for the future of space exploration. For long it has been hypothesized with the possibility of capturing small near-Earth asteroids and delivering them to the vicinity of the Earth in order to allow an affordable access to them for in-situ science, resource utilization and other purposes. On the other side of the balance, asteroids are often seen as potential planetary hazards, since impacts with the Earth happen all the time, and eventually an asteroid large enough could trigger catastrophic events. In spite of the severity of such occurrences, they are also utterly hard to predict. In fact, the rich dynamical aspects of asteroids, their complex modeling and observational uncertainties make exceptionally challenging to predict their future position accurately enough. This becomes particularly relevant when asteroids exhibit close encounters with the Earth, and more so when these happen recurrently. In such situations, where mitigation measures may need to be taken, it is of paramount importance to be able to accurately estimate their trajectories and collision probabilities. As a consequence, advanced tools are needed to model their dynamics and accurately predict their orbits, as well as new technological concepts to manipulate their orbits if necessary. The goal of this Thesis is to provide new methods, techniques and solutions to address these challenges. The contributions of this Thesis fall into two areas: one devoted to the numerical propagation of asteroids, and another to asteroid deflection and capture concepts. Hence, the first part of the dissertation presents novel advances applicable to the high accuracy dynamical propagation of near-Earth asteroids using regularization and perturbations techniques, with a special emphasis in the DROMO method, whereas the second part exposes pioneering ideas for asteroid retrieval missions and discusses the use of an “ion beam shepherd” (IBS) for asteroid deflection purposes.